LM Evaluation Harness

Evaluation & TestingOpen SourceVerified

Unified framework by EleutherAI for evaluating language models across hundreds of academic benchmarks and tasks. The de facto standard for open-source model benchmarking.