OpenAI Evals

Evaluation & TestingOpen SourceVerifiedOpen Source

Open-source framework for evaluating LLMs and LLM-based systems with a registry of pre-built benchmarks. Provides eval templates including model-graded evals, custom completion function protocols, and YAML-based configuration for codeless eval creation. Best suited for systematically testing model performance across dimensions before deploying or upgrading LLM-powered applications.