OpenAI Evals
Evaluation & TestingOpen SourceVerifiedOpen Source
Open-source framework for evaluating LLMs and LLM-based systems with a registry of pre-built benchmarks. Provides eval templates including model-graded evals, custom completion function protocols, and YAML-based configuration for codeless eval creation. Best suited for systematically testing model performance across dimensions before deploying or upgrading LLM-powered applications.
Price
From $0
License: MIT