Evaluation & Testing
Frameworks for measuring LLM output quality, accuracy, and reliability
13 tools
Arthur AI
EnterpriseFull lifecycle AI evaluation and monitoring platform supporting ML, GenAI, and agentic systems. Offers built-in guardrai...
Evaluation & TestingBraintrust
FreemiumEnd-to-end LLM evaluation and development platform with dataset management, prompt playground, automated scoring, and tr...
Evaluation & TestingDeepEval
FreemiumOpen-source LLM evaluation framework with 50+ research-backed metrics for testing AI applications. Differentiates with n...
Evaluation & TestingEvidently AI
FreemiumOpen-source ML and LLM observability platform. Provides pre-built evaluators for text quality, LLM output correctness, t...
Evaluation & TestingGiskard
FreemiumOpen-source testing and vulnerability scanning framework for LLM applications. Detects hallucinations, bias, harmful con...
Evaluation & TestingInspect AI
Open SourceLLM evaluation framework developed by the UK AI Safety Institute (AISI). Designed for safety and capability evaluations ...
Evaluation & TestingLM Evaluation Harness
Open SourceUnified framework by EleutherAI for evaluating language models across hundreds of academic benchmarks and tasks. The de ...
Evaluation & TestingOpenAI Evals
Open SourceOpen-source framework for evaluating LLMs and LLM-based systems with a registry of pre-built benchmarks. Provides eval t...
Evaluation & TestingParea AI
FreemiumPlatform for testing, evaluating, and observing LLM applications with experiment tracking, human annotation, and product...
Evaluation & TestingPromptfoo
Open SourceOpen-source tool for testing and evaluating LLM prompts and models. Supports automated red teaming, A/B comparison of mo...
Evaluation & TestingRAGAS
Open SourceOpen-source framework for evaluating Retrieval-Augmented Generation (RAG) pipelines using LLM-assisted metrics. Provides...
Evaluation & TestingTruLens
Open SourceEvaluation and tracking library for LLM applications built by TruEra. Provides feedback functions to evaluate RAG qualit...
Evaluation & TestingUpTrain
FreemiumOpen-source LLM evaluation and observability platform with 20+ pre-built checks including response quality, factual accu...
Evaluation & Testing