Aidena

Evaluation & Testing

Frameworks for measuring LLM output quality, accuracy, and reliability

13 tools

Arthur AI

Enterprise

Full lifecycle AI evaluation and monitoring platform supporting ML, GenAI, and agentic systems. Offers built-in guardrai...

Evaluation & Testing

Braintrust

Freemium

End-to-end LLM evaluation and development platform with dataset management, prompt playground, automated scoring, and tr...

Evaluation & Testing

DeepEval

Freemium

Open-source LLM evaluation framework with 50+ research-backed metrics for testing AI applications. Differentiates with n...

Evaluation & Testing

Evidently AI

Freemium

Open-source ML and LLM observability platform. Provides pre-built evaluators for text quality, LLM output correctness, t...

Evaluation & Testing

Giskard

Freemium

Open-source testing and vulnerability scanning framework for LLM applications. Detects hallucinations, bias, harmful con...

Evaluation & Testing

Inspect AI

Open Source

LLM evaluation framework developed by the UK AI Safety Institute (AISI). Designed for safety and capability evaluations ...

Evaluation & Testing

LM Evaluation Harness

Open Source

Unified framework by EleutherAI for evaluating language models across hundreds of academic benchmarks and tasks. The de ...

Evaluation & Testing

OpenAI Evals

Open Source

Open-source framework for evaluating LLMs and LLM-based systems with a registry of pre-built benchmarks. Provides eval t...

Evaluation & Testing

Parea AI

Freemium

Platform for testing, evaluating, and observing LLM applications with experiment tracking, human annotation, and product...

Evaluation & Testing

Promptfoo

Open Source

Open-source tool for testing and evaluating LLM prompts and models. Supports automated red teaming, A/B comparison of mo...

Evaluation & Testing

RAGAS

Open Source

Open-source framework for evaluating Retrieval-Augmented Generation (RAG) pipelines using LLM-assisted metrics. Provides...

Evaluation & Testing

TruLens

Open Source

Evaluation and tracking library for LLM applications built by TruEra. Provides feedback functions to evaluate RAG qualit...

Evaluation & Testing

UpTrain

Freemium

Open-source LLM evaluation and observability platform with 20+ pre-built checks including response quality, factual accu...

Evaluation & Testing