Guardrails & Safety

Input/output filters, prompt injection defense, and compliance tools

11 tools

Azure Content Safety

Managed Azure AI service for detecting harmful content in text and images across hate speech, violence, self-harm, and s...

Open-source framework for adding structural and semantic validation guardrails to LLM outputs. Provides a hub of reusabl...

Real-time API-based security layer for LLM applications that detects prompt injections, jailbreaks, and unsafe content w...

Meta's open-source LLM-based safety classifier for content moderation of human-AI conversations, fine-tuned on a taxonom...

Open-source security toolkit by ProtectAI providing a suite of scanners to detect prompt injection, PII leakage, toxicit...

Open-source SDK by Microsoft for fast identification and anonymization of PII in text and images, with customizable reco...

Open-source toolkit by NVIDIA for adding programmable guardrails to LLM systems. Declarative Colang language defines top...

Open-source toolkit for adding programmable guardrails to LLM-based conversational applications. Uses Colang, a custom m...

AI evaluation and guardrails platform that detects hallucinations, validates LLM outputs, and scores model quality. Key ...

Perspective API uses machine learning to score text for toxicity, threats, insults, and other attributes that can derail...

Rebuff is a self-hardening prompt injection detection framework that uses multiple layered techniques to identify and bl...