Model Inference
Engines and platforms for running and serving AI models (self-hosted or managed)
18 tools
BentoML
Open SourceOpen-source platform for building, shipping, and scaling AI inference services. Supports any ML framework, GPU autoscali...
Model InferenceCerebras Inference
FreemiumInference API built on Cerebras Wafer-Scale Engine hardware delivering extremely high token throughput. Achieves 2000+ t...
Model InferenceCloudflare Workers AI
FreemiumServerless GPU inference at the network edge. Run LLMs, image generation, speech, and embedding models with no infrastru...
Model InferenceFireworks AI
FreemiumHigh-performance inference platform for open-source models with compound AI system support. Offers serverless and dedica...
Model InferenceGroq
FreemiumCloud inference API providing access to open-source LLMs, speech, and TTS models via an OpenAI-compatible endpoint. Uses...
Model InferenceLepton AI
FreemiumCloud platform for running AI inference with a Pythonic SDK and OpenAI-compatible endpoints. Built by ex-Meta AI researc...
Model Inferencellama.cpp
Open SourceC/C++ LLM inference engine. Run quantized models on CPU/GPU with minimal dependencies. GGUF format standard.
Model Inferencellamafile
Open SourceMozilla project that bundles an LLM and runtime into a single cross-platform executable (llama.cpp + Cosmopolitan Libc)....
Model InferenceLocalAI
Open SourceOpen-source, self-hosted OpenAI API-compatible inference server for running LLMs, image generation, and audio models on ...
Model InferenceMLC LLM
Open SourceOpen-source framework for compiling and running LLMs natively on any hardware including mobile, edge, and browser. Uses ...
Model InferencePredibase
PaidServerless fine-tuning and inference platform for open LLMs using LoRA adapters. Train and serve hundreds of custom LoRA...
Model InferenceRay Serve
Open SourceScalable model-serving library built on Ray. Composes multiple ML models and Python functions into a single production e...
Model InferenceReplicate
FreemiumCloud platform for running, fine-tuning, and deploying machine learning models via API. Offers thousands of community an...
Model InferenceTensorRT-LLM
Open SourceOpen-source library for optimizing large language model inference on NVIDIA GPUs. Leverages deep hardware-software co-de...
Model InferenceText Generation Inference
Open SourceHugging Face's production-ready inference server. Optimized for Transformer models with tensor parallelism support.
Model InferenceTogether AI
FreemiumCloud inference platform for running open-source and proprietary AI models via serverless or dedicated endpoints. Differ...
Model InferencevLLM
Open SourceOpen-source high-throughput LLM inference and serving engine. Uses PagedAttention for efficient memory management, conti...
Model InferenceXinference
Open SourceOpen-source platform for running inference with LLMs, embedding models, multimodal models, and more on cloud or on-premi...
Model Inference