vLLM
Model InferenceOpen SourceVerifiedOpen Source
Open-source high-throughput LLM inference and serving engine. Uses PagedAttention for efficient memory management, continuous batching, and optimized CUDA kernels to maximize GPU utilization. Best suited for self-hosted production LLM serving with high concurrency requirements.
Price
From $0
License: Apache-2.0