TensorRT-LLM
Model InferenceOpen SourceVerifiedOpen Source
Open-source library for optimizing large language model inference on NVIDIA GPUs. Leverages deep hardware-software co-design with custom kernels, FP8/FP4 quantization, in-flight batching, paged KV caching, and speculative decoding for peak throughput. Best suited for high-performance LLM serving on NVIDIA hardware in production environments.
Price
$0 – $0
License: Apache-2.0