Cerebras Inference

Model InferenceFreemiumVerified

Inference API built on Cerebras Wafer-Scale Engine hardware delivering extremely high token throughput. Achieves 2000+ tokens/sec on Llama models, making it one of the fastest inference providers available.