Our Series E: we raised $300M at a $5B valuation to power a multi-model future.
READ
Product
Product
Platform
Platform
Developer
Developer
Resources
Resources
Research
Research
Customers
Customers
Pricing
Pricing
Log in
Get started
Abu Qader
Software Engineer
Model performance
Kimi K2 Thinking at 140+ TPS on NVIDIA Blackwell
Abu Qader
2 others
Model performance
How we made the fastest GPT-OSS on NVIDIA GPUs 60% faster
Tri Dao
2 others
Model performance
How Baseten achieved 2x faster inference with NVIDIA Dynamo
Abu Qader
2 others
Model performance
How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs
Amir Haghighat
4 others
News
Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference
Justin Yi
3 others
Model performance
How to double tokens per second for Llama 3 with Medusa
Abu Qader
1 other
News
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder
Abu Qader
1 other
Model performance
Benchmarking fast Mistral 7B inference
Abu Qader
3 others
Model performance
Introduction to quantizing ML models
Abu Qader
1 other
Explore Baseten today
Start deploying
Talk to an engineer