Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ

Abu Qader

Software Engineer

Abu Qader

Model performance

Kimi K2 Thinking at 140+ TPS on NVIDIA Blackwell

Abu Qader

Tri Dao

Philip Kiely

Abu Qader

2 others

Kimi K2 Thinking 140+ TPS

Model performance

How we made the fastest GPT-OSS on NVIDIA GPUs 60% faster

Tri Dao

Abu Qader

Philip Kiely

Tri Dao

2 others

650+ TPS on GPT OSS 120B

Model performance

How Baseten achieved 2x faster inference with NVIDIA Dynamo

Abu Qader

Michael Feil

Abu Qader

2 others

2x faster inference with Nvidia Dynamo

Model performance

How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs

Amir Haghighat

Tri Dao

Abu Qader

Bryce Dubayah

Philip Kiely

Amir Haghighat

4 others

GPT OSS 120B

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Abu Qader

Bryce Dubayah

Justin Yi

3 others

Speculative Decoding in Engine Builder

Model performance

How to double tokens per second for Llama 3 with Medusa

Abu Qader

Philip Kiely

Abu Qader

1 other

Double Llama TPS with Medusa

News

Introducing automatic LLM optimization with TensorRT-LLM Engine Builder

Abu Qader

Philip Kiely

Abu Qader

1 other

TensorRT-LLM Engine Creation

Model performance

Benchmarking fast Mistral 7B inference

Abu Qader

Pankaj Gupta

Philip Kiely

Abu Qader

3 others

Mistral 7B

Model performance

Introduction to quantizing ML models

Abu Qader

Philip Kiely

Abu Qader

1 other

Quantization

Explore Baseten today

Start deploying

Talk to an engineer