"Inference Engineering" is now available. Get your copy here

Justin Yi

Software Engineer

Model performance

How we built production-ready speculative decoding with TensorRT-LLM

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

Speculative Decoding with TensorRT-LLM

Model performance

A quick introduction to speculative decoding

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

Intro to Speculative Decoding

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Abu Qader

Bryce Dubayah

Justin Yi

3 others

Speculative Decoding in Engine Builder

Model performance

Benchmarking fast Mistral 7B inference

Abu Qader

Pankaj Gupta

Philip Kiely

Abu Qader

3 others

Mistral 7B

Model performance

High performance ML inference with NVIDIA TensorRT

Philip Kiely

Justin Yi

1 other

NVIDIA TensorRT

Model performance

40% faster Stable Diffusion XL inference with NVIDIA TensorRT

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

40% faster SDXL

AI engineering

Build with OpenAI’s Whisper model in five minutes

Justin Yi

Whisper on Baseten

Explore Baseten today

Start deploying Talk to an engineer