Product
Product
Platform
Platform
Solutions
Solutions
Developer
Developer
Resources
Resources
Pricing
Pricing
Log in
Get started
Philip Kiely
Lead Developer Advocate
Infrastructure
Using fractional H100 GPUs for efficient model serving
Matt Howard
3 others
Model performance
Benchmarking fast Mistral 7B inference
Abu Qader
3 others
Model performance
33% faster LLM inference with FP8 quantization
Pankaj Gupta
1 other
Model performance
High performance ML inference with NVIDIA TensorRT
Justin Yi
1 other
Model performance
FP8: Efficient model inference with 8-bit floating point numbers
Pankaj Gupta
1 other
Infrastructure
The benefits of globally distributed infrastructure for model serving
Phil Howes
1 other
Model performance
40% faster Stable Diffusion XL inference with NVIDIA TensorRT
Pankaj Gupta
2 others
Model performance
Why GPU utilization matters for model inference
Marius Killinger
1 other
AI engineering
The best open source large language model
Philip Kiely
1
2
3
4
5
6
7
Explore Baseten today
Start deploying
Talk to an engineer