Customer stories
We're creating a platform for progressive AI companies to build their products in the fastest, most performant infrastructure available.
What our customers are saying
See allSahaj Garg,
Co-Founder and CTO
Inference for custom-built LLMs could be a major headache. Thanks to Baseten, we’re getting cost-effective high-performance model serving without any extra burden on our internal engineering teams. Instead, we get to focus our expertise on creating the best possible domain-specific LLMs for our customers.
Having lifelike text-to-speech requires models to operate with very low latency and very high quality. We chose Baseten as our preferred inference provider for Orpheus TTS because we want our customers to have the best performance possible. Baseten’s Inference Stack allows our customers to create voice applications that sound as close to human as possible.
Our AI engineers build domain-specific models that beat frontier labs in medical record interpretation. With Baseten Training, we can stay focused on our research and value to customers, not hardware and job orchestration. The Baseten platform powers our workflows from training through to production, saving us tons of time and stress.
Troy Astorino,
Co-founder and CTO
Having lifelike text-to-speech requires models to operate with very low latency and very high quality. We chose Baseten as our preferred inference provider for Orpheus TTS because we want our customers to have the best performance possible. Baseten’s Inference Stack allows our customers to create voice applications that sound as close to human as possible.
Our AI engineers build domain-specific models that beat frontier labs in medical record interpretation. With Baseten Training, we can stay focused on our research and value to customers, not hardware and job orchestration. The Baseten platform powers our workflows from training through to production, saving us tons of time and stress.
Customer Stories

Praktika delivers ultra-low-latency transcription for global language education with Baseten
With Baseten, Praktika delivers <300 milliseconds latency empowering language learners worldwide with a seamless conversational and learning experience.
Zed Industries serves 2x faster code completions with the Baseten Inference Stack
By partnering with Baseten, Zed achieved 45% lower latency, 3.6x higher throughput, and 100% uptime for their Edit Prediction feature.

Wispr Flow creates effortless voice dictation with Llama on Baseten
Wispr Flow runs fine-tuned Llama models with Baseten and AWS to provide seamless dictation across every application.
Rime serves speech synthesis API with stellar uptime using Baseten
Rime AI chose Baseten to serve its custom speech synthesis generative AI model and achieved state-of-the-art p99 latencies with 100% uptime in 2024

Baseten powers real-time translation tool toby to Product Hunt podium
The founders of toby worked with Baseten to deploy an optimized Whisper model on autoscaling hardware just one week ahead of their Product Hunt launch and had a top-three finish with zero downtime.
Custom medical and financial LLMs from Writer see 60% higher tokens per second with Baseten
Writer, the leading full-stack generative AI platform, launched new industry-specific LLMs for medicine and finance. Using TensorRT-LLM on Baseten, they increased their tokens per second by 60%.
Patreon saves nearly $600k/year in ML resources with Baseten
With Baseten, Patreon deployed and scaled the open-source foundation model Whisper at record speed without hiring an in-house ML infra team.