Lead Developer Advocate
SPC hackathon winners build with Llama 3.1 on Baseten
SPC hackathon winner TestNinja and finalist VibeCheck used Baseten to power apps for test generation and mood board creation.
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder
The TensorRT-LLM Engine Builder empowers developers to deploy extremely efficient and performant inference servers for open source and fine-tuned LLMs.
Ten reasons to join Baseten
Baseten is a Series B startup building infrastructure for AI. We're actively hiring for many roles — here are ten reasons to join the Baseten team.
How to serve 10,000 fine-tuned LLMs from a single GPU
LoRA swapping with TRT-LLM supports in-flight batching and loads LoRA weights in 1-2 ms, enabling each request to hit a different fine-tune.
Control plane vs workload plane in model serving infrastructure
A separation of concerns between a control plane and workload planes enables multi-cloud, multi-region model serving and self-hosted inference.
Comparing tokens per second across LLMs
To accurately compare tokens per second between different large language models, we need to adjust for tokenizer efficiency.
CI/CD for AI model deployments
In this article, we outline a continuous integration and continuous deployment (CI/CD) pipeline for using AI models in production.
Streaming real-time text to speech with XTTS V2
In this tutorial, we'll build a streaming endpoint for the XTTS V2 text to speech model with real-time narration and 200 ms time to first chunk.