Lead Developer Advocate

Philip Kiely

Community

SPC hackathon winners build with Llama 3.1 on Baseten

SPC hackathon winner TestNinja and finalist VibeCheck used Baseten to power apps for test generation and mood board creation.

News

Introducing automatic LLM optimization with TensorRT-LLM Engine Builder

The TensorRT-LLM Engine Builder empowers developers to deploy extremely efficient and performant inference servers for open source and fine-tuned LLMs.

1 other
Community

Ten reasons to join Baseten

Baseten is a Series B startup building infrastructure for AI. We're actively hiring for many roles — here are ten reasons to join the Baseten team.

Model performance

How to serve 10,000 fine-tuned LLMs from a single GPU

LoRA swapping with TRT-LLM supports in-flight batching and loads LoRA weights in 1-2 ms, enabling each request to hit a different fine-tune.

Glossary

Control plane vs workload plane in model serving infrastructure

A separation of concerns between a control plane and workload planes enables multi-cloud, multi-region model serving and self-hosted inference.

Glossary

Comparing tokens per second across LLMs

To accurately compare tokens per second between different large language models, we need to adjust for tokenizer efficiency.

Hacks & projects

CI/CD for AI model deployments

In this article, we outline a continuous integration and continuous deployment (CI/CD) pipeline for using AI models in production.

3 others
Hacks & projects

Streaming real-time text to speech with XTTS V2

In this tutorial, we'll build a streaming endpoint for the XTTS V2 text to speech model with real-time narration and 200 ms time to first chunk.

Machine learning infrastructure that just works

Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalable, and cost-efficiently.