"Inference Engineering" is now available. Get your copy here

Rachel Rapp

Product

AI models

NVIDIA Nemotron 3 Super for agentic AI in financial services

Rachel Rapp

Day-zero support for NVIDIA Nemotron 3 Super

Community

The Baseten Inference Stack at NVIDIA Dynamo Day

Rachel Rapp

NVIDIA Dynamo Day and The Baseten Inference Stack

AI engineering

The fastest Whisper — with streaming and diarization

William Gao

Tianshu Cheng

4 others

Baseten powers the fastest, most accurate, and cost-efficient Whisper transcription on the market, with streaming and diarization.

Model performance

How Baseten achieved 2x faster inference with NVIDIA Dynamo

Abu Qader

Michael Feil

Abu Qader

2 others

2x faster inference with Nvidia Dynamo

Infrastructure

How we built Multi-cloud Capacity Management (MCM)

Colin McGrath

Phil Howes

William Lau

3 others

Building multi-cloud capacity management at Baseten

Infrastructure

How Baseten multi-cloud capacity management (MCM) unifies deployments

Amir Haghighat

Rachel Rapp

1 other

Baseten multi-cloud capacity management

News

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Michael Feil

Michael Feil

1 other

Introducing BEI

News

Baseten Chains is now GA for production compound AI systems

Marius Killinger

Tyron Jung

Marius Killinger

2 others

Baseten Chains

News

New observability features: activity logging, LLM metrics, and metrics dashboard customization

Aaron Relph

Marius Killinger

Sid Shanker

Suren Atoyan

4 others

Observability

Explore Baseten today

Start deploying Talk to an engineer