Deployment options

Baseten Self-hosted: speed and control in your cloud

Get the low latency, high throughput, and dev experience you expect from a managed service, right in your own VPC.

Talk to an engineer Try Baseten Cloud

‌

Baseten built for the enterprise

Engineered for compliance

Control data residency, align with customer requirements, and effectively meet stringent in-house, government, and industry standards like GPDR, HIPAA, and more.

Tailored performance

Gain the white glove support of our dedicated engineers, laser-focused on meeting or exceeding your performance targets with highly scalable, optimized inference.

Use cloud credits and commits

Leverage your current cloud provider credits and commitments to optimize inference costs, secure volume discounts, and streamline your billing process.

Choosing Self-hosted, Cloud or Hybrid

	Baseten Self-hosted	Baseten Cloud	Baseten Hybrid
Feature	Learn more	Learn more	Learn more
Data control	Full data control	Managed data security; we never store model inputs or outputs	Full data control in your VPC; managed data security on Baseten Cloud
Data residency requirements	Region-locked data and deployments	Multi-region support with global deployment options	Region-locked data and deployments with multi-region support
Compute capacity	Leverage existing in-house resources	Leverage on-demand compute with SOTA GPUs	Leverage existing resources or Baseten compute for overflow
Cost efficiency	Utilize dedicated resources without extra spend on hardware	Gain cost-effective, on-demand compute	Use in-house compute whenever available for optimized costs
Integration with internal systems	Custom or out-of-the-box integrations	Easy integration via Baseten's ecosystem	Custom or out-of-the-box integrations
Performance optimization	SOTA on-chip model performance and low network latency	SOTA on-chip model performance and low network latency	SOTA on-chip model performance and low network latency
Scalability	High, tailored scalability	High, flexible scaling options	High, tailored scalability with flex capacity on Baseten Cloud
Security and compliance	Adhere to custom organizational policies	SOC 2 Type II certified, HIPAA compliant, and GDPR compliant by default	Adhere to custom policies and our SOC 2 Type II, HIPAA, and GDPR compliance
Support and Maintenance	Comprehensive support and managed services	Comprehensive support and managed services	Comprehensive support and managed services
Utilization of existing cloud commits	Use credits or commits	Spend down existing cloud commits	Use credits or commits

Feature

Data control

Full data control

Data residency requirements

Region-locked data and deployments

Compute capacity

Leverage existing in-house resources

Cost efficiency

Utilize dedicated resources without extra spend on hardware

Integration with internal systems

Custom or out-of-the-box integrations

Performance optimization

SOTA on-chip model performance and low network latency

Scalability

High, tailored scalability

Security and compliance

Adhere to custom organizational policies

Support and Maintenance

Comprehensive support and managed services

Utilization of existing cloud commits

Use credits or commits

Learn more

Don't sacrifice performance for security

Millisecond-level response times

Model performance is our specialty. Get ultra-low latency and high throughput inference with dedicated engineering support and out-of-the-box optimizations.

Scale on demand

We optimized autoscaling so you don't have to. Effortlessly scale to infinity or down to zero to accomodate any traffic level.

Secure by design

Baseten Self-hosted gives you full control over data residency, keeping clients' intellectual property on your servers, and following established security practices.

Meet strict compliance

Keep data where you need it and address strict compliance and regulatory needs. Inference inputs and outputs will never hit our premises.

Use custom hardware

With complete control over your hardware and infrastructure, you can buy or use any hardware in-house to meet specific performance requirements.

Optimize resource usage

Fully utilize existing investments across cloud providers and in-house hardware to make optimal use of your resources.

Baseten supports billions of custom, fine-tuned LLM calls per week from OpenEvidence, serving high-stakes medical information to healthcare providers in every major healthcare facility in the country. If you see a doctor today, chances are that they are leveraging OpenEvidence for trustworthy, up-to-date medical information at their fingertips. Baseten's tireless dedication to reliability and deep support at scale has proven up to the task of supporting this at times literally life-or-death mission.
Speed is critical for Gamma. We're a PLG company: the faster we can deliver something great to our users, the happier they are with the product. That's why we partner with Baseten to serve our open-source image generation models. We generate millions of images a day on Baseten for our 70+ million users with ultra-low latency and high throughput.
Jon Noronha
Co-founder and CPO, Gamma