Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ
Case study

How Baseten and Sully.ai returned 30M+ clinical minutes to healthcare using open-source models.

90%

Inference cost savings

65%

Lower median latency

30M

Minutes added to workforce

21x

Return on Agent Spend (ROAS)

Company Overview

Sully.ai is a healthcare technology company that builds AI-powered autonomous agents designed to automate clinical and administrative workflows for hospitals and enterprise healthcare organizations. Their suite of AI “employees” includes tools like automated clinical notes generation, AI Receptionist, AI Medical Coder, AI Nurse, and more. All these agents are intended to reduce documentation burden, improve clinical efficiency, free clinicians to focus on patient care, reduce cost of care delivery and ultimately enhance patient and provider experiences.

The company's technical foundation reflects a research-first approach to healthcare AI. Sully holds three patents, including one of the first patents on live clinical decision support using AI and a novel ensembling method that underpins its proprietary architecture. This architecture has surpassed all frontier foundation models on medical QA benchmarks. 

Sully’s solutions integrate directly with major electronic health records (EHR) systems and support real-time clinical documentation, patient intake, coding, triage, and other routine tasks through natural language and voice interactions. Their technology helps practices streamline operations, reduce burnout, and scale without proportional increases in staffing; not as isolated tools, but as a unified platform where context-aware agents learn and adapt to each provider's workflow.

Inference Challenges

Like many AI-first companies, Sully originally relied on proprietary closed-source models to power their agents and run their real-time inference workloads. While these closed-source services provided strong capabilities early on, as Sully’s platform scaled across healthcare practice, the demand placed on their system increased significantly and they started to run into critical limitations that were incompatible with Sully’s real-time, clinical grade use cases. These challenges fell into the three main categories:

Latency at scale:

Sully’s AI agents operate in live clinical environments and assist healthcare providers with critical tasks such as live decision support, clinical note generation, medical coding and post visit use workflows. In these scenarios, latency is not just a performance metric, it directly impacts clinician experience and workflow efficiency. As request volumes increased, Sully experienced inconsistent and elevated response times. This variability made it difficult to guarantee the fast, predictable interactions required for real-time documentation, leading Sully to seek an inference solution with more control over performance and deployment characteristics.

Unsustainable inference costs:

Inference costs quickly became a second major constraint. Pricing models based on tokens or per-request usage scaled linearly with Sully’s growth, making it increasingly expensive to support their workloads. 

This challenge reflects a broader industry shift: as AI products move from experimentation to production at scale, closed-source inference costs often balloon and become a bottleneck. For Sully, cost-efficiency wasn’t just about sustainable growth, it was about accessibility. Switching to open source models allowed them to price their AI agents competitively enough to serve all types of health systems and hospitals.

Model quality:

In healthcare, accuracy is non-negotiable. Even small inconsistencies in clinical documentation can create compliance risks and erode trust with providers. Sully observed that closed models could exhibit quality regressions or unpredictable behavior, particularly as providers adjusted pricing or rolled out model updates. This lack of control made it difficult to guarantee consistent, high-quality outputs across all clinical scenarios.

In response to these rising costs and the desire for more control over their stack, Sully began transitioning key workloads to open-source models and started to look for an inference provider that could help them sustainably and cost effectively scale inference while supporting their demanding performance requirements.


Solution

To address their latency, cost, and quality challenges, Sully transitioned its inference stack to open-source models running on Baseten. Beyond cost savings, Sully needed a platform that could move at the same pace as the rapidly evolving open-source ecosystem without sacrificing performance or reliability.

A key differentiator for Baseten was its speed of execution and responsiveness to new model releases. As new open-source models emerge, the window to gain a competitive advantage can be measured in days, not months. Sully required a partner that could operationalize these models immediately.

For example, when GPT OSS 120b was released, Baseten made the model available in its model library within two days, fully optimized for production inference using NVIDIA Dynamo, TensorRT-LLM, and NVFP4 and running on NVIDIA HGX B200. This allowed Sully to integrate the model into their product just days after its public release and quickly realize improvements in quality and performance without needing to manage infrastructure, optimization, or deployment themselves.

In addition to rapid model availability, Sully benefited from Baseten’s deep expertise in performance optimization. Models served through Baseten are carefully tuned for low latency, high throughput, and predictable behavior at scale. This combination of fast access to cutting-edge models and production-grade performance enabled Sully to continuously improve their AI agents while maintaining a stable and reliable user experience for medical providers.

“The open-source ecosystem is moving incredibly fast, and Baseten moves just as fast with it. Having access to newly released models like GPT OSS 120b within days, already optimized for production, gives us a real competitive edge. It means we can continuously improve model quality without slowing down product development.”
- Amit Kumthekar, Head of Research, Sully.ai

Results

Since transitioning to open-source inference with Baseten, Sully has realized major business and operational impacts:

Dramatic cost efficiency gains
Sully reports over 90% reduction in inference costs by switching from closed-source platforms to open-source models accessed via Baseten. It enabled them to re-invest these savings in the business to improve agents that help provide more value to practitioners and their patients.

Performance gains

Latency is critical for real-time clinical applications like live medical note generation or live decision making support. Following the transition to open-source models running on Baseten, Sully saw a significant improvement in performance:

  • P50 latency decreased from ~70 seconds with closed-source models to ~25 seconds for note generation.

  • P50 latency decreased from ~20 seconds to 5 seconds for decision support and also reduced random spikes significantly.

This 65% reduction in median latency resulted in faster, more predictable responses during clinical workflows, improving overall experience for physicians. 

Workforce Impact – “Return on Agent Spend” (ROAS) and “Minutes added to Workforce”
Sully is measuring 2 key impact metrics that clearly show the impact of their solution for practitioners:  

First is the concept of minutes added to the workforce - By automating scribing and administrative workflow tasks that previously consumed clinician time, Sully’s agents have significantly increased usable clinician capacity with impressive results:

  • January 2025: Sully counted ~750,000 minutes added to the workforce

  • December 2025: Sully has added ~29 million minutes to their customers’ workforce overall

Minutes Added to Workforce (MAW)Minutes Added to Workforce (MAW)

Second is the Return on Agent Spend (ROAS) which measures the return on investment for companies adopting its AI agents similar to when ads launched on all platforms as a new monetization framework. The results are extremely compelling with an overall RoAS of 21x broken down into:

  • 5% increase in patient retention

  • 2.4+ hours time saving per physician

  • 18.5% increase in number of patients

  • 98% decrease in abandonment rate


Quality at Scale with Open Models
Closed-source models often adjust performance when pricing drops which can lead to variations and even decline in output quality over time. By controlling the inference stack and model choice with open-source alternatives, Sully maintains consistent, high-quality performance tailored to clinical data, ensuring that quality remains a priority rather than a function of vendor pricing decisions.

“At our scale, inference efficiency matters as much as model quality. Baseten enabled us to cut costs by 90% while delivering significantly faster, more predictable performance. That efficiency is what allows us to add millions of productive minutes back into our customers.”
- Ahmed Omar, Co-Founder and CEO, Sully.ai

Results for Apogee Behavioral Medicine

Apogee Behavioral Medicine faced a crisis that’s all too common in healthcare: its team of skilled clinicians was drowning in paperwork instead of focusing on patient care.

Implementing Sully.ai’s AI Employees transformed the practice overnight, slashing documentation time to boost morale, patient engagement, retention, and revenue.

As a result of implementing Sully.ai they were able to:

  • Realize ~$1M in incremental revenue

  • Increase patient engagement score from 73% to 78%

  • Spend 92% less time on notes and replace it with patient care

“We’re getting paid faster, our compliance risk is much lower, and our administrative overhead is down. Most importantly, we have a happier, more effective clinical team that’s providing better care to more engaged patients. Sully has become a major competitive advantage for us in the marketplace.”
-Derek Ayers, CMO, Apogee Behavioral Medicine

What’s Next

Sully's transition to open-source inference is just the beginning. The team is now focused on pushing performance further; faster response times, higher accuracy, and agents that operate more seamlessly in the background of clinical workflows. As the platform matures, Sully is investing in tighter coordination across its suite of AI agents, enabling them to share context and build real memory. The goal is a more unified experience that adapts to how each provider practices, reducing friction and making the technology feel less like a set of features and more like an integrated part of the care team.

Powered by open source models and Baseten’s inference platform, Sully continues to explore novel architectures and publish findings that advance the state of healthcare AI. With Baseten's rapid model deployment capabilities, the team can quickly experiment with emerging open-source models and bring improvements to production without disrupting existing workflows, enabling Sully to stay at the frontier of what's possible in clinical ML.










Chosen by the world's most ambitious builders

How Baseten and Sully.ai returned 30M+ clinical minutes to healthcare using open-source models.