High-performance agents for financial services with NVIDIA Nemotron on Baseten

Baseten is excited to offer day-zero support for NVIDIA Nemotron Nano 2 VL, a highly accurate and efficient vision language model, alongside other models in the Nemotron family.

NVIDIA Nemotron provides open weights and data, making these models an excellent choice for enterprises who want to build specialized AI agents. The Nemotron Nano 2 VL is a 12 billion parameter model built on NVIDIA Nemotron Nano 2, a 9 billion parameter foundation model with a hybrid Mamba-Transformer architecture and reasoning capabilities.

Nemotron Nano 2 VL replaces the Llama Nemotron Nano VL with a larger, more capable, more scalable model with better performance across both vision language benchmarks and real-world testing.

The bar chart shows accuracy of Nemotron Nano VL and Nemotron Nano 2 VL models across visual benchmarks for multi-image understanding, document intelligence, and video captioning.

Nemotron Nano 2 VL delivers improved accuracy across visual benchmarks for multi-image understanding, document intelligence, and video captioning.

Nemotron Nano 2 VL is available to deploy today on Baseten and leverages NVIDIA NIM microservices for high-throughput, low-latency out-of-the-box performance.

Using vision language models in financial services

While every industry, from healthcare to media to manufacturing, benefits from high-quality open vision language models, the financial services industry has a number of particularly compelling use cases.

Great applications of vision language models include:

KYC and Identity Verification: know-your-customer compliance requires securely extracting PII from documents like drivers licenses and passports.
Intelligent Document Processing: run OCR on everything from receipts and invoices to mortgage documents and SEC filings to extract structured information.
Fraud Detection: scan customer interactions for unusual patterns across images and videos, not just textual records.

As enterprises in the financial services space build out differentiated agentic capabilities, adding vision agents for these kinds of use cases requires both highly accurate models and secure, scalable infrastructure.

When to use Nemotron Nano 2 VL and Nemotron Parse 1.1

Nemotron Nano 2 VL is a general-purpose vision language model that supports a wide range of tasks, including extracting and summarizing images, tables, charts, formulas, and other diagrams from documents.

Additionally, this expansion to the Nemotron family includes a smaller, faster, less expensive model that is particularly well-suited for straightforward optical character recognition (OCR) tasks: Nemotron Parse 1.1.

This one-billion-parameter vision language model provides accurate and efficient information parsing in a 10x-smaller footprint. Nemotron Parse 1.1 runs faster and with higher throughput than Nemotron Nano 2 VL, making it an ideal choice for simple, high-volume, cost-sensitive OCR tasks.

Accelerating AI workflows in financial services enterprises with NVIDIA Nemotron

Financial services enterprises from banks to insurance companies to investment firms need secure, reliable AI models to power their agents. Nemotron models are built with open data and offer recipes for developers to create trustworthy custom AI for their specific needs.

Reliability is a function of model quality; models should provide accurate outputs. Nemotron Nano 2 VL offers best-in-class accuracy for a 12B model. The model was trained using NVIDIA-curated synthetics data to ensure high quality.

But reliability is also a function of inference infrastructure. Models need to be fast and scalable to support demanding enterprise workloads.

Achieving enterprise scale with NVIDIA NIM on Baseten

At Baseten, a leading AI infrastructure company focused on high-performance inference, we have day-zero support for Nemotron Nano 2 VL.

Baseten’s platform as a whole accelerates enterprise AI with:

Model performance: run generative AI models with the lowest latencies, like our industry-best GPT-OSS API built with NVIDIA Dynamo and NVIDIA Blackwell.
Multi-cloud infrastructure: access autoscaling GPUs from across every hyperscaler and neocloud as a single unified compute layer with active-active reliability and independent workload planes with multi-cloud capacity management.
Forward-deployed engineers: build alongside inference experts with hands-on technical support from Baseten’s forward-deployed engineers.
Enterprise-grade security: SOC 2 Type II and HIPAA compliance, self-hosting, audit logs, SSO, and more for a complete enterprise platform.

Baseten leverages a large number of NVIDIA technologies including NVIDIA Dynamo, NVIDIA TensorRT-LLM, NVIDIA TensorRT, custom CUDA kernels, and more in the Baseten Inference Stack.

Our inference stack lets developers swap in and out components as needed. One way to run Nemotron models on Baseten is with NVIDIA NIM, NVIDIA’s pre-packaged model serving containers. NIM microservices bundle runtime, dependencies, and performance optimizations together.

Caption: NVIDIA NIM microservices package open-source models and runtime technologies together in a portable, easy-to-deploy container.

Baseten offers out-of-the-box support for NVIDIA NIM (or any other model container) for convenient deployments of Nemotron.

Building with NVIDIA Nemotron Nano 2 VL in production

Whether you’re building agents for KYC, document processing, fraud detection, or another mission-critical workflow, give them an intelligence upgrade with Nemotron Nano 2 VL.

You can deploy Nemotron Nano 2 VL on Baseten today for scalable inference on NVIDIA’s latest vision model, or get in touch with our engineers to learn more about the performance, scale, security, and flexibility we offer enterprises, including our self-hosting capabilities.