"Inference Engineering" is now available. Get your copy here

NVIDIA

GPU manufacturer and open-source model developer

Publisher details

In addition to manufacturing GPUs, NVIDIA's research teams create original and fine-tuned frontier models across modalities and domains to showcase their hardware's performance and possibilities.

NVIDIA's Nemotron 3 models

The Nemotron 3 family is NVIDIA's latest model family, a set of hybrid mixture-of-experts models built for agentic workloads. All three share a hybrid Mamba-Transformer architecture that keeps memory constant at long context lengths, latent MoE routing that consults more experts at lower compute cost, and NVFP4 training for high throughput without accuracy loss.

They're fully open-source — weights, training data, and recipes — making them well-suited for sensitive deployments with strict compliance requirements.

Which Nemotron 3 model to use

Nemotron 3 Nano

Nano (30B total, 3B active parameters) is optimized for high-volume, targeted tasks: summarization, retrieval, classification, and routing. With 4x the throughput of Nemotron 2 Nano, it's built to run as the high-frequency worker in a pipeline, handling intermediate steps at scale without becoming a bottleneck.

Nemotron 3 Super

Super (120B total, 12B active parameters) is the coordination and reasoning layer. It combines inputs across a 1M token context window, calls tools reliably, and orchestrates specialized sub-agents while generating tokens 50%+ faster than comparable models in the Qwen, Llama, and Mistral families.

For most agentic applications, Super is the right starting point.

Nemotron 3 Ultra

Ultra (parameter count not yet released) is NVIDIA's forthcoming high-end reasoning model, expected in early 2026. It's designed for the most demanding tasks in a pipeline — deep analysis, long-horizon planning, strategic decision-making — and can delegate routine work down to Super or Nano.