Publisher details
In addition to manufacturing GPUs, NVIDIA's research teams create original and fine-tuned frontier models across modalities and domains to showcase their hardware's performance and possibilities.
NVIDIA's Nemotron 3 models
The Nemotron 3 family is NVIDIA's latest model family, a set of hybrid mixture-of-experts models built for agentic workloads. All three share a hybrid Mamba-Transformer architecture that keeps memory constant at long context lengths, latent MoE routing that consults more experts at lower compute cost, and NVFP4 training for high throughput without accuracy loss.
They're fully open-source — weights, training data, and recipes — making them well-suited for sensitive deployments with strict compliance requirements.
Which Nemotron 3 model to use
Nemotron 3 Nano
Nano (30B total, 3B active parameters) is optimized for high-volume, targeted tasks: summarization, retrieval, classification, and routing. With 4x the throughput of Nemotron 2 Nano, it's built to run as the high-frequency worker in a pipeline, handling intermediate steps at scale without becoming a bottleneck.
Nemotron 3 Super
Super (120B total, 12B active parameters) is the coordination and reasoning layer. It combines inputs across a 1M token context window, calls tools reliably, and orchestrates specialized sub-agents while generating tokens 50%+ faster than comparable models in the Qwen, Llama, and Mistral families.
For most agentic applications, Super is the right starting point.
Nemotron 3 Ultra
Ultra (parameter count not yet released) is NVIDIA's forthcoming high-end reasoning model, expected in early 2026. It's designed for the most demanding tasks in a pipeline — deep analysis, long-horizon planning, strategic decision-making — and can delegate routine work down to Super or Nano.