Nvidia Unveils Nemotron 3, Surpasses GPT-OSS In Throughput

Nvidia Unveils Nemotron 3 Super: A New Era in AI Model Efficiency

Nvidia has launched Nemotron 3 Super, a 120-billion-parameter AI model designed to tackle the inefficiencies of handling long-horizon tasks in enterprise settings. Released under an open model license, Nemotron 3 Super combines three distinct architectures to surpass competitors like GPT-OSS and Qwen in throughput, offering a promising tool for industries ranging from software engineering to cybersecurity.

### The Model and Its Architecture

Nemotron 3 Super is built on a hybrid architecture that merges state-space models, transformers, and a novel “Latent” mixture-of-experts design. This combination aims to optimize memory efficiency and precision reasoning. The model’s backbone, a Hybrid Mamba-Transformer, integrates Mamba-2 layers with Transformer attention layers. This setup enables the model to manage a 1-million-token context window without a significant memory footprint, addressing the “needle in a haystack” problem for tasks requiring deep associative recall.

Additionally, the Latent Mixture-of-Experts (LatentMoE) design enhances computational efficiency. By compressing tokens before routing them to experts, the model can engage more specialists without increasing computational costs. This is crucial for applications that require diverse reasoning skills, such as switching between programming languages or logical operations.

The model also employs Multi-Token Prediction (MTP), predicting multiple future tokens simultaneously, which accelerates structured generation tasks. This innovation contributes to up to 3x speed improvements in real-world applications.

### Competitive Context and Industry Implications

Nemotron 3 Super positions itself as a leader in AI model throughput, achieving up to 2.2x higher throughput than GPT-OSS-120B and 7.5x higher than Qwen3.5-122B. This efficiency is particularly beneficial for enterprises facing challenges with high-volume data processing.

Optimized for Nvidia’s Blackwell GPU platform, the model delivers 4x faster inference than previous architectures without compromising accuracy. It currently ranks first on the DeepResearch Bench, underscoring its capability in multi-step research across extensive document sets.

The release of Nemotron 3 Super under the Nvidia Open Model License offers enterprises a flexible framework for commercial use, with provisions for creating derivative works. However, it includes safeguard clauses to prevent misuse and protect Nvidia’s intellectual property.

### Future Prospects

The launch of Nemotron 3 Super has sparked significant interest in the tech community. Companies like CodeRabbit and Siemens are already integrating the model to enhance their operations, from codebase analysis to automating complex manufacturing workflows. Nvidia’s VP of AI Software, Kari Briski, emphasizes the model’s role in addressing the “context explosion” faced by companies moving beyond simple chatbot applications.

As Nemotron 3 Super becomes more widely adopted, it represents a pivotal shift in AI model deployment, offering enterprises a powerful tool to reduce operational inefficiencies. This development signals a new phase in AI, where advanced models can deliver both high performance and cost-effectiveness, redefining the landscape for enterprise applications.