The Information Machine

NVIDIA Nemotron 3 Ultra: Hybrid SSM/MoE Architecture Launch and Benchmarks · history

Version 2

2026-06-05 18:33 UTC · 72 items

What

NVIDIA launched Nemotron 3 Ultra, a 550B-total-parameter (55B active) hybrid Mamba SSM/MoE open-weights model, at Jensen Huang's Computex 2026 keynote and shipped it June 4, 2026 [1][2]. Independent benchmarks edge it ahead of Qwen3.6-27B, and a direct comparison shows near-identical code output to GPT-5.5 at roughly 10x lower cost [5][7]. A 'Nemotron Coalition' — an apparent NVIDIA-organized consortium for American open-source model development — has drawn sharp criticism from SemiAnalysis, which called it 'communist committee-style' and stated it would use Chinese open models instead [13][14]. Enterprise integrators adopted same-day; production reliability remains an open question.

Why it matters

The model's MoE-driven cost efficiency lowers the barrier for enterprises that would otherwise depend on proprietary APIs, and the 10x cost gap versus GPT-5.5 is large enough to shift model-selection decisions if reliability concerns prove manageable. The SemiAnalysis critique of the Nemotron Coalition adds a governance dimension: how American open-source AI development should be organized is now an active debate alongside the model's technical merits.

Open questions

  • What exactly is the Nemotron Coalition, who are its members, and what is its governance structure? [13][14]

  • Does Nemotron 3 Ultra meet production reliability standards? CodeRabbit flagged a specific reliability caveat for production workloads [9][10].

  • Will independent benchmark suites beyond NVIDIA's chosen GDPval-AA confirm its ranking claims, or does the keynote-citation circularity persist? [4]

  • Does the ~10x cost advantage over GPT-5.5 [7] hold across task types beyond code generation, and does latency stay competitive at scale?

Narrative

NVIDIA announced Nemotron 3 Ultra at Jensen Huang's Computex 2026 keynote and shipped the model on June 4, 2026. It has 550B total parameters with 55B active — a mixture-of-experts ratio of roughly 10:1 — and uses a hybrid architecture combining Mamba-style state-space models (SSM) with a standard Transformer backbone [1][2]. The SSM component handles long sequences without the quadratic memory cost of full attention, making the architecture suited to extended agentic reasoning and multi-step tool use. NVIDIA positions it as the most intelligent US open-weights model, citing Artificial Analysis benchmark data (GDPval-AA) in the Computex keynote to support that claim [3][4].

Performance data from launch day is mixed but largely favorable on cost grounds. Artificial Analysis ran independent tests and found Nemotron 3 Ultra scored 47.7 on their benchmark versus Qwen3.6-27B's 45.8, with faster task completion [5][6]. A side-by-side code generation comparison showed output nearly identical to GPT-5.5 on an HTML5 physics simulation task at $0.051 versus GPT-5.5's $0.57 — roughly a 10x cost difference [7]. On agent-specific benchmarks, one observer cited a score of 89.9% versus Claude Opus 4.7's 93.5% [8], and CodeRabbit's evaluation found the model close to baseline but flagged reliability as a specific concern for production use [9][10]. Enterprise integrators including Glean and Nebius announced same-day support [11][12].

Alongside the model, NVIDIA appears to be organizing a 'Nemotron Coalition' — a consortium-based effort to advance American open-source model development. SemiAnalysis published sharp criticism on June 5, calling the coalition 'communist committee-style' and arguing that free-market competition, not coordinated coalition development, is the correct path for American open models to close the capability gap with Chinese ones [13][14]. SemiAnalysis went further, stating its team would use Chinese open models like Kimi for OSS work, deliberately inverting the US-China framing by labeling an American consortium as the non-market option and a Chinese model as the free-market one [14]. This critique introduces a governance debate that runs parallel to the model's technical performance discussion.

The broader competitive context includes skeptics who argue Nemotron models historically rank around fifth in open-source rankings and that 550B scale is primarily a compute showcase [15][16], and Grok/xAI's explicit statement that it will not adopt Nemotron 3 Ultra as xAI trains its own models end-to-end [17]. The benchmark selection drew scrutiny for circularity: using the same Artificial Analysis metric (GDPval-AA) in the keynote that NVIDIA cited for performance claims was noted by independent evaluators [4].

Timeline

  • 2026-05-29: Social media attention builds around an upcoming NVIDIA model release before any official announcement. [23][24]
  • 2026-06-01: Rohan Paul announces Nemotron 3 Ultra will ship within days and summarizes its hybrid SSM/MoE architecture and long-context advantages. [2]
  • 2026-06-01: Skeptical commentary emerges: Nemotron models typically rank ~5th in open-source rankings and 550B scale is called a 'compute flex.' [15][16]
  • 2026-06-03: Artificial Analysis notes NVIDIA used its GDPval-AA benchmark in Jensen Huang's Computex 2026 keynote to support Nemotron 3 Ultra's performance claims. [4][3]
  • 2026-06-04: Nemotron 3 Ultra officially ships; becomes available on Nebius Token Factory and through Kilocode. [12][25][26]
  • 2026-06-04: Glean announces same-day enterprise support for Nemotron 3 Ultra. [11][22]
  • 2026-06-04: Artificial Analysis independent benchmarks on launch day: Nemotron 3 Ultra scores 47.7 versus Qwen3.6-27B's 45.8 with faster task completion. [6][5]
  • 2026-06-04: CodeRabbit evaluation: Nemotron 3 Ultra lands close to baseline with a reliability caveat for production use. [9][21][10]
  • 2026-06-04: One observer cites an agent benchmark score of 89.9% for Nemotron 3 Ultra versus 93.5% for Claude Opus 4.7. [8]
  • 2026-06-04: Grok/xAI denies adopting Nemotron 3 Ultra; xAI builds and trains its own models end-to-end. [17]
  • 2026-06-05: Side-by-side comparison shows Nemotron 3 Ultra producing near-identical code output to GPT-5.5 at roughly 10x lower cost ($0.051 vs $0.57). [7]
  • 2026-06-05: SemiAnalysis calls the Nemotron Coalition 'communist committee-style' and states it will use Chinese open models (Kimi) instead for OSS work. [13][14]

Perspectives

NVIDIA (official)

Nemotron 3 Ultra is the most intelligent US open-weights model, built for long-running agentic workloads, running 5x faster at inference with reduced costs.

Evolution: Consistent with NVIDIA's positioning of the Nemotron line as enterprise-grade open-weights leaders; Ultra is the largest and most capable in the series.

Artificial Analysis (@ArtificialAnlys)

Independent benchmarks confirm Nemotron 3 Ultra edges Qwen3.6-27B (47.7 vs 45.8) with faster task completion; their GDPval-AA metric was cited in NVIDIA's keynote.

Evolution: Consistent empirical testing role; elevated to keynote-slide prominence by NVIDIA's citation.

Rohan Paul (@rohanpaul_ai)

Positive on architectural rationale and cost case: SSM enables long-context processing without attention bottlenecks, and the ~10x cost gap versus GPT-5.5 on code tasks is compelling for developers.

Evolution: Consistent; provided pre-release architecture summary and followed up with a direct GPT-5.5 price comparison on launch day.

CodeRabbit (@coderabbitai)

Measured: Nemotron 3 Ultra performs close to baseline in their code review benchmark but reliability is a specific concern for production workloads.

Evolution: Consistent; the full blog post confirms the same reliability caveat raised at launch.

Skeptics (ECLresearch, Aero)

Nemotron 3 Ultra is primarily a compute showcase; the open-source ecosystem has moved past raw scale, and Nemotron models historically rank around fifth while benefiting from NVIDIA-optimized inference.

Evolution: Consistent skeptical position; not updated by launch-day benchmark data.

SemiAnalysis (@SemiAnalysis_)

The Nemotron Coalition is 'communist committee-style' and the wrong approach for American open-source AI; free-market competition — including using Chinese open models like Kimi — is preferable.

Evolution: New voice in the thread; introduced a governance critique that did not previously appear.

Enterprise integrators (Glean, Nebius)

Enthusiastic same-day adoption; framing Nemotron 3 Ultra as expanding enterprise model choice for agentic applications.

Evolution: Consistent; both announced support on launch day.

Grok/xAI (@grok)

Explicitly not adopting Nemotron 3 Ultra; xAI builds and trains its own models end-to-end.

Evolution: Reactive clarification; consistent with xAI's closed-ecosystem approach.

Tensions

  • NVIDIA claims Nemotron 3 Ultra is the most intelligent US open-weights model; skeptics argue Nemotron models historically land ~5th in open-source rankings and that 550B scale is primarily an inference-speed marketing move. [18][15][16]
  • NVIDIA's keynote benchmark (GDPval-AA from Artificial Analysis) drives the launch narrative; independent evaluators including CodeRabbit and an agent-benchmark observer show more modest or mixed results. [4][9][8]
  • Rohan Paul's comparison shows ~10x cost savings versus GPT-5.5 on code generation; CodeRabbit's reliability caveat suggests those savings may come with production stability trade-offs. [7][9][10]
  • Nemotron 3 Ultra scores 89.9% on an agent benchmark versus Claude Opus 4.7's 93.5%, qualifying NVIDIA's 'most capable' framing against the closed-source frontier. [8]
  • SemiAnalysis argues the Nemotron Coalition's coordinated development model contradicts American free-market principles and prefers Chinese open models; the coalition's implicit premise is that coordination advances American open-source AI. [13][14]

Sources

  1. [1] NVIDIA has announced Nemotron 3 Ultra at Jensen Huang's Computex 2026 keynote. At 550 billion total parameters (55 billi... — reactive:nvidia-nemotron-ultra (2026-06-04)
  2. [2] Nemotron 3 Ultra will be available from Nvidia in few days. — Rohan Paul Twitter (2026-06-01)
  3. [3] Jensen Huang’s keynote at Computex used Artificial Analysis benchmarks to communicate the performance of Nemotron 3 Ultr... — reactive:nvidia-nemotron-ultra (2026-06-03)
  4. [4] NVIDIA used the GDPval-AA benchmark to promote the launch of NVIDIA Nemotron 3 Ultra https://t.co/N6e4ItzYev — reactive:nvidia-nemotron-ultra (2026-06-03)
  5. [5] Not only did Nemotron 3 Ultra beat Qwen3.6-27b in raw AA benchmark (47.7 vs 45.8) but it also completed the tasks faster... — reactive:nvidia-nemotron-ultra (2026-06-04)
  6. [6] Nemotron 3 Ultra was launched today, including a focus on low latency agentic performance. We tested it against peers un... — reactive:nvidia-nemotron-ultra (2026-06-04)
  7. [7] Nemotron 3 Ultra vs GPT-5.5 on atomic[.]chat, a desktop app that runs LLMs locally. — Rohan Paul Twitter (2026-06-05)
  8. [8] @turingou 我一直堅信固定工作流的agent不需要SoTA,英偉達nemotron 3 ultra在agent benchmark 拿了89.9%,OPUS 4.7拿了93.5%。 — reactive:nvidia-nemotron-ultra (2026-06-04)
  9. [9] In our CodeRabbit benchmark, Nemotron 3 Ultra landed close to the baseline: — reactive:nvidia-nemotron-ultra (2026-06-04)
  10. [10] Nemotron 3 Ultra makes the case for fast, open coding models — reactive:nvidia-nemotron-ultra
  11. [11] 🎉 We’re announcing support for @nvidia Nemotron Ultra 3! — reactive:nvidia-nemotron-ultra (2026-06-04)
  12. [12] NVIDIA Nemotron™ 3 Ultra is now live on Nebius Token Factory. — reactive:nvidia-nemotron-ultra (2026-06-04)
  13. [13] The path for American open models to catch up to Chinese open models is not through a “coalition,” but through the found… — SemiAnalysis Twitter (2026-06-05)
  14. [14] We fundamentally disagree with the communist committee-style “Nemotron Coalition” approach to developing OSS models, and… — SemiAnalysis Twitter (2026-06-05)
  15. [15] @jun_song Solid skepticism. Nemotron 3 Ultra at 550B is a compute flex, but the open-weight ecosystem moved past "just b... — reactive:nvidia-nemotron-ultra (2026-06-01)
  16. [16] @scaling01 Nemotron models are always released to be #5 on open-source while using the latest nvidia tricks to be fast a... — reactive:nvidia-nemotron-ultra (2026-06-01)
  17. [17] @xiaosun86 @NVIDIAAI No, Grok isn't adopting Nemotron 3 Ultra. xAI builds and trains its own models end-to-end. Strong b... — reactive:nvidia-nemotron-ultra (2026-06-01)
  18. [18] Nemotron 3 Ultra 550B (A55B) is the largest Nemotron 3 model to date, and the most intelligent US open weights model rig... — reactive:nvidia-nemotron-ultra (2026-06-01)
  19. [19] NVIDIA ships Nemotron 3 Ultra, a 550B MoE model for long-running agents. It runs 5x faster at inference and reduces cost... — reactive:nvidia-nemotron-ultra (2026-06-04)
  20. [20] Build Agentic AI with Multimodal Foundation Models - NVIDIA — reactive:nvidia-nemotron-ultra
  21. [21] The caveat: reliability. — reactive:nvidia-nemotron-ultra (2026-06-04)
  22. [22] Excited to expand model choice in @Glean with @NVIDIA Nemotron 3 Ultra. — reactive:nvidia-nemotron-ultra (2026-06-04)
  23. [23] THIS IS THE FUTURE OF AI and it just dropped — reactive:nvidia-nemotron-ultra (2026-05-29)
  24. [24] Day 2 of Trajectory!! — reactive:nvidia-nemotron-ultra (2026-05-29)
  25. [25] Nemotron 3 ultra, Qwen 3.7 plus, laguna m.1, step 3.7 flash ALL FREE on kilocode. but which to use? BENCHMARK TEST TIME.... — reactive:nvidia-nemotron-ultra (2026-06-04)
  26. [26] 550 billion parameters. Open weights. Ships today. — reactive:nvidia-nemotron-ultra (2026-06-04)