Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

NVIDIA Blog · Shruti Koparkar · 2026-06-16

NVIDIA Blackwell achieved the fastest training times on all seven MLPerf Training 6.0 benchmarks — including new DeepSeek-V3 671B and GPT-OSS-20B mixture-of-experts workloads — with scaled runs reaching 8,192 GPUs and GB300 NVL72 delivering up to 1.6x improvement over GB200.

Open original ↗

Appears in

NVIDIA Launches Vera CPU and Vera Rubin NVL72 at COMPUTEX / GTC Taipei

Extraction

Topics: mlperf-benchmarksnvidia-blackwellai-traininggpu-performancemixture-of-experts

Claims

NVIDIA Blackwell was the only platform submitted across all seven MLPerf Training 6.0 benchmarks and delivered the fastest training time on each.
GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72 at the same scale, driven by NVFP4 compute density, expanded memory, and a higher power ceiling.
NVIDIA scaled to 8,192 GPUs on DeepSeek-V3 671B, the largest Blackwell-based submission in MLPerf Training history.
NVFP4 training methods increase performance while meeting strict accuracy requirements across pretraining and fine-tuning workloads.
CoreWeave achieved the fastest DeepSeek-V3 671B training time of 2.02 minutes at 8,192-GPU scale using GB300 NVL72 with Spectrum-X Ethernet.

Key quotes

The NVIDIA platform was the only one to be submitted across every benchmark, and delivered the fastest time to train on all seven.

GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72 at the same scale.

NVIDIA Resiliency Extension, or NVRx, minimizes the time lost when faults do occur, with capabilities spanning fault detection, recovery and health monitoring across the cluster.