Blackwell’s MLPerf Run Puts AI Training Bottlenecks At Rack Scale

BySendTech Times DeskNewsroom-edited technology coverage|Source: NVIDIA Blog

Newsroom brief

NVIDIA says Blackwell led MLPerf Training 6.0 across all seven benchmarks, with submissions scaling to 8,192 GPUs and GB300 NVL72 training up to 1.6x faster than GB200 NVL72 at the same scale.

Verified against source materialEdited by SendTech Times Desk

Blackwell’s MLPerf Run Puts AI Training Bottlenecks At Rack Scale

Image source: NVIDIA Blog

Blackwell Moves The Benchmark Fight To Full Racks

NVIDIA’s latest MLPerf Training 6.0 results put the AI training race at rack and cluster scale, not just at the level of an individual accelerator.

The company says its Blackwell platform delivered the fastest time to train across all seven MLPerf Training 6.0 benchmarks.

It also submitted across every benchmark in the suite and scaled one Blackwell NVL72 training run to 8,192 GPUs.

Large AI training is increasingly limited by networking, memory, reliability and power envelopes.

Faster chips help, but frontier-model training also depends on whether thousands of GPUs can behave like a coordinated system long enough to finish a run.

MoE Workloads Raise The Networking Burden

MLPerf Training 6.0 added two mixture-of-experts pretraining workloads: DeepSeek-V3 671B and GPT-OSS-20B.

Those workloads make the network more important because tokens have to move across GPUs to reach the right expert subnetworks.

NVIDIA submitted results on GB200 NVL72 and GB300 NVL72 rack-scale systems.

In each NVL72 rack, fifth-generation NVLink Switches connect 72 GPUs into a shared pool of compute and memory.

NVIDIA put the GB300 NVL72 advantage at up to 1.6x versus GB200 NVL72 when both systems were measured at the same scale.

The company tied the gain to Blackwell Ultra features including NVFP4, expanded memory capacity and a higher power ceiling.

Scale Claims Now Depend On Reliability

The largest Blackwell submission used 8,192 GPUs on DeepSeek-V3 671B with GB200 NVL72 systems.

NVIDIA also submitted at 5,120 GPUs on Llama 3.1 405B.

Partner submissions show how hyperscalers and AI clouds are turning the same platform into cluster-scale results.

Microsoft Azure ran Llama 3.1 405B training across 8,192 GPUs and hit the reference quality target in 7.07 minutes.

CoreWeave used GB300 NVL72 with Spectrum-X Ethernet to reach the DeepSeek-V3 671B quality target in 2.02 minutes at 8,192-GPU scale.

At that scale, reliability becomes part of performance.

NVIDIA says production training can run for weeks or months across hundreds of thousands of GPUs, so checkpointing, fault detection and network rerouting affect whether theoretical throughput becomes usable training time.

Partner Demand Is The Commercial Proof

The ecosystem list is broad.

NVIDIA says 19 organizations submitted in this MLPerf round, including Microsoft Azure, CoreWeave, Google Cloud, Dell Technologies, Hewlett Packard Enterprise, Supermicro, Cisco and others.

Customer examples add another layer.

Cohere said its North agentic AI platform trained 3x faster on GB200 NVL72.

Thinking Machines Lab reported 2x faster training and serving speeds on GB300 NVL72 through Google Cloud.

Higgsfield said Nebius infrastructure cut model training time by 30%, while its service has 22 million users generating more than 6 million pieces of AI content per day.

The unresolved issue is not whether NVIDIA can post benchmark wins.

It is whether data-center operators can keep feeding these rack-scale systems with power, networking, cooling and customer workloads at the pace Blackwell demand implies.

#AI infrastructure #NVIDIA Blackwell #MLPerf Training #GB300 NVL72

Cloud & Data Centers

Buzz HPC’s £220 Million Deal Puts Canada’s Sovereign AI Push on Nvidia Racks

Buzz HPC signed a three-year sovereign AI contract with Bell Canada and Cohere, with 2,304 Nvidia Grace Blackwell GPUs planned for Bell Canada’s Merritt data center.

Cloud & Data Centers

CAS Star’s Photonics Bet Turns Into an AI Infrastructure Test

CAS Star founder Mi Lei says the AI boom has validated a decade-long investment thesis around photonics and other hard-tech fields. The firm has more than 200 photonics-related companies among roughly 600 portfolio companies, spanning sensing, communications, computing, storage and display. The next test is whether optical links, laser chips and photonic computing companies can turn AI data-centre demand into durable commercial scale.

Cloud & Data Centers

Equinix HK6 Links Hong Kong AI Capacity To Shenzhen’s Innovation Corridor

Equinix has opened the first phase of HK6 in Hong Kong, adding 1,000 cabinets, direct-to-chip liquid cooling and a private low-latency link to the Hong Kong-Shenzhen Innovation and Technology Park.

Cloud & Data Centers

KDDI’s Osaka AI Data Center Turns Liquid Cooling Into A Power Test

KDDI is moving liquid cooling into an Osaka AI data center after a 2023 immersion test cut server cooling energy use by 94 percent and lowered PUE to 1.05.