The Information Machine

AMD and Google TPU Closing the Gap on NVIDIA · history

Version 4

2026-05-25 06:42 UTC · 87 items

What

AMD, Google, IBM, Red Hat, and NVIDIA are competing across hardware benchmarks, software infrastructure governance, and market-scale demand in mid-2026. MLCommons officially published MLPerf Inference v6.0 results in April 2026 [1], prompting Forbes to ask whether AMD had beaten NVIDIA in AI performance [2] while Nebius simultaneously framed the same round as showcasing top-tier NVIDIA results [4] — a split that captures the competitive ambiguity. IBM, Red Hat, and Google jointly donated llm-d and its TPU drivers to the CNCF [8], with Red Hat's formal inclusion broadening the enterprise-credibility of the neutral Kubernetes inference coalition. Against these challengers, Jensen Huang's $1 trillion GPU purchase order projection through 2027 [16][17] has been amplified across mainstream financial and social media, anchoring NVIDIA's incumbency narrative.

Why it matters

NVIDIA's dominance has rested on raw hardware performance and ecosystem lock-in. AMD now holds both a concrete inference cost advantage and official benchmark presence in MLPerf v6.0, while Red Hat's entry into the llm-d coalition signals that the CNCF-governed inference stack is gaining enterprise distribution reach. AMD's continued absence from that stack is the sharpest gap between its hardware credibility and its software positioning — and whether it closes that gap before NVIDIA's projected demand wave locks in another hardware cycle is the defining question for AI infrastructure in 2026.

Open questions

  • MLPerf v6.0 results are generating divergent framings — Forbes asks if AMD beat NVIDIA [2] while Nebius emphasizes NVIDIA's top-tier position [4]. The actual head-to-head token throughput and efficiency comparison between AMD MI355 and NVIDIA B200 on identical benchmark tasks is the unresolved data point that would settle the interpretation.

  • Red Hat's formal inclusion in the llm-d donation [8] adds enterprise distribution reach to the CNCF coalition. Does this accelerate operator adoption of the framework in ways that make AMD's continued absence from llm-d's supported hardware list [15] more consequential than a simple missing-driver story?

  • Jensen Huang's $1 trillion GPU purchase order projection [16] has become a mainstream investor narrative. How much of this demand is contractually locked versus indicative, and can AMD or Google TPU capture any share of it if inference cost advantages persist?

  • Does AMD's 40% cost advantage on GLM5 FP8 inference [5] generalize to other frontier model architectures, or is it specific to GLM5's design and speculative decoding configuration via SGLang?

Narrative

AMD's competitiveness in AI inference has moved from vendor claims to formalized benchmark territory. In April 2026, MLCommons officially published MLPerf Inference v6.0 results [1], prompting Forbes to directly ask whether AMD had just beaten NVIDIA in AI performance [2] — a question that captures the genuine competitive ambiguity of the round. Spheron's analysis positioned the results as providing clear GPU performance rankings for AI workloads [3], while Nebius simultaneously highlighted top-tier NVIDIA performance from the same data [4]. The divergent framings suggest AMD's results were strong enough to force the question while NVIDIA retained the absolute top position on the leaderboard. This benchmark context builds on AMD's earlier confirmation of a 40% cost advantage over NVIDIA's B200 on single-node GLM5 FP8 inference, achieved just 14 weeks after GLM5's launch using SGLang v0.12 across both ROCm and CUDA backends [5] — a result SemiAnalysis framed as evidence that ROCm's software maturity has crossed a threshold where AMD hardware can realize its cost advantages in production settings. AMD's MI355X carries higher manufacturing costs than NVIDIA's competing chip while pricing below cost [6], a deliberate margin sacrifice to gain inference market share, though AMD's MI350 GPU has separately seen a 66.7% price increase amid NVIDIA rivalry [7], revealing a nuanced rather than uniformly below-cost posture across its accelerator portfolio.

On the software infrastructure front, IBM, Red Hat, and Google jointly donated llm-d — an open-source Kubernetes-native distributed inference framework — and its TPU drivers to the Cloud Native Computing Foundation [8][9][10]. The New Stack framed the contribution as IBM, Red Hat, and Google aligning around a shared future for AI infrastructure [11], underscoring that Red Hat's formal inclusion — as an enterprise Linux and OpenShift distribution leader — adds enterprise deployment reach beyond what IBM and Google alone could provide. Google added nightly CI coverage for TPU hardware to llm-d [12], and SemiAnalysis noted that TPU has reached parity with NVIDIA in llm-d code quality, with community commentators flagging this software moat narrowing as an under-appreciated signal [13]. CoreWeave, a major NVIDIA-centric cloud provider, publicly endorsed llm-d's CNCF acceptance as significant for production inference infrastructure [14], a notable cross-vendor validation. AMD's position on this stack remains a concrete production blocker: an open GitHub issue confirms llm-d images do not support AMD MI300 GPUs [15], leaving AMD hardware unsupported in the inference framework that enterprise Kubernetes operators are increasingly standardizing on, even as NVIDIA and Google TPU both hold active CI coverage.

Against these competitive signals, Jensen Huang projected $1 trillion in GPU purchase orders through 2027 at NVIDIA's GTC conference [16], a claim that Data Center Dynamics, Yahoo Finance, and CryptoBriefing all covered as a primary macro-demand confidence signal [17][18][19], with further amplification across retail investor and social platforms [20]. Huang simultaneously articulated a low-MFU-by-design philosophy [21], arguing that deliberate GPU over-provisioning is a strategic asset that provides flexibility — a framing that positions capacity headroom above cost-per-token efficiency and directly counters AMD's inference cost-advantage narrative. If hyperscalers internalize this philosophy and lock in GPU procurement at the scale Huang projects, cost-efficiency comparisons matter less than incumbency and demand volume.

One signal that cuts across the competitive frame: AMD contributed upstream to NVIDIA's AIPerf benchmarking sub-project within the Dynamo repository [22] — believed to be the first such cross-competitor open-source merge — suggesting shared interest in measurement infrastructure even as hardware and pricing strategies diverge sharply. Google added 3X inference speedups using diffusion-style speculative decoding [23], extending its technical output on the llm-d platform. The competitive picture in mid-2026 is thus multi-layered: AMD leads on published inference cost efficiency and holds formal MLPerf presence; Google, IBM, and Red Hat are building an enterprise-credible, CNCF-governed inference stack that NVIDIA and TPU participate in but AMD does not; and NVIDIA is projecting demand incumbency at a scale that could absorb competitive pressure before it becomes structural.

Timeline

  • 2026-03-24: CNCF formally welcomes llm-d to its sandbox; IBM, Red Hat, and Google co-donate llm-d and TPU drivers to the CNCF under neutral governance. [9][10][34][8]
  • 2026-04-06: MLCommons officially publishes MLPerf Inference v6.0 results. Forbes asks 'Did AMD just beat NVIDIA in AI performance?'; Nebius highlights top-tier NVIDIA performance from the same round; Spheron publishes GPU rankings analysis. [2][3][1][4]
  • 2026-05-16: AMD's contribution accepted into NVIDIA's AIPerf benchmarking repository — believed to be a first cross-competitor upstream merge. [22]
  • 2026-05-17: Jensen Huang at Stanford CS153 articulates 'low MFU by design' as a deliberate over-provisioning philosophy. [21]
  • 2026-05-19: AMD MI355 confirmed 40% cheaper than NVIDIA B200 on GLM5 single-node FP8 inference, 14 weeks post-launch. [5]
  • 2026-05-21: Google adds nightly CI for llm-d on TPU hardware; SemiAnalysis notes TPU has reached parity with NVIDIA in llm-d code quality. [12][13]
  • 2026-05: Jensen Huang's $1 trillion GPU purchase order projection through 2027 amplified across mainstream financial media, social platforms, and retail investor communities. [16][17][19][35][36][18][20]
  • 2026-05: AMD MI350 GPU price rises 66.7% amid NVIDIA rivalry, adding complexity to the AMD below-cost pricing narrative. [7]

Perspectives

SemiAnalysis

Bullish on AMD and Google TPU progress; frames inference cost-efficiency and software ecosystem parity as decisive competitive dimensions. Views cross-competitor open-source collaboration as historically notable. Analytically provocative on Jensen Huang's low-MFU philosophy.

Evolution: Consistent throughout — SemiAnalysis has been the primary reporting voice, maintaining a pro-competition, anti-NVIDIA-moat framing.

Jensen Huang / NVIDIA

Projects $1 trillion in GPU purchase orders through 2027, a claim now amplified across mainstream financial media and social platforms. Simultaneously reframes low GPU utilization as intentional over-provisioning strategy, positioning deliberate capacity headroom as a strategic asset that blunts cost-efficiency comparisons.

Evolution: The $1 trillion forecast has crossed from conference statement to broad investor and retail narrative this pass, significantly widening the audience for NVIDIA's incumbency counter-argument.

IBM Research and Red Hat

Co-donated llm-d to the CNCF alongside Google, with Red Hat's formal inclusion signaling enterprise Linux and OpenShift distribution reach beyond what IBM and Google alone represent. The coalition frames the donation as establishing a shared AI infrastructure future.

Evolution: Red Hat is explicitly named as a co-contributor this pass, expanding the coalition's enterprise-credibility narrative beyond hyperscalers to enterprise IT distribution incumbents.

CoreWeave

Publicly frames llm-d's CNCF acceptance as significant for production inference infrastructure, despite CoreWeave's infrastructure being NVIDIA-centric — a notable cross-vendor endorsement.

Evolution: Consistent with prior pass.

AMD (hardware and pricing actions)

Formally represented in standardized benchmarks via MLPerf Inference v6.0, with results strong enough to prompt direct NVIDIA comparisons in mainstream coverage. Pricing MI355X below manufacturing cost to gain inference market share while MI350 prices have risen 66.7%, revealing a selective rather than uniformly aggressive pricing posture.

Evolution: Official MLPerf v6.0 publication moves AMD's competitiveness claim from self-described 'breakthrough' to external benchmark documentation, though the head-to-head interpretation remains contested.

Nebius (NVIDIA ecosystem)

Frames MLPerf Inference v6.0 as demonstrating top-tier AI performance on NVIDIA, directly countering the AMD-beat-NVIDIA framing in mainstream coverage of the same results.

Evolution: New voice this pass; represents NVIDIA-aligned infrastructure providers interpreting the same benchmark data to sustain NVIDIA's performance leadership narrative.

Community observers (Twitter/X amplifiers)

Broadly validating the SemiAnalysis framing; one commentator specifically flags the software moat angle as under-appreciated, arguing Google wiring TPU into llm-d CI is the more significant signal than silicon performance alone.

Evolution: Consistent with prior pass; no new named voices in the community cluster.

Tensions

  • SemiAnalysis frames inference cost-per-token as the decisive competitive moat, with AMD's 40% cost advantage on GLM5 as structurally significant [5]. Jensen Huang's $1 trillion demand projection [16][17] and 'low MFU by design' philosophy [21] implicitly counter this: if hyperscalers lock in GPU purchase orders at scale and prioritize over-provisioned capacity flexibility, cost-per-token comparisons matter less than incumbency and demand volume. [5][16][17][21]
  • Forbes and Spheron frame MLPerf Inference v6.0 as a competitive AMD milestone that directly challenges NVIDIA's AI performance leadership [2][3], while Nebius — an NVIDIA-aligned cloud provider — frames the same benchmark round as confirming NVIDIA's top-tier position [4]. The same official results support divergent narratives, leaving the head-to-head comparison genuinely ambiguous. [2][3][1][4]
  • Google, IBM, and Red Hat have granted llm-d neutral governance under the CNCF [8][9][10] and Google has achieved nightly CI parity with NVIDIA on the framework [12], while AMD's llm-d support is absent — confirmed by an open GitHub issue showing llm-d images do not support AMD MI300 GPUs [15]. AMD leads on hardware cost and holds formal MLPerf results, but trails on the open-source inference stack that enterprise Kubernetes operators are standardizing on. [8][9][10][12][15][25]

Sources

  1. [1] MLCommons Releases New MLPerf Inference v6.0 Benchmark ... — reactive:gpu-accelerator-competition
  2. [2] Did AMD Just Beat Nvidia In AI Performance? - Forbes — reactive:gpu-accelerator-competition
  3. [3] MLPerf Inference v6.0 Results Explained: GPU Performance Rankings for AI Workloads (2026) | Spheron Blog — reactive:gpu-accelerator-competition
  4. [4] MLPerf® Inference v6.0: Top-tier AI performance on NVIDIA ... - Nebius — reactive:gpu-accelerator-competition
  5. [5] AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initi… — SemiAnalysis Twitter (2026-05-19)
  6. [6] AMD's MI355X costs more to build but sells for much less than ... — reactive:gpu-accelerator-competition
  7. [7] AMD MI350 Price Jumps 66.7% Amid Nvidia Rivalry - SmBom — reactive:gpu-accelerator-competition
  8. [8] IBM, Red Hat, and Google just donated a Kubernetes blueprint for ... — reactive:gpu-accelerator-competition
  9. [9] Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure | CNCF — reactive:gpu-accelerator-competition
  10. [10] Donating llm-d to the Cloud Native Computing Foundation - IBM Research — reactive:gpu-accelerator-competition
  11. [11] IBM, Red Hat, and Google are aligning around a shared future for AI ... — reactive:gpu-accelerator-competition
  12. [12] TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for llm-d. Great step by … — SemiAnalysis Twitter (2026-05-21)
  13. [13] @SemiAnalysis_ The under-appreciated bit: it's the *software* moat narrowing, not just silicon. Google wiring TPU into l... — reactive:gpu-accelerator-competition (2026-05-21)
  14. [14] Why llm-d in CNCF Matters for Production Inference — reactive:gpu-accelerator-competition
  15. [15] llm-d image doesn't support AMD MI300 GPU's? · Issue #139 - GitHub — reactive:gpu-accelerator-competition
  16. [16] “I see through 2027, at least $1 trillion.” At Nvidia's annual GTC ... — reactive:jensen-huang-nvidia-thesis
  17. [17] Nvidia CEO Jensen Huang says company has one trillion dollars in orders through 2027 - DCD — reactive:gpu-accelerator-competition
  18. [18] Nvidia CEO Huang says company sees more than $1 ... — reactive:gpu-accelerator-competition
  19. [19] Nvidia CEO Jensen Huang assures investors on growth, $1T sales forecast — reactive:gpu-accelerator-competition
  20. [20] Nvidia CEO sees 2027 as at least one trillion dollars of revenue, and computing demand to be higher than that : r/wallstreetbets — reactive:gpu-accelerator-competition
  21. [21] At Stanford CS153 Frontier Systems, Jensen states word for word that he "would like to be at low MFU all the time" &… — SemiAnalysis Twitter (2026-05-17)
  22. [22] SERIOUS & COOL: AIPerf -- a sub-repo of the Nvidia Dynamo project focused on benchmarking LLM workloads -- just acce… — SemiAnalysis Twitter (2026-05-16)
  23. [23] Supercharging LLM inference on Google TPUs: Achieving 3X ... — reactive:gpu-accelerator-competition
  24. [24] Jensen Huang just made the most audacious prediction in semiconductor history: $1 trillion in GPU purchase orders through 2027. — reactive:gpu-accelerator-competition
  25. [25] AMD Delivers Breakthrough MLPerf Inference 6.0 Results — reactive:gpu-accelerator-competition
  26. [26] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  27. [27] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  28. [28] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  29. [29] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  30. [30] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  31. [31] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  32. [32] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  33. [33] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
  34. [34] [Sandbox] llm-d · Issue #462 · cncf/sandbox - GitHub — reactive:gpu-accelerator-competition
  35. [35] Nvidia CEO Jensen Huang said at the GTC 2026 conference that ... — reactive:gpu-accelerator-competition
  36. [36] Nvidia CEO Predicts $1 Trillion in AI Chip Sales by 2027 | Fortune posted on the topic | LinkedIn — reactive:gpu-accelerator-competition