AMD and Google TPU Closing the Gap on NVIDIA · history

Version 5

2026-05-25 12:18 UTC · 95 items

What

AMD, Google TPU, and NVIDIA are competing across hardware benchmarks, software infrastructure governance, and supply dynamics in mid-2026. MLCommons officially published MLPerf Inference v6.0 in April 2026 [1], generating split interpretations — Forbes asked whether AMD had beaten NVIDIA [2] while Nebius framed the same round as confirming NVIDIA's top-tier position [4]. IBM, Red Hat, and Google donated llm-d and its TPU drivers to the CNCF [8], and Red Hat has since integrated llm-d directly into OpenShift AI as part of Red Hat AI 3 [11][12], moving the coalition from a governance story to an enterprise product story. Against this, NVIDIA holds a 3.6 million unit B200/GB200 backlog with enterprise buyers facing 16–24 month waits [18][19], a supply constraint that simultaneously confirms extraordinary demand and creates a structural window for AMD and Google TPU to capture customers who cannot access NVIDIA hardware.

Why it matters

NVIDIA's incumbency has rested on both ecosystem lock-in and hardware availability. AMD now holds an inference cost advantage and formal benchmark presence, Red Hat has embedded llm-d into an enterprise Linux distribution that reaches thousands of operators, and NVIDIA's own supply backlog means cost-competitive alternatives have an unusually long window to establish production footholds before the next NVIDIA hardware cycle fully clears. Whether AMD closes its llm-d software gap fast enough to capture that window is the defining question for AI infrastructure in 2026.

Open questions

MLPerf Inference v6.0 generated divergent interpretations — Forbes asks if AMD beat NVIDIA [2] while Nebius frames the same data as NVIDIA-dominant [4]. The actual head-to-head token throughput and power efficiency comparison between AMD MI355 and NVIDIA B200 on identical benchmark tasks is the unresolved data point that would settle the framing.
NVIDIA's B200 and GB200 are sold out through mid-2026 with a 3.6 million unit backlog and 16–24 month enterprise wait times [18][19]. Does this supply constraint translate into measurable AMD or Google TPU market share gains, or are enterprises simply queuing rather than substituting?
Red Hat AI 3 ships llm-d as part of OpenShift AI [11][12], but AMD MI300 GPUs remain unsupported in llm-d [17]. Does Red Hat's enterprise distribution footprint make AMD's absence from llm-d more consequential — effectively blocking AMD from a large share of enterprise Kubernetes inference deployments?
Jensen Huang's $1 trillion GPU purchase order projection through 2027 [21] and his 'low MFU by design' philosophy [22] frame deliberate over-provisioning as strategy. How much of that projected demand is contractually locked, and does NVIDIA's own supply backlog constrain its ability to fulfill it within the stated window?

Narrative

AMD's competitiveness in AI inference has moved from vendor claims to formalized benchmark territory. In April 2026, MLCommons officially published MLPerf Inference v6.0 results [1], prompting Forbes to directly ask whether AMD had beaten NVIDIA in AI performance [2] — a question that captures the genuine competitive ambiguity of the round. Spheron's analysis positioned the results as providing clear GPU performance rankings for AI workloads [3], while Nebius simultaneously highlighted top-tier NVIDIA performance from the same data [4]. The divergent framings suggest AMD's results were strong enough to force the question while NVIDIA retained the absolute top position on the leaderboard. This benchmark context builds on AMD's earlier confirmation of a 40% cost advantage over NVIDIA's B200 on single-node GLM5 FP8 inference, achieved just 14 weeks after GLM5's launch using SGLang v0.12 [5] — a result SemiAnalysis framed as evidence that ROCm's software maturity has crossed a threshold where AMD hardware can realize its cost advantages in production. AMD's MI355X carries higher manufacturing costs than NVIDIA's competing chip while pricing below cost [6], a deliberate margin sacrifice to gain inference market share, though AMD's MI350 GPU separately saw a 66.7% price increase [7], revealing a selective rather than uniformly aggressive pricing posture.

On the software infrastructure front, the llm-d coalition has moved from governance announcement to enterprise product reality. IBM, Red Hat, and Google jointly donated llm-d — an open-source Kubernetes-native distributed inference framework — and its TPU drivers to the CNCF in March 2026 [8][9][10]. Since then, Red Hat has embedded llm-d directly into OpenShift AI as part of Red Hat AI 3, with developer documentation covering multi-turn LLM workloads on the platform [11][12][13]. This moves the coalition from a neutral-governance story to an enterprise distribution story: Red Hat AI 3 ships to the enterprise Linux and OpenShift installed base, giving llm-d a production deployment path through one of the largest enterprise IT channels. Google added nightly CI coverage for TPU hardware to llm-d [14], and SemiAnalysis noted that TPU has reached parity with NVIDIA in llm-d code quality [15]. CoreWeave, a major NVIDIA-centric cloud provider, publicly endorsed llm-d's CNCF acceptance as significant for production inference infrastructure [16]. AMD's position on this stack remains a concrete production gap: an open GitHub issue confirms llm-d images do not support AMD MI300 GPUs [17], leaving AMD hardware unsupported in the inference framework that enterprise Kubernetes operators are increasingly standardizing on.

NVIDIA's supply picture adds a structural dimension that complicates the standard incumbency narrative. NVIDIA's B200 and GB200 are sold out through mid-2026 with a backlog reported at 3.6 million units [18], and enterprise buyers are facing 16–24 month wait times for GB200 systems [19]. Investing.com's analysis frames this as signaling a long multi-year GPU procurement cycle [20]. This supply constraint is double-edged: it confirms extraordinary demand for NVIDIA hardware, sustaining the incumbency narrative Jensen Huang articulated with his $1 trillion GPU purchase order projection through 2027 [21] and his 'low MFU by design' philosophy that frames over-provisioned capacity as strategic flexibility [22]. But it also means enterprises that need AI inference capacity now — and cannot wait 16–24 months — face a genuine substitution decision. AMD's cost advantage [5] and Google TPU's availability through Google Cloud both become more practically relevant when NVIDIA hardware is simply unavailable.

One signal that cuts across the competitive frame: AMD contributed upstream to NVIDIA's AIPerf benchmarking sub-project within the Dynamo repository [23] — believed to be the first such cross-competitor open-source merge — suggesting shared interest in measurement infrastructure even as hardware and pricing strategies diverge sharply. The competitive picture in mid-2026 is thus multi-layered: AMD leads on published inference cost efficiency and holds formal MLPerf benchmark presence but trails on the open-source inference stack enterprises are standardizing on; Google, IBM, and Red Hat have moved llm-d from governance donation to enterprise product; and NVIDIA holds both extraordinary projected demand and a supply backlog that simultaneously validates its position and opens a substitution window for its competitors.

Timeline

2026-01-13: Red Hat publishes developer documentation for accelerating multi-turn LLM workloads on OpenShift AI with llm-d, signaling active product integration ahead of the CNCF donation. [11]
2026-03-24: CNCF formally welcomes llm-d to its sandbox; IBM, Red Hat, and Google co-donate llm-d and TPU drivers. SiliconANGLE covers Red Hat's Kubernetes inference bet at KubeCon EU. [9][10][39][8][13]
2026-04-06: MLCommons officially publishes MLPerf Inference v6.0 results. Forbes asks 'Did AMD just beat NVIDIA in AI performance?'; Nebius highlights top-tier NVIDIA performance from the same round; Spheron publishes GPU rankings analysis. [2][3][1][4]
2026-05-16: AMD's contribution accepted into NVIDIA's AIPerf benchmarking repository — believed to be a first cross-competitor upstream merge. [23]
2026-05-17: Jensen Huang at Stanford CS153 articulates 'low MFU by design' as a deliberate over-provisioning philosophy, countering cost-efficiency narratives. [22]
2026-05-19: AMD MI355 confirmed 40% cheaper than NVIDIA B200 on GLM5 single-node FP8 inference, 14 weeks post-launch via SGLang v0.12. [5]
2026-05-21: Google adds nightly CI for llm-d on TPU hardware; SemiAnalysis notes TPU has reached parity with NVIDIA in llm-d code quality. [14][15]
2026-05: Jensen Huang's $1 trillion GPU purchase order projection through 2027 amplified across mainstream financial media, social platforms, and retail investor communities. [21][24][25][40][41][26][27]
2026-05: AMD MI350 GPU price rises 66.7% amid NVIDIA rivalry, adding complexity to the AMD below-cost pricing narrative. [7]
2026-05: Red Hat AI 3 ships llm-d integration as part of OpenShift AI, moving llm-d from CNCF governance story to enterprise product reality. [12][11][28]
2026-05: NVIDIA B200 and GB200 reported sold out through mid-2026 with a 3.6 million unit backlog; enterprise buyers face 16–24 month wait times. Investing.com frames this as a long multi-year GPU cycle signal. [18][19][20][42]

Perspectives

SemiAnalysis

Bullish on AMD and Google TPU progress; frames inference cost-efficiency and software ecosystem parity as decisive competitive dimensions. Views cross-competitor open-source collaboration as historically notable. Analytically provocative on Jensen Huang's low-MFU philosophy.

Evolution: Consistent throughout — SemiAnalysis has been the primary reporting voice, maintaining a pro-competition, anti-NVIDIA-moat framing.

[23][22][5][14]

Jensen Huang / NVIDIA

Projects $1 trillion in GPU purchase orders through 2027, a claim amplified across mainstream financial media and social platforms. Reframes low GPU utilization as intentional over-provisioning strategy, positioning deliberate capacity headroom as a strategic asset that blunts cost-efficiency comparisons. Supply backlog at 3.6 million units confirms extraordinary demand but also constrains near-term delivery.

Evolution: The supply constraint data this pass adds a complication: NVIDIA's demand is confirmed but its hardware unavailability creates a substitution window that the $1 trillion narrative did not address.

[21][22][24][25][26][27][20][19][18]

Red Hat

Has moved from co-donating llm-d to the CNCF to actively shipping it in an enterprise product. Red Hat AI 3 integrates llm-d into OpenShift AI, with developer documentation covering production multi-turn inference workloads. Red Hat frames this as a strategic bet on Kubernetes-native inference at enterprise scale.

Evolution: Significantly deepened this pass: Red Hat has moved from governance co-donor to enterprise product owner, giving llm-d distribution reach through the OpenShift installed base.

[8][10][13][11][28][12]

IBM Research

Co-donated llm-d to the CNCF alongside Red Hat and Google. The coalition frames the contribution as establishing a shared AI infrastructure future on neutral governance.

Evolution: Consistent with prior pass; Red Hat's product integration is the more active story this cycle.

[10][29][8]

CoreWeave

Publicly frames llm-d's CNCF acceptance as significant for production inference infrastructure, despite CoreWeave's infrastructure being NVIDIA-centric — a notable cross-vendor endorsement.

Evolution: Consistent with prior pass.

[16]

AMD (hardware and pricing actions)

Formally represented in standardized benchmarks via MLPerf Inference v6.0, with results strong enough to prompt direct NVIDIA comparisons in mainstream coverage. Pricing MI355X below manufacturing cost to gain inference market share while MI350 prices have risen 66.7%, revealing a selective rather than uniformly aggressive pricing posture. Remains absent from llm-d's supported hardware list.

Evolution: Official MLPerf v6.0 publication moves AMD's competitiveness claim from self-described 'breakthrough' to external benchmark documentation, though the head-to-head interpretation remains contested.

[2][1][30][7][6][17]

Nebius (NVIDIA ecosystem)

Frames MLPerf Inference v6.0 as demonstrating top-tier AI performance on NVIDIA, directly countering the AMD-beat-NVIDIA framing in mainstream coverage of the same results.

Evolution: Consistent with prior pass; represents NVIDIA-aligned infrastructure providers sustaining NVIDIA's performance leadership narrative.

[4]

Community observers (Twitter/X amplifiers)

Broadly validating the SemiAnalysis framing; one commentator specifically flags the software moat angle as under-appreciated, arguing Google wiring TPU into llm-d CI is the more significant signal than silicon performance alone.

Evolution: Consistent with prior pass; no new named voices in the community cluster.

[15][31][32][33][34][35][36][37][38]

Tensions

SemiAnalysis frames inference cost-per-token as the decisive competitive moat, with AMD's 40% cost advantage on GLM5 as structurally significant [5]. Jensen Huang's $1 trillion demand projection [21] and 'low MFU by design' philosophy [22] implicitly counter this: if hyperscalers lock in GPU purchase orders at scale and prioritize over-provisioned capacity flexibility, cost-per-token comparisons matter less than incumbency and demand volume. However, NVIDIA's own 3.6 million unit backlog and 16–24 month enterprise wait times [18][19] undercut the incumbency argument by making cost-competitive alternatives practically necessary for enterprises that cannot wait. [5][21][24][22][18][19]
Forbes and Spheron frame MLPerf Inference v6.0 as a competitive AMD milestone that directly challenges NVIDIA's AI performance leadership [2][3], while Nebius — an NVIDIA-aligned cloud provider — frames the same benchmark round as confirming NVIDIA's top-tier position [4]. The same official results support divergent narratives, leaving the head-to-head comparison genuinely ambiguous. [2][3][1][4]
Google, IBM, and Red Hat have granted llm-d neutral governance under the CNCF [8][9][10], Google has achieved nightly CI parity with NVIDIA on the framework [14], and Red Hat has shipped llm-d in OpenShift AI as part of Red Hat AI 3 [12][11] — making llm-d a real enterprise product. AMD's llm-d support is absent, confirmed by an open GitHub issue showing llm-d images do not support AMD MI300 GPUs [17]. AMD leads on hardware cost and holds formal MLPerf results, but trails on the open-source inference stack that enterprise Kubernetes operators are standardizing on and that Red Hat's distribution now actively ships. [8][9][10][14][17][30][12][11]

Sources

[1] MLCommons Releases New MLPerf Inference v6.0 Benchmark ... — reactive:gpu-accelerator-competition
[2] Did AMD Just Beat Nvidia In AI Performance? - Forbes — reactive:gpu-accelerator-competition
[3] MLPerf Inference v6.0 Results Explained: GPU Performance Rankings for AI Workloads (2026) | Spheron Blog — reactive:gpu-accelerator-competition
[4] MLPerf® Inference v6.0: Top-tier AI performance on NVIDIA ... - Nebius — reactive:gpu-accelerator-competition
[5] AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initi… — SemiAnalysis Twitter (2026-05-19)
[6] AMD's MI355X costs more to build but sells for much less than ... — reactive:gpu-accelerator-competition
[7] AMD MI350 Price Jumps 66.7% Amid Nvidia Rivalry - SmBom — reactive:gpu-accelerator-competition
[8] IBM, Red Hat, and Google just donated a Kubernetes blueprint for ... — reactive:gpu-accelerator-competition
[9] Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure | CNCF — reactive:gpu-accelerator-competition
[10] Donating llm-d to the Cloud Native Computing Foundation - IBM Research — reactive:gpu-accelerator-competition
[11] Accelerate multi-turn LLM workloads on OpenShift AI with llm-d ... — reactive:gpu-accelerator-competition
[12] Red Hat AI 3 aims to streamline enterprise AI at production scale — reactive:gpu-accelerator-competition
[13] Red Hat bets big on Kubernetes inference with llm-d - SiliconANGLE — reactive:gpu-accelerator-competition
[14] TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for llm-d. Great step by … — SemiAnalysis Twitter (2026-05-21)
[15] @SemiAnalysis_ The under-appreciated bit: it's the *software* moat narrowing, not just silicon. Google wiring TPU into l... — reactive:gpu-accelerator-competition (2026-05-21)
[16] Why llm-d in CNCF Matters for Production Inference — reactive:gpu-accelerator-competition
[17] llm-d image doesn't support AMD MI300 GPU's? · Issue #139 - GitHub — reactive:gpu-accelerator-competition
[18] Nvidia’s Blackwell Dynasty: B200 and GB200 Sold Out Through Mid-2026 as Backlog Hits 3.6 Million Units — reactive:gpu-accelerator-competition
[19] NVIDIA GB200 Delays: Enterprise Buyers Face 16-24 Month Wait | Mark Peters posted on the topic | LinkedIn — reactive:gpu-accelerator-competition
[20] Nvidia: GPU Order Backlog Signals Long Multi Year Cycle | Investing.com — reactive:gpu-accelerator-competition
[21] “I see through 2027, at least $1 trillion.” At Nvidia's annual GTC ... — reactive:jensen-huang-nvidia-thesis
[22] At Stanford CS153 Frontier Systems, Jensen states word for word that he "would like to be at low MFU all the time" &… — SemiAnalysis Twitter (2026-05-17)
[23] SERIOUS & COOL: AIPerf -- a sub-repo of the Nvidia Dynamo project focused on benchmarking LLM workloads -- just acce… — SemiAnalysis Twitter (2026-05-16)
[24] Nvidia CEO Jensen Huang says company has one trillion dollars in orders through 2027 - DCD — reactive:gpu-accelerator-competition
[25] Nvidia CEO Jensen Huang assures investors on growth, $1T sales forecast — reactive:gpu-accelerator-competition
[26] Nvidia CEO Huang says company sees more than $1 ... — reactive:gpu-accelerator-competition
[27] Nvidia CEO sees 2027 as at least one trillion dollars of revenue, and computing demand to be higher than that : r/wallstreetbets — reactive:gpu-accelerator-competition
[28] What's new and what's next for Red Hat AI | Q1 2026 - YouTube — reactive:gpu-accelerator-competition
[29] IBM, Red Hat, and Google are aligning around a shared future for AI ... — reactive:gpu-accelerator-competition
[30] AMD Delivers Breakthrough MLPerf Inference 6.0 Results — reactive:gpu-accelerator-competition
[31] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[32] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[33] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[34] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[35] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[36] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[37] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[38] RT @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for ll... — reactive:gpu-accelerator-competition (2026-05-21)
[39] [Sandbox] llm-d · Issue #462 · cncf/sandbox - GitHub — reactive:gpu-accelerator-competition
[40] Nvidia CEO Jensen Huang said at the GTC 2026 conference that ... — reactive:gpu-accelerator-competition
[41] Nvidia CEO Predicts $1 Trillion in AI Chip Sales by 2027 | Fortune posted on the topic | LinkedIn — reactive:gpu-accelerator-competition
[42] NVIDIA B200 GPU: Specs, Pricing, and Cloud Availability (2026) — reactive:gpu-accelerator-competition