AMD and Google TPU Closing the Gap on NVIDIA

closed · v6 · 2026-05-25 · 110 items · history

What's new in v6

The AMD-Red Hat strategic collaboration announcement [17] from May 2025 is the primary new development this pass: it reframes AMD's absence from llm-d as a transitional gap within an active partnership rather than a clean exclusion, and makes the question of AMD hardware certification in Red Hat AI 3 a concrete partnership deliverable to watch. The BusinessWire press release [8] confirms Red Hat AI 3 launched in October 2025 — before the CNCF governance donation in March 2026 — clarifying that llm-d was shipping in an enterprise product before it received neutral open-source governance. Google also published its own Cloud Blog confirmation of llm-d's CNCF sandbox status [12], and enterprise analyst coverage (theCUBE Research [16], Digital Chiefs [31]) has broadened the story from technical benchmark reporting into CIO-level strategy framing.

What

AMD, Google TPU, and NVIDIA are competing across hardware benchmarks, software infrastructure governance, and supply dynamics in mid-2026. AMD and Red Hat have a formal strengthened strategic collaboration [17], which complicates AMD's known absence from llm-d's supported hardware list [18] — the two companies are active partners even as AMD GPU support in Red Hat AI 3's inference stack remains unconfirmed against the official hardware configuration documentation [19]. Red Hat AI 3 launched in October 2025 [8] integrating llm-d into OpenShift AI, subsequently donated to CNCF governance in March 2026 [9], while NVIDIA holds a 3.6 million unit B200/GB200 backlog with enterprise buyers facing 16–24 month waits [21][22].

Why it matters

NVIDIA's incumbency rests on ecosystem lock-in and hardware availability, but its own supply backlog creates a structural window for cost-competitive alternatives. AMD leads on published inference cost efficiency and holds a formal Red Hat partnership, yet lacks confirmed hardware support on the inference stack Red Hat ships at enterprise scale. Whether AMD converts that partnership into llm-d hardware support — and whether NVIDIA's backlog drives enterprises to substitute rather than wait — are the defining questions for AI infrastructure in 2026.

Open questions

AMD and Red Hat have a formal strategic collaboration [17], yet AMD MI300 GPUs remain unsupported in llm-d [18]. Does the partnership include a concrete roadmap for AMD hardware support in Red Hat AI 3, and on what timeline?
MLPerf Inference v6.0 generated divergent interpretations — Forbes asks if AMD beat NVIDIA [2] while Nebius frames the same data as NVIDIA-dominant [4]. The actual head-to-head token throughput and power efficiency between AMD MI355 and NVIDIA B200 on identical workloads remains the unresolved data point.
NVIDIA's B200 and GB200 are sold out through mid-2026 with a 3.6 million unit backlog and 16–24 month enterprise wait times [21][22]. Are enterprises substituting AMD or Google TPU in measurable share, or simply queuing for NVIDIA?
Jensen Huang's $1 trillion GPU purchase order projection through 2027 [23] and 'low MFU by design' philosophy [24] frame over-provisioning as strategy. How much of that projected demand is contractually locked, and does NVIDIA's backlog constrain its ability to fulfill within the stated window?

Narrative

AMD's competitiveness in AI inference has moved from vendor claims to formalized benchmark territory. In April 2026, MLCommons officially published MLPerf Inference v6.0 results [1], prompting Forbes to directly ask whether AMD had beaten NVIDIA in AI performance [2] — a question that captures the genuine competitive ambiguity of the round. Spheron's analysis positioned the results as providing clear GPU performance rankings for AI workloads [3], while Nebius simultaneously highlighted top-tier NVIDIA performance from the same data [4]. AMD's results were strong enough to force the question while NVIDIA retained the absolute top position. This benchmark context builds on AMD's earlier confirmation of a 40% cost advantage over NVIDIA's B200 on single-node GLM5 FP8 inference, achieved just 14 weeks after GLM5's launch using SGLang v0.12 [5] — a result SemiAnalysis framed as evidence that ROCm's software maturity has crossed a threshold where AMD hardware can realize its cost advantages in production. AMD's pricing strategy is selective: the MI355X carries higher manufacturing costs than NVIDIA's competing chip while priced below cost to gain inference market share [6], but AMD's MI350 GPU separately saw a 66.7% price increase [7], revealing a targeted rather than uniformly aggressive posture.

On the software infrastructure front, the llm-d coalition's timeline spans enterprise product launch, then open-source governance. Red Hat AI 3 launched in October 2025 [8], integrating llm-d — a Kubernetes-native distributed inference framework — directly into OpenShift AI for enterprise production workloads. In March 2026, IBM, Red Hat, and Google co-donated llm-d and its TPU drivers to the CNCF [9][10][11], moving the framework into neutral governance while it was already shipping in a major enterprise Linux distribution. Google confirmed the CNCF sandbox acceptance on its own Cloud Blog [12], and separately added nightly CI coverage for TPU hardware in llm-d [13], with SemiAnalysis noting TPU has reached code quality parity with NVIDIA on the framework [14]. CoreWeave, a major NVIDIA-centric cloud provider, publicly endorsed llm-d's CNCF acceptance as significant for production inference infrastructure [15]. theCUBE Research has analyzed Red Hat AI 3 as a platform for distributed inference and agentic intelligence [16].

AMD's relationship with the llm-d stack is more nuanced than simple absence. In May 2025, AMD and Red Hat announced a strengthened strategic collaboration [17] — a formal partnership that predates Red Hat AI 3's October 2025 launch. This makes AMD's absence from llm-d's supported hardware list [18] a transitional gap within an active partnership rather than a clean exclusion. The Red Hat AI 3 hardware configurations documentation [19] is the authoritative reference for exactly which AMD hardware, if any, is currently certified. AMD also contributed upstream to NVIDIA's AIPerf benchmarking sub-project within the Dynamo repository [20] — believed to be the first cross-competitor open-source merge — suggesting shared interest in measurement infrastructure even as hardware and pricing strategies diverge sharply.

NVIDIA's supply picture adds a structural dimension that complicates the standard incumbency narrative. NVIDIA's B200 and GB200 are sold out through mid-2026 with a backlog of 3.6 million units [21], and enterprise buyers face 16–24 month wait times for GB200 systems [22]. This supply constraint is double-edged: it confirms extraordinary demand, sustaining the incumbency narrative Jensen Huang articulated with his $1 trillion GPU purchase order projection through 2027 [23] and his 'low MFU by design' philosophy that frames over-provisioned capacity as strategic flexibility [24]. But enterprises that need AI inference capacity now face a genuine substitution decision. AMD's cost advantage [5] and Google TPU's availability through Google Cloud both become practically relevant when NVIDIA hardware is simply unavailable. The competitive picture in mid-2026 is thus multi-layered: AMD leads on published inference cost efficiency and holds a formal Red Hat partnership but awaits hardware validation on the enterprise inference stack; Google has achieved CI parity on llm-d and occupies the governing coalition; and NVIDIA holds both extraordinary projected demand and a supply backlog that simultaneously validates its position and opens a substitution window.

Timeline

2025-05-20: AMD and Red Hat announce a strengthened strategic collaboration, expanding their joint enterprise AI commitment. [17]
2025-10-14: Red Hat AI 3 officially launches, bringing distributed AI inference to production workloads by integrating llm-d into OpenShift AI. [8]
2026-01-13: Red Hat publishes developer documentation for accelerating multi-turn LLM workloads on OpenShift AI with llm-d. [29]
2026-03-24: CNCF formally welcomes llm-d to its sandbox; IBM, Red Hat, and Google co-donate llm-d and TPU drivers; Google confirms acceptance on its Cloud Blog. [10][11][32][9][28][12]
2026-04-06: MLCommons officially publishes MLPerf Inference v6.0 results; Forbes asks 'Did AMD just beat NVIDIA?'; Nebius highlights top-tier NVIDIA performance from the same round. [2][3][1][4]
2026-05-16: AMD's contribution accepted into NVIDIA's AIPerf benchmarking repository — believed to be a first cross-competitor upstream merge. [20]
2026-05-17: Jensen Huang at Stanford CS153 articulates 'low MFU by design' as a deliberate over-provisioning philosophy, countering cost-efficiency narratives. [24]
2026-05-19: AMD MI355 confirmed 40% cheaper than NVIDIA B200 on GLM5 single-node FP8 inference, 14 weeks post-launch via SGLang v0.12. [5]
2026-05-21: Google adds nightly CI for llm-d on TPU hardware; SemiAnalysis notes TPU has reached parity with NVIDIA in llm-d code quality. [13][14]
2026-05: Jensen Huang's $1 trillion GPU purchase order projection through 2027 amplified across mainstream financial media and retail investor communities. [23][25][26][33][34][35][36]
2026-05: AMD MI350 GPU price rises 66.7% amid NVIDIA rivalry; AMD MI355X separately priced below manufacturing cost to gain inference market share. [7][6]
2026-05: NVIDIA B200 and GB200 reported sold out through mid-2026 with a 3.6 million unit backlog; enterprise buyers face 16–24 month wait times. [21][22][27][37]

Perspectives

SemiAnalysis

Bullish on AMD and Google TPU progress; frames inference cost-efficiency and software ecosystem parity as decisive competitive dimensions. Views cross-competitor open-source collaboration as historically notable.

Evolution: Consistent throughout — primary reporting voice maintaining a pro-competition, anti-NVIDIA-moat framing.

[20][24][5][13][14]

Jensen Huang / NVIDIA

Projects $1 trillion in GPU purchase orders through 2027 and reframes low GPU utilization as intentional over-provisioning strategy. Supply backlog at 3.6 million units confirms extraordinary demand but constrains near-term delivery.

Evolution: Supply constraint data complicates the demand narrative: NVIDIA's hardware unavailability creates a substitution window that the $1 trillion projection did not address.

[23][24][25][26][27][22][21]

Red Hat

Launched Red Hat AI 3 in October 2025 with llm-d integrated into OpenShift AI, then co-donated llm-d to CNCF in March 2026. Also holds a strengthened strategic collaboration with AMD, creating a potential path for AMD hardware support on the Red Hat AI stack.

Evolution: Now simultaneously enterprise product owner for llm-d and formal AMD partner — a dual role that makes AMD's hardware absence from llm-d a transition story rather than a clean exclusion.

[9][11][28][29][17][8]

AMD

Formally represented in MLPerf Inference v6.0 with results strong enough to prompt direct NVIDIA comparisons. Pricing MI355X below manufacturing cost to gain inference market share. Holds a formal strategic collaboration with Red Hat despite AMD MI300 GPU support in llm-d being unconfirmed.

Evolution: The AMD-Red Hat collaboration announcement adds a partnership dimension absent from the prior characterization of AMD as simply missing from the enterprise inference stack.

[2][1][6][18][17][7]

IBM / Google (llm-d coalition)

Co-donated llm-d and TPU drivers to CNCF alongside Red Hat. Google separately achieved nightly CI coverage for TPU hardware in llm-d and published CNCF confirmation on its Cloud Blog.

Evolution: Google's Cloud Blog post adds an official Google-branded confirmation to what had been covered primarily through third-party reporting.

[10][11][30][9][13][12]

Nebius (NVIDIA ecosystem)

Frames MLPerf Inference v6.0 as demonstrating top-tier AI performance on NVIDIA, directly countering the AMD-beat-NVIDIA framing from mainstream coverage of the same results.

Evolution: Consistent; represents NVIDIA-aligned infrastructure providers sustaining NVIDIA's performance leadership narrative.

[4]

CoreWeave

Publicly frames llm-d's CNCF acceptance as significant for production inference infrastructure, despite CoreWeave's own infrastructure being NVIDIA-centric — a notable cross-vendor endorsement.

Evolution: Consistent with prior reporting.

[15]

Enterprise media (theCUBE, BusinessWire, Digital Chiefs)

Covers Red Hat AI 3 and the broader GPU competition as CIO-level strategic decisions. theCUBE Research frames Red Hat AI 3 as a platform for distributed inference and agentic intelligence; Digital Chiefs frames CIOs as navigating between NVIDIA dominance and emerging alternatives.

Evolution: Enterprise analyst and trade press coverage has broadened, moving the story from technical benchmark reporting to CIO strategy framing.

[31][16][8]

Tensions

SemiAnalysis frames inference cost-per-token as the decisive competitive moat, with AMD's 40% cost advantage on GLM5 as structurally significant [5]. Jensen Huang's $1 trillion demand projection [23] and 'low MFU by design' philosophy [24] counter this: if hyperscalers lock in GPU orders at scale and prioritize capacity flexibility, cost-per-token comparisons matter less than incumbency. NVIDIA's own 3.6 million unit backlog [21] complicates that counter by making cost-competitive alternatives practically necessary for enterprises that cannot wait. [5][23][24][21][22]
Forbes and Spheron frame MLPerf Inference v6.0 as a competitive AMD milestone that directly challenges NVIDIA's AI performance leadership [2][3]; Nebius frames the same benchmark round as confirming NVIDIA's top-tier position [4]. The same official results support divergent narratives. [2][3][1][4]
AMD holds a formal strategic collaboration with Red Hat [17] and leads on hardware cost efficiency [5], yet AMD MI300 GPU support in llm-d remains unconfirmed [18], leaving AMD absent from the inference framework Red Hat ships to its enterprise installed base [8]. [17][5][18][8]
Google, IBM, and Red Hat granted llm-d neutral CNCF governance [9] and Google achieved nightly CI parity with NVIDIA on the framework [13]; AMD's absence from llm-d's hardware support list [18] means the open-source inference stack that enterprise Kubernetes operators are standardizing on does not run on AMD silicon. [9][13][18]

Status: active and growing

Sources

[1] MLCommons Releases New MLPerf Inference v6.0 Benchmark ... — reactive:gpu-accelerator-competition
[2] Did AMD Just Beat Nvidia In AI Performance? - Forbes — reactive:gpu-accelerator-competition
[3] MLPerf Inference v6.0 Results Explained: GPU Performance Rankings for AI Workloads (2026) | Spheron Blog — reactive:gpu-accelerator-competition
[4] MLPerf® Inference v6.0: Top-tier AI performance on NVIDIA ... - Nebius — reactive:gpu-accelerator-competition
[5] AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initi… — SemiAnalysis Twitter (2026-05-19)
[6] AMD's MI355X costs more to build but sells for much less than ... — reactive:gpu-accelerator-competition
[7] AMD MI350 Price Jumps 66.7% Amid Nvidia Rivalry - SmBom — reactive:gpu-accelerator-competition
[8] Red Hat Brings Distributed AI Inference to Production AI Workloads ... — reactive:gpu-accelerator-competition
[9] IBM, Red Hat, and Google just donated a Kubernetes blueprint for ... — reactive:gpu-accelerator-competition
[10] Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure | CNCF — reactive:gpu-accelerator-competition
[11] Donating llm-d to the Cloud Native Computing Foundation - IBM Research — reactive:gpu-accelerator-competition
[12] llm-d officially a CNCF Sandbox project | Google Cloud Blog — reactive:gpu-accelerator-competition
[13] TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for llm-d. Great step by … — SemiAnalysis Twitter (2026-05-21)
[14] @SemiAnalysis_ The under-appreciated bit: it's the *software* moat narrowing, not just silicon. Google wiring TPU into l... — reactive:gpu-accelerator-competition (2026-05-21)
[15] Why llm-d in CNCF Matters for Production Inference — reactive:gpu-accelerator-competition
[16] Red Hat AI 3 Unifies Open Source Platform - theCUBE Research — reactive:gpu-accelerator-competition
[17] Red Hat and AMD Strengthen Strategic Collaboration, Expand ... — reactive:gpu-accelerator-competition
[18] llm-d image doesn't support AMD MI300 GPU's? · Issue #139 - GitHub — reactive:gpu-accelerator-competition
[19] [PDF] Red Hat AI 3 Supported product and hardware configurations — reactive:gpu-accelerator-competition
[20] SERIOUS & COOL: AIPerf -- a sub-repo of the Nvidia Dynamo project focused on benchmarking LLM workloads -- just acce… — SemiAnalysis Twitter (2026-05-16)
[21] Nvidia’s Blackwell Dynasty: B200 and GB200 Sold Out Through Mid-2026 as Backlog Hits 3.6 Million Units — reactive:gpu-accelerator-competition
[22] NVIDIA GB200 Delays: Enterprise Buyers Face 16-24 Month Wait | Mark Peters posted on the topic | LinkedIn — reactive:gpu-accelerator-competition
[23] “I see through 2027, at least $1 trillion.” At Nvidia's annual GTC ... — reactive:jensen-huang-nvidia-thesis
[24] At Stanford CS153 Frontier Systems, Jensen states word for word that he "would like to be at low MFU all the time" &… — SemiAnalysis Twitter (2026-05-17)
[25] Nvidia CEO Jensen Huang says company has one trillion dollars in orders through 2027 - DCD — reactive:gpu-accelerator-competition
[26] Nvidia CEO Jensen Huang assures investors on growth, $1T sales forecast — reactive:gpu-accelerator-competition
[27] Nvidia: GPU Order Backlog Signals Long Multi Year Cycle | Investing.com — reactive:gpu-accelerator-competition
[28] Red Hat bets big on Kubernetes inference with llm-d - SiliconANGLE — reactive:gpu-accelerator-competition
[29] Accelerate multi-turn LLM workloads on OpenShift AI with llm-d ... — reactive:gpu-accelerator-competition
[30] IBM, Red Hat, and Google are aligning around a shared future for AI ... — reactive:gpu-accelerator-competition
[31] NVIDIA GPU Strategy for CIOs in 2026 - Digital Chiefs — reactive:gpu-accelerator-competition
[32] [Sandbox] llm-d · Issue #462 · cncf/sandbox - GitHub — reactive:gpu-accelerator-competition
[33] Nvidia CEO Jensen Huang said at the GTC 2026 conference that ... — reactive:gpu-accelerator-competition
[34] Nvidia CEO Predicts $1 Trillion in AI Chip Sales by 2027 | Fortune posted on the topic | LinkedIn — reactive:gpu-accelerator-competition
[35] Nvidia CEO Huang says company sees more than $1 ... — reactive:gpu-accelerator-competition
[36] Nvidia CEO sees 2027 as at least one trillion dollars of revenue, and computing demand to be higher than that : r/wallstreetbets — reactive:gpu-accelerator-competition
[37] NVIDIA B200 GPU: Specs, Pricing, and Cloud Availability (2026) — reactive:gpu-accelerator-competition