AI Systems Achieve Verifiable Mathematical Reasoning · history

Version 3

2026-05-25 10:23 UTC · 55 items

Changes since v2

Three additions meaningfully develop the story this pass. First, Terence Tao's GitHub wiki tracking AI contributions to Erdős problems [19803] directly addresses the prior open question about his involvement—it confirms sustained, curatorial engagement rather than incidental verification, and is corroborated by The Atlantic profile [19802] and YouTube content [19804] from earlier in 2026. Second, the Gemini Deep Think IMO gold medal result (previously cited via Nature alone) is now confirmed across Ars Technica, Simon Willison's blog, Hacker News, and Reddit, substantially increasing evidential weight. Third, a new competitive tension has emerged: framing that OpenAI is 'under fire' as DeepMind claims the IMO gold [19809] introduces an organizational rivalry dimension absent from prior synthesis, and r/math is now explicitly auditing Tao's earlier AI predictions against 2026 reality [9487].

What

In late May 2026, AI-assisted mathematics reached a visible inflection point across multiple fronts. OpenAI's unreleased general-purpose reasoning model disproved the Erdős unit distance conjecture (open since 1946) [1], with a formal arXiv preprint [2] and public acknowledgment from combinatorialist Gil Kalai [3]. Google DeepMind's Gemini with Deep Think achieved official gold-medal standard at the 2025 International Mathematical Olympiad, confirmed across multiple outlets including Ars Technica [4] and Simon Willison's widely-read analysis [5]. Terence Tao—perhaps the most credentialed living mathematician—has emerged as a central figure: he maintains a public GitHub wiki specifically tracking AI contributions to Erdős problems [11], was profiled in The Atlantic on his AI use [12], and is the subject of an r/math thread checking his earlier predictions against current AI performance [13].

Why it matters

Two independent organizations have produced results that survive or invite expert mathematical scrutiny—one in open-conjecture discovery, one in competition mathematics—within a compressed window, and the world's most prominent active mathematician is now actively curating AI's contributions to his own research domain. If the pace holds, AI-assisted mathematics shifts from a demonstration to a workflow, with downstream consequences for cryptography, software verification, and any field anchored in proof-level certainty.

Open questions

Terence Tao maintains a GitHub wiki tracking AI contributions to Erdős problems [11]—what specific results does it record, and does it confirm or qualify the earlier claim that he personally verified three Erdős proofs in seven days [10]?
Multiple sources describe OpenAI as 'under fire' in the wake of DeepMind's IMO gold medal [9]—what specifically is the criticism, and is it directed at the validity of the Erdős disproof, the broader capability claims, or something else?
The Atlantic profiled Tao's use of AI in February 2026 [12], and an r/math thread is actively checking his earlier predictions against current AI performance [13]—has Tao publicly updated his assessment of AI's mathematical trajectory in light of the Erdős and IMO results?
Does Gemini's IMO gold medal [4][5] reflect the same underlying capability as OpenAI's open-conjecture disproof, or is competitive mathematics (time-bounded, well-scoped problems) a structurally distinct task from discovering counterexamples in the wild—and does the 'From Silver to Gold' trajectory [8] suggest a capability curve or a narrow optimization?

Narrative

In late May 2026, a cluster of AI-assisted mathematical results emerged from multiple research organizations and drew engagement from mathematicians, the scientific press, AI skeptics, and a growing broader public. The anchoring event was OpenAI's announcement that an unreleased, general-purpose reasoning model had produced a counterexample disproving the Erdős unit distance conjecture—a discrete-geometry problem posed by Paul Erdős in 1946 [1]. Critically, the model received no special mathematical training or problem-specific scaffolding. Princeton mathematician Will Sawin subsequently sharpened the result, external mathematicians co-signed verification, a formal arXiv preprint appeared [2], and prominent combinatorialist Gil Kalai—who had worked on closely related problems—acknowledged the result on his widely-read mathematics blog as 'Amazing' [3]. Separately and independently, Google DeepMind announced that Gemini with Deep Think had achieved official gold-medal standard at the 2025 International Mathematical Olympiad, a threshold no AI had previously cleared, with coverage confirmed across Ars Technica [4], Simon Willison's analysis [5], Hacker News [6], and Reddit's r/singularity [7]. A Medium deep-dive traced the trajectory from DeepMind's earlier silver-medal performance to gold [8], and a YouTube video framed the moment as 'Google Takes the Gold' while noting OpenAI is simultaneously 'under fire' [9]—suggesting the competitive and credibility dynamics between the two organizations have sharpened alongside the mathematical results.

The figure who has come most prominently into focus across this period is Terence Tao. Already reported to have personally verified several AI-assisted Erdős proofs [10], Tao is now confirmed to be actively maintaining a public GitHub wiki on his erdosproblems repository specifically dedicated to tracking AI contributions to Erdős problems [11]—strong evidence that his engagement is sustained and curatorial rather than incidental. The Atlantic published a longform profile in February 2026 on how Tao uses AI in his work [12], and r/math has an active thread asking how Tao's earlier predictions about AI mathematical capability are holding up now that 2026 has arrived [13]. YouTube has also published content specifically on Tao's AI practices [14]. This convergence of primary evidence positions Tao not merely as an external validator of AI results but as an active participant in an evolving human-AI mathematical workflow.

The architectural debate about what these results demonstrate has sharpened alongside the results themselves. DeepMind's approach pairs large language models (for idea generation) with the Lean theorem prover (for step-by-step verification), with each reasoning move checked before proceeding [15][16]. Harmonic's Aristotle system similarly produces Lean-checkable proofs [17], with co-founder Tudor Achim maintaining that AI could prove the Riemann Hypothesis by 2028. This formal-constraint architecture contrasts with OpenAI's result, which came from a general-purpose model without formal-system grounding. AI commentator Rohan Paul has argued that formal-system success operates inside 'carefully constrained worlds' rather than demonstrating open-ended mathematical reasoning [16], while The Neuron contended that proofs—precisely because they require line-by-line expert review—constitute a stronger test of AI reasoning than benchmarks [1]. Community discussions on Hacker News and Reddit's r/math have examined what Lean's proof-checking actually guarantees in practice [18][19]. AI critic Gary Marcus has published a piece explicitly checking whether AI math headlines match underlying results [20], introducing the first prominent AI-skeptic voice into coverage that had been largely enthusiastic. Coverage has broadened to physics news aggregators [21], academic course materials [22], and YouTube [9][14], indicating the story has moved from specialized AI discourse into the broader scientific education conversation.

Timeline

1946-01-01: Paul Erdős poses the unit distance conjecture in discrete geometry [1]
2026-02-01: The Atlantic publishes longform profile 'The Edge of Mathematics' on how Terence Tao uses AI [12]
2026-05-20: OpenAI announces its unreleased reasoning model has disproved the Erdős unit distance conjecture; Harmonic podcast on Aristotle and formal verification published [23][17]
2026-05-21: Widespread media amplification including The Guardian; Gil Kalai publishes blog post acknowledging the disproof as 'Amazing'; social media reaction intensifies [24][3][33][34]
2026-05-22: The Neuron analytical piece published; arXiv preprint 'Remarks on the disproof' appears; Reddit, New Scientist, phys.org amplify; DeepMind Lean-grounded theorem-proving architecture discussed [1][16][2][35][29][36][21]
2026-05-23: Reports emerge that three Erdős problems fell within seven days with Terence Tao verifying each proof; Gary Marcus publishes critical review of AI math headlines [10][20]
2026-05-24: Google DeepMind's Gemini with Deep Think IMO gold-medal result amplified across Ars Technica, Simon Willison's blog, Hacker News, Reddit r/singularity, and YouTube; Tao's erdosproblems GitHub wiki tracking AI contributions surfaces; r/math thread checks Tao's earlier AI predictions against 2026 reality [25][26][5][4][8][6][9][7][11][13][14]

Perspectives

OpenAI

A general-purpose reasoning model with no mathematical specialization disproved an 80-year-old open conjecture, demonstrating that mathematical discovery capability is emerging in frontier models without targeted engineering

Evolution: Credibility has been bolstered by Gil Kalai's public acknowledgment [9426] and the arXiv preprint [9425], but the 'Google Takes the Gold' framing [19809] and reports that OpenAI is 'under fire' suggest new competitive pressure and scrutiny of its broader AI math claims

[1][23][24][2][9]

Google DeepMind

Has produced two distinct landmark results: a Lean-grounded theorem-proving system where every reasoning step is formally verified, and Gemini with Deep Think achieving official gold-medal standard at the International Mathematical Olympiad

Evolution: The IMO gold medal result is now confirmed across multiple independent outlets including Ars Technica, Simon Willison's blog, and community discussions on HN and Reddit, substantially strengthening DeepMind's position relative to the previous synthesis

[16][25][26][27][15][5][4][8][6][7]

Terence Tao

Actively engaged with AI's mathematical contributions: maintains a public GitHub wiki on his erdosproblems repository tracking AI results, was profiled in The Atlantic on how he uses AI, and is reported to have personally verified AI-assisted Erdős proofs

Evolution: Significantly expanded from previous synthesis; the GitHub wiki [19803] confirms Tao's engagement is sustained and curatorial rather than incidental, and the Atlantic profile [19802] provides depth on his AI workflow. He is now the most prominent mathematician visibly integrating AI into his own research practice

[11][12][14][10][13]

Harmonic (Tudor Achim)

Formal verification—machine-checkable proofs in Lean—is the key epistemological shift; AI could reach the Riemann Hypothesis by 2028; near-term applications in software and hardware are already within reach

Evolution: Consistent and promotional; no new substantive claims beyond the founding thesis

[17][28][29][30][31][32]

Gil Kalai (mathematician)

Acknowledged the Erdős unit distance disproof publicly as 'Amazing' and attributed it explicitly to AI—a notable signal given his prominence in combinatorics and prior work on related problems

Evolution: Consistent with previous synthesis; his engagement remains a key marker of mathematical establishment acceptance

[3]

Gary Marcus (AI critic)

Published a piece explicitly checking whether AI math headlines match underlying results; skeptical of headline claims about both OpenAI and Anthropic mathematical achievements

Evolution: Consistent with previous synthesis; remains the most prominent AI-skeptic voice in an otherwise enthusiastic coverage landscape

[20]

Rohan Paul (AI commentator)

Deliberately nuanced: AI's formal math success operates inside 'carefully constrained worlds' where every step is verifier-checked—not evidence of open-ended mathematical reasoning; the architectural constraint is the mechanism, not broad generalization

Evolution: Consistent with previous synthesis; DeepMind's publicly explained LLM+Lean pairing continues to support his framing

[16][15]

r/math and Hacker News communities

Engaged and increasingly interrogatory: r/math is explicitly checking Tao's earlier AI predictions against 2026 reality [9487]; HN is scrutinizing what Lean's verification actually guarantees [12965][19808]; r/singularity is enthusiastic about the IMO gold medal [19810]

Evolution: Community discourse has fragmented along subreddit lines—r/math takes a skeptical-evaluative posture toward predictions, r/singularity reads the IMO result as confirmatory of rapid AI capability growth; HN occupies a technical-interrogatory middle

[13][18][19][6][7]

Tensions

Rohan Paul and DeepMind's framing holds that AI mathematical success is confined to 'carefully constrained worlds' where every step is verifier-checked—not evidence of open-ended mathematical reasoning—while The Neuron and OpenAI's result suggest a general-purpose model can discover counterexamples to open problems without formal-system constraints or mathematical specialization [16][1][17][15]
Gary Marcus is explicitly checking whether AI math headlines match underlying results [12963], positioning himself against enthusiastic coverage from The Neuron, Nature, Ars Technica, and Simon Willison, which have treated the OpenAI and DeepMind results as genuine breakthroughs [20][1][25][4][5]
Tudor Achim (Harmonic) predicts AI could prove the Riemann Hypothesis by 2028, an extraordinarily aggressive timeline that sits in unresolved tension with the cautious, constraint-emphasizing framing of DeepMind's published research and the skeptical register of Gary Marcus's critique [17][16][20][25]
OpenAI is reported to be 'under fire' as DeepMind claims the IMO gold medal [19809], suggesting a competitive credibility contest is developing between the two organizations over who has demonstrated the more meaningful or more verifiable form of AI mathematical capability [9][1][25][4]
Whether Terence Tao's active curation of AI contributions to Erdős problems [19803] and the r/math community checking his earlier predictions [9487] indicates growing mathematical-establishment acceptance of AI as a genuine research tool, or merely sophisticated tracking of an external phenomenon, remains unresolved [11][13][12][10]

Sources

[1] 😸 OpenAI solved an 80-year math problem by... disproving it — The Neuron (2026-05-22)
[2] Remarks on the disproof of the unit distance conjecture - arXiv — reactive:openai-erdos-math-breakthrough
[3] Amazing: Erdős' Unit Distance Problem was Disproved! It was ... — reactive:openai-erdos-math-breakthrough
[4] Gemini Deep Think learns math, wins gold medal at International Math Olympiad - Ars Technica — reactive:ai-formal-math-breakthroughs
[5] Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad — reactive:ai-formal-math-breakthroughs
[6] Gemini with Deep Think achieves gold-medal standard at the IMO | Hacker News — reactive:ai-formal-math-breakthroughs
[7] Gemini Deep Think achieved Gold at IMO : r/singularity - Reddit — reactive:ai-formal-math-breakthroughs
[8] From Silver to Gold: An In-Depth Analysis of Google's Gemini Deep ... — reactive:ai-formal-math-breakthroughs
[9] Google Takes the Gold. OpenAI under fire. - YouTube — reactive:ai-formal-math-breakthroughs
[10] Three Erdős Problems Fell in Seven Days, and Terence Tao Verified ... — reactive:ai-formal-math-breakthroughs
[11] AI contributions to Erdős problems · teorth/erdosproblems Wiki — reactive:ai-formal-math-breakthroughs
[12] The Edge of Mathematics - The Atlantic — reactive:ai-formal-math-breakthroughs
[13] Now that it's 2026, how is Terence Tao's prediction holding up? : r/math — reactive:openai-erdos-math-breakthrough
[14] Terence Tao – How the world’s top mathematician uses AI — reactive:ai-formal-math-breakthroughs
[15] @tomflex @prz_chojecki Sure! DeepMind built AI agents that pair LLMs (for generating ideas) with the Lean theorem prover... — reactive:ai-formal-math-breakthroughs (2026-05-24)
[16] Google DeepMind's new paper. — Rohan Paul Twitter (2026-05-22)
[17] 😺 🎙️ PODCAST: Can AI Solve Math's Biggest Mystery? — The Neuron (2026-05-20)
[18] I would say that there is very little danger of a proof in Lean being ... — reactive:ai-formal-math-breakthroughs
[19] Thoughts on LEAN, the proof checker : r/math - Reddit — reactive:ai-formal-math-breakthroughs
[20] Checking the math behind OpenAI and Anthropic's latest headlines — reactive:ai-formal-math-breakthroughs
[21] AI makes a major breakthrough in a math problem that had stumped experts for decades — reactive:openai-erdos-math-breakthrough
[22] Major Breakthroughs in Lean 4-Based Auto-Formalized Mathematics — reactive:ai-formal-math-breakthroughs
[23] OpenAI claims it solved an 80-year-old math problem - TechCrunch — reactive:ai-formal-math-breakthroughs
[24] OpenAI makes breakthrough on 80-year-old maths problem — reactive:openai-erdos-math-breakthrough
[25] Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad — reactive:ai-formal-math-breakthroughs
[26] Olympiad-level formal mathematical reasoning with reinforcement ... — reactive:ai-formal-math-breakthroughs
[27] google-deepmind/formal-conjectures - GitHub — reactive:ai-formal-math-breakthroughs
[28] [PDF] Aristotle: IMO-level Automated Theorem Proving - arXiv — reactive:ai-formal-math-breakthroughs
[29] Aristotle from Harmonic just proved Erdos Problem #124 in Lean all ... — reactive:ai-formal-math-breakthroughs
[30] I work at Harmonic, the company behind Aristotle. To clear up a few misconceptio... | Hacker News — reactive:ai-formal-math-breakthroughs
[31] Harmonic — reactive:ai-formal-math-breakthroughs
[32] Harmonics Proves a Tough Mathematics Problem. — reactive:ai-formal-math-breakthroughs
[33] 🚨 OPENAI MATH BREAKTHROUGH 🚨 — reactive:ai-formal-math-breakthroughs (2026-05-21)
[34] OpenAI's internal model disproves Unit Distance Conjecture of Erdos — reactive:openai-erdos-math-breakthrough
[35] Google DeepMind: "Olympiad-level formal mathematical reasoning ... — reactive:ai-formal-math-breakthroughs
[36] Mathematicians stunned by AI's biggest breakthrough in ... — reactive:ai-formal-math-breakthroughs