AI Systems Achieve Verifiable Mathematical Reasoning · history
Version 2
2026-05-25 06:00 UTC · 45 items
What
In late May 2026, AI-assisted mathematics produced a rapid cluster of landmark results. OpenAI's unreleased general-purpose reasoning model disproved the Erdős unit distance conjecture (open since 1946), with an arXiv preprint formalizing the result [2] and prominent combinatorialist Gil Kalai acknowledging it publicly [3]. A Medium report claims three Erdős problems fell within a seven-day window, with Fields Medalist Terence Tao reportedly verifying each proof himself [4]—a claim that, if confirmed, would substantially broaden the story. Separately, Google DeepMind announced that Gemini with Deep Think has officially achieved gold-medal standard at the International Mathematical Olympiad, published in Nature [5][6]. AI critic Gary Marcus has now published a critical review checking whether the headlines match the underlying results [15].
Why it matters
Multiple independent organizations have produced verifiable or independently-checked mathematical results within days of each other—a convergence that is harder to dismiss than any single demonstration. If AI can routinely resolve longstanding open problems and compete at IMO gold-medal level across diverse mathematical domains, the pace of discovery could accelerate with downstream effects on software verification, cryptography, and any field where proof-level certainty matters. The entry of prominent skeptics and senior mathematicians into the debate signals the community is moving from initial reaction to rigorous assessment.
Open questions
A Medium report claims Terence Tao personally verified three Erdős proofs in seven days [4]—what is the precise status of these results, and has this been independently confirmed beyond a single secondary source?
Does Gemini's gold-medal performance at the IMO [5] reflect the same generalizable reasoning capability as OpenAI's open-problem disproof, or is competitive mathematics (with well-scoped, time-bounded problems) a distinct capability from conjecture discovery in the wild?
Gary Marcus's piece explicitly 'checks the math' behind AI math headlines [15]—which specific claims does he identify as overstated, and do working mathematicians share those concerns?
Is Tudor Achim's prediction that AI could prove the Riemann Hypothesis by 2028 [10] a grounded extrapolation from the current trajectory, or does it leap past a meaningful capability discontinuity that the IMO and Erdős results do not actually bridge?
Narrative
In late May 2026, a cluster of AI-assisted mathematical results emerged from three major research organizations, drawing engagement from mathematicians, the scientific press, AI skeptics, and a growing broader public. The anchoring event was OpenAI's announcement that an unreleased, general-purpose reasoning model had produced a counterexample disproving the Erdős unit distance conjecture—a discrete-geometry problem posed by Paul Erdős in 1946 [1]. Critically, the model received no special mathematical training or problem-specific scaffolding. Princeton mathematician Will Sawin subsequently sharpened the result, and external mathematicians including some former skeptics of OpenAI's math claims co-signed verification [1]. A formal arXiv preprint titled 'Remarks on the disproof of the unit distance conjecture' appeared shortly after [2], and prominent combinatorialist Gil Kalai—who has worked on closely related problems—acknowledged the result on his widely-read mathematics blog under the headline 'Amazing: Erdős' Unit Distance Problem was Disproved! It was achieved by AI' [3]. The Neuron argued that a mathematical proof provides a stronger test of AI reasoning than standard benchmarks precisely because it must survive line-by-line expert review [1].
The scope then expanded. A report on Medium claimed that three Erdős problems fell within a seven-day window, with Fields Medalist Terence Tao personally verifying every proof [4]—a claim whose sourcing warrants caution but which, if accurate, would represent an extraordinary concentration of AI-assisted mathematical discovery endorsed by one of the most credible figures in the field. Separately and independently, Google DeepMind announced that an advanced version of Gemini with Deep Think has officially achieved gold-medal standard at the International Mathematical Olympiad, with findings published in Nature [5][6]. This adds a second distinct system and a second distinct domain to the picture: competitive mathematics at the IMO is designed to resist pattern-matching, and gold-medal performance is a threshold no AI had previously cleared. DeepMind also maintains a public formal-conjectures GitHub repository [7], suggesting sustained organizational effort rather than an isolated result. Harmonic continued publishing news about its Lean-verified proof system Aristotle [8][9], with co-founder Tudor Achim maintaining his prediction that AI could prove the Riemann Hypothesis by 2028 [10].
The architectural debate about what these results demonstrate has sharpened alongside the results themselves. DeepMind's approach pairs large language models (for idea generation) with the Lean theorem prover (for step-by-step verification), with each reasoning move checked before proceeding [11][12]. Harmonic's Aristotle similarly produces Lean-checkable proofs [10]. This formal-constraint architecture contrasts with OpenAI's result, which came from a general-purpose model without formal-system grounding. Commentator Rohan Paul has framed DeepMind's method as operating inside 'carefully constrained worlds' rather than demonstrating open-ended mathematical reasoning [12]. Community discussions on Hacker News and Reddit's r/math have examined what Lean's verification actually guarantees, with one comment noting 'very little danger of a proof in Lean being wrong' [13][14]. The most pointed external challenge to the overall narrative came from AI critic Gary Marcus, who published a piece explicitly checking whether AI math claims match underlying results [15]—bringing a persistent skeptical perspective that had been absent from initial coverage. Coverage has continued to broaden across YouTube [16][17][18], physics news aggregators [19], and academic course materials [20], indicating the story has become embedded in the scientific education community alongside the AI press.
Timeline
- 1946-01-01: Paul Erdős poses the unit distance conjecture in discrete geometry [1]
- 2026-05-20: OpenAI announces its unreleased reasoning model has disproved the Erdős unit distance conjecture; TechCrunch reports; Harmonic podcast on Aristotle and formal verification published [21][10]
- 2026-05-21: Widespread media amplification including The Guardian; Gil Kalai publishes blog post acknowledging the disproof as 'Amazing'; social media reaction intensifies [22][3][26][27]
- 2026-05-22: The Neuron analytical piece; DeepMind Lean-grounded theorem-proving paper discussed by commentators; arXiv preprint 'Remarks on the disproof' published; Reddit, New Scientist, phys.org amplify [1][12][2][28][24][29][19]
- 2026-05-23: Reports emerge that three Erdős problems fell within seven days with Terence Tao verifying each proof; Gary Marcus publishes critical review of AI math headlines [4][15]
- 2026-05-24: Google DeepMind announces Gemini with Deep Think achieves gold-medal standard at the 2025 IMO, published in Nature; Grok explains DeepMind's LLM+Lean architecture; YouTube and academic course coverage broadens [5][6][11][16][20][18]
Perspectives
OpenAI
A general-purpose reasoning model with no mathematical specialization disproved an 80-year-old open conjecture, demonstrating that mathematical discovery capability is emerging in frontier models without targeted engineering
Evolution: Consistent with OpenAI's capability-forward framing; Gil Kalai's blog acknowledgment [9426] and the arXiv preprint [9425] have added credibility from the mathematical establishment that was not present at initial announcement
Google DeepMind
Has now produced two distinct landmark results: a Lean-grounded theorem-proving system where every reasoning step is verified before proceeding, and Gemini with Deep Think achieving official gold-medal standard at the International Mathematical Olympiad
Evolution: Significantly expanded since previous synthesis; the IMO gold-medal result [12968][12970] broadens DeepMind's demonstrated scope well beyond formal-verification-constrained open-conjecture work, adding competitive problem-solving to the portfolio
Harmonic (Tudor Achim)
Formal verification—machine-checkable proofs in Lean—is the key epistemological shift; AI could reach the Riemann Hypothesis by 2028; near-term applications in software and hardware are already within reach
Evolution: Consistent and promotional; continued publishing activity [18141][18140] but no new substantive claims beyond the founding thesis
Gil Kalai (mathematician)
Acknowledged the Erdős unit distance disproof publicly as 'Amazing' and attributed it explicitly to AI—a notable signal given his prominence in combinatorics and prior work on related problems
Evolution: First appearance in thread; his engagement marks the mathematical establishment beginning to formally acknowledge these results rather than treating them as AI-community hype
Gary Marcus (AI critic)
Published a piece explicitly checking whether AI math headlines match underlying results; skeptical of headline claims about both OpenAI and Anthropic mathematical achievements
Evolution: First appearance in thread; introduces the first prominent AI-skeptic voice, which had been entirely absent from initial coverage dominated by enthusiastic outlets
Rohan Paul (AI commentator)
Deliberately nuanced: AI's formal math success operates inside carefully constrained worlds and should not be read as the system reasoning like a human mathematician; the architectural constraint of checking each step in Lean is the mechanism, not open-ended reasoning
Evolution: Consistent with previous synthesis; DeepMind's publicly explained LLM+Lean pairing [18139] continues to support his framing
The Neuron / Grant Harvey
Enthusiastic and analytical; argues the OpenAI result is a genuinely meaningful signal because proofs require line-by-line expert verification, and notes AI may surface results humans lacked the incentive to pursue
Evolution: Consistent promotional-analytical stance; coverage continues to amplify both OpenAI and Harmonic results
External mathematicians (Will Sawin, Gil Kalai, Terence Tao)
Will Sawin sharpened the unit-distance disproof; Gil Kalai acknowledged it; Terence Tao is reported to have personally verified three Erdős proofs within seven days—lending extraordinary mathematical authority to a broader set of results if confirmed
Evolution: Significantly expanded from previous synthesis; Tao's reported involvement [18136] would add the most credentialed name yet to the verification record, though the claim requires independent confirmation
Hacker News and r/math community
Engaged but interrogatory: HN commenters note 'very little danger of a proof in Lean being wrong' [12965], while r/math users are actively working through what Lean's proof-checking actually guarantees in practice [12966]
Evolution: Community scrutiny of formal verification reliability is more prominent than in previous synthesis, reflecting a shift from 'is this real?' to 'what does this actually mean?'
Tensions
- Rohan Paul and DeepMind's framing holds that AI mathematical success is confined to 'carefully constrained worlds' where every step is verifier-checked—not evidence of open-ended mathematical reasoning—while The Neuron and OpenAI's result suggest a general-purpose model can discover counterexamples to open problems without formal-system constraints or mathematical specialization [12][1][10][11]
- Gary Marcus is explicitly checking whether AI math headlines match underlying results [12963], positioning himself against enthusiastic coverage from The Neuron, Nature, and Quanta Magazine, which have treated the OpenAI and DeepMind results as genuine breakthroughs [15][1][5][6]
- Tudor Achim (Harmonic) predicts AI could prove the Riemann Hypothesis by 2028, an extraordinarily aggressive timeline that sits in unresolved tension with the cautious, constraint-emphasizing framing of DeepMind's published research and the skeptical register of Gary Marcus's critique [10][12][15][5]
- Whether the right benchmark for AI mathematical capability is performance on formally verifiable proof tasks and scoped competition problems (Lean-grounded systems, IMO gold medal) or open-ended conjecture discovery without special scaffolding (OpenAI's Erdős result) reflects a deeper disagreement about what 'mathematical reasoning' means for AI systems [1][12][10][5][6]
Sources
- [1] 😸 OpenAI solved an 80-year math problem by... disproving it — The Neuron (2026-05-22)
- [2] Remarks on the disproof of the unit distance conjecture - arXiv — reactive:openai-erdos-math-breakthrough
- [3] Amazing: Erdős' Unit Distance Problem was Disproved! It was ... — reactive:openai-erdos-math-breakthrough
- [4] Three Erdős Problems Fell in Seven Days, and Terence Tao Verified ... — reactive:ai-formal-math-breakthroughs
- [5] Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad — reactive:ai-formal-math-breakthroughs
- [6] Olympiad-level formal mathematical reasoning with reinforcement ... — reactive:ai-formal-math-breakthroughs
- [7] google-deepmind/formal-conjectures - GitHub — reactive:ai-formal-math-breakthroughs
- [8] Harmonic — reactive:ai-formal-math-breakthroughs
- [9] Harmonics Proves a Tough Mathematics Problem. — reactive:ai-formal-math-breakthroughs
- [10] 😺 🎙️ PODCAST: Can AI Solve Math's Biggest Mystery? — The Neuron (2026-05-20)
- [11] @tomflex @prz_chojecki Sure! DeepMind built AI agents that pair LLMs (for generating ideas) with the Lean theorem prover... — reactive:ai-formal-math-breakthroughs (2026-05-24)
- [12] Google DeepMind's new paper. — Rohan Paul Twitter (2026-05-22)
- [13] I would say that there is very little danger of a proof in Lean being ... — reactive:ai-formal-math-breakthroughs
- [14] Thoughts on LEAN, the proof checker : r/math - Reddit — reactive:ai-formal-math-breakthroughs
- [15] Checking the math behind OpenAI and Anthropic's latest headlines — reactive:ai-formal-math-breakthroughs
- [16] AI just disproved the biggest math conjecture so far - YouTube — reactive:ai-formal-math-breakthroughs
- [17] OpenAI has disproved Erdős' unit-distance conjecture - YouTube — reactive:ai-formal-math-breakthroughs
- [18] IMO 2025 - Will AI Finally Win Gold? 🥇 - YouTube — reactive:ai-formal-math-breakthroughs
- [19] AI makes a major breakthrough in a math problem that had stumped experts for decades — reactive:openai-erdos-math-breakthrough
- [20] Major Breakthroughs in Lean 4-Based Auto-Formalized Mathematics — reactive:ai-formal-math-breakthroughs
- [21] OpenAI claims it solved an 80-year-old math problem - TechCrunch — reactive:ai-formal-math-breakthroughs
- [22] OpenAI makes breakthrough on 80-year-old maths problem — reactive:openai-erdos-math-breakthrough
- [23] [PDF] Aristotle: IMO-level Automated Theorem Proving - arXiv — reactive:ai-formal-math-breakthroughs
- [24] Aristotle from Harmonic just proved Erdos Problem #124 in Lean all ... — reactive:ai-formal-math-breakthroughs
- [25] I work at Harmonic, the company behind Aristotle. To clear up a few misconceptio... | Hacker News — reactive:ai-formal-math-breakthroughs
- [26] 🚨 OPENAI MATH BREAKTHROUGH 🚨 — reactive:ai-formal-math-breakthroughs (2026-05-21)
- [27] OpenAI's internal model disproves Unit Distance Conjecture of Erdos — reactive:openai-erdos-math-breakthrough
- [28] Google DeepMind: "Olympiad-level formal mathematical reasoning ... — reactive:ai-formal-math-breakthroughs
- [29] Mathematicians stunned by AI's biggest breakthrough in ... — reactive:ai-formal-math-breakthroughs