The Information Machine

AI Systems Achieve Verifiable Mathematical Reasoning

cooling · v5 · 2026-06-05 · 78 items · history

What's new in v5

Two substantive additions this pass. First, AlphaProof Nexus has moved from a name mention to a system with concrete enumerated results: it solved 9 open Erdős problems and 44 OEIS conjectures [21218][21768] — Zvi Mowshowitz considers this a landmark milestone but notes it received almost no mainstream media coverage, a new coverage-gap tension. Second, Fields Medalist Tim Gowers and University of Toronto professor Daniel Litt have explicitly confirmed the OpenAI Erdős result as substantively exciting rather than merely indicative [23058], the most senior mathematician endorsement of a specific AI result to date. A Google research paper also shows structured proof-decomposition raises LLM formal math performance from under 10% to 70% [24702], adding a new architectural data point to the formal-verification debate.

What

Two AI systems have now produced substantive, expert-validated results in open-ended research mathematics, not just competition benchmarks. OpenAI's general-purpose reasoning model disproved the Erdős unit distance conjecture (open since 1946), and Fields Medal laureate Tim Gowers has since confirmed this as 'a milestone in AI mathematics' [7]. Google DeepMind's AlphaProof Nexus has solved 9 open Erdős problems and 44 OEIS conjectures — results Zvi Mowshowitz calls a landmark milestone that nonetheless received almost no mainstream media coverage [9][8]. A separate Google research paper shows that structuring proof-writing into planning and step-by-step checking raises general LLM performance on formal math benchmarks from under 10% to 70% [12].

Why it matters

Expert mathematician validation — from Tim Gowers, Daniel Litt, and Terence Tao's own curation activity — now confirms that at least some AI-produced results are substantively exciting rather than merely indicative. The volume is growing fast: one system resolving 44 conjectures at once puts direct pressure on whether the mathematical community can verify results at the pace they are being produced.

Open questions

  • AlphaProof Nexus solved 9 Erdős problems and 44 OEIS conjectures [8][9] — what are the specific problems, how were they independently verified, and how do they compare in difficulty to the OpenAI unit distance disproof?

  • Zvi called the AlphaProof Nexus results a landmark milestone that got almost no mainstream coverage [9] — is the formal-verification and mathematical community engaging with them, and what does the Xena Project or peer-review infrastructure make of 44 simultaneous conjecture resolutions?

  • Does the structured proof-decomposition approach (planning + step-by-step checking, <10% to 70% on formal math benchmarks) [12] underlie AlphaProof Nexus's results, or is it an independent architectural line?

  • Tao's concern about 'mass-produced mathematics at scale' [17] predates AlphaProof Nexus's 44-conjecture announcement — has the mathematical community developed any institutional response to the review-capacity question?

Narrative

In 2025 and early 2026, AI systems reached levels of mathematical performance that had previously seemed distant. Google DeepMind and OpenAI both achieved gold-medal-level performance at the 2025 International Mathematical Olympiad [1][2][3] — the first AI systems to clear that threshold. Those results validated competition mathematics, where problems are pre-selected and success criteria are fixed. The more contested question has been whether AI can produce genuine research mathematics: open problems, no curated problem set, no guaranteed solution exists.

On that question, expert mathematician endorsement has now accumulated around OpenAI's Erdős unit distance disproof. The result — a counterexample to a discrete-geometry conjecture open since 1946, produced by a general-purpose reasoning model with no formal-system scaffolding [4] — was confirmed in an arXiv preprint [5] and acknowledged by combinatorialist Gil Kalai [6]. Fields Medal laureate Tim Gowers has since gone further, stating 'there is no doubt that the solution to the unit-distance problem is a milestone in AI mathematics' [7]. University of Toronto professor Daniel Litt called it 'the first example of a result produced autonomously by an AI that I find exciting in itself, as opposed to as a leading indicator' [7]. Google DeepMind's AlphaProof Nexus has produced results in different terms: the system solved 9 open Erdős problems and 44 OEIS conjectures [8][9], a volume of results Zvi Mowshowitz describes as a landmark milestone — one that received almost no mainstream media coverage [9].

The methodological gap between OpenAI and DeepMind's approaches remains substantive. DeepMind's architecture pairs large language models with the Lean theorem prover, formally verifying each reasoning step before proceeding [10][11]. OpenAI's Erdős counterexample came from a general-purpose model with no such scaffolding — a result that survived external mathematician review but does not carry the machine-checkable audit trail. A Google research paper reported separately that breaking proof-writing into planning and step-by-step checking raised general LLM performance on formal math benchmarks from under 10% to 70% [12], suggesting structured decomposition — not necessarily full Lean verification — may be the operative mechanism. The University of Washington Math AI Lab's dataset of erroneous Lean proofs [13] adds pressure on the formal-verification side: if the verification layer itself produces false proofs, the trust model is not as clean as the architecture suggests.

The mathematician who has come most visibly into focus across this period is Terence Tao. He actively curates AI contributions on his Erdős problems GitHub wiki [14], was profiled in The Atlantic on his AI workflow [15], and has personally verified several AI-assisted results [16]. But Tao has also publicly observed 'more and more mass-produced mathematics at scale' [17] — a quality concern whose weight is growing as systems announce dozens of conjecture resolutions at once. Gary Marcus has continued auditing AI math headlines against underlying results [18], occupying the visible skeptical position. Harmonic's Tudor Achim, at the optimistic end, has predicted AI could prove the Riemann Hypothesis by 2028 [19], a timeline that sits in unresolved tension with the caution of DeepMind's published research and Marcus's critique alike.

Timeline

  • 1946-01-01: Paul Erdős poses the unit distance conjecture in discrete geometry [4]
  • 2024-07-01: DeepMind's AlphaProof and AlphaGeometry 2 earn silver medal at IMO 2024 [28]
  • 2025-07-21: NYT reports Google AI wins gold at IMO 2025; OpenAI also claims gold — first AI systems to clear this threshold [1][2][3]
  • 2025-08-03: Xena Project publishes mathematical community round-up of AI performance at IMO 2025 [21]
  • 2026-02-01: The Atlantic publishes longform profile on how Terence Tao uses AI in his research [15]
  • 2026-05-20: OpenAI announces unreleased reasoning model disproved the Erdős unit distance conjecture [4][19]
  • 2026-05-21: Gil Kalai acknowledges disproof as 'Amazing'; widespread media amplification including The Guardian [6][29][30]
  • 2026-05-22: arXiv preprint on the disproof appears; formal verification debate surfaces in coverage [5][10]
  • 2026-05-23: Reports of three Erdős problems falling in seven days with Tao verifying each; Gary Marcus publishes critical review of AI math headlines [16][18]
  • 2026-05-24: Tao's GitHub wiki tracking AI contributions confirmed; Tao quoted on 'mass-produced mathematics at scale'; AlphaProof Nexus first mentioned [14][17][22]
  • 2026-05-28: Zvi Mowshowitz reports AlphaProof Nexus solved 9 Erdős problems and 44 OEIS conjectures; notes landmark result with almost no mainstream media coverage [9][8]
  • 2026-06-01: Ars Technica: Fields Medalist Tim Gowers and U of T professor Daniel Litt explicitly validate OpenAI's Erdős result as substantively exciting rather than merely indicative [7]
  • 2026-06-04: Google research paper: structured proof-writing (planning + step-by-step checking) raises general LLM formal math performance from under 10% to 70% [12]

Perspectives

OpenAI

A general-purpose reasoning model with no mathematical specialization disproved an 80-year-old open conjecture and achieved gold-medal performance at IMO 2025, demonstrating discovery capability without formal-system scaffolding.

Evolution: Stronger now: Fields Medalist Tim Gowers and Daniel Litt have explicitly validated the Erdős result as substantively exciting [23058], moving expert endorsement from acknowledgment to explicit enthusiasm.

Google DeepMind

Achieved gold at IMO 2025 with Gemini Deep Think, maintains a Lean-grounded theorem-proving architecture where every step is formally verified, and AlphaProof Nexus has now solved 9 open Erdős problems and 44 OEIS conjectures.

Evolution: AlphaProof Nexus has moved from a name mention to a system with concrete, enumerated results [21218][21768]; the volume of results (44 OEIS conjectures) is substantially larger than previously reported.

Terence Tao

Actively curates AI contributions on his Erdős problems GitHub wiki and has personally verified several AI-assisted results, while publicly expressing concern about 'more and more mass-produced mathematics at scale.'

Evolution: Consistent dual position as engaged participant and quality skeptic; the quality concern gains weight now that AlphaProof Nexus has announced 44 conjecture resolutions simultaneously.

Tim Gowers and Daniel Litt (mathematician validators)

Both explicitly confirm the OpenAI Erdős disproof as substantively significant — Gowers calls it 'a milestone in AI mathematics'; Litt calls it the first AI result he finds 'exciting in itself, as opposed to as a leading indicator.'

Evolution: New in this pass; these are the most senior mathematician endorsements of any specific AI math result to date, adding a layer of credibility beyond community acknowledgment.

Harmonic (Tudor Achim)

Formal verification is the key epistemological shift enabling trustworthy AI mathematics; AI could prove the Riemann Hypothesis by 2028.

Evolution: Consistent and promotional; no new substantive claims beyond the founding thesis.

Gary Marcus

Explicitly auditing whether AI math headlines match underlying results; skeptical of headline claims about OpenAI and Anthropic mathematical achievements.

Evolution: Consistent; the most prominent skeptical voice, now positioned against explicit Fields Medalist endorsement of the OpenAI result.

Rohan Paul (AI commentator)

AI formal math success operates in structured contexts; now reports Google research showing proof-decomposition into planning and step-by-step checking raises LLM formal math from under 10% to 70%.

Evolution: Expanded from constraint-emphasis to active reporting of architectural improvements [24702]; positions structured decomposition as the operative mechanism rather than full Lean verification.

Mathematical community (Zvi, Xena Project, r/math, LessWrong, Hacker News)

Fragmented: Xena Project provides formal-verification-community context; LessWrong and Hacker News probe what verification actually guarantees; Zvi notes that landmark results like AlphaProof Nexus receive almost no mainstream coverage despite their magnitude.

Evolution: Zvi's coverage-gap observation [21768] is new: the mathematical and AI-tracking communities are registering results the mainstream press is not covering.

Tensions

  • OpenAI's Erdős disproof came from a general-purpose model with no formal-system grounding; DeepMind's architecture requires every reasoning step to be Lean-verified — both have now produced expert-validated results, but they represent different claims about what makes AI mathematical output trustworthy. [4][10][11][5][2][3][7]
  • Tao observes 'more and more mass-produced mathematics at scale' [17] as a quality concern; Rohan Paul and Google researchers frame high-volume proof-search as precisely the mechanism for mathematical progress [12] — the same phenomenon read as a problem versus a feature. [17][12][10]
  • The UW Math AI Lab's dataset of erroneous Lean proofs challenges the 'every step formally verified' claim central to DeepMind's and Harmonic's architectures — if the verification layer itself produces false proofs, formal-system grounding may be less reliable than claimed. [13][10][19][11]
  • Gary Marcus explicitly audits AI math headlines against underlying results [18], while Fields Medalist Tim Gowers and Daniel Litt have now directly endorsed the OpenAI Erdős result as substantively exciting rather than hype [7] — the most direct clash yet between the skeptical and endorsing positions. [18][7][4]
  • AlphaProof Nexus solved 9 Erdős problems and 44 OEIS conjectures — results Zvi calls a landmark milestone — but received almost no mainstream coverage [9], while OpenAI's single Erdős disproof generated sustained press attention; the coverage asymmetry does not obviously track the magnitude of results. [9][8][4][20]
  • Tudor Achim predicts AI could prove the Riemann Hypothesis by 2028 [19]; this timeline sits unresolved against the caution of DeepMind's published research, the UW erroneous-proofs dataset, and Gary Marcus's ongoing skepticism. [19][10][13][18]

Status: active and growing

Sources

  1. [1] Google A.I. System Wins Gold Medal in International Math Olympiad — reactive:ai-formal-math-breakthroughs
  2. [2] OpenAI claims gold-medal performance at IMO 2025 | Hacker News — reactive:ai-formal-math-breakthroughs
  3. [3] Google and OpenAI Get 2025 IMO Gold - LessWrong — reactive:ai-formal-math-breakthroughs
  4. [4] 😸 OpenAI solved an 80-year math problem by... disproving it — The Neuron (2026-05-22)
  5. [5] Remarks on the disproof of the unit distance conjecture - arXiv — reactive:openai-erdos-math-breakthrough
  6. [6] Amazing: Erdős' Unit Distance Problem was Disproved! It was ... — reactive:openai-erdos-math-breakthrough
  7. [7] An OpenAI model solved a famous math problem that stumped humans for 80 years — Ars Technica AI (2026-06-01)
  8. [8] Google DeepMind's AlphaProof Nexus Solves 9 Erdős Problems and 44 OEIS Conjectures | KuCoin — reactive:ai-formal-math-breakthroughs
  9. [9] AI #170: Lack of Executive Order — Zvi's AI Roundups (2026-05-28)
  10. [10] Google DeepMind's new paper. — Rohan Paul Twitter (2026-05-22)
  11. [11] @tomflex @prz_chojecki Sure! DeepMind built AI agents that pair LLMs (for generating ideas) with the Lean theorem prover... — reactive:ai-formal-math-breakthroughs (2026-05-24)
  12. [12] Another great paper from Google. — Rohan Paul Twitter (2026-06-04)
  13. [13] UW Math AI Lab Releases Erroneous Lean Proofs Dataset - LinkedIn — reactive:ai-formal-math-breakthroughs
  14. [14] AI contributions to Erdős problems · teorth/erdosproblems Wiki — reactive:ai-formal-math-breakthroughs
  15. [15] The Edge of Mathematics - The Atlantic — reactive:ai-formal-math-breakthroughs
  16. [16] Three Erdős Problems Fell in Seven Days, and Terence Tao Verified ... — reactive:ai-formal-math-breakthroughs
  17. [17] “I do see more and more mass-produced mathematics at scale." — Rohan Paul Twitter (2026-05-24)
  18. [18] Checking the math behind OpenAI and Anthropic's latest headlines — reactive:ai-formal-math-breakthroughs
  19. [19] 😺 🎙️ PODCAST: Can AI Solve Math's Biggest Mystery? — The Neuron (2026-05-20)
  20. [20] OpenAI announces AI's biggest math breakthrough yet — reactive:openai-erdos-math-breakthrough
  21. [21] AI at IMO 2025: a round-up - Xena Project - WordPress.com — reactive:ai-formal-math-breakthroughs
  22. [22] @tobiamure @Polymarket This is today's big news from Google DeepMind: their new AI agent (AlphaProof Nexus) autonomously... — reactive:ai-formal-math-breakthroughs (2026-05-24)
  23. [23] Now that it's 2026, how is Terence Tao's prediction holding up? : r/math — reactive:openai-erdos-math-breakthrough
  24. [24] [PDF] Aristotle: IMO-level Automated Theorem Proving - arXiv — reactive:ai-formal-math-breakthroughs
  25. [25] Aristotle from Harmonic just proved Erdos Problem #124 in Lean all ... — reactive:ai-formal-math-breakthroughs
  26. [26] Harmonic — reactive:ai-formal-math-breakthroughs
  27. [27] Gemini with Deep Think achieves gold-medal standard at the IMO | Hacker News — reactive:ai-formal-math-breakthroughs
  28. [28] Google DeepMind's AI systems, AlphaProof and AlphaGeometry 2 ... — reactive:ai-formal-math-breakthroughs
  29. [29] OpenAI makes breakthrough on 80-year-old maths problem — reactive:openai-erdos-math-breakthrough
  30. [30] OpenAI's internal model disproves Unit Distance Conjecture of Erdos — reactive:openai-erdos-math-breakthrough