Pythagoras-Prover just made Lean theorem proving look far less dependent on giant models, with a 4B prover beating DeepS…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-15

Pythagoras-Prover, a 4B parameter model, outperforms DeepSeek-Prover-V2-671B on the MiniF2F Pass@32 benchmark for Lean theorem proving, demonstrating that training data geometry can compensate for a 167x reduction in model scale.

Open original ↗

Appears in

LLM Efficiency Breakthroughs: Small Models and Sparse Architectures Challenge Scale Assumptions

Extraction

Topics: formal-theorem-provinglean-proversmall-language-modelsdata-efficiencybenchmarks

Claims

Pythagoras-Prover with 4B parameters beats DeepSeek-Prover-V2-671B on the MiniF2F Pass@32 benchmark.
Better training data geometry can recover a substantial amount of the performance gap caused by reduced model scale.
Formal reasoning tasks may be far less dependent on giant models than previously assumed.

Key quotes

Pythagoras-Prover just made Lean theorem proving look far less dependent on giant models, with a 4B prover beating DeepSeek-Prover-V2-671B at MiniF2F Pass@32.

Shows in formal reasoning, better data geometry can buy back an astonishing amount of scale.