VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRP…
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-24
VibeThinker, a 3-billion-parameter model trained with novel SFT+GRPO, achieves near-parity with Claude Opus 4.5 on reasoning benchmarks, scoring 94.3 on AIME26 and 80.2 Pass@1 on LiveCodeBench v6.
Extraction
Topics: small-language-modelsreasoning-benchmarksmodel-efficiencygrpo-training
Claims
- VibeThinker achieves 94.3 on AIME26 with only 3 billion parameters.
- VibeThinker scores 80.2 Pass@1 on LiveCodeBench v6.
- VibeThinker achieves 96.1% acceptance on recent unseen LeetCode problems.
- VibeThinker nearly matches Claude Opus 4.5 on reasoning despite being orders of magnitude smaller in parameter count.
- VibeThinker uses a novel combination of SFT and GRPO training to achieve its results.
Key quotes
VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO.
Unusually strong for its size: with only 3B parameters, 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on recent unseen LeetCode