VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRP…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-24

VibeThinker, a 3-billion-parameter model trained with novel SFT+GRPO, achieves near-parity with Claude Opus 4.5 on reasoning benchmarks, scoring 94.3 on AIME26 and 80.2 Pass@1 on LiveCodeBench v6.

Open original ↗

Extraction

Topics: small-language-modelsreasoning-benchmarksmodel-efficiencygrpo-training

Claims

VibeThinker achieves 94.3 on AIME26 with only 3 billion parameters.
VibeThinker scores 80.2 Pass@1 on LiveCodeBench v6.
VibeThinker achieves 96.1% acceptance on recent unseen LeetCode problems.
VibeThinker nearly matches Claude Opus 4.5 on reasoning despite being orders of magnitude smaller in parameter count.
VibeThinker uses a novel combination of SFT and GRPO training to achieve its results.

Key quotes

VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO.

Unusually strong for its size: with only 3B parameters, 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on recent unseen LeetCode