Paper from Meta shows Quantized reasoning models often lose because they keep doubting a correct answer instead of finis…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-07-01

A Meta paper finds that post-training quantization causes reasoning models to second-guess already-correct answers by adding noise at uncertain word choices, raising overthinking failures by up to 52%, and demonstrates that penalizing 50 hesitation words cuts reasoning length 12–23% while preserving accuracy.

Open original ↗

Extraction

Topics: model-quantizationreasoning-modelsllm-efficiencyoverthinking

Claims

Quantized reasoning models frequently fail not because they lack capability but because they second-guess answers they already reached correctly.
Quantization introduces noise at uncertain word choices, increasing the probability of hesitation tokens like 'wait,' 'but,' or 'alternatively' that reopen closed problems.
Aggressive quantization raised overthinking failures by up to 52% across math, coding, and science benchmarks spanning model sizes from 1.5B to 32B.
Adding a small decoding penalty on 50 hesitation words reduced reasoning chain length by 12–23% while maintaining or improving accuracy.

Key quotes

quantization adds noise at uncertain word choices, so the model becomes more likely to pick words like 'wait,' 'but,' or 'alternatively' that reopen the problem

aggressive quantization raised overthinking failures up to 52%, while a small penalty on 50 hesitation words cut reasoning length by 12% to 23% and often kept or improved accuracy