Great Stanford + MIT + Harvard + Anthropic paper.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-08
A joint Stanford, MIT, Harvard, and Anthropic paper concludes that larger AI models acquire rare skills smaller models miss because their greater capacity causes less forgetting of weakly-learned abilities during training.
Extraction
Topics: llm-scalingemergent-abilitiestraining-dynamicsmodel-capacitymachine-learning-theory
Claims
- Larger AI models learn rare skills that smaller models fail to acquire.
- The mechanism behind this gap is differential forgetting: larger models forget weakly-learned signals less than smaller models do.
- Extra model capacity functions as a buffer that protects rare, weakly-reinforced abilities during training.
- This paper provides a training-based mechanistic explanation for capability emergence at scale.
Key quotes
Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning