Great Stanford + MIT + Harvard + Anthropic paper.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-25
A collaborative paper from Stanford, MIT, Harvard, and Anthropic offers a training-based explanation for why larger AI models acquire rare capabilities that smaller models miss: larger models forget rare skills less during training because extra parameter capacity protects weak learning signals.
Extraction
Topics: ai-scalingemergent-abilitiesmachine-learning-theorymodel-training
Claims
- Larger AI models learn rare skills because they forget them less during training than smaller models do.
- Extra parameter capacity in larger models protects weak learning signals from being overwritten during training.
- The paper provides a mechanistic, training-based explanation for why model scale unlocks capabilities unavailable to smaller models.
Key quotes
bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning