Great Stanford + MIT + Harvard + Anthropic paper.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-25

A collaborative paper from Stanford, MIT, Harvard, and Anthropic offers a training-based explanation for why larger AI models acquire rare capabilities that smaller models miss: larger models forget rare skills less during training because extra parameter capacity protects weak learning signals.

Open original ↗

Extraction

Topics: ai-scalingemergent-abilitiesmachine-learning-theorymodel-training

Claims

Larger AI models learn rare skills because they forget them less during training than smaller models do.
Extra parameter capacity in larger models protects weak learning signals from being overwritten during training.
The paper provides a mechanistic, training-based explanation for why model scale unlocks capabilities unavailable to smaller models.

Key quotes

bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning