The other paper that killed deep learning theory
Alignment Forum · LawrenceC · 2026-04-27
(No summary yet for this item — extraction summaries are still backfilling.)
Appears in
Extraction
Topics: deep-learning-theorygeneralization-boundsuniform-convergencestatistical-learning-theoryneural-network-generalization
Claims
- Nagarajan and Kolter 2019 showed empirically that post-Zhang et al. spectral-norm generalization bounds scale in the wrong direction: as training data increases, actual test error decreases but the theoretical bounds worsen.
- The paper proved in an overparameterized linear setting that uniform convergence bounds provably fail to explain why gradient descent generalizes, constructing a 'bad' dataset on which any uniform convergence bound must be vacuous.
- SGD on neural networks produces classifiers that are microscopically complex near training points but macroscopically simple near new data points, simultaneously explaining good generalization and the failure of uniform convergence.
- Any valid future theory of neural network generalization must be algorithm- and data-dependent in a stronger sense than spectral-norm bounds and must forgo worst-case uniform convergence over all hypotheses.
- Nearly a decade after Nagarajan and Kolter's 2019 result, no satisfactory theoretical explanation for neural network generalization has been found.
Key quotes
Not only did it demonstrate that the data-dependent bounds created by the field scaled in the wrong direction, it provided an over-parameterized setting where the entire approach taken by statistical learning theory – uniform convergence bounds – provably did not work.
SGD on neural networks learns classifiers that are simple on the macroscopic scale, but complex on the microscopic scale. The microscopic complexity is what stops uniform convergence bounds from working, while the macroscopic simplicity explains generalization.
But almost a decade later, we still don't have those results. Maybe one of you reading this will write the paper that explains generalization.