[[interpolate|interpolation]] / overparameterization does not contradict generalization; spike in [[test error]] around [[interpolate|interpolation threshold]] ![[double descent figure.png]] - In classical statistics (lhs), we expect to see a [[bias-variance decomposition]] where more complex models tend to overfit the training data - In [[deep learning]] or even [[ridge regularization|ridge regression]] (rhs): [[model complexity|more complex]] ("overparameterized") [[statistical model]]s generalize even better (even after [[interpolate|interpolating]] the training data)! - Often observe a [[phase transition]] around the interpolation threshold # why [[2024SchaefferEtAlDoubleDescentDemystified]] paints a surprisingly simple picture: well understood in [[kernel ridge regression]]; related to [[ridge regularization]] [[test error]] exploding due to 1. [[test error|test]] dataset [[covariate]]s aligning to [[principal component]]s of low [[variance]] in [[train loss|train]]ing dataset 2. [[architectural bias|model misspecification]] (occasionally [[noise]]) actually surprisingly simple! see [[2024SchaefferEtAlDoubleDescentDemystified]] and [[2024AtanasovEtAlScalingRenormalizationHighdimensional|Scaling and renormalization in high-dimensional regression]] for [[statistical mechanics|statistical physics]] derivations people have found "**multiple** descent curves"; see [[2021ChenEtAlMultipleDescentDesign]]. ![[inductive bias#^inductive-bias]] see [[simplicity bias in neural networks]] - [[double descent in ridge regression for binary classification]] - [[cgmt for ridge regression with squared loss for binary classification]] - [[2019HastieEtAlSurprisesHighdimensionalRidgeless|Surprises in High-Dimensional Ridgeless Least Squares Interpolation]] # sources [[2024SchaefferEtAlDoubleDescentDemystified]] Key [[history]]cal papers from Belkin et al: - [[2018BelkinEtAlDoesDataInterpolation]] (interpolation does not contradict generalization) - [[2018BelkinEtAlUnderstandDeepLearning]] - [[2019BelkinEtAlReconcilingModernMachine]] [[2019HastieEtAlSurprisesHighdimensionalRidgeless|Surprises in High-Dimensional Ridgeless Least Squares Interpolation]] [[2023HenighanEtAlSuperpositionMemorizationDouble]] [[CS 229br lec 06 2023-03-02]] Boaz jokes that "more money should give better results... [[gradient descent]] is a capitalist algorithm"