What is **deep learning**? For our purposes: [[statistical model]]s (often [[artificial neural network]]s) with lots of parameters ([[asymptotic|high dimensional]]) typically [[train loss|trained]] by [[gradient based optimization]] on some [[loss function]]. Some guiding questions: - What makes a function easy/hard to approximate? How many samples do you need? - What [[represent|latent representation]]s do models learn? - For a given optimizer, can we predict what kinds of solutions it finds? - Can we explain [[2019SuttonBitterLesson|The Bitter Lesson]]: the unreasonable effectiveness of [[continuous optimization]] techniques on large models? - Can we understand the [[computational complexity theory|complexity]] of these [[algorithm]]s? Prove [[statistical learning|upper or lower bounds]]? (eg via [[statistical query]] framework) - Why does there exist polytime algorithms for certain problems and not others? # inductive bias Can we understand [[inductive bias]]? ![[inductive bias#^inductive-bias]] Why do [[interpolate|overparameterized]] models often **generalize well**? Equivalently: - Why does [[double descent]] happen? - Why does [[neural network inductive bias|neural network training prefer simple functions]]? - Or not: What causes models to fail outside of their training distribution? (See also [[covariate shift|inner alignment]]) # sources - [[2021GrosseCSC2541Winter2021]] strongly recommend! - [[2016GoodfellowEtAlDeepLearning]] - [ARENA](https://www.arena.education/) - [MIT Deep Learning 6.S191](http://introtodeeplearning.com/) - MIT intro program, series of lectures. probably quite good. [Lectures on YouTube across multiple years](https://www.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI) - [Understanding Deep Learning book](https://udlbook.github.io/udlbook/) - [Neural Networks, Manifolds, and Topology -- colah's blog](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/) - [cims.nyu.edu/\~matus/neurips.2024.workshop/](https://cims.nyu.edu/~matus/neurips.2024.workshop/)