What is **deep learning**?
For our purposes: [[statistical model]]s (often [[artificial neural network]]s)
with lots of parameters ([[asymptotic|high dimensional]])
typically [[train loss|trained]] by [[gradient based optimization]] on some [[loss function]].
Some guiding questions:
- What makes a function easy/hard to approximate? How many samples do you need?
- What [[represent|latent representation]]s do models learn?
- For a given optimizer, can we predict what kinds of solutions it finds?
- Can we explain [[2019SuttonBitterLesson|The Bitter Lesson]]:
the unreasonable effectiveness of [[continuous optimization]] techniques on large models?
- Can we understand the [[computational complexity theory|complexity]] of these [[algorithm]]s?
Prove [[statistical learning|upper or lower bounds]]? (eg via [[statistical query]] framework)
- Why does there exist polytime algorithms for certain problems and not others?
# inductive bias
Can we understand [[inductive bias]]?
![[inductive bias#^inductive-bias]]
Why do [[interpolate|overparameterized]] models often **generalize well**? Equivalently:
- Why does [[double descent]] happen?
- Why does [[neural network inductive bias|neural network training prefer simple functions]]?
- Or not: What causes models to fail outside of their training distribution?
(See also [[covariate shift|inner alignment]])
# sources
- [[2021GrosseCSC2541Winter2021]] strongly recommend!
- [[2016GoodfellowEtAlDeepLearning]]
- [ARENA](https://www.arena.education/)
- [MIT Deep Learning 6.S191](http://introtodeeplearning.com/)
- MIT intro program, series of lectures. probably quite good. [Lectures on YouTube across multiple years](https://www.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI)
- [Understanding Deep Learning book](https://udlbook.github.io/udlbook/)
- [Neural Networks, Manifolds, and Topology -- colah's blog](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [cims.nyu.edu/\~matus/neurips.2024.workshop/](https://cims.nyu.edu/~matus/neurips.2024.workshop/)