> The action of standing for, or in the place of, a person, group, or thing, and related senses. ([[Oxford English Dictionary|oed]]) see also [[symmetry]] and [[metaphor]]. # desiderata in [[deep learning]] - **task-relevant** (good [[inductive bias]] / [[symmetry]]): expressing 2D rotations in Cartesian coordinates is cumbersome and requires trigonometric functions. But in polar coordinates it's trivial. - **low-dimensional and disentangled**: even though the ambient space that the observed data lives in is very high-dimensional, it's highly redundant -- the data lies on some lower-dimensional [[manifold]] embedded in the ambient space. We often want to [[redundancy reduction|reduce redundancy]] in the representation (ie assume the "efficient coding" hypothesis). - **robust and invariant**: we often sacrifice (ie are invariant to) "irrelevant" detail that contains a lot of information in the information-theoretic sense our visual system can identify a given face even when the appearance of the face differs drastically (eg glasses, lighting, expression). - **meaningful** (under some [[ontology]]): can we [[interpretability|interpret]] the information contained in the representation? # [[computational neuroscience]] Is there anything that's *not* representation learning? Yes; There's [[biologically plausible|neural dynamics]] at play, etc. Lots of different paradigms / frameworks / [[mathematic]]al languages: - [[object oriented language]] - [[functional programming]] - [[effective data dimension|manifold hypothesis]] Intuition says: - an [[double descent|overparameterized]] model should be able to simply *memorize* the data without needing to learn useful representations; - an underparameterized model isn't expressive / complex enough to describe useful representations. However, neither of these are necessary conditions; a strong [[inductive bias]] can greatly accelerate learning useful representations, and overparameterized models can still [[grokking|generalize]]. > [!question] What's the difference between [[dimensionality reduction]] and [[encoding|compression]]? > > The way I use them: Compression is a hardware concept; dimensionality reduction is a semantic concept. Dimensionality reduction is a form of compression but not the other way around: eg quantization is compression but not dimensionality reduction. # learned or handcrafted representations ![[spectrum from deep learning to handcrafted kernels.png|400]] From *[A blitz through classical statistical learning theory – Windows On Theory](https://windowsontheory.org/2021/01/31/a-blitz-through-classical-statistical-learning-theory/)* autogenerated: - [[kernel]] - [[signal noise decomposition]] - [[Gaussian process]] - [[basis function]] and [[basis expansion]] - [[dimensionality reduction]] - [[redundancy reduction]] - [[effective data dimension]] - [[unsupervised]] - [[1991KramerNonlinearPrincipalComponent|autoencoder paper]] - [[tutorial on variational autoencoders]] - [[deep learning]] - [[2022ElhageEtAlToyModelsSuperposition|Toy models of superposition]] - [[neural network inductive bias]] - [[2023YangEtAlTheoryRepresentationLearning|A theory of representation learning gives a deep generalisation of kernel methods]] - [[self supervised]] - [[2023Ben-ShaulEtAlReverseEngineeringSelfsupervised|Reverse Engineering Self-Supervised Learning]] - [[context-free text representations]] - [[grokking]] - [[neural representation alignment]]