> The action of standing for, or in the place of, a person, group, or thing, and related senses. ([[Oxford English Dictionary|oed]])
see also [[symmetry]] and [[metaphor]].
# desiderata in [[deep learning]]
- **task-relevant** (good [[inductive bias]] / [[symmetry]]): expressing 2D rotations in Cartesian coordinates is cumbersome and requires trigonometric functions. But in polar coordinates it's trivial.
- **low-dimensional and disentangled**: even though the ambient space that the observed data lives in is very high-dimensional, it's highly redundant -- the data lies on some lower-dimensional [[manifold]] embedded in the ambient space. We often want to [[redundancy reduction|reduce redundancy]] in the representation (ie assume the "efficient coding" hypothesis).
- **robust and invariant**: we often sacrifice (ie are invariant to) "irrelevant" detail that contains a lot of information in the information-theoretic sense our visual system can identify a given face even when the appearance of the face differs drastically (eg glasses, lighting, expression).
- **meaningful** (under some [[ontology]]): can we [[interpretability|interpret]] the information contained in the representation?
# [[computational neuroscience]]
Is there anything that's *not* representation learning?
Yes; There's [[biologically plausible|neural dynamics]] at play, etc.
Lots of different paradigms / frameworks / [[mathematic]]al languages:
- [[object oriented language]]
- [[functional programming]]
- [[effective data dimension|manifold hypothesis]]
Intuition says:
- an [[double descent|overparameterized]] model should be able to simply *memorize* the data without needing to learn useful representations;
- an underparameterized model isn't expressive / complex enough to describe useful representations.
However, neither of these are necessary conditions; a strong [[inductive bias]] can greatly accelerate learning useful representations, and overparameterized models can still [[grokking|generalize]].
> [!question] What's the difference between [[dimensionality reduction]] and [[encoding|compression]]?
>
> The way I use them: Compression is a hardware concept; dimensionality reduction is a semantic concept. Dimensionality reduction is a form of compression but not the other way around: eg quantization is compression but not dimensionality reduction.
# learned or handcrafted representations
![[spectrum from deep learning to handcrafted kernels.png|400]]
From *[A blitz through classical statistical learning theory – Windows On Theory](https://windowsontheory.org/2021/01/31/a-blitz-through-classical-statistical-learning-theory/)*
autogenerated:
- [[kernel]]
- [[signal noise decomposition]]
- [[Gaussian process]]
- [[basis function]] and [[basis expansion]]
- [[dimensionality reduction]]
- [[redundancy reduction]]
- [[effective data dimension]]
- [[unsupervised]]
- [[1991KramerNonlinearPrincipalComponent|autoencoder paper]]
- [[tutorial on variational autoencoders]]
- [[deep learning]]
- [[2022ElhageEtAlToyModelsSuperposition|Toy models of superposition]]
- [[neural network inductive bias]]
- [[2023YangEtAlTheoryRepresentationLearning|A theory of representation learning gives a deep generalisation of kernel methods]]
- [[self supervised]]
- [[2023Ben-ShaulEtAlReverseEngineeringSelfsupervised|Reverse Engineering Self-Supervised Learning]]
- [[context-free text representations]]
- [[grokking]]
- [[neural representation alignment]]