#star/great
https://andyljones.com/posts/rl-debugging.html
great [[reinforcement learning debugging tips and implementation advice]].
links to lots of other resources.
# [[challenges of reinforcement learning]]
I really like the points under "simplifying is hard":
- there are few narrow interfaces
- there are few black boxes (ie [[natural abstraction]]s)
uh oh:
![[2025GrossGrugBrainedDeveloper#symmetry complexity]]
also recommends [[software test]]s.
use trivial [[sequential decision environment|environment]]s (e.g. two-horizon reward one).
and chase anomalies! new extra stuff won't fix behaviour
# advice
[[reward]] scale
big batch size ([[2018McCandlishEtAlEmpiricalModelLargebatch]])
decorrelate parallel environments at beginning
> _If you're new to reinforcement learning, writing things from scratch is the most catastrophically self-sabotaging thing you can do._
write [[oracle]] agents
# [[rl diagnostics]]
[[policy]] [[entropy]] divided by maximum possible entropy ($\log |\mathcal{A}|$ for finite [[action]] space)
should start near $1$, fall, and flatten out
if near $1$: policy is basically acting [[discrete Uniform distribution|uniformly]] at random
if near $0$: policy has collapsed;
check for policy [[entropy]] loss or other [[exploration-exploitation tradeoff|exploration]] incentives
oscillation: decrease [[learning rate]]
[[Kullback-Leibler divergence|kld]] between collection [[policy]] and current learner [[policy]]
high indicates severe [[off policy]] experience
(recall must be nonnegative)
[[variance]] of [[temporal difference error]]s divided by variance of [[value target]]s
lots of good [[gpu profile]] advice