2021JonesDebuggingReinforcementLearning

#star/great https://andyljones.com/posts/rl-debugging.html great [[reinforcement learning debugging tips and implementation advice]]. links to lots of other resources. # [[challenges of reinforcement learning]] I really like the points under "simplifying is hard": - there are few narrow interfaces - there are few black boxes (ie [[natural abstraction]]s) uh oh: ![[2025GrossGrugBrainedDeveloper#symmetry complexity]] also recommends [[software test]]s. use trivial [[sequential decision environment|environment]]s (e.g. two-horizon reward one). and chase anomalies! new extra stuff won't fix behaviour # advice [[reward]] scale big batch size ([[2018McCandlishEtAlEmpiricalModelLargebatch]]) decorrelate parallel environments at beginning > _If you're new to reinforcement learning, writing things from scratch is the most catastrophically self-sabotaging thing you can do._ write [[oracle]] agents # [[rl diagnostics]] [[policy]] [[entropy]] divided by maximum possible entropy ($\log |\mathcal{A}|$ for finite [[action]] space) should start near $1$, fall, and flatten out if near $1$: policy is basically acting [[discrete Uniform distribution|uniformly]] at random if near $0$: policy has collapsed; check for policy [[entropy]] loss or other [[exploration-exploitation tradeoff|exploration]] incentives oscillation: decrease [[learning rate]] [[Kullback-Leibler divergence|kld]] between collection [[policy]] and current learner [[policy]] high indicates severe [[off policy]] experience (recall must be nonnegative) [[variance]] of [[temporal difference error]]s divided by variance of [[value target]]s lots of good [[gpu profile]] advice