The Log Derivative Trick For Dummies

Oct 30, 2019

[ spinningup math reinforcement-learning ]

I came across this while reading through OpenAI’s Spinning Up Reinforcement Learning tutorial:

The Log-Derivative Trick. The log-derivative trick is based on a simple rule from calculus: the derivative of log x with respect to x is 1/x. When rearranged and combined with chain rule, we get:

\[\nabla_{\theta} P(\tau|\theta) = P(\tau|\theta) \nabla\log P(\tau|\theta)\]

For the calculus challenged, like myself, this may not be too obvious, so the following added details may help.

The first part is the chain rule from calculus:

\[\log f(x) = \frac{1}{f(x)} * f(x)'\]

If we replace $f(x)$, we get:

\[\log P(\tau|\theta) = \frac{1}{P(\tau|\theta)} * \nabla_{\theta}P(\tau|\theta)\]

When rearranged, this is

\[\nabla_{\theta} P(\tau|\theta) = P(\tau|\theta) \log P(\tau|\theta)\]

Archive

chinese tang-dynasty-poetry 李白 python 王维 rl pytorch numpy emacs 杜牧 spinningup networking deep-learning 贺知章 白居易 王昌龄 杜甫 李商隐 tips reinforcement-learning macports jekyll 骆宾王 贾岛 孟浩然 xcode time-series terminal regression rails productivity pandas math macosx lesson-plan helicopters flying fastai conceptual-learning command-line bro 黄巢 韦应物 陈子昂 王翰 王之涣 柳宗元 杜秋娘 李绅 张继 孟郊 刘禹锡 元稹 youtube visdom system sungho stylelint stripe softmax siri sgd scipy scikit-learn scikit safari research qtran qoe qmix pyhton poetry pedagogy papers paper-review optimization openssl openmpi nyc node neural-net multiprocessing mpi morl ml mdp marl mandarin macos machine-learning latex language-learning khan-academy jupyter-notebooks ios-programming intuition homebrew hacking google-cloud github flashcards faker docker dme deepmind dec-pomdp data-wrangling craftsman congestion-control coding books book-review atari anki analogy 3brown1blue 2fa

More

Archive