The Log Derivative Trick For Dummies
I came across this while reading through OpenAI’s Spinning Up Reinforcement Learning tutorial:
\[\nabla_{\theta} P(\tau|\theta) = P(\tau|\theta) \nabla\log P(\tau|\theta)\]The Log-Derivative Trick. The log-derivative trick is based on a simple rule from calculus: the derivative of log x with respect to x is 1/x. When rearranged and combined with chain rule, we get:
For the calculus challenged, like myself, this may not be too obvious, so the following added details may help.
The first part is the chain rule from calculus:
\[\log f(x) = \frac{1}{f(x)} * f(x)'\]If we replace $f(x)$, we get:
\[\log P(\tau|\theta) = \frac{1}{P(\tau|\theta)} * \nabla_{\theta}P(\tau|\theta)\]When rearranged, this is
\[\nabla_{\theta} P(\tau|\theta) = P(\tau|\theta) \log P(\tau|\theta)\]More
Archive
chinese
tang-dynasty-poetry
李白
python
王维
rl
pytorch
numpy
emacs
杜牧
spinningup
networking
deep-learning
贺知章
白居易
王昌龄
杜甫
李商隐
tips
reinforcement-learning
macports
jekyll
骆宾王
贾岛
孟浩然
xcode
time-series
terminal
regression
rails
productivity
pandas
math
macosx
lesson-plan
helicopters
flying
fastai
conceptual-learning
command-line
bro
黄巢
韦应物
陈子昂
王翰
王之涣
柳宗元
杜秋娘
李绅
张继
孟郊
刘禹锡
元稹
youtube
visdom
system
sungho
stylelint
stripe
softmax
siri
sgd
scipy
scikit-learn
scikit
safari
research
qtran
qoe
qmix
pyhton
poetry
pedagogy
papers
paper-review
optimization
openssl
openmpi
nyc
node
neural-net
multiprocessing
mpi
morl
ml
mdp
marl
mandarin
macos
machine-learning
latex
language-learning
khan-academy
jupyter-notebooks
ios-programming
intuition
homebrew
hacking
google-cloud
github
flashcards
faker
docker
dme
deepmind
dec-pomdp
data-wrangling
craftsman
congestion-control
coding
books
book-review
atari
anki
analogy
3brown1blue
2fa