I came across this while reading through OpenAI’s Spinning Up Reinforcement Learning tutorial:

The Log-Derivative Trick. The log-derivative trick is based on a simple rule from calculus: the derivative of log x with respect to x is 1/x. When rearranged and combined with chain rule, we get:

\[\nabla_{\theta} P(\tau|\theta) = P(\tau|\theta) \log P(\tau|\theta)\]

For the calculus challenged, like myself, this may not be too obvious, so the following added details may help.

The first part is the chain rule from calculus:

\[\log f(x) = \frac{1}{f(x)} * f(x)'\]

If we replace $f(x)$, we get:

\[\log P(\tau|\theta) = \frac{1}{P(\tau|\theta)} * \nabla_{\theta}P(\tau|\theta)\]

When rearranged, this is

\[\nabla_{\theta} P(\tau|\theta) = P(\tau|\theta) \log P(\tau|\theta)\]

More