How does temperature affect softmax in machine learning?

Feb 14, 2020

[ python softmax ml deep-learning ]

This notebook demonstrates the effects of high and low temperature settings on softmax. It includes code from the ground up to build an intuitive understanding of what is going on.

import math

Probabilities

Let’s start something likely familiar, probabilities. Assume we have two possibilities, the first with a probability of 25% and the second with a probability of 75%.

probabilities = [0.25, 0.75]; 
probabilities

[0.25, 0.75]

What’s the deal with logits then?

From Stackoverflow:

In Math, Logit is a function that maps probabilities ([0, 1]) to R ((-inf, inf))

Logits are just the log of the probabilities, so we can take the log of each probability above to get the logits.

def logitsFrom(probabilities):
    return [math.log(x) for x in probabilities]

logits = logitsFrom(probabilities); logits

[-1.3862943611198906, -0.2876820724517809]

In machine learning, the logits layer is a layer near the end of a model, typically a classifier, which contains the logit of each classification.

What is softmax?

The logits layer is often followed by a softmax layer, which turns the logits back into probabilities (between 0 and 1). From StackOverflow:

Softmax is a function that maps [-inf, +inf] to [0, 1] similar as Sigmoid. But Softmax also normalizes the sum of the values(output vector) to be 1.

We can implement softmax on our logits array like so:

def softmax(logits):
    bottom = sum([math.exp(x) for x in logits])
    softmax = [math.exp(x)/bottom for x in logits]
    return softmax

softmax(logits), sum(softmax(logits))

([0.25, 0.75], 1.0)

As you can see, we have our starting probability numbers back.

What is softmax with temperature?

Temperature is a hyperparameter which is applied to logits to affect the final probabilities from the softmax.

A low temperature (below 1) makes the model more confident.
A high temperature (above 1) makes the model less confident.

Let’s see both in turn.

Low Temperature Example

low_temp = 0.5
logits_low_temp = [x/low_temp for x in logits]
logits_low_temp

[-2.772588722239781, -0.5753641449035618]

Now let’s see what happens when we send this through softmax again.

softmax(logits_low_temp), sum(softmax(logits_low_temp))

([0.1, 0.9], 1.0)

The higher probability of the two has risen from 0.75 to 0.9. The lower probability has dropped to 0.1.

High Temperature Example

# What happens if we apply a hightemperature?
low_temp = 5
logits_high_temp = [x/low_temp for x in logits]
logits_high_temp

[-0.2772588722239781, -0.05753641449035618]

softmax(logits_high_temp), sum(softmax(logits_high_temp))

([0.44528931866219296, 0.5547106813378071], 1.0)

With a high temperature setting, our probabilities are closer together.

https://stackoverflow.com/questions/41455101/what-is-the-meaning-of-the-word-logits-in-tensorflow

Archive

chinese tang-dynasty-poetry 李白 python 王维 rl pytorch numpy emacs 杜牧 spinningup networking deep-learning 贺知章 白居易 王昌龄 杜甫 李商隐 tips reinforcement-learning macports jekyll 骆宾王 贾岛 孟浩然 xcode time-series terminal regression rails productivity pandas math macosx lesson-plan helicopters flying fastai conceptual-learning command-line bro 黄巢 韦应物 陈子昂 王翰 王之涣 柳宗元 杜秋娘 李绅 张继 孟郊 刘禹锡 元稹 youtube visdom system sungho stylelint stripe softmax siri sgd scipy scikit-learn scikit safari research qtran qoe qmix pyhton poetry pedagogy papers paper-review optimization openssl openmpi nyc node neural-net multiprocessing mpi morl ml mdp marl mandarin macos machine-learning latex language-learning khan-academy jupyter-notebooks ios-programming intuition homebrew hacking google-cloud github flashcards faker docker dme deepmind dec-pomdp data-wrangling craftsman congestion-control coding books book-review atari anki analogy 3brown1blue 2fa