# How does temperature affect softmax in machine learning?

[

]

This notebook demonstrates the effects of high and low temperature settings on softmax. It includes code from the ground up to build an intuitive understanding of what is going on.

```
import math
```

## Probabilities

Let’s start something likely familiar, probabilities. Assume we have two possibilities, the first with a probability of 25% and the second with a probability of 75%.

```
probabilities = [0.25, 0.75];
probabilities
```

```
[0.25, 0.75]
```

## What’s the deal with logits then?

From Stackoverflow:

In Math, Logit is a function that maps probabilities ([0, 1]) to R ((-inf, inf))

Logits are just the `log`

of the probabilities, so we can take the log of each probability above to get the `logits`

.

```
def logitsFrom(probabilities):
return [math.log(x) for x in probabilities]
logits = logitsFrom(probabilities); logits
```

```
[-1.3862943611198906, -0.2876820724517809]
```

In machine learning, the *logits layer* is a layer near the end of a model, typically a classifier, which contains the logit of each classification.

## What is softmax?

The logits layer is often followed by a *softmax* layer, which turns the logits back into probabilities (between 0 and 1). From StackOverflow:

Softmax is a function that maps [-inf, +inf] to [0, 1] similar as Sigmoid. But Softmax also normalizes the sum of the values(output vector) to be 1.

We can implement softmax on our logits array like so:

```
def softmax(logits):
bottom = sum([math.exp(x) for x in logits])
softmax = [math.exp(x)/bottom for x in logits]
return softmax
softmax(logits), sum(softmax(logits))
```

```
([0.25, 0.75], 1.0)
```

As you can see, we have our starting probability numbers back.

## What is softmax with temperature?

Temperature is a hyperparameter which is applied to logits to affect the final probabilities from the softmax.

- A low temperature (below 1) makes the model more confident.
- A high temperature (above 1) makes the model less confident.

Let’s see both in turn.

## Low Temperature Example

```
low_temp = 0.5
logits_low_temp = [x/low_temp for x in logits]
logits_low_temp
```

```
[-2.772588722239781, -0.5753641449035618]
```

Now let’s see what happens when we send this through softmax again.

```
softmax(logits_low_temp), sum(softmax(logits_low_temp))
```

```
([0.1, 0.9], 1.0)
```

The higher probability of the two has risen from 0.75 to 0.9. The lower probability has dropped to 0.1.

## High Temperature Example

```
# What happens if we apply a hightemperature?
low_temp = 5
logits_high_temp = [x/low_temp for x in logits]
logits_high_temp
```

```
[-0.2772588722239781, -0.05753641449035618]
```

```
softmax(logits_high_temp), sum(softmax(logits_high_temp))
```

```
([0.44528931866219296, 0.5547106813378071], 1.0)
```

With a high temperature setting, our probabilities are closer together.

## Related

## Archive

chinese

tang-dynasty-poetry

李白

python

王维

rl

pytorch

numpy

emacs

杜牧

spinningup

networking

deep-learning

贺知章

白居易

王昌龄

杜甫

李商隐

tips

reinforcement-learning

macports

jekyll

骆宾王

贾岛

孟浩然

xcode

time-series

regression

rails

pandas

math

macosx

lesson-plan

helicopters

flying

fastai

conceptual-learning

command-line

bro

黄巢

韦应物

陈子昂

王翰

王之涣

柳宗元

杜秋娘

李绅

张继

孟郊

刘禹锡

元稹

youtube

visdom

terminal

system

sungho

stylelint

stripe

softmax

siri

sgd

scipy

scikit-learn

scikit

safari

research

qtran

qoe

qmix

pyhton

productivity

poetry

pedagogy

papers

paper-review

optimization

openssl

openmpi

nyc

node

neural-net

multiprocessing

mpi

morl

ml

mdp

marl

mandarin

macos

machine-learning

latex

language-learning

khan-academy

jupyter-notebooks

ios-programming

intuition

homebrew

hacking

google-cloud

github

flashcards

faker

dme

deepmind

dec-pomdp

data-wrangling

craftsman

congestion-control

coding

books

book-review

atari

anki

analogy

3brown1blue

2fa