Softmax

Softmax is a function that takes a list of numbers and turns them into probabilities. It’s how neural networks answer “which option is most likely?”

The Problem Softmax Solves

Imagine a neural network trying to classify an image as a cat, dog, or bird. The network outputs raw scores called logits:

  • Cat: 5.0
  • Dog: 2.0
  • Bird: 1.0

These numbers are hard to interpret. Is 5.0 good? How confident is the network? Softmax converts these into percentages that sum to 100%:

  • Cat: 84.2%
  • Dog: 11.4%
  • Bird: 4.4%

Now it’s clear: the network thinks it’s probably a cat.

How It Works

Softmax does two things:

  1. Makes everything positive using the exponential function (exe^x)
  2. Normalises so everything adds up to 1 (100%)

Step by Step

Starting with scores [5.0, 2.0, 1.0]:

AnimalScoreescoree^{\text{score}}Probability
Cat5.0148.4148.4 ÷ 176.3 = 84.2%
Dog2.07.47.4 ÷ 176.3 = 11.4%
Bird1.02.72.7 ÷ 176.3 = 4.4%
Total176.3100%

The formula: divide each exponential by the sum of all exponentials.

Why Use Exponentials?

The exponential function (exe^x) has useful properties:

  • Always positive: You can’t have negative probability
  • Amplifies differences: Bigger scores get much bigger after exe^x, making the winner stand out
  • Smooth: Small changes in input create small changes in output (important for learning)

Softmax vs “Hard” Max

Think of softmax as a “soft” version of picking a winner:

Scores:      [5.0,  2.0,  1.0]

Hard max:    [1,    0,    0  ]    ← Winner takes all
Softmax:     [0.84, 0.11, 0.04]   ← Winner gets most, but others get some

Hard max says “definitely cat.” Softmax says “probably cat, but maybe dog or bird.”

Temperature: Adjusting Confidence

You can make softmax more or less confident using temperature:

TemperatureResult
Low (0.5)More confident: [95%, 4%, 1%]
Normal (1.0)Standard: [84%, 11%, 4%]
High (2.0)Less confident: [62%, 23%, 15%]
  • Low temperature → More decisive (approaches hard max)
  • High temperature → More uncertain (approaches equal probabilities)

This is useful when you want AI to be more creative (high temp) or more predictable (low temp).

Where Softmax Is Used

  • Image classification: “Is this a cat, dog, or bird?”
  • Language models: “What’s the next word in this sentence?”
  • Game AI: “Which move should I make?” (see Reinforcement Learning)
  • Recommendation systems: “Which video should I suggest?”

Key Takeaways

  1. Softmax converts raw scores into probabilities (0-100%)
  2. All probabilities sum to exactly 100%
  3. Higher scores get higher probabilities
  4. It’s “soft” because every option keeps some probability
  5. Temperature controls how confident the output is

See Also

-
-