What is the role of temperature in token sampling?

May 6, 2026Source: Wikipedia: Softmax functionbeginner

This tests your understanding of how to control the creativity and randomness of a language model's output. A great answer explains that temperature is a divisor applied to the model's logits before the softmax function. Low temperature makes the output more deterministic by sharpening the probability distribution, while high temperature increases randomness by flattening it. A common red flag is vaguely saying it 'controls randomness' without explaining the underlying softmax mechanism.

This question tests your understanding of the final step in token generation—specifically, how to control the trade-off between output predictability and creativity. Interviewers want to see that you can connect a user-facing parameter to the underlying model mathematics. A strong answer first defines temperature as a divisor for the raw output scores (logits) *before* the softmax function is applied. Then, explain the effect: a temperature below 1.0 sharpens the probability distribution, making high-probability tokens even more likely (deterministic). A temperature above 1.0 flattens it, increasing the chance of sampling lower-probability tokens (creative). The key red flag is imprecision—failing to connect temperature to the logits and softmax calculation.

Read the original → Wikipedia: Softmax function

#llm
#generative ai
#sampling
#softmax

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Get on Play Store Get on App Store