Step 6: Generation

The final station is the picker — it looks at all the probability meters and selects the winner. This is where the factory produces its output.
We now have probabilities for every word. How do we pick the final answer? There are a few strategies:
Strategy 1: Greedy (pick the highest)
The simplest approach: always pick the word with the highest probability.
"five" → 94.2% ← Pick this one!
"four" → 2.1%
"six" → 1.8%
...
Output: "five"This is called greedy decoding. It's deterministic—the same input always gives the same output. Perfect for math where there's only one right answer.
Strategy 2: Sampling (add randomness)
Instead of always picking the top word, we randomly choose based on the probabilities. Higher probability = more likely to be chosen, but not guaranteed.
Run 1: "five" (94.2% chance → picked!)
Run 2: "five" (94.2% chance → picked!)
Run 3: "four" (2.1% chance → lucky pick!)
Run 4: "five" (94.2% chance → picked!)This adds variety. When writing a story, you don't want the same words every time. Sampling makes the model more creative.
Strategy 3: Temperature (control randomness)
We can adjust how "confident" the model is using a parameter called temperature:
- Low temperature (0.1): Makes high probabilities even higher. Model becomes very confident, less creative.
- Temperature = 1: Use probabilities as-is.
- High temperature (2.0): Flattens probabilities. Model becomes more random, more creative.
Original: "five" 94.2%, "four" 2.1%, "six" 1.8%
Low temp: "five" 99.9%, "four" 0.05%, "six" 0.03% (almost certain)
High temp: "five" 60%, "four" 15%, "six" 12% (more random)What Do Real Models Use?
| Model/Use Case | Strategy | Why |
|---|---|---|
| ChatGPT (default) | Sampling + Temperature ~0.7 | Balanced creativity and coherence |
| Code generation (Copilot) | Low temperature ~0.2 | Code needs to be precise and correct |
| Creative writing | Higher temperature ~1.0+ | More surprising and varied outputs |
| Math/Reasoning | Greedy or very low temp | Only one right answer |
| Our calculator | Greedy | Math has no room for creativity! |
And that's it! The model outputs "five", and we've successfully computed "two plus three" = "five".