Step 6: Generation

Loading...
Generation factory illustration showing selection strategies to pick the final answer from probability scores
Picking the winner — the final answer emerges

The final station is the picker — it looks at all the probability meters and selects the winner. This is where the factory produces its output.

We now have probabilities for every word. How do we pick the final answer? There are a few strategies:

Strategy 1: Greedy (pick the highest)

The simplest approach: always pick the word with the highest probability.

"five"  → 94.2%  ← Pick this one!
"four"  → 2.1%
"six"   → 1.8%
...

Output: "five"

This is called greedy decoding. It's deterministic—the same input always gives the same output. Perfect for math where there's only one right answer.

Strategy 2: Sampling (add randomness)

Instead of always picking the top word, we randomly choose based on the probabilities. Higher probability = more likely to be chosen, but not guaranteed.

Run 1: "five"  (94.2% chance → picked!)
Run 2: "five"  (94.2% chance → picked!)
Run 3: "four"  (2.1% chance  → lucky pick!)
Run 4: "five"  (94.2% chance → picked!)

This adds variety. When writing a story, you don't want the same words every time. Sampling makes the model more creative.

Strategy 3: Temperature (control randomness)

We can adjust how "confident" the model is using a parameter called temperature:

  • Low temperature (0.1): Makes high probabilities even higher. Model becomes very confident, less creative.
  • Temperature = 1: Use probabilities as-is.
  • High temperature (2.0): Flattens probabilities. Model becomes more random, more creative.
Original:     "five" 94.2%, "four" 2.1%, "six" 1.8%
Low temp:     "five" 99.9%, "four" 0.05%, "six" 0.03%  (almost certain)
High temp:    "five" 60%, "four" 15%, "six" 12%        (more random)

What Do Real Models Use?

Model/Use CaseStrategyWhy
ChatGPT (default)Sampling + Temperature ~0.7Balanced creativity and coherence
Code generation (Copilot)Low temperature ~0.2Code needs to be precise and correct
Creative writingHigher temperature ~1.0+More surprising and varied outputs
Math/ReasoningGreedy or very low tempOnly one right answer
Our calculatorGreedyMath has no room for creativity!
For our calculator, we'll use greedy decoding (always pick the highest). Math has right and wrong answers—we don't want creativity here! With high temperature, even GPT-4 might confidently answer "2+2=5" just for variety. Deterministic tasks need deterministic settings.

And that's it! The model outputs "five", and we've successfully computed "two plus three" = "five".

Helpful?