Build Your First LLM from ScratchPart 1 · Section 5 of 9

Step 3: Positional Encoding

Positional encoding illustration showing how word order matters - same words in different positions give different results

We have a problem. Look at these two inputs:

"five minus three" → answer: "two"
"three minus five" → answer: "negative two"

The words are the same, but the order matters. With just embeddings, the model sees the same three vectors in both cases—it doesn't know which word came first!

The solution: we add position information to each embedding. Think of it like seat numbers in a theater—each word gets a fixed position marker:

"two plus three"

Position 1: "two"   → [0.9, 0.1, ...] + [position 1 info] → [0.92, 0.15, ...]
Position 2: "plus"  → [0.1, 0.8, ...] + [position 2 info] → [0.13, 0.85, ...]
Position 3: "three" → [0.85, 0.15, ...] + [position 3 info] → [0.88, 0.21, ...]

Now each vector contains both what the word is AND where it appears. The same word at different positions will have slightly different vectors.

After this step, "three" at position 1 looks different from "three" at position 3. The model can now tell word order apart.

Helpful?

Step 2: Embedding Step 4: Transformer Layers (Attention)