Build Your First LLM from ScratchPart 1 · Section 9 of 9
Summary
Here's what we did to convert "two plus three" into "five":
- Tokenization — Split text into words, convert to IDs → [2, 12, 3]
- Embedding — Convert each ID to a vector of 64 numbers → 3 vectors
- Positional Encoding — Add position info to each vector → 3 position-aware vectors
- Transformer — Let vectors "talk" via attention → 3 context-enriched vectors
- Output Layer — Score every word, convert to probabilities → "five" = 94.2%
- Generation — Pick the highest probability word → "five"
Helpful?