Build Your First LLM from ScratchPart 1 · Section 9 of 9

Summary

Here's what we did to convert "two plus three" into "five":

  • Tokenization — Split text into words, convert to IDs → [2, 12, 3]
  • Embedding — Convert each ID to a vector of 64 numbers → 3 vectors
  • Positional Encoding — Add position info to each vector → 3 position-aware vectors
  • Transformer — Let vectors "talk" via attention → 3 context-enriched vectors
  • Output Layer — Score every word, convert to probabilities → "five" = 94.2%
  • Generation — Pick the highest probability word → "five"
Helpful?