Build Your First LLM from ScratchPart 1 · Section 1 of 9

What is an LLM?

Large because it has billions of parameters (the numbers the model adjusts during training to get better at predictions). Language because it works with text. Model because it's a mathematical function that learns patterns.

An LLM (Large Language Model) does one thing: it predicts the next word.

That's it. Everything else—chat, code generation, reasoning, translation—emerges from this single capability.

Important: The model doesn't "eat" the input and "spit out" an answer. It appends its prediction to the input. So "two plus three" becomes "two plus three five". This is called autoregressive generation—each new word is added to the sequence, then used to predict the next one.

The Core Insight

When you ask an LLM "What is 2+2?", it doesn't "think" or "calculate". It predicts that after the sequence of words "What is 2+2?", the most likely next words are "2+2 equals 4" or simply "4".

It learned this by reading billions of examples where questions were followed by answers.

Common Misconceptions

MisconceptionReality
"It understands"It predicts patterns
"It thinks"It does matrix math
"It knows things"It learned statistical relationships
"It remembers our chat"Each input is processed fresh*
*Chat history is included in each new prompt, giving the illusion of memory.

The Analogy

Think of your phone's autocomplete, but trained on the entire internet and scaled up a million times. When you type "How are", your phone suggests "you" because that pattern appears frequently. An LLM does the same thing, just with much longer contexts and much more sophisticated pattern matching.

Key takeaway: An LLM is a very sophisticated pattern-matching machine that predicts what text should come next.

Helpful?