Part 2: The Project
Understand why we're building a calculator and what the final result looks like
Why a Calculator?
We could teach LLM concepts with any task—chatbot, code generation, translation. But a calculator is perfect for learning. Here's why:
| Criteria | Calculator | Text-to-SQL | Chatbot |
|---|---|---|---|
| Vocabulary | ~30 words | ~10,000 | ~50,000 |
| Training time | 5-10 min | 2-3 hours | Days |
| Data generation | Trivial | Need dataset | Complex |
| Verify correctness | Easy | Medium | Hard |
You might want to jump straight to code generation or chat. But:
- Complexity hides understanding — With a complex task, you can't tell if issues are from your model or your data
- Training time kills iteration — Real LLMs take days/weeks to train. Our calculator trains in minutes.
- The concepts are identical — Tokenization, attention, transformers—it's all the same, just smaller
Once you understand the calculator, scaling up is straightforward.
The Task
Our model will convert English math phrases into English number answers:
"two plus three" → "five"
"seven minus four" → "three"
"six times eight" → "forty eight"
"twenty divided by five" → "four"Notice: both input and output are words, not digits. The model never sees "2 + 3 = 5"—it only sees "two plus three" and learns to predict "five".
The Dataset
Unlike real LLMs that train on internet text (Wikipedia, books, websites), we'll generate our own dataset programmatically. This is one of the beauties of our calculator project—we don't need to find or download any corpus.
# We write code to generate training examples:
import random
def generate_example():
a = random.randint(0, 99) # e.g., 23
b = random.randint(0, 99) # e.g., 15
op = "plus"
result = a + b # 38
# Convert to words (we'll write this helper function):
# to_words(23) → "twenty three"
# to_words(38) → "thirty eight"
return to_words(a) + " plus " + to_words(b), to_words(result)
# Generate 1000 examples in seconds!Bonus insight: Our random.randint(0, 99) creates a uniform distribution—every number appears equally. Real language follows a Zipfian (power law) distribution: "1+1" appears millions of times while "97+6" appears once. Our uniform data actually makes training easier and prevents the model from memorizing common cases.
Our Vocabulary
| Category | Tokens |
|---|---|
| Numbers 0-19 | zero, one, two, ... nineteen |
| Tens | twenty, thirty, forty, ... ninety |
| Operations | plus, minus, times, divided, by |
| Special | [PAD], [START], [END] |
That's roughly 30 tokens total. Compare this to GPT-4's ~100,000 tokens!
We'll generate 500-1000 examples covering addition, subtraction, multiplication, and division with numbers 0-99.
Model Specifications
Here's how our tiny model compares to GPT-4:
| Spec | Our Model | GPT-4 |
|---|---|---|
| Parameters | ~1-2 million | ~1.7 trillion |
| Embedding dim | 64-128 | 12,288 |
| Layers | 2-4 | ~96 |
| Vocabulary | ~30 | ~100,000 |
| Max sequence length | ~10-20 tokens | 32k-128k tokens |
| Training time | 5-10 min | Months |
Our model is about a million times smaller—but it uses the exact same architecture!
What It Can & Cannot Do
Can Do
- Basic arithmetic with numbers 0-99
- Single operations (one plus, minus, times, or divided by)
- Output results as English words
Cannot Do
- Numbers above 99
- Chained operations ("two plus three minus one")
- Decimals or fractions
- Parentheses or order of operations
What You'll Create
By the end of this series, you'll have built these files from scratch:
| File | Description |
|---|---|
| tokenizer.py | Converts text to token IDs |
| embeddings.py | Converts token IDs to vectors |
| attention.py | The attention mechanism |
| transformer.py | Transformer blocks |
| model.py | Complete model architecture |
| dataset.py | Calculator dataset generator |
| train.py | Training loop |
| generate.py | Text generation |
| app.py | Gradio demo for Hugging Face |
github.com/slahiri/small_calculator_modelEnd-to-End Preview
Here's what the final result looks like:
from model import CalculatorLLM
model = CalculatorLLM.load("calculator-llm.pt")
model.calculate("two plus three") # → "five"
model.calculate("nine times nine") # → "eighty one"
model.calculate("fifty minus twelve") # → "thirty eight"The .calculate() method is a convenience wrapper. Under the hood, it's still doing autoregressive generation—appending "two plus three" + predicted token, checking for [END], and returning just the answer portion.
Try It Live
At the end of this series, we'll deploy our model to Hugging Face Spaces—a free platform to host ML demos. You'll be able to share your working calculator LLM with anyone via a simple URL.