The Vocabulary

Our calculator needs 36 tokens — stored in config/vocab.json:

json
1{
2 "[PAD]": 0, "[START]": 1, "[END]": 2, "[UNK]": 3,
3 "zero": 4, "one": 5, "two": 6, "three": 7, "four": 8,
4 "five": 9, "six": 10, "seven": 11, "eight": 12, "nine": 13,
5 "ten": 14, "eleven": 15, "twelve": 16, "thirteen": 17,
6 "fourteen": 18, "fifteen": 19, "sixteen": 20, "seventeen": 21,
7 "eighteen": 22, "nineteen": 23,
8 "twenty": 24, "thirty": 25, "forty": 26, "fifty": 27,
9 "sixty": 28, "seventy": 29, "eighty": 30, "ninety": 31,
10 "plus": 32, "minus": 33, "times": 34, "equals": 35
11}

Each word maps to a unique ID: "two"6, "plus"32

Helpful?