The Vocabulary
Our calculator needs 36 tokens — stored in config/vocab.json:
json
1{2 "[PAD]": 0, "[START]": 1, "[END]": 2, "[UNK]": 3,3 "zero": 4, "one": 5, "two": 6, "three": 7, "four": 8,4 "five": 9, "six": 10, "seven": 11, "eight": 12, "nine": 13,5 "ten": 14, "eleven": 15, "twelve": 16, "thirteen": 17,6 "fourteen": 18, "fifteen": 19, "sixteen": 20, "seventeen": 21,7 "eighteen": 22, "nineteen": 23,8 "twenty": 24, "thirty": 25, "forty": 26, "fifty": 27,9 "sixty": 28, "seventy": 29, "eighty": 30, "ninety": 31,10 "plus": 32, "minus": 33, "times": 34, "equals": 3511}Each word maps to a unique ID: "two" → 6, "plus" → 32
Helpful?