this post was submitted on 27 Jan 2024
119 points (91.0% liked)
Arch Linux
7175 readers
2 users here now
The beloved lightweight distro
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
These models do not see letters but tokens. For the model, violet is probably two symbols viol and et. Apart from learning by heart the number of letters in each token, it is impossible for the model to know the number of letters in a word.
This is also why gpt family sucks at addition their tokenizer has symbols for common numbers like 14. This meant that to do 14 + 1 it could not use the knowledge 4 + 1 was 5 as it could not see the link between the token 4 and the token 14. The Llama tokenizer fixes this, and is thus much better at basic algebra even with much smaller models.