Is this a consequence of the fact that "multiplication tables" For kindergarteners are available online (in training data) abundantly ... typically up to 12 times or 13 times table as plain text ?
i don't think it's just about the training material. it's also about keeping track of the precise number of tokens. you'd have to have dedicated tokens for 1+1+1+1 and another one for 1+1+1+1+1 etc.
Internal representation is multidimensional vectors. A typical 4096 in q4 one can name every particle in the universe and have over 4000 dimensions left for other purposes