> for instance "what is 2+2" or some numerical puzzles that needed algebraic thinking
there is only one algebraic approach to solving something like 2+2 and that is counting! 2+2 = (((0 + 1) + 1) + 1) + 1). but llms are infamously bad at counting. which is why 2+2 isn't an algebraic problem to an llm. it's pattern matching or linguistic reasoning token by token.
Is this a consequence of the fact that "multiplication tables" For kindergarteners are available online (in training data) abundantly ... typically up to 12 times or 13 times table as plain text ?
i don't think it's just about the training material. it's also about keeping track of the precise number of tokens. you'd have to have dedicated tokens for 1+1+1+1 and another one for 1+1+1+1+1 etc.
Internal representation is multidimensional vectors. A typical 4096 in q4 one can name every particle in the universe and have over 4000 dimensions left for other purposes
there is only one algebraic approach to solving something like 2+2 and that is counting! 2+2 = (((0 + 1) + 1) + 1) + 1). but llms are infamously bad at counting. which is why 2+2 isn't an algebraic problem to an llm. it's pattern matching or linguistic reasoning token by token.