That’s just showing the tests are measuring specific things that LLMs can game particularly well.
Computers have been able to smash high school algebra tests since the 1970’s, but that doesn’t make them as smart as a 16 year old (or even a three year old).
Computers have been able to smash high school algebra tests since the 1970’s, but that doesn’t make them as smart as a 16 year old (or even a three year old).