No ELO 1400 player will have that rate of illegal moves, so saying it that it plays with an ELO 1400 rating is disingenuous.
Reinterpreting illegal moves as resignation is absurd when an LLM is formally capable of expressing statements "I resign" or "I cannot conceive of a winning move from here" just as well as any human player. It just doesn't do so because it's not actually playing chess the way we think of an ELO 1400 player playing chess.
Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.
My point is, it sounds like Elo doesn't measure what we want it to measure. If we care about the way an agent wins a game and not just whether it wins a game, then we need an instrument that measures strategy, not outcome.
> Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.
Under FIDE rules it's first a forfeit after the second illegal move, so if anything it would seem that the interpretation used by the article author underestimates its ELO ranking.
Nope, still not even close to what the author claims. If I understand it correctly, it made illegal moves in 3 out of 19 games. That's probably a few orders of magnitude more illegal moves than even a 1400 ELO player would make of their entire lifetime.
The author claims: chatGPT has a 1400 chess ELO based on games played.
You appear to think author claims: chatGPT plays chess like a human rated 1400.
Your observations do not contradict the authors’ claim that based on games won and lost against opponents of a specific strength, the estimated ELO is 1400.
A non-human player can make illegal moves at a much higher rate and make up for that by being stronger when it does not make illegal moves to achieve the same rating as a human player who plays the game in a completely different way.
There's the "it" which has no post-processing, and there's the "it" where the output is post-processed to announce a resignation when it attempts an illegal move.
Some things about the two "it"s:
- They differ trivially.
- They enable new capabilities, such as the ability to explain why a move got made. Current chess AIs are not good at this.
So I think you're making too much of a big deal from a comparative triviality.
[edit]
We might be talking past each other. And some people above have come to doubt the article's results even with the right prompt engineering.
The ranking takes into account wins and losses, not illegal moves, and so the fact that it plays in a way where a higher proportion of its losses is down to illegal moves than a human player is not relevant to its ranking. It may suggest that the ranking ought to take that into account, but that's a separate issue.
That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules
Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.
A bowling bot that threw strikes 9 out of 10 throws and a gutter ball one time out of ten would still be a great bowler even though no human with the ability to make strikes that often would pretty much ever throw a gutter ball.
This is a weird kind of alien intelligence that does not have to behave like humans.
Note that the claim is not that it's an ELO 1400 human equivalent player but that it can play chess at a level that gives it an ELO of 1400, which is not nitpicking: that's a completely different thing. We're not testing whether it plays like a player with ELO x, we're proving that "it can't play chess" is fallacious. It can, and when prompted properly, it can achieve an ELO of 1400.
ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.
I personally find that makes it more astonishing, that it would slip up on knowing the most basic elements of the game, yet still be able to play better than most humans. Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them. But that usually doesn't stop smart people from having an impact in making a contribution with their insights. The question of illegal moves is superficial, since most online systems have guardrails in place that prevent them. At worst it's just an embarrassment and I don't think machines care about being embarrassed.
> Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them
This is the opposite of that, a highly trained but dumb entity that has seen many lifetimes worth of games but is still tripping up on basics. But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.
> But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.
But it is a master, as has been pointed out repeatedly. If you replace all illegal moves with resignations, and use the same style of prompt as the OP did, then it plays like an expert. I'm objecting because you're making it sound like it's a trivial result.
That isn't relevant to my comment, an idiot human is still a human. Your comment here therefore doesn't make sense. The comment I responded to likened it to a genius entering a new field, I objected to that, that is all.
I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"
Reinterpreting illegal moves as resignation is absurd when an LLM is formally capable of expressing statements "I resign" or "I cannot conceive of a winning move from here" just as well as any human player. It just doesn't do so because it's not actually playing chess the way we think of an ELO 1400 player playing chess.