Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No ELO 1400 player will have that rate of illegal moves, so saying it that it plays with an ELO 1400 rating is disingenuous.

Reinterpreting illegal moves as resignation is absurd when an LLM is formally capable of expressing statements "I resign" or "I cannot conceive of a winning move from here" just as well as any human player. It just doesn't do so because it's not actually playing chess the way we think of an ELO 1400 player playing chess.



Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

My point is, it sounds like Elo doesn't measure what we want it to measure. If we care about the way an agent wins a game and not just whether it wins a game, then we need an instrument that measures strategy, not outcome.


> Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

Then it's still isn't anywhere near ELO 1400.


Under FIDE rules it's first a forfeit after the second illegal move, so if anything it would seem that the interpretation used by the article author underestimates its ELO ranking.


Nope, still not even close to what the author claims. If I understand it correctly, it made illegal moves in 3 out of 19 games. That's probably a few orders of magnitude more illegal moves than even a 1400 ELO player would make of their entire lifetime.


Repeating what others have said in this thread:

The author claims: chatGPT has a 1400 chess ELO based on games played.

You appear to think author claims: chatGPT plays chess like a human rated 1400.

Your observations do not contradict the authors’ claim that based on games won and lost against opponents of a specific strength, the estimated ELO is 1400.

A non-human player can make illegal moves at a much higher rate and make up for that by being stronger when it does not make illegal moves to achieve the same rating as a human player who plays the game in a completely different way.


There's the "it" which has no post-processing, and there's the "it" where the output is post-processed to announce a resignation when it attempts an illegal move.

Some things about the two "it"s:

- They differ trivially.

- They enable new capabilities, such as the ability to explain why a move got made. Current chess AIs are not good at this.

So I think you're making too much of a big deal from a comparative triviality.

[edit]

We might be talking past each other. And some people above have come to doubt the article's results even with the right prompt engineering.


The ranking takes into account wins and losses, not illegal moves, and so the fact that it plays in a way where a higher proportion of its losses is down to illegal moves than a human player is not relevant to its ranking. It may suggest that the ranking ought to take that into account, but that's a separate issue.


That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules


Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.

A bowling bot that threw strikes 9 out of 10 throws and a gutter ball one time out of ten would still be a great bowler even though no human with the ability to make strikes that often would pretty much ever throw a gutter ball.

This is a weird kind of alien intelligence that does not have to behave like humans.


Note that the claim is not that it's an ELO 1400 human equivalent player but that it can play chess at a level that gives it an ELO of 1400, which is not nitpicking: that's a completely different thing. We're not testing whether it plays like a player with ELO x, we're proving that "it can't play chess" is fallacious. It can, and when prompted properly, it can achieve an ELO of 1400.

ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.


I personally find that makes it more astonishing, that it would slip up on knowing the most basic elements of the game, yet still be able to play better than most humans. Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them. But that usually doesn't stop smart people from having an impact in making a contribution with their insights. The question of illegal moves is superficial, since most online systems have guardrails in place that prevent them. At worst it's just an embarrassment and I don't think machines care about being embarrassed.


> Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them

This is the opposite of that, a highly trained but dumb entity that has seen many lifetimes worth of games but is still tripping up on basics. But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.


> But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

But it is a master, as has been pointed out repeatedly. If you replace all illegal moves with resignations, and use the same style of prompt as the OP did, then it plays like an expert. I'm objecting because you're making it sound like it's a trivial result.


> you're making this sound like it's a trivial result

I don't think this is a trivial result, emulating a highly trained idiot is still very impressive. But it is very different from an untrained genius.


You seem to have very rigid and boring definitions of the words "idiot" and "genius". The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

Tbh, I don't even know what you're saying.

[edit] OK, I might have misunderstood you. It's not always clear what people mean.


> The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

That isn't relevant to my comment, an idiot human is still a human. Your comment here therefore doesn't make sense. The comment I responded to likened it to a genius entering a new field, I objected to that, that is all.


ELO is based off who you win and lose against. The rate of illegal moves has nothing to do with ELO.


I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: