No ELO 1400 player will have that rate of illegal moves, so saying it that it pl...

JellyBeanThief · on March 17, 2023

Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

My point is, it sounds like Elo doesn't measure what we want it to measure. If we care about the way an agent wins a game and not just whether it wins a game, then we need an instrument that measures strategy, not outcome.

illiarian · on March 17, 2023

> Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

Then it's still isn't anywhere near ELO 1400.

vidarh · on March 17, 2023

Under FIDE rules it's first a forfeit after the second illegal move, so if anything it would seem that the interpretation used by the article author underestimates its ELO ranking.

illiarian · on March 17, 2023

Nope, still not even close to what the author claims. If I understand it correctly, it made illegal moves in 3 out of 19 games. That's probably a few orders of magnitude more illegal moves than even a 1400 ELO player would make of their entire lifetime.

pedrosorio · on March 17, 2023

Repeating what others have said in this thread:

The author claims: chatGPT has a 1400 chess ELO based on games played.

You appear to think author claims: chatGPT plays chess like a human rated 1400.

Your observations do not contradict the authors’ claim that based on games won and lost against opponents of a specific strength, the estimated ELO is 1400.

A non-human player can make illegal moves at a much higher rate and make up for that by being stronger when it does not make illegal moves to achieve the same rating as a human player who plays the game in a completely different way.

ogogmad · on March 17, 2023

There's the "it" which has no post-processing, and there's the "it" where the output is post-processed to announce a resignation when it attempts an illegal move.

Some things about the two "it"s:

- They differ trivially.

- They enable new capabilities, such as the ability to explain why a move got made. Current chess AIs are not good at this.

So I think you're making too much of a big deal from a comparative triviality.

[edit]

We might be talking past each other. And some people above have come to doubt the article's results even with the right prompt engineering.

vidarh · on March 17, 2023

The ranking takes into account wins and losses, not illegal moves, and so the fact that it plays in a way where a higher proportion of its losses is down to illegal moves than a human player is not relevant to its ranking. It may suggest that the ranking ought to take that into account, but that's a separate issue.

vidarh · on March 17, 2023

That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules

arrrg · on March 17, 2023

Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.

A bowling bot that threw strikes 9 out of 10 throws and a gutter ball one time out of ten would still be a great bowler even though no human with the ability to make strikes that often would pretty much ever throw a gutter ball.

This is a weird kind of alien intelligence that does not have to behave like humans.

TheRealPomax · on March 17, 2023

Note that the claim is not that it's an ELO 1400 human equivalent player but that it can play chess at a level that gives it an ELO of 1400, which is not nitpicking: that's a completely different thing. We're not testing whether it plays like a player with ELO x, we're proving that "it can't play chess" is fallacious. It can, and when prompted properly, it can achieve an ELO of 1400.

ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.

jart · on March 17, 2023

I personally find that makes it more astonishing, that it would slip up on knowing the most basic elements of the game, yet still be able to play better than most humans. Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them. But that usually doesn't stop smart people from having an impact in making a contribution with their insights. The question of illegal moves is superficial, since most online systems have guardrails in place that prevent them. At worst it's just an embarrassment and I don't think machines care about being embarrassed.

Jensson · on March 17, 2023

> Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them

This is the opposite of that, a highly trained but dumb entity that has seen many lifetimes worth of games but is still tripping up on basics. But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

ogogmad · on March 17, 2023

> But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

But it is a master, as has been pointed out repeatedly. If you replace all illegal moves with resignations, and use the same style of prompt as the OP did, then it plays like an expert. I'm objecting because you're making it sound like it's a trivial result.

Jensson · on March 17, 2023

> you're making this sound like it's a trivial result

I don't think this is a trivial result, emulating a highly trained idiot is still very impressive. But it is very different from an untrained genius.

ogogmad · on March 17, 2023

You seem to have very rigid and boring definitions of the words "idiot" and "genius". The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

Tbh, I don't even know what you're saying.

[edit] OK, I might have misunderstood you. It's not always clear what people mean.

Jensson · on March 17, 2023

> The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

That isn't relevant to my comment, an idiot human is still a human. Your comment here therefore doesn't make sense. The comment I responded to likened it to a genius entering a new field, I objected to that, that is all.

charcircuit · on March 17, 2023

ELO is based off who you win and lose against. The rate of illegal moves has nothing to do with ELO.

Pxtl · on March 17, 2023

I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"