ChatGPT's Chess Elo is 1400

k2052 · on March 17, 2023

This is so easy to disprove it makes it look like the author didn't even try.

Here is the convo I just had:

me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed

ChatGPT: Sure, I'd be happy to help! Please provide the move sequence and I'll give you my response.

me: 1. e3

ChatGPT: 1... e5

me: 2. ne2

ChatGPT: 2... d6

me: 3. g4

ChatGPT: 3... h5

me: 4. g5

ChatGPT: 4... Ne4

Ne4 is illegal. Also you'd be hard pressed to find any human player that would miss the hanging pawns much less a 1400 elo player. GPT has been trained on chess games and is predicting patterns, this has been known since the GPT-2 days. The daily hype articles where someone coerces a statistical model into "demonstrating" an emergent capability sure are getting old. Claiming emergent capabilities (like having learned chess) are an extraordinary claim (LLMs most likely cant learn things) and I really wish people would put more effort when they make these claims. But the AI hype problem is as old as AI research itself...

vidarh · on March 17, 2023

You're "disproving" the article by doing things differently to how the article did. If you're going to disprove that the method given in the article does as well as the article claims at least use the same method.

k2052 · on March 17, 2023

You are right that my method differed slightly so I did things again. It took me one try to find a sequence of moves that "breaks" what is claimed. You just have to make odd patterns of moves and it clearly has no understanding of the position.

Here is the convo:

me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed

ChatGPT: Alright, I'm ready to play! Please give me the move sequence.

me: 1. e3 Nf6 2. f4 d6 3. e4

ChatGPT: My next move as black would be 3... e5

Completely ignoring the hanging pawn.This is not the play of a 1400 elo player. It is the play of something predicting patterns.

I ran a bunch of experiments in the past where I played normal moves and ChatGPT does respond extraordinarily well. With the right prompts and sequences you can get it to play like a strong grandmaster. But it is a "trick" you are getting it to perform by choosing good data and prompts. It is impressive but it is not doing what is claimed by the article.

nwienert · on March 17, 2023

I'll add in as someone new to chess (~800 ELO):

ChatGPT is in no way 1400, or even close to it. The fact this article gets upvoted around here is proof that people aren't thinking clearly about this stuff. It's trivially easy to prove it wrong. Live unbelievably so, I tried the same prompt and within 12 moves it made multiple ridiculous errors I never would, and then an illegal move.

Keep in mind a 1400 level player would need to basically make 0 mistakes that bad in a typical game, and further would need to play 30-50 moves in that fashion, with the final moves being some of the most important and hard to do. There's just no way it's even close, my guess would be even if you correct it's many errors, it's something like ~200 ELO. Pure FUD.

The author of this article is cashing in the hype and I'm wondering how they even got the results they did.

babel_ · on March 17, 2023

They probably got them. The problem is that it's difficult to repeat, thanks to temperature, meaning users will get a random spread of outcomes. Today, someone got a legal game. Tomorrow, someone might get a grandmaster level game. But then everyone else trying to repeat or leverage this ends up with worse luck and gets illegal moves or, if they're lucky, moves that make sense in a limited context (such as related to specific gambits etc) but have no role in longer-term play.

Urist-Green · on March 17, 2023

With the big caveat that I'm not into chess, but I have heard that higher level play is extremely pattern based. Seems like ChatGPT would work well as long as you stick to patterns that people have studied and documented. Less optimal play would be more random and thus break from the patterns ChatGPT would have picked up from its training corpus.

echelon · on March 17, 2023

Criticisms like this are exactly how the model will grow multimodal support for chess moves.

Keep poking it and criticizing it. Microsoft and OpenAI are on HN and they're listening. They'd find nothing more salient to tout full chess support in their next release or press conference.

With zero effort the thing understands uber domain specific chess notation and the human prompt to play a game. To think it stops here is wild.

People are hyping it because they want to get involved. They want to see the crazy and exciting future this leads to.

jerf · on March 17, 2023

I doubt they'll pursue this. There is no advantage to it. ChatGPT will never beat Stockfish, and Stockfish would do it on a ludicrously small fraction of the resources. It would send the wrong message.

Some future AI might, but a language model won't.

flir · on March 17, 2023

My uber-obscure question that guaranteed a confident hallucination got fixed in the next update after I mentioned it. Probably just a coincidence.

throwwwaway69 · on March 17, 2023

He literally used the same prompt as the article.

Claim: "ChatGPT's Chess Elo is 1400"

Reality: ChatGPT gives illegal moves (this happened to article author too), something a 1400 ranked player would never do

Result: ChatGPT's rank is not 1400.

erulabs · on March 17, 2023

No, the author of the article specifically says that the entire move sequence should be supplied to chatGPT each time, not simply the next move. Be very careful when "disproving" an experiment with squinted eyes.

throwwwaway69 · on March 17, 2023

I'm not really sure what to say here. Both the parent commenter and the author of the article had issues with ChatGPT supplying illegal moves. Both methods resulted in this. It sort of doesn't matter how we're trying to establish that it's a 1400 level player, there's no defined correct way to do this. Regardless of method we've disproven it's a 1400 level player due to these illegal moves.

tedsanders · on March 18, 2023

The #1 misconception when working with large language models is thinking that a capability is a property of the model, rather than the model + input. It may be simultaneously true that ChatGPT has an elo of 100 when given a conversational message and an elo of 1400 when given an optimized message (e.g., strings that resemble chess games, with many examples present in the conversation).

Understanding this concept is crucial for getting good results out of large language models.

whimsicalism · on March 17, 2023

> Regardless of method we've disproven it's a 1400 level player due to these illegal moves.

Explain your thought process here further if you don't mind.

pattrn · on March 17, 2023

I think his point is that 1400 level players don't make illegal moves, therefore ChatGPT is not playing at the level of a 1400 level player.

whimsicalism · on March 17, 2023

Think blindfolded 1400 players, which is what this effectively is, would make illegal moves.

But even if it doesn't play like human 1400 players, if it can get to a 1400 elo while resigning games it makes illegal moves on, that seems 1400 level to me. And i bet that some 1400s do occasionally make illegal moves (missing pins) while playing otb

throwwwaway69 · on March 17, 2023

This isn't really an apt metaphor. Firstly because higher level blindfolded players, when trained to play with a blindfold, also virtually never make mistakes. Secondly because a computer has permanent concrete state management (compared to humans) and can, without error, keep a perfect representation of a chess if it chooses to do so.

whimsicalism · on March 17, 2023

1400 FIDE !=. high level blindfolded player.

DSMan195276 · on March 17, 2023

Personally I think the illegal moves are irreverent, the fact that it doesn't play exactly like a typical 1400 doesn't mean it can't have a 1400 rating. Rating is purely determined by wins and losses against opponents, it doesn't matter if you lose a game by checkmate, resignation, or playing an illegal move.

That's not to say ChatGPT can play at 1400, just that that playing in an odd way doesn't determine its rating.

throwwwaway69 · on March 17, 2023

This is like saying I play at a 2900 level if you just ignore all the times I lose.

DSMan195276 · on March 19, 2023

No it's not, we're not ignoring losses or illegal moves at all, they are counted as losses and that's how you arrive at 1400.

It's a (theoretically) 1400 player which plays significantly better then 1400 when it knows the lines, but makes bad or illegal moves when it doesn't, and that play averages out to be around your typical 1400 player. Functionally is just what a 1400 player already is, but with higher extremes and lower lows.

vidarh · on March 18, 2023

The article does not ignore the losses. In fact, it used a rule stricter than FIDE rules to trigger losses on illegal moves.

unyttigfjelltol · on March 17, 2023

The author said ChatGPT gives illegal moves. So, a quirky sort of 'grandmaster'. He considered illegal moves to be a resignation. Maybe you need to tell ChatGPT that the alternatives are to win via legal moves, and if it is not possible to do so, to resign? Does that fix it?

mynameisvlad · on March 17, 2023

> something a 1400 ranked player would never do

The fact that rules and articles exist describing what to do if you or your opponent makes an illegal move indicates this is not the case.

Humans are also... human. They make mistakes. It may not happen often at 1400, but to say that it'll never happen is preposterous.

eddsh1994 · on March 17, 2023

I can’t remember the last time I played an illegal move tbf, and I’ve played 7 games of chess this morning already to give you an idea of total games played

mynameisvlad · on March 17, 2023

You have never made an illegal move, ever?

The bar isn’t “I didn’t make an illegal move this morning” it’s “something a 1400 ranked player would never do”.

My entire point is that it happens. Not often, but also not “never”.

pattrn · on March 17, 2023

This argument is pretty flimsy. ChatGPT makes illegal moves frequently. In all my years of playing competitive chess (from 1000 to 2200), I have never seen an illegal move. I'm sure it has happened to someone, but it's extremely rare. ChatGPT does it all the time. No one is arguing that humans never make illegal moves; they're arguing that ChatGPT makes illegal moves at a significantly higher rate than a 1400 player does (therefore ChatGPT does not have a 1400 rating).

Edit: Without reading everything again, I'll assume someone said "never." They're probably assuming the reader understands that "never" really means "with an infinitesimal probability," since we're talking about humans. If you're trying to argue that "some 1400 player has made an illegal move at some point," then I agree with that statement, and I also think it's irrelevant since the frequency of illegal moves made by ChatGPT compared to the frequency of illegal moves made by a 1400 rated player is many orders of magnitudes higher.

mynameisvlad · on March 17, 2023

> No one is arguing that humans never make illegal moves

> something a 1400 ranked player would never do

> fine, fair, "never" was too much.

I mean, yes they were and they said as much after I called them out on it. But go off on how nobody is arguing the literal thing that was being argued.

It's not like messages are threaded or something, and read top-down. You would have 100% had to read the comment I replied to first.

pattrn · on March 17, 2023

You have twice removed the substance of an argument and responded to an irrelevant nitpick. Here's what the OP said:

> He literally used the same prompt as the article. > Claim: "ChatGPT's Chess Elo is 1400"

> Reality: ChatGPT gives illegal moves (this happened to article author too),

> something a 1400 ranked player would never do

> Result: ChatGPT's rank is not 1400.

This is a completely fair argument that makes perfect sense to anyone with knowledge of competitive chess. I have never seen a 1400 make an illegal move. He probably hasn't either. Your point is literally correct in the sense that at some point in history a 1400 rated player has made an illegal move, but it completely misses the point of his argument: ChatGPT makes illegal moves at such an astronomically high rate that it wouldn't even be allowed to even play competitively, hence it cannot be accurately assessed at 1400 rating.

Imagine you made a bot that spewed random letters and said "My bot writes English as well as a native speaker, so long as you remove all of the letters that don't make sense." A native English speaker says, "You can't say the bot speaks English as well as a native speaker, since a native speaker would never write all those random letters." You would be correct in pointing out that sometimes native speakers make mistakes, but you would also be entirely missing the point. That's what's happening here.

pattrn · on March 17, 2023

> Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.

You seem to have missed the part where I said multiple times that a 1400 has definitely made illegal moves.

> In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.

I actually said the exact opposite. You're responding to an argument I didn't make.

> I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.

The commenter's throwaway account never agreed it was an extreme. I agreed it was an extreme, but also that disproving that one extreme does nothing to contradict his argument. Yet again you aren't responding to the argument.

This entire exchange is baffling. You seem to be missing the point for a third time, and now you're misrepresenting what I said. Welcome to the internet, I guess.

mynameisvlad · on March 17, 2023

> The commenter's throwaway account never agreed it was an extreme.

> fine, fair, "never" was too much.

This is the second time I've had to do this. Do you just pretend things weren't said or do you actually have trouble reading the comments that have been here for hours? You make these grand assertions which are disproven by... reading the things that are directly above your comment.

> This entire exchange is baffling.

Yeah your inability to read comments multiple times in a row is extremely baffling.

As I said before:

> Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.

throwwwaway69 · on March 17, 2023

> The commenter's throwaway account never agreed it was an extreme.

I did, two hours ago, 6 minutes after your comment

https://news.ycombinator.com/item?id=35201830

pattrn · on March 17, 2023

Thanks! I appreciate it.

mynameisvlad · on March 17, 2023

> I have never seen a 1400 make an illegal move.

Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.

In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.

I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.

eddsh1994 · on March 17, 2023

No I definitely have, it’s just so rare I can’t remember when I last did it. I do remember playing one in a blitz tournament 20 years ago! But if this is the first game they played, or if it happens in 1/10 matches, that’s wild

ipaddr · on March 17, 2023

A broken clock is correct two times a day. But my broken clock isn't 1400 player although it might seem to be.

mynameisvlad · on March 17, 2023

Does that somehow prove the assertion of "something a 1400 ranked player would never do"?

Because all I'm hearing is talk about ChatGPT's abilities as a reply to me calling out an extreme statement as being extreme. Something the parent comment even admitted as being overly black and white.

bcrosby95 · on March 17, 2023

Prove to me your clock is broken, I think it's just telling the future.

Jensson · on March 18, 2023

I asked the clock over and over, and after 5 hours it gave the right time, proof the clock can learn!

PaulHoule · on March 17, 2023

I read an article about a pro player who castled twice in a game and my son hates castling so I make a point of castling twice as often as I can to tease him and attempting other illegal moves as a joke but he never ends the game because of it.

If I was playing that monstrosity though I would play something crazy that is far out of the opening book and count on it making an illegal move.

SamBam · on March 18, 2023

I trivially made it make an illegal move it my very first game, on the third move, just by deliberately playing weird moves:

> You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.

1. b4 d5 2. b5 a6 3. b6

> bxc6

No, it's ridiculous to say "oh, a blindfolded human might sometimes make a mistake." No, this is trivially easy to make it make a mistake. It has no internal chess model at all, it's just read enough chess games to be able to copy common patterns.

throwwwaway69 · on March 17, 2023

fine, fair, "never" was too much. posting link to this comment to not repeat same discussion twice

https://news.ycombinator.com/item?id=35201037

PaulHoule · on March 17, 2023

It’s super scary how ChatGPT brings out people who are veeeery good at seeing the Emperor’s clothes.

YeGoblynQueenne · on March 18, 2023

You know, I didn't remember the story very well so I checked wikipedia. Here's what it says about the (start of) the plot:

>> Two swindlers arrive at the capital city of an emperor who spends lavishly on clothing at the expense of state matters. Posing as weavers, they offer to supply him with magnificent clothes that are invisible to those who are stupid or incompetent. The emperor hires them, and they set up looms and go to work. A succession of officials, and then the emperor himself, visit them to check their progress. Each sees that the looms are empty but pretends otherwise to avoid being thought a fool.

So everyone "pretends otherwise to avoid being thought a fool".

Huh. I guess that explains it. Good metaphor.

z3c0 · on March 17, 2023

They are disproving an assertion. Demonstrating that an alternate approach implodes the assertion is a perfectly acceptable route, especially when the original approach was cherry-picking successes and throwing out failures.

I wish I could just make bullshit moves and get a higher chess ranking. Sounds nice.

whimsicalism · on March 17, 2023

I disagree. If there is a procedure for getting ChatGPT to play chess accurately and you discard that and do some naive approach as a way of disproving the article, doesn't sound to me like you have disproven anything.

I dont understand the point of your second sentence, seems to be entirely missing the substance of the conversation.

z3c0 · on March 18, 2023

The gymnastics you GPT True Believers go through to make this stuff "work" are really something else.

By the way - definitely read the article. But once again - I thought the methodology was bad, and thus the conclusion was bad.

whimsicalism · on March 18, 2023

I don’t think this is any crazy level of gymnastics.

But not going to keep replying, you engage online in a way that will turn lots of people you talk to away.

z3c0 · on March 19, 2023

I'll admit to having mistook your reply with another (hence the non-sequitur second half of my comment.) Apologies for my brusque tone.

vidarh · on March 17, 2023

It was not throwing out failures. It was treating even the first illegal move as a forfeit something which is stricter than FIDE rules.

z3c0 · on March 17, 2023

You can spin it that way if you want to, but the result is essentially guiding it through a brute force of the first successful playthrough it can muster.

whimsicalism · on March 17, 2023

> the result is essentially guiding it through a brute force of the first successful playthrough it can muster.

No, all unsuccessful playthroughs are resignations that impact the models ELO.

sebzim4500 · on March 17, 2023

He claims he was forfeiting every time he got an illegal move. Does no one on this website actually read the article?

Whether any of it is actually true is a different question.

z3c0 · on March 18, 2023

And it has already been stated elsewhere in the thread: an illegal move is not technically a forfeiture, so this is some heavy "giving the benefit of the doubt".

YeGoblynQueenne · on March 18, 2023

It would be interesting to see how ChatGPT would play after making the first illegal move. Would it go off the rails completely, playing an impossible game? Would it be able to play well if its move was corrected (I'm not sure how illegal moves are treated in chess; are they allowed to be taken back if play hasn't progressed?). Could it figure out it made an illegal move, if it was told it did, without specifying which one, or why it was illegal? By stopping the game as soon as an illegal move is made, the author is missing the chance to understand an important aspect of ChatGPT's ability to play chess.

I got the impression the author did this because they thought they were being fair with ChatGPT, but they're much more likely to be letting it off the hook than they seem to realise.

(Sorry about the "they"'s; I think the author is a guy but wasn't sure).

whymauri · on March 17, 2023

I was able to get better performance by always providing the prior sequence of moves and forcing ChatGPT to also respond with the sequence of moves up until its move.

Edit: I told the model that if the sequence was wrong or illegal, it forfeits the game. Without doing this, GPT would argue with me that it won and I didn't know the rules (serious).

Spivak · on March 17, 2023

From the article.

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.

But you can do even better than the OP with a few tweaks.

1. One is by taking the most common legal move from a sample of responses.

2. Telling GPT what all the current legal moves are telling it to only respond with an element from the list.

3. Ending the prompt with the current sequence of moves and having it complete from there.

tracker1 · on March 17, 2023

How many 1400 human chess players do you have to explain every possible move to it every single move?

sacred_numbers · on March 17, 2023

When you are speaking to a person, they have inner thoughts and outer actions/words. If a person sees a chess board they will either consciously or unconsciously evaluate all the legal moves available to them and then choose one. An LLM like ChatGPT does not distinguish between inner thoughts and outer actions/words. The words that it speaks when prompted are its inner thoughts. There is also no distinction between subconscious and conscious thoughts. Humans generate and discard a multitude of thoughts in the subconscious before any thoughts ever make it to the conscious layer. In addition, most humans do not immediately speak every conscious thought they have before evaluating it to see whether speaking it aloud is consistent with their goals.

There's already a lot of research on this, but I strongly believe that eventually the best AIs will consist of LLMs stuck in a while loop that generate a stream of consciousness which will be evaluated by other tools (perhaps other specialized LLMs) that evaluate the thoughts for factual correctness, logical consistency, goal coherence, and more. There may be multiple layers as well, to emulate subconscious, conscious, and external thoughts.

For now though, in order to prompt the machine into emulating a human chess player, we will need to act as the machine's subconscious.

Spivak · on March 17, 2023

I feel like we have very different expectations about what tools like this are good for and how to use them. When I say GPT3 can play chess what I mean is, I can build a chess playing automaton where the underlying decision making system is entirely powered by the LLm.

I, as the developer, am providing contextual information like what the current board state is, and what the legal moves are, but my code doesn't actually know anything about how to play chess, the Llm is doing all the "thinking."

Like it's nuts that people aren't more amazed that there's a piece of software that can function as a chess playing engine (and a good one) that was trained entirely generically.

arrrg · on March 17, 2023

Does that matter? I’m really very confused by the argument you are making.

That you may have to babysit this particular aspect of playing the game seems quite irrelevant to me.

sebzim4500 · on March 17, 2023

When they are blindfolded? Almost all of them.

theptip · on March 17, 2023

I don’t think this suffices as disproving the hypothesis. It’s possible to play at 1400 and make some idiotic moves in some cases. You really need to simulate a wide variety of games to find out, and that is what the OP did more of. Though I do agree it’s suggestive that your first (educated) try at an edge case seems to have found an error.

This is broadly the “AI makes dumb mistakes” problem; while being super-human in some dimensions, they make mistakes that are incredibly obvious to a human. This comes up a lot with self-driving cars too.

Just because they make a mistake that would be “idiots only” for humans, doesn’t mean they are at that level, because they are not human.

SamBam · on March 18, 2023

Honestly, I made it make an illegal move in my very first game, in the third move. You just have to do stuff no normal player would do:

> You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.

1. b4 d5 2. b5 a6 3. b6

> bxc6

That's obviously illegal.

... to all those who are saying "well even good players can make illegal moves sometimes," that's just ridiculous. No player makes illegal moves that often.

nottathrowaway3 · on March 17, 2023

> me: You are a chess grandmaster playing as black...

https://upload.wikimedia.org/wikipedia/en/5/5f/Ingmar_Bergma...

The KNIGHT holds out his two fists to CHATGPT, who smiles at him suddenly. CHATGPT points to one of the KNIGHT'S hands; it contains a black pawn.

KNIGHT: You drew black.

CHATGPT: Very appropriate. Don't you think so?

haburka · on March 17, 2023

He does have a section about GPT 4 which does match your results. Not sure if he added it after your comment or if you accidentally missed it.

Tenoke · on March 17, 2023

I played a game against it yesterday (it won) and the only time it made an ilegal was move 15 (the game was unique according to lichess database from much earlier) so I just asked it to try again. There's variance in what you get but your example seems much worse.

good_boy · on March 17, 2023

It should be possible to attach 'accelerators' or 'special skills'. So that when playing with ChatGPT you will be actually facing GNU Chess.

latexr · on March 17, 2023

> These people used bad prompts and came to the conclusion that ChatGPT can’t play a legal chess game. (…)

> With this prompt ChatGPT almost always plays fully legal games.

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table (…)

> (…) with GPT4 (…) in the two games I attempted, it made numerous illegal moves.

So you’ve ostensibly¹ found a way to reduce the error rate and then deliberately ignored the cases where it failed. In short: it may play valid chess under certain conditions but can’t be trusted to do so. That doesn’t contradict previous findings.

¹ 19 games is a small sample and the supposedly more advanced system failed in your tries.

vidarh · on March 17, 2023

Fuller context from the article:

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.

(my emphasis)

So the illegal moves are at least part of the reasons for the 6 losses, and factored into the rating. Quickly scanning the game, it seems 3 of the losses ended in checkmate, so that leaves 3 illegal moves in 19 games.

Could be better, but for a system not intentionally built to play chess, it's pretty decent.

swatcoder · on March 17, 2023

No ELO 1400 player will have that rate of illegal moves, so saying it that it plays with an ELO 1400 rating is disingenuous.

Reinterpreting illegal moves as resignation is absurd when an LLM is formally capable of expressing statements "I resign" or "I cannot conceive of a winning move from here" just as well as any human player. It just doesn't do so because it's not actually playing chess the way we think of an ELO 1400 player playing chess.

JellyBeanThief · on March 17, 2023

Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

My point is, it sounds like Elo doesn't measure what we want it to measure. If we care about the way an agent wins a game and not just whether it wins a game, then we need an instrument that measures strategy, not outcome.

illiarian · on March 17, 2023

> Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

Then it's still isn't anywhere near ELO 1400.

vidarh · on March 17, 2023

Under FIDE rules it's first a forfeit after the second illegal move, so if anything it would seem that the interpretation used by the article author underestimates its ELO ranking.

illiarian · on March 17, 2023

Nope, still not even close to what the author claims. If I understand it correctly, it made illegal moves in 3 out of 19 games. That's probably a few orders of magnitude more illegal moves than even a 1400 ELO player would make of their entire lifetime.

pedrosorio · on March 17, 2023

Repeating what others have said in this thread:

The author claims: chatGPT has a 1400 chess ELO based on games played.

You appear to think author claims: chatGPT plays chess like a human rated 1400.

Your observations do not contradict the authors’ claim that based on games won and lost against opponents of a specific strength, the estimated ELO is 1400.

A non-human player can make illegal moves at a much higher rate and make up for that by being stronger when it does not make illegal moves to achieve the same rating as a human player who plays the game in a completely different way.

ogogmad · on March 17, 2023

There's the "it" which has no post-processing, and there's the "it" where the output is post-processed to announce a resignation when it attempts an illegal move.

Some things about the two "it"s:

- They differ trivially.

- They enable new capabilities, such as the ability to explain why a move got made. Current chess AIs are not good at this.

So I think you're making too much of a big deal from a comparative triviality.

[edit]

We might be talking past each other. And some people above have come to doubt the article's results even with the right prompt engineering.

vidarh · on March 17, 2023

The ranking takes into account wins and losses, not illegal moves, and so the fact that it plays in a way where a higher proportion of its losses is down to illegal moves than a human player is not relevant to its ranking. It may suggest that the ranking ought to take that into account, but that's a separate issue.

vidarh · on March 17, 2023

That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules

arrrg · on March 17, 2023

Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.

A bowling bot that threw strikes 9 out of 10 throws and a gutter ball one time out of ten would still be a great bowler even though no human with the ability to make strikes that often would pretty much ever throw a gutter ball.

This is a weird kind of alien intelligence that does not have to behave like humans.

TheRealPomax · on March 17, 2023

Note that the claim is not that it's an ELO 1400 human equivalent player but that it can play chess at a level that gives it an ELO of 1400, which is not nitpicking: that's a completely different thing. We're not testing whether it plays like a player with ELO x, we're proving that "it can't play chess" is fallacious. It can, and when prompted properly, it can achieve an ELO of 1400.

ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.

jart · on March 17, 2023

I personally find that makes it more astonishing, that it would slip up on knowing the most basic elements of the game, yet still be able to play better than most humans. Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them. But that usually doesn't stop smart people from having an impact in making a contribution with their insights. The question of illegal moves is superficial, since most online systems have guardrails in place that prevent them. At worst it's just an embarrassment and I don't think machines care about being embarrassed.

Jensson · on March 17, 2023

> Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them

This is the opposite of that, a highly trained but dumb entity that has seen many lifetimes worth of games but is still tripping up on basics. But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

ogogmad · on March 17, 2023

> But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

But it is a master, as has been pointed out repeatedly. If you replace all illegal moves with resignations, and use the same style of prompt as the OP did, then it plays like an expert. I'm objecting because you're making it sound like it's a trivial result.

Jensson · on March 17, 2023

> you're making this sound like it's a trivial result

I don't think this is a trivial result, emulating a highly trained idiot is still very impressive. But it is very different from an untrained genius.

ogogmad · on March 17, 2023

You seem to have very rigid and boring definitions of the words "idiot" and "genius". The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

Tbh, I don't even know what you're saying.

[edit] OK, I might have misunderstood you. It's not always clear what people mean.

Jensson · on March 17, 2023

> The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

That isn't relevant to my comment, an idiot human is still a human. Your comment here therefore doesn't make sense. The comment I responded to likened it to a genius entering a new field, I objected to that, that is all.

charcircuit · on March 17, 2023

ELO is based off who you win and lose against. The rate of illegal moves has nothing to do with ELO.

Pxtl · on March 17, 2023

I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"

saghm · on March 17, 2023

So it sounds like it can play _some_ legal chess games, but not all; it's unable to consistently complete a game where it loses. Maybe the remaining work shouldn't be focused on trying to teach it chess rules better, but to teach it sportsmanship better. People were so excited about teaching it high-school level academics that we forgot to teach it the basic lessons we learn in kindergarten.

vidarh · on March 17, 2023

It seems like it plays mostly legal chess games, when not explicitly reminded of the rules. There's no problem of sportsmanship when it makes mistakes in a game it has not been verified to understand the rules of.

saghm · on March 17, 2023

I was responding to the conclusion from TFA quoted by the parent comment, that playing an illegal move was it saying "this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” If you reject that premise, then yes, my response to it will not be particularly relevant to your worldview.

vidarh · on March 17, 2023

Playing illegal moves is accounted for in rules. Depending on which rules you play by it can be an immediate forfeit, or involves redoing moves and adding time for the opponent, possibly with forfeit if repeated. As such, the article opted for one of the strictest possible rule sets. You can reject the interpretation he gave, and the outcome under those rules would still be the same. If you were to pick a more lenient ruleset, it's possible it would've come out with an even higher ranking.

kmeisthax · on March 17, 2023

Or append "If you wish to resign or you cannot think of a legal move, type 'resign'" to the end of the prompt.

saghm · on March 17, 2023

That's basically my point; that sort of context is exactly the sort of thing you would not need to say to a person who grew up in a typical social environment. If we focus too much on teaching AI technical skills, we might later find out that some of the social skills we think of as implicit were just as important.

jmull · on March 17, 2023

The article also says in one game chatgpt when crazy so they continued the game with a fresh chat. That probably should have been counted a resignation loss too.

nextaccountic · on March 17, 2023

> So whenever it wanted to make an illegal move, it resigned.

Making an illegal move counts as losing by the laws of chess, so this is essentially correct

dudeinjapan · on March 17, 2023

Obviously the article should be taken with a giant grain of salt. That being said, not many things what aren't designed to play chess can play chess, with or without coaxing. My dog cannot, for instance, nor can my coffee table.

hectorlorenzo · on March 17, 2023

> My dog cannot, for instance, nor can my coffee table.

You must be giving them the wrong prompts.

ogogmad · on March 17, 2023

[redacted]

AndrewPGameDev · on March 17, 2023

It's a joke

ballenf · on March 17, 2023

The illegal moves were counted as losses/resignations, not ignored.

__s · on March 17, 2023

> So whenever it wanted to make an illegal move, it resigned.

Doesn't sound like ignoring the cases where it failed

Waterluvian · on March 17, 2023

I’m going to float something ridiculous:

An illegal move is a valid play. You might not get caught. I think there are some Magnus games where illegal moves went overlooked and impacted the game.

You could interpret this as “ChatGPT wants to cheat sometimes.” But I personally interpret it as “ChatGPT doesn’t understand what it’s doing. It’s just a really really good simulacrum.”

hgsgm · on March 17, 2023

Is this the top comment (and not even grey) because more people failed to read the article than read it?

whimsicalism · on March 17, 2023

A baffling thread.

They quoted the article, so clearly they read it... but not very well?

sebzim4500 · on March 17, 2023

It does seem that way.

whimsicalism · on March 17, 2023

I'm confused. If you read the article, you know that you are wrong - but you are quoting the article?

psychphysic · on March 17, 2023

That's how one uses any tool.

qwytw · on March 17, 2023

The behavior of pretty much every other tool is much easier to interpret though.

kdmccormick · on March 17, 2023

If the title of the article was:

> A trivial wrapper around ChatGPT has a Chess Elo of 1400

would you have any issue?

Afaict, the thesis is the article is not "ChatGPT is the ideal tool for playing AI chess", but "it is interesting how well ChatGPT can play chess with some very simple tweaks."

Out_of_Characte · on March 17, 2023

Yes, but it also completely invalidates the measurement of a 1400 elo rating. By comparison, any player making an illegal move is forfeiting the game, almost all people from ~300 elo can play without making illegal moves, chatgpt cant.

ncallaway · on March 17, 2023

> almost all people from ~300 elo can play without making illegal moves

I don't believe you. Are you giving those people a restricted move set (i.e. computer chess, where it will _only_ allow legal moves)? Because if you give people an unrestricted board, I _guarantee_ you people will make lots of illegal moves.

Me: Moves pawn

Opponent: You can't do that, you exposed your king to check.

Me: Oops, sorry, you're right.

nsxwolf · on March 17, 2023

Why do illegal moves forfeit? In online play, they're validated. You can't make illegal moves. What's the ELO score if ChatGPT is corrected, and chooses a new move?

hgsgm · on March 17, 2023

All this above, and people are claiming that ChatGPT lacks human level comprehension of the text it consumes.

In Chess.com, you absolutely can attempt an illegal move, and many players do, and you will not get punished for it, so chatgpt is better then a 1400 human player.

sebzim4500 · on March 17, 2023

ChatGPT did forfeit whenever it made an illegal move, read the article.

swatcoder · on March 17, 2023

No, the writer arbitrarily decided to interpret illegal moves as resignations in order to support the conclusion they wanted. That's very different and grossly unscientific.

mynameisvlad · on March 17, 2023

I mean, that's more lenient than the official "interpretation" (rule) which is that your second illegal move results in a forfeit.

epups · on March 17, 2023

This is not a scientific paper, and I at least find this decision justified, as he could have been more lenient and grab headlines with a bigger ELO.

renewiltord · on March 17, 2023

The article:

> So whenever it wanted to make an illegal move, it resigned.

You:

> By comparison, any player making an illegal move is forfeiting the game...

By comparison indeed.

FartyMcFarter · on March 17, 2023

I just opened a random recent chess game on lichess ( https://lichess.org/YpxTUUbO/white#88 ) . I'm pretty sure ChatGPT can't be trained on games that were just played, so this ensures the game is not in its training data.

I gave the position before checkmate to ChatGPT to see if it would produce the checkmating move. It played an illegal move, replying with "Be5#" even there's no bishop of either color in the position.

Unfortunately I'm rate limited at the moment so I can't try other games, but this looks like a solid method to evaluate how often ChatGPT plays legal / good moves.

vidarh · on March 17, 2023

I gave it this prompt, and pasted the game. Its reply:

"45. Qxg6# This is a checkmate!"

hgsgm · on March 17, 2023

OP explained that you need to prompt the whole game, not just a position.

ChatGPT is an LLM, not a game tree engine. It needs the move history to help it create context for it's attention.

WoodenChair · on March 17, 2023

Most likely it has seen a similar sequence of moves in its training set. There are numerous chess sites with databases displayed in the form of web pages with millions of games in them. If it had any understanding of chess, it would never play an illegal move. It's not surprising that given a sequence of algebraic notation it can regurgitate the next move in a similar sequence of algebraic notation.

ptmx · on March 17, 2023

I played chess against ChatGPT4 a few days ago without any special prompt engineering, and it played at what I would estimate to be a ~1500-1700 level without making any illegal moves in a 49 move game.

Up to 10 or 15 moves, sure, we're well within common openings that could be regurgitated. By the time we're at move 20+, and especially 30+ and 40+, these are completely unique positions that haven't ever been reached before. I'd expect many more illegal moves just based on predicting sequences, though it's also possible I got "lucky" in my one game against ChatGPT and that it typically makes more errors than that.

Of course, all positions have _some_ structural similarity or patterns compared to past positions, otherwise how would an LLM ever learn them? The nature of ChatGPT's understanding has to be different from the nature of a human's understanding, but that's more of a philosophical or semantic distinction. To me, it's still fascinating that by "just" learning from millions of PGNs, ChatGPT builds up a model of chess rules and strategy that's good enough to play at a club level.

mtlmtlmtlmtl · on March 17, 2023

I'd be interested in seeing this game, if you saved it?

ptmx · on March 17, 2023

I uploaded the PGN to lichess: https://lichess.org/rzSriO6I#97

After reviewing the chat history I actually have to issue a correction here, because there were two moves where ChatGPT played illegally:

1. ChatGPT tried to play 32. ... Nc5, despite there being a pawn on c5

2. ChatGPT tried to play 42. ... Kxe6, despite my king being on d5

It corrected itself after I questioned whether the previous move was legal.

I was pretty floored that it managed to play a coherent game at all, so evidently I forgot about the few missteps it made. Much like ChatGPT itself, it turns out I'm not an entirely reliable narrator!

mtlmtlmtlmtl · on March 17, 2023

Thanks! Interesting game.

Qxd7 early on was puzzling but has been played in a handful of master games and it played a consistent setup after that with b5 Bb7. Which I imagine was also done in those master games. But interesting that it went for a sideline like that.

It played remarkably well although a bit lacking in plan. Then cratered in the endgame.

Bxd5 was strategically absurd. fxg4 is tactically absurd. Interestingly they both follow the pattern: Piece goes to square -> takes on that square.

This is of course an extremely common pattern, so again tentatively pointing towards predicting likely sequences of moves.

Ke7 was also a mistake but a somewhat unusual tactic with Re2 and f5 is forced but after en passant the knight is pinned. This tactic does appear in some e4 e5 openings though. But then the rook is on e1 and the king never moved or if it did, usually to e8, not e7. Possibly suggesting that it has blind spots for tactics when they don't appear on the usual squares?

Fascinating stuff.

nwienert · on March 17, 2023

Me too, I couldn't get it to reliably go past move 15 without numerous errors. In my mind it's closer to 150 ELO than 1300, so I'd be happy to be proven wrong.

wongarsu · on March 17, 2023

Pattern matching on databases of previous moves is a legitimate technique that was poplar in chess AI back when computing power was more limited. Even if that was all that GPT did here it would still be interesting.

But the presence of illegal moves doesn't really show that in my eyes. I fully understand the rules of chess, but I still occasionally make illegal moves. In 2017 Magnus Carlsen made one in a tournament [1]. The number of illegal moves suggests that either GPT is pretty new to chess, has low intelligence, or is playing under difficult circumstances (like not having a chess board at hand to keep track of the current state). I'm not sure we can deduce more than that

1: https://www.chessbase.in/news/Carlsen_Inarkiev_controversy

illiarian · on March 17, 2023

But then it was one case out of thousands of games that Carlsen played. In the article ChatGPT is ding illegal moves in 3 games out of 19.

The sample is smallm but the rate is much, much, higher. You'd expect maybe one, or none at all. Even for a supposed 1400 ELO player. Because even 800 ELO players rarely do that many illegal moves I think.

div72 · on March 17, 2023

I agree with your conclusion but the example you gave of Magnus Carlsen doing an illegal move is a pretty bad one imo. That was clearly a trap from his opponent as he did an illegal move first by checking Magnus while he was under check and then immediately calling the arbiter when Magnus moved his king in response. Magnus also probably did not have had much time left as it was a Blitz game.

leereeves · on March 17, 2023

Also, according to the link, the final ruling was that Carlson's move was not illegal.

EarthLaunch · on March 17, 2023

An arbiter determined that Inarkiev made an illegal move and Carlsen did not. Interesting article.

johnfn · on March 17, 2023

> Most likely it has seen a similar sequence of moves in its training set.

Is this a joke making fun of the common way people dismiss other ChatGPT successes? This makes no sense with respect to chess, because every game is unique, and playing a move from a different game in a new game is nonsensical.

btilly · on March 17, 2023

Sorry, but not every game is unique. The following game has been played millions of times.

1. e4 e5 2. Bc4 Bc5 3. Qh5? Nf6?? 4. Qxf7++

The game Go has a claim to every game being unique. But not chess. And particularly not if both players follow a standard opening which there is a lot of theory about. Opening books often have lines 20+ moves deep that have been played many times. And grandmasters will play into these lines in tournament games so that they can reveal a novel idea that they came up with even farther in than that.

johnfn · on March 17, 2023

You seem to be refuting a specific point of my argument which has little bearing on the overall point I was making.

All games were provided in the article. None of them were 4 move checkmates; nearly every one is longer than 20 moves and some are 40 or longer. There is simply no possible way that ChatGPT is regurgitating the exact same 40-move-long game it's seen before. You can check a chess database if you'd like; virtually all games longer than 20 moves are unique.

mtlmtlmtlmtl · on March 17, 2023

But this doesn't disprove that:

1. It definitely regurgitates opening theory, much more than can reasonably be calculated at its strength.

2. It might be regurgitating tactical sequences that appear in a lot of positions but remain identical in algebraic notation. Famous example:

  1. Nxf7+ Kg8
  2. Nh6++ Kh8
  3. Qg8+ Rxg8
  4. Nf7#

This smothered mate can occur in a huge variety of different positions.

There's some qualitative evidence for this in the games.

In one of the games it has a bishop on f6 as white. It plays Qxh6?? Kxh6 and then resigns due to illegal move. I'd bet good money that illegal move was Rhx# where x is 1-4. So it seems like in some these positions it's filling in a tactical sequence that often occurs in the vicinity of recent moves, even when it's illegal or doesn't work tactically.

mtlmtlmtlmtl · on March 17, 2023

1850 ELO player and also chess AI programmer here. This is an oversimplification at best.

Many many games follow the same moves(1 move = 2 plies) for a long time, up to 30 moves in some cases, 20 moves is downright common and 10 moves is more common than not.

These series of moves are referred to as opening theory and are described at copious length in tons of books.

This is because while the raw number of possible paths to take is immense, the number of reasonable paths for 2 players of a given strength gets smaller and smaller.

If I went over the 300 or so classical tournament games I've played I would ballmark that maybe just one or two would deviate from all known theory in the first 10 moves.

So the criticism is valid in my view. The existence of copious chess literature can't simply be ignored here.

sebzim4500 · on March 17, 2023

Yeah but now explain how it played a 61 move game.

EDIT: I checked and it left the lichess database after 9 moves. The lichess db has probably 5 orders of magnitude more chess games in it than chatGPT has in its training data.

roflyear · on March 17, 2023

It's suggesting a move it saw come after a move somewhere. Or it's creating a token it thinks comes next.

mtlmtlmtlmtl · on March 17, 2023

That's not the point. The point is if you truly want to test its strength, you'll have to control for these things. Maybe do things like invent a new form of notation and/or deliberately go into uncharted territory. Maybe start with a non-standard starting position even. Or play chess960 against it.

In theory if I was playing a 1200 player I would almost always win, but let's say they have some extremely devious preparation that I fell into due to nonchalance and by the time we're both out of book I'm down a queen. It might not matter that I'm 600 points stronger at that point. If they don't make a sufficient amount of errors in return I will lose anyway.

sebzim4500 · on March 17, 2023

ChatGPT would probably play worse under those conditions, but then humans also get worse. ACPL is way higher at top level 960 events than at normal tournaments, for example

mtlmtlmtlmtl · on March 17, 2023

This is true. But I think ChatGPT compared to a human player of equivalent strength(leaving the dubious number 1400 out for now) will probably have a lot more opening knowledge than the human.

So it would be interesting to eliminate all opening knowledge and that way be able to qualitately get at which aspects of chess it's actually good at, which is sucks out, and how much of its strength can be attributed to opening knowledge.

I'm still impressed by this btw. I did not expect this to be possible at all really. But being impressed is not an excuse to ignore methodological flaws. :)

johnfn · on March 17, 2023

I'm very familiar with opening theory. Some of the games are 40 or 60 movies. This is not a regurgitation of book moves.

mtlmtlmtlmtl · on March 17, 2023

Why do people always have to interpret everything in absolute terms?

It's clearly following some opening theory in all the games I've looked at so far. So yes, it is regurgitating opening moves. That's clearly not all it's doing, which is very impressive, but these are not mutually exclusive.

johnfn · on March 17, 2023

I am responding to OP, who said "Most likely it has seen a similar sequence of moves in its training set."

From this, I take it that the question is if ChatGPT is repeating existing games, or not. All you need is a single game where it's not repeating a single game to prove it definitively. You can hardly play 60 moves without an error by accident.

I believe you're responding to a different question, something like "does ChatGPT fully understand the game of chess".

mtlmtlmtlmtl · on March 17, 2023

The OP was too unsophisticated in their analysis(as is TFA), no doubt. But I'm not too interested in what OP said or who was wrong or not, and rather more interested in finding what's right.

As someone very clever once said, welcome to the end of the thought process.

We've established that:

1. It doesn't repeat entire games when the games go long enough

2. It does repeat a lot of opening theory

3. It seems to repeat common, partially position independent tactical sequences even when they're illegal or don't work tactically.

fsckboy · on March 17, 2023

>playing a move from a different game in a new game is nonsensical

GP did say "sequence of moves", and if it matches what it has seen from the first move on, including the opponent, it will be in a valid "sequence of moves".

then, even midgame or endgame, if a sequence is played on one side of the board, even though the other side of the board may be different, the sequence has a great chance of being good (not always of course, but a 1400 rating is solid (you know the rules and some moves) but not amazing

GauntletWizard · on March 17, 2023

There's a pretty finite set of tokens that represent valid chess moves, and a more than countable set of possible opening moves, with a very distinct probability of moves; Nobody opens pg3, for example.

Classical Markov chains played chess at some rate of success. ChatGPT is probably a lot better but not fundamentally different - It's predicting which moves to play based on sets of past games, not by memorizing it but by memoizing it.

fxtentacle · on March 17, 2023

For Bomberland, we were quite surprised how strongly we could compress and quantize the current game state and still get useful movement predictions.

I wouldn't be surprised if the relevant state in a typical beginner's chess game also excluded many units in the sense that yes, you could move them, but a beginner is going to just ignore them in any case.

qwytw · on March 17, 2023

Is it though? I mean if you had data on millions of games what is the chance that you'd find one which has identical position that the one you're in (it's not like most moves are random..)

I wonder how well it could perform in Go, there are way more permutations there so finding an identical state should be more difficult.

jointpdf · on March 17, 2023

You can click through the Lichess opening database (click the book icon, and then the Lichess tab) to get an idea: https://lichess.org/analysis

But the answer is insanely unlikely, past a certain number of moves. The combinatorial explosion is inescapable. Even grandmaster games are often novelties in <10 moves.

So, it has a to have some kind of internal representation of board state and what makes a reasonable move and such that enables it to generalize (choosing random legal moves is almost unbelievably bad, so it’s not doing that).

I also doubt that it has been trained on the full (massive) database of Lichess games, but that would be an interesting experiment: https://database.lichess.org/

sebzim4500 · on March 17, 2023

>I mean if you had data on millions of games what is the chance that you'd find one which has identical position that the one you're in (it's not like most moves are random..)

Very low. On lichess when you analyse your games you can see which positions have been reached before, and you almost always diverge in the opening.

The lichess db has orders of magnitude more games of chess than the chatGPT training data does, so there is absolutely no way that chatGPT could reach 1400 purely based off positions in its training data.

medvezhenok · on March 17, 2023

You could certainly test this by making completely random moves and seeing whether it's more likely to make illegal moves in those positions.

Though I think you're overestimating how many positions have occured. Frequently, by move 20-25 you have a unique position that's never been played before (unless you're playing a well known main line or something)

melenaboija · on March 17, 2023

If it isn not memorizing, how do you think is doing it?

LeanderK · on March 17, 2023

by trying to learning the general rules that to explain the dataset and minimise its loss. That's what machine learning is about, it's not called machine memorising.

mrguyorama · on March 17, 2023

It cannot be learning the general rules because it occasionally tries to invent pieces out of whole cloth.

meh8881 · on March 17, 2023

Try asking it if a previous move was legal. Or why it decided to do the move.

The problem is not a failure to understand the rules. It is just not very good at maintaining the state.

LeanderK · on March 17, 2023

that would be the optimal rule :) We usually optimise to improve and learn better and better rules, hopefully approximating the optimal rule after some iterations. There's still a gap here, but hopefully it can be closed by improving the models, training algorithms etc.

onos · on March 17, 2023

Apparently not, at least at this level.

mrbungie · on March 17, 2023

I would argue that a decent human player also has "similar sequences of moves in its training set", due to reading plus playing or watching games.

The illegal move argument is good though, and indicates no direct understanding of what it is spewing out.

circuit10 · on March 17, 2023

I wouldn’t say it has no understanding whatsoever, it probably just doesn’t have the capability to keep track of everything that happened all the time

dwmbt · on March 17, 2023

that's also not entirely true either. it's prone to make illegal moves. one could argue abiding by the rules of the game is a requisite for understanding!

imtringued · on March 17, 2023

When I ask it to display moves in an ASCII board it messes up. I doubt that having a representation of the board is the key to improving its chess skills.

SanderNL · on March 17, 2023

You don’t get to 1400 like that. The amount of moves it has to literally remember is stupendous.

Jensson · on March 17, 2023

It doesn't remember exact moves, it remembers patterns of moves and continues those. That is why it so easily makes illegal moves, because the patterns doesn't have anything to do with board state.

lelandfe · on March 17, 2023

Perhaps I don’t know what “easy” means in your comment, but the OP article says that they “almost always” played full, legal games with their prompt.

Jensson · on March 17, 2023

Maybe chess.com players play in a specific way, and there are lot of transcriptions of such games that these LLMs have ingested when they ingested the internet?

I don't know why it worked in this specific case, but based on earlier examples it is more likely that these kind of games were more prevalent in its dataset it was trained on than it being able to play chess in general. It still wasn't perfect, so even these games weren't rigid enough for it to reliably perform valid moves.

antibasilisk · on March 17, 2023

but that's literally what most chess players do anyway?

Jensson · on March 17, 2023

Yes, they remember patterns, and then applies logic to make a valid move. ChatGPT remembers patterns, but didn't apply any logic to make valid moves. There is a difference there.

ChatGPT does the pattern matching part, but not the logical part.

SanderNL · on March 17, 2023

Again, not enough for 1400. 800 maybe.

Jensson · on March 17, 2023

We have evidence right here that patterns of moves leads to 1400. To me this is evidence that 1400 level play is very rigid and mostly relies on defined openings and strategies, making it easy for an LLM to find similar games and win using standard continuations.

Just like grammar the patterns are too hard for humans to see and encode, but LLMs can encode pretty complex patterns. Domains that are easy to encode as grammars will be really easy for LLMs to solve, and the further from a grammar the harder for it.

SanderNL · on March 17, 2023

At what point can we just say that understanding “patterns of moves” is understanding chess? It seems you suggest there is more to it, but maybe I am mistaken.

Jensson · on March 17, 2023

At least it should make valid moves, that is the minimum level required. It didn't reach that level here. If it never made illegal moves we could talk and see what it does, but until then we can be sure it didn't understand the rules.

serverholic · on March 17, 2023

I don’t understand why the threshold is “never”. Isn’t it entirely possible that the AI is learning a model of chess but this model is imperfect? What if AIs don’t fail the same way as humans?

Jensson · on March 17, 2023

> What if AIs don’t fail the same way as humans?

But it is failing the same way as a human. Humans who remembers patterns and don't learn the logic makes these kind of errors in math or logic all the time.

ChatGPT is much better than humans at pattern matching, you see it right here it can pattern match chess moves and win games! But its inability to apply logic to its output instead of just pattern matching is holding it back, as long as that isn't solved it wont be able to perform on the level of humans in many tasks. Chess might be easy enough to solve using just pattern matching and no logic that scaling it up will make it pretty good at chess, but many other topics wont be.

meh8881 · on March 17, 2023

Ask it to evaluate moves and explain why they are or are not good

WoodenChair · on March 17, 2023

Nobody who is 1400 plays outright illegal moves.

mrbungie · on March 17, 2023

Yet it achieves 1400. Add hard rules to stop it spewing out said moves and you have a 1400 ELO Player (most UIs won't even let you make illegal moves).

It is difficult to say that is not impressive due to it being an emergent ability.

WoodenChair · on March 17, 2023

> It is difficult to say that is not impressive due to it being an emergent ability.

I don't know why you think it's an emergent ability.

It's seeing a sequence of moves, and playing the most likely next move (i.e. the most likely next token) given the previous complete move sequences it was trained on. That's the baseline of what an LLM does—not something emergent. Games in online chess databases tend to be of relatively good players. Nobody wants to look up games played by two 800 ELO players.

As an aside, there have been chess programs for years that show you for a given position all of the previous games in its database with the same position and the win outcome % of each move. That's all that's going on here.

oezi · on March 17, 2023

> here have been chess programs for years that show you for a given position all of the previous games in its database with the same position and the win outcome % of each move. That's all that's going on here.

It could be, but would you think that of the 100-300 bn parameters in the model a lot are dedicated to chess move sequences? It seems likely that it has seen such data, but I would be surprised if it is using a considerable chunk to store chess database information.

Jensson · on March 17, 2023

The web has millions of grandmaster chess games and probably billions of chess games overall. So I wouldn't be surprised if it has like 0.01% to chess games since there are so many, if so that would mean it has 10-30 million parameters to play chess with, for comparison stockfish has 10 million parameters to its chess engine.

mrbungie · on March 17, 2023

Because I don't think that the model learned the literal memorization of chess moves. It must've at least compressed said information in some way way. And since the model is not biased to play chess on its structure nor sampling policy, I think it's fair to consider it an emergent ability.

Chess moves are a tiny/diminute part of all text learned by the model. This memorization argument is very similar to the "Stable Diffusion just takes bits of the images in the original dataset and parches them together".

mtlmtlmtlmtl · on March 18, 2023

I have seen 1400 FIDE elo(much much stronger than chess.com 1400) players make illegal moves in real life long time control tournament games after thinking for over 10 minutes. People regularly create an erroneous board state in their head while thinking. This is often how blunders happen; merging together a possible future board state with the current one. Illegal moves are more rare because you have to then also miss that the move is illegal while physically making it. But they do happen.

1400s on chess.com never play illegal moves. 300s on chess.com never play illegal moves. Because it's impossible to do. In the real world, even grandmasters can make illegal moves, though they almost always have to be under time pressure.

This idea that the illegal moves completely invalidate this result is just ill-conceived. On the other hand I do agree this is mostly returning common sequences of moves. And if you actually analyse the games, especially the ones with illegal moves, you'll find plenty of qualitative evidence of that. But I'm fed up of doing people's thinking for them for today, so this is peace out for me today. See my others comments on this post to see a more detailed analysis of what this is doing.

SanderNL · on March 17, 2023

Humans don’t, but that is an unnecessary bar IMO.

WoodenChair · on March 17, 2023

> Humans don’t, but that is an unnecessary bar IMO.

People are always telling me that I'm moving the goalposts when I challenge the hyperbole about LLMs. But now you're moving the goalposts about chess.

Not playing illegal moves is a pre-requisite for any strong understanding of how to play chess. That is definitely the goal post.

scj · on March 17, 2023

If a two year old had an ELO of 1400 but sometimes took a nap halfway through a game, it'd be a prodigy.

It's not like an AI making silly mistakes when driving a car.

vanviegen · on March 17, 2023

Does that still hold when the player doesn't have a board in front of them, but just a list of previous moves?

PoignardAzur · on March 17, 2023

> Most likely it has seen a similar sequence of moves in its training set.

Wouldn't we expect a much higher rate of illegal moves if that was the case?

yowzadave · on March 17, 2023

Doesn't ChatGPT indeed have a very high number of illegal moves?

https://www.youtube.com/watch?v=kvTs_nbc8Eg

In this example, ChatGPT's first few moves are reasonable (while it appears to be on-book), but then it goes off the rails and starts moving illegally, spawning pieces out of nowhere, deleting pieces for no reason, etc.

medvezhenok · on March 17, 2023

I think it was not given the whole game up to that point, just individual moves. That was the point of this article - if you include all of the moves in the prompt, it is less likely to make illegal moves.

macNchz · on March 17, 2023

Reminds me of asking for driving directions (city to city)...for major cities it can often give perfect directions, for smaller cities it starts out surprisingly accurate but often devolves into invented exits or descriptions of a

WoodenChair · on March 17, 2023

If there was an actual understanding of chess at a 1400 level we wouldn't expect any illegal moves.

scarmig · on March 17, 2023

We would expect some vanishingly small fraction. Even at the grandmaster level, they happen (rarely):

https://chess24.com/en/read/news/the-7-most-illegal-chess-mo...

bloppe · on March 17, 2023

This. The author is very generous with their interpretation:

> I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.”

Kind of sounds like anthropomorphization, but more likely the author just papering over the glaring shortcomings to produce a compelling blog post.

It also sounds like the illegal moves were rather frequent. The 61-legal-move game sounded like an impressive outlier.

rightbyte · on March 17, 2023

I think the authors rule is fair. If we interpret illegal moves as getting stuck in an online game, the resulting Elo rating is what it would get.

But ye, he is anthropomorphizing alot ...

bloppe · on March 17, 2023

There's no indication that GPT-3.5 was stuck when it tried to make illegal moves. GPT-4 clearly was making illegal moves when it was very much not stuck. It just doesn't know how to play, but the author decided to interpret it as frustration.

rightbyte · on March 19, 2023

I meant stuck as "it can't make the move" and would timeout the game if playing online.

oezi · on March 17, 2023

I think there is very low percentage of players at elo 1400 who can provide a valid next move after seeing just the list of moves and not the current board state.