In this example, ChatGPT's first few moves are reasonable (while it appears to be on-book), but then it goes off the rails and starts moving illegally, spawning pieces out of nowhere, deleting pieces for no reason, etc.
I think it was not given the whole game up to that point, just individual moves. That was the point of this article - if you include all of the moves in the prompt, it is less likely to make illegal moves.
Reminds me of asking for driving directions (city to city)...for major cities it can often give perfect directions, for smaller cities it starts out surprisingly accurate but often devolves into invented exits or descriptions of a
This. The author is very generous with their interpretation:
> I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.”
Kind of sounds like anthropomorphization, but more likely the author just papering over the glaring shortcomings to produce a compelling blog post.
It also sounds like the illegal moves were rather frequent. The 61-legal-move game sounded like an impressive outlier.
There's no indication that GPT-3.5 was stuck when it tried to make illegal moves. GPT-4 clearly was making illegal moves when it was very much not stuck. It just doesn't know how to play, but the author decided to interpret it as frustration.
I think there is very low percentage of players at elo 1400 who can provide a valid next move after seeing just the list of moves and not the current board state.
I'm Elo 1400 and can beat literally everyone I know in the real world. I need to go online to find players at my skill level, or find tournament/competitive settings for a challenge.
Yeah, I'm "class C", weak amateur chess player, but I think you're grossly underestimating the amount of study I put into this game. I'm not going to make an illegal move
> ChatGPT: Yes, that’s a good move for you. My next move is: Bc3, developing my pieces and attacking your pawn on c3.
I am 1400 Elo and can tell you that from an near opening position, its impossible to move a Bishop to c3 for either Black or White in the first say, 10 moves, under traditional openings.
We're talking about pieces that don't exist, reappearing pieces, pieces moving completely wrong (Knight takes as if its a Pawn), etc. etc.
---------
People are taking these example games and saying ChatGPT is 1400 strength. I don't think so. This isn't a case of "oops, I castled even though I moved my king 15 turns ago".
The article points out that the way that game was conducted was bad. (Here's the original transcript: https://pastebin.com/X6kBRTa9)
You need to give ChatGPT the full state (every move) on every prompt to make it play closer to 1400. The game you linked the user was giving one move at a time.
If I've been given the full state every move, I will _never_ make an illegal move as a 1400 chess player.
-----------
> O-O
> I'll play O-O as well. Your move.
Do you really think that this error would have been made at 1400 Elo? Even in blind chess? This is the 5th move of the game. I can still track the game at this point mentally.
I recognize that you're 1900 and think that all the chess players below you are n00bs, but... come on. 1400 players are stronger than this.
And yet kids who are gaining rating quickly can and do still occasionally (albeit rarely) make illegal moves at 1400. I know because I've played them (and was one, many years ago).
Not if training is unsupervised. If you've never been explicitly told the rules of game, you can never be 100% sure of all possible illegal moves.
anyway the 3.5 series can't ply chess but gpt-4 certainly can.
Ah, yes that is more or less my understanding of it as well. Though I would like to see how it would perform if given the state of the board as input to predict the next move, rather than a sequence of moves, since that is how we humans normally determine the next move. I believe the move history is only relevant when it comes to en passant and certain draw scenarios (like repetition and that 50 move rule). Needless to say, it would first have to be trained on those types of inputs, which it probably is not.
Edit: move history can also be relevant when it comes to castling.
Wouldn't we expect a much higher rate of illegal moves if that was the case?