I had some successes refactoring one instance of a pattern in our codebase, along with all the class' call sites, and having codex identify all the others instances of said pattern and refactor them in parallel by following my initial refactor.
Similarly, I had it successfully migrate a third (so far) of our tests from an old testing framework to a new one, one test suite at a time.
We also had a race condition, and providing Claude Code with both the unsymbolicated trace and the build’s symbols, it successfully symbolicated the trace, identified the cause. When prompted, it identified most of the similar instances of the responsible pattern in our codebase (the one it missed was an indirect one).
I didn’t care much about the suggested fixes on that last one, but consider it a success too, especially since I could just keep working on other stuff while it chugged along.
The problem with all of these, even the most recent one, is that they have the "AI look". People have tired of this look already, even for short adverts; if they don't want five minutes of it, they really won't like two hours of it. There is no doubt the quality has vastly improved over time, but I see no sign of progress in removing the "AI look" from these things.
My feeling is the definition of the "AI look" has evolved as these models progressed.
It used to mean psychedelic weird things worthy of the strangest dreams or an acid trip.
Then it meant strangely blurry with warped alien script and fifteen fingers, including one coming out of another’s second phalanx
Now it means something odd, off, somewhat both hard to place and obvious, like the CGI "transparent" car (is it that the 3D model is too simple, looks like a bad glass sculpture, and refracts light in squares?) and ice cliffs (I think the the lighting is completely off, and the colours are wrong) in Die Another Day.
And if that’s the case, then these models have covered far more in far less time then it took computer graphics and CGI.
> I feel like it has picked up on certain keywords and then just rolled with its own stereotypes of what those keywords represent, rather than actually taking a good look at what I think. A roast works because the roaster has clearly spent time and effort and care understanding the person roasted. This is way too shallow for that.
Yeah. It picks one random thing from one comment and turns into a lifestyle.
I remember reading that one of the issue regarding the EU and it’s institutions' exposure to lobbyists was that a big part of the population is uninterested in the EU and EU elections.
Which may or may not be true, maybe only partially true at that, and is perhaps simplistic, but does kind of make sense. EU elections do have a particularly low turnout, and if people themselves don’t care enough, then who will?
Also the DoE having to figure out how to make Fogbank again (a classified material used in weapons which they lost the manufacturing documentation for)
Similarly, I had it successfully migrate a third (so far) of our tests from an old testing framework to a new one, one test suite at a time.
We also had a race condition, and providing Claude Code with both the unsymbolicated trace and the build’s symbols, it successfully symbolicated the trace, identified the cause. When prompted, it identified most of the similar instances of the responsible pattern in our codebase (the one it missed was an indirect one).
I didn’t care much about the suggested fixes on that last one, but consider it a success too, especially since I could just keep working on other stuff while it chugged along.
reply