Claude is really good at specific analysis, but really terrible at open-ended problems.
"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.
"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.
It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.
There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.
That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.
But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.
I didn't pull this argument out of nowhere, please read the direct comment I was replying to. Your position is also completely untenable: this benchmark was obsoleted by its creators 29 years ago, who very clearly say it is obsolete, and you're arguing that it isn't because it "still runs."
I'm guessing that this discussion would be more productive if you would please say who you are and the company you work for. I'm Brendan Gregg, I work for Intel, and I'm well known in the performance space. Who are you?
> I should strive to never upgrade anything unless I absolutely must.
It seems that the choice is whether to live on the slightly-bleeding edge (as determined by “stable” releases, etc), or to live on the edge of end-of-life, always scrambling to rewrite things when the latest dependency library is being officially obsoleted. I advocate doing the former, while you seem to prefer the latter.
The problems with the former approach are obvious (and widely seen), but there are two problems with the latter approach, too: Firstly, you are always using very old software which are not using the latest techniques, or even reasonable techniques. This can even be considered to be bugs – like using MD5 hash for example, which, while being better than what preceded it, much software were using MD5 as a be-all-and-end-all hashing algorithm; this turned out later to be a mistake. The other problem is more subtle (and was more common in older times): It’s too easy to be seduced into freezing your own dependencies, even though they are officially unsupported and end-of-lifed. The rationalizations are numerous: “It’s stable, well-tested software”, “We can backport fixes ourselves, since there won’t be many bugs.” But of course, in doing this, you condemn your own software to a slow death.
One might think that doing the latter approach is the hard-nosed, pragmatic and responsible approach, but I think this is confusing something painful with something useful. I think that doing the former approach is more work and more pain from integration, and the latter approach is almost no work, since saying “no” to upgrades is easy. It feels like it’s good since working with an old system is painful, but I think one is fooling oneself into doing the easy thing while thinking it is the hard thing.
The other reason one might prefer the former approach to the latter is that by doing the former approach, software development in general will speed up by all the fast feedback cycles. It’s not a direct benefit; it’s more of an environmental thing which benefits the ecosystem. Doing the latter approach instead slows down all feedback cycles in all the affected software packages.
Of course, having good test coverage will also help enormously with doing the former approach.
A point I like to make in discussions like this is that software and hardware specifications are very different. We think of software as the thing we're building. But it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Compared to what an architect does when they create a blueprint for a building, creating blueprints for software source code is not a thing.
What in waterfall is considered the design phase is the equivalent of an architect doing sketches, prototypes, and other stuff very early in the project. It's not creating the actual blue print. The building blue print is the equivalent of source code here. It's a complete plan for actually constructing the building down to every nut and bolt.
The big difference here is that building construction is not automated, costly, and risky. So architects try to get their blueprint to a level where they can minimize all of that cost and risk. And you only build the bridge once. So iterating is not really a thing either.
Software is very different; compiling and deploying is relatively cheap and risk free. And typically fully automated. All the effort and risk is contained in the specification process itself. Which is why iteration works.
Architects abandon their sketches and drafts after they've served their purpose. The same is true in waterfall development. The early designs (whiteboard, napking, UML, brainfart on a wiki, etc.) don't matter once the development kicks off. As iterations happen, they fall behind and they just don't matter. Many projects don't have a design phase at all.
The fallacy that software is imperfect as an engineering discipline because we are sloppy with our designs doesn't hold up once you realize that essentially all the effort goes into creating hyper detailed specifications, i.e. the source code.
Having design specifications for your specifications just isn't a thing. Not for buildings, not for software.
One of the biggest productivity improvements I've had as a developer was to make a habit of planning all my work upfront. Specifically, when I pick up a ticket, I break it down into a big bullet point list of TODOs. Doing it this way leads to a better design, to dealing with inter-ticket dependencies upfront, to clarifying the spec upfront (which yes, is part of your job as a senior developer), and most valuable of all it allows me to get into flow state much more regularly when I am programming.
Its not a surprise to me that this approach also helps AI coding agents to work more effectively, as in-depth planning is essentially moving the thinking upfront.
Corporations aren’t people in the literal sense which the 13th amendment uses, nobody ever said they were. They just have the ability to do some people things. They can have a bank account or sign a contract. They cannot vote or enlist or do lots of things people can do. (The technical name is ‘juridicial people’ and what they can or cannot do is spelled out in law quite well.)
Money isn’t speech, and no court ever said it was. The ads you buy with money are speech. What’s the difference between a Fox news editorial show or a right-leaning ad on Fox News? (The answer: who pays for it.) If news organizations are just things owned by people, what makes them more worthy of expressing opinions than other things owned by people? Just because they have “news” in their name?
You just think they’re half-assed because you have the cartoony idea of what they are expressed by media that doesn’t like them. They’re quite sensible.
The Supreme Court was going to decide whatever they wanted, regardless of which linguistic terms were used to describe the underlying legal concepts which remain the same.
If you look at the text of the first amendment, the word "person" doesn't appear in that part. It says "Congress shall make no law... abridging the freedom of speech." It doesn't say that the speech has to come from "persons". So I'd say you're the one misunderstanding here.
I think it was a dumb Supreme Court decision, but I'm not going to pretend it had anything to do with the fact that corporations are called a "legal person" instead of a "legal entity" or some other term that ends up meaning the exact same thing. Disagree with their decision, great. But arguing over legal terminology is a waste of breath.
Since giving up my cell phone entirely over 5 years ago, my productivity, memory, and overall happiness are at the highest levels they have ever been, in my late 30s. I no longer apologize to anyone for this lifestyle choice anymore since the benefits are something everyone deserves, but almost all opt out of today for made up reasons.
I take photos with a pocket mirrorless, and take notes with a notebook. I tell time with a self winding mechanical watch. I pay for things at stores with cash instead of tap to pay. Like a cave man, I know.
I am reachable by internet when I am at my desk, and by landline when I am at home. In an actual emergency dial 911, not me. Otherwise it can probably wait until I am at my desktop or a laptop.
I was already sold on raising kids without smartphones on intuition and lived experience, but study after study point at us having access to all humans, all knowledge, and all entertainment at all times as leading to generally bad mental health and cognitive function outcomes. Our brains were simply not evolved for it.
Whenever I see parents scrolling, and handing a kid a phone as well to pacify them, I wish I could report them for child abuse. I feel like I am watching them be given whiskey or cigarettes, except it is socially acceptable and no one cares.
I hate when people bring up this “billions of years of evolution” idea. It’s completely wrong and deluded in my opinion.
Firstly humans have not been evolving for “billions” of years.
Homo sapiens have been around for maybe 300’000 years, and the “homo” genus has been 2/3 million years. Before that we were chimps etc and that’s 6/7 million years ago.
If you want to look at the entire brain development, ie from mouse like creatures through to apes and then humans that’s 200M years.
If you want to think about generations it’s only 50/75M generations, ie “training loops”.
That’s really not very many.
Also the bigger point is this, for 99.9999% of that time we had no writing, or any kind of complex thinking required.
So our ability to reason about maths, writing, science etc is only in the last 2000-2500 years! Ie only roughly 200 or so generations.
Our brain was not “evolved” to do science, maths etc.
Most of evolution was us running around just killing stuff and eating and having sex. It’s only a tiny tiny amount of time that we’ve been working on maths, science, literature, philosophy.
So actually, these models have a massive, massive amount of training more than humans had to do roughly the same thing but using insane amounts of computing power and energy.
Our brains were evolved for a completely different world and environment and daily life that the life we lead now.
So yes, LLMs are good, but they have been exposed to more data and training time than any human could have unless we lived for 100000 years and still perform worse than we do in most problems!
Interesting day. I've been on an incident bridge since 3AM. Our systems have mostly recovered now with a few back office stragglers fighting for compute.
The biggest miss on our side is that, although we designed a multi-region capable application, we could not run the failover process because our security org migrated us to Identity Center and only put it in us-east-1, hard locking the entire company out of the AWS control plane. By the time we'd gotten the root credentials out of the vault, things were coming back up.
Good reminder that you are only as strong as your weakest link.
Whenever I watch Claude Code or Codex get stuck trying to force a square peg into a round hole and failing over and over it makes me wish that they could feel the creeping sense of uncertainty and dread a human would in that situation after failure after failure.
Which eventually forces you to take a step back and start questioning basic assumptions until (hopefully) you get a spark of realization of the flaws in your original plan, and then recalibrate based on that new understanding and tackle it totally differently.
But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.
Because no big deal, if it’s wrong it’s the human's problem to untangle and Anthropic gets paid either way so why not try?
But it's worse than that. Even if in theory the system could be fixed, we don't actually know how to fix it for real, the way we can fix a normal computer program.
The reason we can't fix them is because we have no idea how they work; and the reason we have no idea how they work is this:
1. The "normal" computer program, which we do understand, implement a neural network
2. This neural network is essentially a different kind of processor. The "actual" computer program for modern deep learning systems is the weights. That is, weights : neural net :: machine language : normal cpu
3. We don't program these weights; we literally summon them out of the mathematical aether by the magic of back-propagation and gradient descent.
This summoning is possible because the "processor" (the neural network architecture) has been designed to be differentiable: for every node we can calculate the slope of the curve with respect to the result we wanted, so we know "The final output for this particular bit was 0.7, but we wanted it to be 1. If this weight in the middle of the network were just a little bit lower, then that particular output would have been a little bit higher, so we'll bump it down a bit."
And that's fundamentally why we can't verify their properties or "fix" them the way we can fix normal computer programs: Because what we program is the neural network; the real program, which runs on top of that network, is summoned and not written.
I've seen senior engineers get fired and the business suffer a setback because they didn't have any way to scale beyond a single low spec VPS from a budget provider, and their system crashed when a hall full of students tried to sign up together during a demo and each triggered 200ms of bcrypt CPU activity.
A summer afternoon, static on the radio, the low hum of an announcer calling balls and strikes like he’s reading scripture in a Midwest church. Baseball used to be stitched together with silence. You heard the game as much in the pauses as in the plays.
Then the voice came in. Once the game hit the airwaves, it slowed. Had to. The ball waited for the broadcast.
Out of the dead-ball fog came the home run. No more bunting, no more clever thefts of second. Now it was swing, admire, trot. Alongside the homers came the walks and the strikeouts. Fewer balls in play. More staring, less running. Time thickened, and the nature of the game was trending towards longer games.
World War II shaved minutes from the clock. With so many players overseas, the talent pool shrank. The games got shorter because they became simpler. When the talent came back, the games got longer, largely because, after 1947, the game was flooded with previously segregated talent and players who were returning from overseas.
In the 60s, pitchers took over. Dominance from the mound. ERAs dropped. Batting averages plummeted. In 1968 they called it the Year of the Pitcher, then called the rulebook to fix it. Scoring came back, and with it, longer games.
Television followed with commercial breaks and camera angles. The game had to pause for sponsors. The seventh-inning stretch now came with a soft drink.
In the 70s, the bullpen became a revolving door. Specialists. Situational matchups. Every pitching change added minutes. Coaches walked the mound like they were heading to confession.
And the game kept expanding. OPS rose. More runners meant more pitches. More strikeouts meant more throws. Every batter became a saga.
If you look at the graph, you can see a trend that matches well with changes in baseball. We could probably break down every high and low to describe the shift based on rules, personal changes, etc.
Then came the pitch clock. No more dawdling. No more meditative pacing between pitches. And now a reliever has to face at least three batters in an inning. No more one-pitch exits.
It’s not that baseball got lazy. It got layered, commercialized, optimized, and strategized, but it forgot about time management.
The graph shows an outline, with the trends representing a chapter in baseball history, which is very cool.
I get why people say its boring but I love it as well. I don't follow it anymore really and if I tune in randomly I feel similarly - it seems boring. It just takes some exposure before you can appreciate it. The emergent narratives within games, series, and seasons is really special.
As Richard Restak postulates in his book “The Naked Brain”[0]: the limbic system provides a gut feeling (usually from comfort) and we rationalise our way backwards from that without being able to really pinpoint “why”; usually the “why” is secondary and only added as justification for the feeling post-decision.
I built a package which I use for large codebase work[0].
It starts with /feature, and takes a description. Then it analyzes the codebase and asks questions.
Once I’ve answered questions, it writes a plan in markdown. There will be 8-10 markdowns files with descriptions of what it wants to do and full code samples.
Then it does a “code critic” step where it looks for errors. Importantly, this code critic is wrong about 60% of the time. I review its critique and erase a bunch of dumb issues it’s invented.
By that point, I have a concise folder of changes along with my original description, and it’s been checked over. Then all I do is say “go” to Claude Code and it’s off to the races doing each specific task.
This helps it keep from going off the rails, and I’m usually confident that the changes it made were the changes I wanted.
I use this workflow a few times per day for all the bigger tasks and then use regular Claude code when I can be pretty specific about what I want done. It’s proven to be a pretty efficient workflow.
It's the internet. When you talk to people online, it often descends into pettiness. When you talk to people in the real world, that rarely happens. But it's much easier to talk online, so people get the wrong impression.
You should talk to strangers. It's never gone wrong for me. Most people have a warmth and agreeableness that comes out when you are there with them, talking about stuff. There's also the interesting effect that people will give you their innermost secrets, knowing you won't tell anyone (I actually met a serial killer who did this, heh). For instance I was on a long haul flight earlier this year, and my neighbour told me everything about her divorce. Like a kind of therapy.
I also find when I have a real disagreement with someone, it's a lot easier when you're face-to-face. For instance, I have friends who are religious, in a real way, ie they actually think there's a god who created the earth and wants us to live a certain way. Being there in person keeps me from ridiculing them like I might on an internet forum, but it also keeps them from condemning me to hell.
So folks, practice talking to people. Much of what's wrong in the current world is actually loneliness, having no outlet for your expressions.
> I’ve often heard, with decent reason, an LLM compared to a junior colleague.
No, they're like an extremely experienced and knowledgeable senior colleague – who drinks heavily on the job. Overconfident, forgetful, sloppy, easily distracted. But you can hire so many of them, so cheaply, and they don't get mad when you fire them!
> We are headed in a direction where written code is no longer a time sink.
Written code has never been a time sink. The actual time that software developers have spent actually writing code has always been a very low percentage of total time.
Figuring out what code to write is a bigger deal. LLMs can help with part of this. Figuring out what's wrong with written code, and figuring out how to change and fix the code, is also a big deal. LLMs can help with a smaller part of this.
> Juniors can onboard faster and more independently with LLMs,
Color me very, very skeptical of this. Juniors previously spent a lot more of their time writing code, and they don't have to do that anymore. On the other hand, that's how they became not-juniors; the feedback loop from writing code and seeing what happened as a result is the point. Skipping part of that breaks the loop. "What the computer wrote didn't work" or "what the computer wrote is too slow" or even to some extent "what the computer wrote was the wrong thing" is so much harder to learn from.
Juniors are screwed.
> LLMs have the ability to lighten cognitive loads and increase productivity,
I'm fascinated to find out where this is true and where it's false. I think it'll be very unevenly distributed. I've seen a lot of silver bullets fired and disintegrate mid-flight, and I'm very doubtful of the latest one in the form of LLMs. I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.
No doubt the math checks out, but I wonder if developer productivity can be quantified that easily. I believe there's a lot of research pointing to people having a somewhat fixed amount of cognitive capacity available per day, and that aligns well with my personal experience. A lot of times, waiting for the computer to finish feels like a micro-break that saves up energy for my next deep thought process.
Just for context and as this article only mentions LASIK and not other options such as (Trans-)PRK and SMILE, the majority of negative side effects one experiences post LASIK are not linked to the ablation/"carving" of the cornea, as they call it, but rather is a result of the need to sever the subbasal nerve plexus in the anterior stroma, which tends to be regenerate in a less comprehensive manner and significantly slower around the margins of the flap compared to other methods.
Flaps aren't inherently dangerous either (flap detachments are very rare, even more so with modern systems that create essentially a cavity where the flap can rest in), but the difference in healing post OP is a lead cause of heightened dry eye after LASIK. Both PRK and SMILE, due to the way they work, are less likely to suffer from this, but every procedure has trade-offs naturally.
With PRK, the epithelium in the area is removed and has to regrow, a process that takes a few days (to get the initial part done, full regrowth takes far longer but isn't noticeable in general). This regrowth can be both rather painful and also rob you of the "instantly perfect sight"-effect many people desire from their laser eye surgery. As the epithelium does regrow naturally however, it is less likely (both in theory and in medical literature) to lead to dry eye and other side effects in the short and long term, making it the preferred choice by many ophthalmologists when choosing such surgery for themselves.
SMILE, on paper, might be able to offer the best of both worlds, but is severely more expensive than either and there is not a sufficient degree of long term research to make a definitive statement that the side effect amount and severity is comparable to PRK, simply because it is rather new. What research is out there is promising though.
Overall, each option is very well tolerated, leads to major QOL improvements and we need to keep in mind that even the more common side effects one may face with LASIK may not affect everyone and still are comparably small considering other medical fields and their elective procedures.
In this context, I'm very excited to see whether this method might have even fewer short and long term side effects than PRK, but like with SMILE, it may take decades to have a conclusive answer.
Edit: Another thing I missed and which was not covered in the article, is the potential that this new method could be applicable to people who, because of a variety of factors, are not eligible for any ablative eye surgery. I myself was at the upper limit for Trans-PRK in regard to the severity of my Myopia and the thickness (or lack there off) of my Epithelium. In that regard, I see far more potential than just reducing already low side-effect risks further.
Node is the biggest impediment to performant Node services. The entire value proposition is "What if you could hire people who write code in the most popular programming language in the world?" Well, guess what
> If you were an engineer would you want to be assigned this project?
If you're high flying, trying to be the next Urs or Jeff Dean or Ian Goodfellow, you wouldn't, but I'm sure there's are many thousands of people who are able to do the job that would just love to work for Google and collect a paycheck on a $150k/yr job and do that for the rest of their lives.
Good for them! I used to walk my dog past their (office? warehouse?) in Emeryville, and when the weather was warm they'd have the doors open and the giant server stacks just sitting there, looking awesome. I guess it's not really a concern that someone will steal something that looks like it'd take a forklift to move.
A decade of "the best smartphone camera competitions" by mkbhd have clearly highlighted what is happening here.
1: In a/b testing, nearly everyone including pixel peepers prefer a more vibrant photo.
2: the traditional perspective of "a photo should look as close as possible to what my eyes see if I drop the viewfinder" is increasingly uncommon and not pursued in the digital age by nearly anyone.
3: phone companies know the above, and basically all of them engage in varrying degrees of "crank vibrance until people start to look like clowns, apply a skin correction so you can keep the rest mega vibrant" with an extra dash of "if culturally accepted to the primary audience, add additional face filtering to improve how people look, including air-brushing and thinning of the face"
This is rightfully compared to the loudness wars and I think that's accurate. It really became a race to the bottom once we collectively decided that "accurate" photos were not interesting and we want "best" photos.
"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.
"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.
It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.
There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.
That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.
But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.