Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The best code I've written with an LLM has been where I architect it, I guide the LLM through the scaffolding and initial proofs of different components, and then I guide it through adding features. Along the way it makes mistakes and I guide it through fixing them. Then when it is slow, I profile and guide it through optimizations.

So in the end, it's code that I know very, very well. I could have written it but it would have taken me about 3x longer when all is said and done. Maybe longer. There are usually parts that have difficult functions but the inputs and outputs of those functions are testable so it doesn't matter so much that you know every detail of the implementation, as long as it is validated.

This is just not junior stuff.



> I could have written it but it would have taken me about 3x longer when all is said and done.

Really does not sound like that from your description. It sounds like coaching a noob, which is a lot of work in itself.

Wasn’t there a study that said that using LLMs makes people feel more productive while they actually are not?


True but the n00b is very fast. A lot of coaching is waiting for the n00b to perform tasks and meta things about motivation. These LLM are extremely fast and eager to work.

I don't need a study to tell me that five projects that have been stuck in slow plodding along waiting for me to ever have time or resources for nearly ten years. But these are now nearing completion after only two months of picking up Claude Code. And with high-quality implementations that were feverdreams.

My background is academic science not professional programming though and the output quality and speed of Claude Code is vastly better than what grad students generate. But you don't trust grad student code either. The major difference here is that suggestions for improvement loop in minutes rather than weeks or months. Claude will get the science wrong, but so do grad students.

(But sure technically they are not finished yet ... but yeah)


100% this. The AI missunderstands and make a mistake? No problem. Clarify and the AI will come back with a rewrite in 30 sec.


A rewrite with another, more subtle mistake. That you must spend energy discovering and diagnosing.


And potentially trick you into going down a rabbit hole as you try to steer it when it would have been faster to edit the code yourself. The best use is editing code with it instead of purely relying on prompting. Also new contexts for different changes make a huge difference. Seems a lot of devs get stuck in the single context chat stream and it starts to break down as context gets fuzzy.


This can go both ways though. I've had instances where after a long somewhat tedious back and forth with Claude Code what I'm looking for finally "clicks" and it starts one-shotting requests left and right. But then after coming up to the end of the context window, I compact or ask it to update CLAUDE.md with what it learned and it loses the nuanced understanding it built up through trial and error and starts screwing up in the exact same ways again with fresh context.

And then I waste 20 minutes bashing my head against the wall trying to write three paragraphs meticulously documenting all the key gotchas and lessons from the "good" context window with the magic combination of words needed to get Claude back in the right head space to one-shot again.

At least if I pull that off I can usually ask Claude to use it as documentation for the project CLAUDE.md with a pretty good success rate. But man it leaves a bad taste in my mouth and makes me question whether I'm actually saving time or well into sunk cost territory...


You’d think that with more context it would get better, but it doesn’t.

My hunches are that the to-and-fro of ideas as you discuss options or make corrections leads to a context with competing “intentions”; and that they can’t tell the difference between positive and negative experiences when it comes across each successive token in the context.

But I don’t make LMMs, so this is pure guesswork.


Quite honestly after using Copilot, Claude Code, Cursor, and Codex extensively for months, I very rarely hand edit code.

The rabbit hole problem is pretty rare. Usually it happens when the model flips into "stupid mode", some kind of context poisoning. If you are experienced, you know to purge the context when that happens.

In personal projects I avoid manual editing as a form of deliberate practice. At work, I only edit when it is a very small edit. I can usually explain what I want more concisely and quickly than hand editing code.

I probably would use more hand editing if I had classic refactoring tools in the IDEs similar to intellij/pycharm. Though cli based tools were a pleasant surprise once I actively started using them.


> The best use is editing code with it instead of purely relying on prompting.

What does this look like in practice?


How is that different from working with a n00b except that it only took 30sec to get to the next bug rather than a week?


The junior engineer will grow into a senior engineer


Not as fast as models get better.

And not at all for you, because you're unlikely to retain them for long. Which makes this immaterial - AI or human, you're only going to delegate to n00bs.

(This is distinct from the question of which benefits society more, which is a separate discussion.)


> The junior engineer will grow into a senior engineer

And then quit after accepting a new job that pays them their modified value, because tech companies are particularly bad at proactive retention.


Which has nothing to do with the argument you're responding to.


Only if you take the argument entirely out of context.


.. and because the job and environment weren't that pleasant or rewarding to offset that delta in income offered elsewhere at an equally drab employer


And then they get their own junior engineers and you get fresh new junior engineers.


> another, more subtle mistake. That you must spend energy discovering and diagnosing

But this is literally what senior engineers do most of the time? Have juniors write code with direction and review that it isn't buggy?


Except that most of the code seniors review was written with intention, not just the most statistically most likely response to a given query. As a senior engineer, the kinds of mistakes that AI makes are much more bizarre then the mistakes junior engineers make


I've worked with many interns and juniors in my life and they've made very bizarre mistakes and had subtle bugs, so the difference in the kinds hasn't made much of a difference in the work I've had to do to review. Whether or not there was intention behind it didn't make a difference.


>not just the most statistically most likely response to a given query.

Some people really are going to hang on until the better end (and beyond) eh?

"AI can't code like me!" people are going to get crushed.


I’ve definitely seen absolutely bizarre creations from junior devs decades ago (so, well before modern AI). I can also think back to some cringey code I wrote by hand when I was a junior as well.

I mentor high-school students and watch them live write code that takes a completely bizarre path. It might technically be intentional, but that doesn’t mean it’s good or useful.


> Except that most of the code seniors review was written with intention, not just the most statistically most likely response to a given query.

Given the nature of the statistics in question, the line between the two is extremely blurry at this point.


My experience in the UK has been that seniors and principles just write more/better code and occasionally deal with “the business” with (justifiably) greater attention paid to their input. I’d love to see a proper culture of mentoring.


LLMs might make you feel faster (which helps with motivation!) and help with some of the very easy stuff but the critical part of your anecdote is that you haven't actually completed the extra work. The projects are only "NEARING" completion. I think that's very telling.


Congratulations! You repeated my joke? lol

But in all seriousness, completion is not the only metric of productivity. I could easily break it down into a mountain of subtasks that have been fully completed for the bean counters. In the meantime, the code that did not exist 2 months ago does exist.


If the easy things are done faster you xan spend more time on the hard stuff. No need to spend 2 hours on making the UI for the MVP when an AI can make a decent UI in 2 min. Means you have 2 hours more to spend on the hard stuff.


Unless, as is often the case in my experience, the hard stuff consists largely of fixing bugs and edge cases in your implementation of the easy stuff. I've seen multiple people already end up forced back to the drawing board because their "easy stuff" AI implementation had critical flaws they only realized after they thought they were done. It's hard to prove counterfactuals, but I'm pretty confident they would have gotten it right the first time if they hadn't used AI, they're not bad engineers.


> I don't need a study to tell me that five projects that have been stuck in slow plodding along waiting for me to ever have time or resources for nearly ten years.

that's the issue in the argument though. it could be that those projects would also have been completed in the same time if you had simply started working on them. but honestly, if it makes you feel productive to the point you're doing more work than you would do without the drug, I'd say keep taking it. watch out for side effects and habituation though.


You've added an implicit assumption that this person spends more time programming now than they used to, rather than continuing to commit time at the same rate but now leading to projects being completed when they previously got bogged down and abandoned.

There are any number of things you could add to get you to any conclusion. Better to discuss what is there.

I've had the same experience of being able to finish tons of old abandoned projects with AI assistance, and I am not spending any more time than usual working on programming or design projects. It's just that the most boring things that would have taken weeks to figure out and do (instead, let me switch to the other project I have that is not like that, yet) have been reduced to hours. The parts that were tough in a creative fun way are still tough, and AI barely helps with them because it is extremely stupid, but those are the funnest, most substantive parts.


> if you had simply started working on them.

The magic of Claude is that you can simply start.


You can also simply start without Claude too!


Yes, but less often.


Yes, because Claude makes you feel more productive and thus you are more eager. I feel like we've covered this further up.


I don't think that's correct. That could be true if I were primarily a programmer, but I am not. I'm mostly a certified medical physicist working in a hospital. Programming is a skill that is helpful and I have spent my programming time building other tools that I need. But that list is gigantic, the software that is available for purchase is all complete crap, the market is too small for investment, etc. That's all to say the things I am building are desperately needed but my time for programming is limited and it's not what brings home the bacon and there's no money to be made (beyond consulting, essentially these things might possibly work as tools for consultants). I don't have resources for professional programming staff but I have worked with them in the past and (no offense to most of HN) but the lack of domain knowledge tends to waste even more of my time.


You are very fortunately in the perfect slot for where LLM has a lot of bang for the buck.


It is in many ways much like coaching a n00b, but a n00b that can do 10 hours of n00b work in 10 minutes (or, 2 minutes).

That's a significant difference. There are a lot of tasks that can be done by a n00b with some advice, especially when you can say "copy the pattern when I did this same basic thing here and here".

And there are a lot of things a n00b, or an LLM, can't do.

The study you reference was real, and I am not surprised — because accurately gauging the productivity win, or loss, obtained by using LLMs in real production coding workflows is also not junior stuff.


"Really does not sound like that from your description. It sounds like coaching a noob, which is a lot of work in itself."

And if this is true, you will have to coach AI each time whereas a person should advance over time.


At least you can ask AI to summarize a AGENT.md or something and it will read it diligently next time.

As for humans, they might not have the motivation technical writing skill to document what they learnt. And even if they did, the next person might not have the patience to actually read it.


"Read diligently" - that’s a very optimistic statement. I can not count how many times Claude (LLM I am most familiar with, I had it write probably about 100KLOC in the past few months) explicitly disobeyed what was written in the instructions.

Also, a good few times, if it were a human doing the task, I would have said they both failed to follow the instructions and lied about it and attempted to pretend they didn’t. Luckily their lying abilities today are primitive, so it’s easy to catch.


Psychopatic behavior seems to be a major problem for these (of course it doesn't think so it can't be called that but that's the closest term that fits). They are trained to arrive at the result, and if the most likely path to it is faking it and lying about it, then that's what you are getting. And if you find it, it will cheerfully admit it and try to make s better lie that you'd believe.


So true. I have some non-typical preferences for code style. One example is I don’t like nested error checks in Go. It’s not a correctness issue, it’s just a readability preference. Claude and copilot continually ignore this no matter how much emphasis I give it in the instructions. I recently found a linter for this, and the agent will fix it when the linter points out the issue.

This is probably because the llm is trained on millions of lines of Go with nested error checks vs a few lines of contrary instructions in the instructions file.

I keep fighting this because I want to understand my tools, not because I care that much about this one preference.


Claude has really gone downhill in the last month or so. They made a change to move the CLAUDE.md from the system prompt to being occasionally read in, and it really deprioritizes the instructions to the same attention level as the code it's working on.

I've been trying out Codex the last couple days and it's much more adherent and much less prone to lying and laziness. Anthropic says they're working on a significant release in Claude Code, but I'd much rather have them just revert back to the system as it was ~a month ago.


Claude is cooked. GPT5 codex is a much stronger model, and the codex cli is much more performant/robust than cc (even if it has fewer features).

I've never had a model lie to me as much as Claude. It's insane.


true, I was using Cline/Roocode from almost an year and it always made sure to read things from memory-bank which i really liked. Claude has gone downhill from August mid for me and often it doesn't follow instructions from claude.md or forget things mid-way.


> Also, a good few times, if it were a human doing the task, I would have said they both failed to follow the instructions and lied about it and attempted to pretend they didn’t.

It's funny. Just yesterday I had the experience of attending a concert under the strong — yet entirely mistaken — belief that I had already been to a previous performance of the same musician. It was only on the way back from the show, talking with my partner who attended with me (and who had seen this musician live before), trying to figure out what time exactly "we" had last seen them, with me exhaustively listing out recollections that turned out to be other (confusingly similar) musicians we had seen live together... that I finally realized I had never actually been to one of this particular musician's concerts before.

I think this is precisely the "experience" of being one of these LLMs. Except that, where I had a phantom "interpolated" memory of seeing a musician I had never actually seen, these LLMs have phantom actually-interpolated memories of performing skills they have never actually themselves performed.

Coding LLMs are trained to replicate pair-programming-esque conversations between people who actually do have these skills, and are performing them... but where those conversations don't lay out the thinking involved in all the many implicit (thinking, probing, checking, recalling) micro-skills involved in actually performing those skills. Instead, all you get in such a conversation thread is the conclusion each person reaches after applying those micro-skills.

And this leads to the LLM thinking it "has" a given skill... even though it doesn't actually know anything about "how" to execute that skill, in terms of the micro-skills that are used "off-screen" to come up with the final response given in the conversation. Instead, it just comes up with a prediction for "what someone using the skill" looks like... and thinks that that means it has used the skill.

Even after a hole is poked in its use of the skill, and it realizes it made a mistake, that doesn't dissuade it from the belief that it has the given skill. Just like, even after I asked my partner about the show I recall us attending, and she told me that that was a show for a different (but similar) musician, I still thought I had gone to the show.

It took me exhausting all possibilities for times I could have seen this musician before, to get me to even hypothesize that maybe I hadn't.

And it would likely take similarly exhaustive disproof (over hundreds of exchanges) to get an LLM to truly "internalize" that it doesn't actually have a skill it believed itself to have, and so stop trying to use it. (If that meta-skill is even a thing that LLMs have ever learned from their training data — which I doubt. And even if they did, you'd be wasting 90% of a Transformer's context window on this. Maybe something that's worth keeping in mind if we ever switch back to basing our LLMs on RNNs with true runtime weight updates, though!)


I find the summaries to be helpful. However, I find some of the detailed points to lack a deep understanding of technical points and their importance.


And then they skip to another job for more money, and you start again with a new hire.


Thankfully after many generations of human interactions and complex analysis of group dynamics, we've found a solution. It's called 'don't be an asshole' and 'pay people competitively'.

edit: because people are stupid, 'competitively' in this sense isn't some theoretical number pulled from an average, it's 'does this person feel better off financially working with you than others around them who don't work with you, and is is this person meeting their own personal financial goals through working with you'?


The common corporate policy of making it harder to give raises than to increase starting salaries for new hires is insane.


Is it insane? Makes perfect sense. Employee has way less leverage at raise time. It’s all about leverage. It sucks, but that is the reality


The elephant in this particular room is that there are a tiny handful of employers that have so much money that they can and do just pay whatever amount is more than any of their competitors can possibly afford.


That shouldn't be a big deal since they're a finite portion of the market. You should have a robust enough model to handle people leaving, including unavoidable scenarios like retirement and death.


They do have a point. Why waste a time on person who will always need more money over time, rather than invest in AI? Not only you don’t need to please every hire, your seniors will be more thankful too, because they will get linearly faster with time.


Outside of working for Antropic etc., there's no way you can make an LLM better at anything. You can train a junior though.


You absolutely can. It's a skill like operating IDE, CLI, etc.

Junior is a person, not your personal assistant like LLM.


You can def provide better context etc.


The person paying and the one responsible for coaching others usually aren't same


Mentor usually has no power over the compensation for the mentee.

Also it is never a policy to pay competitively for the existing employees, only for the new hires.


That's not a bad thing. It means you've added one more senior to the societal pool. A lot of the talent problems today are due to companies not wanting to train and focusing on cheap shortcut options like outsourcing or H1B


The AI in this example is 1/100 the cost.


that is absolutely false - the capital and resources used to create these things are societal scale. An individual consumer is not paying that cost at this time.


You can make the same argument about humans. The employeer doesnt pay the full cost and time to create the worker from an embryo to a senior dev.


Unless you are advocating for executing developers when they are no longer capable of working, that’s a bit of a non sequitur.

Humans aren’t tools.


That only proves the point. If something increases the value of someone’s time by 5% and 500,000,000 people are affected by it, the cost will collapse.

These models are only going to get better and cheaper per watt.


> These models are only going to get better and cheaper per watt.

What do you base this claim on? They have only gotten exponentially more expensive for decreasing gain so far - quite the opposite of what you say.


For now, not including externalities.


> Really does not sound like that from your description. It sounds like coaching a noob, which is a lot of work in itself.

Even if you do it by yourself, you need to do the same thinking and iterative process by yourself. You just get the code almost instantly and mostly correctly, if you are good at defining the initial specification.


This. You _have_ to write the spec. The result is that instead of spending X units of time on spec and THEN y units of time on coding, you get the whole thing in x units of time AND you have a spec.

The trick is knowning where the particular LLM sucks. I expect in a short amount of time there is no productivity gain but when you start to understand the limitations and strengths - holey moley.


> The result is that instead of spending X units of time on spec and THEN y units of time on coding, you get the whole thing in x units of time AND you have a spec.

It's more like x units of time thinking and y units of times coding, whereas I see people spend x/2 thinking, x typing the specs, y correcting the specs, and y giving up and correcting the code.


Sure! That's inefficient. I know just how I work and I've been writing the type of programs I do for quite many years. And I know what would take me normally a week takes me few days at best.


But if you can’t define the specs, you can’t write the code without LLM either?


Unless you realize no LLM is good at what you need and you just wasted weeks of time walking in circles.


If you notice only after two weeks the project is off-kilter I would guess the chance of that happening without an LLM is not low either.

These are not _tools_ -they are like cool demos. Once you have a certain mass of functional code in place, intuition - that for myself required decades of programming to develop - kicks in and you get these spider sense tinglings ”ahh umm this does not feel right, something’s wrong”.

My advice would be don’t use LLM until you have the ”spider-sense” level intuition.


> Wasn’t there a study that said that using LLMs makes people feel more productive while they actually are not?

On a tangent; that study is brought up a lot. There are some issues with it, but I agree with the main takeaway to be weary of the feeling of productivity vs actual productivity.

But most of the time its brought up by AI skeptics, that conveniently gloss over the fact it's about averages.

Which, while organizationally interesting, is far less interesting than to discover what is and isn't currently possible at the tail end by the most skillful users.


The key insight from the study is that even the users that did see an increase in productivity overestimated that increase.

Taken along with the dozens of other studies that show that humans are terrible at estimating how long it will take them to complete task, you should be very skeptical when someone says an LLM makes them x% more productive.

There’s no reason to think that the most skillful LLM users are not overestimating productivity benefits as well.


Engineers have always been terrible at measuring productivity. Building a new internal tool or writing a bunch of code is not necessarily productive.

Productivity is something that creates business value. In that sense an engineer who writes 10 lines of code but that code solves a $10M business problem or allows the company to sign 100 new customers may be the most productive engineer in your organization.


Not to mention the study doesn't really show a lack of productivity and they include some key caveats in it outlining how they think productivity increases using LLMs


It's this, but 1000 times faster — that's the difference. It's sending a noob away to follow your exact instructions and getting results back in 10 seconds instead of 10 hours.

I don't have to worry about managing the noob's emotions or their availability, I can tell the LLM to try 3 different approaches and it only takes a few minutes... I can get mad at it and say "fuck it I'll do this part myself", the LLM doesn't have to be reminded of our workflow or formatting (I just tell the LLM once)

I can tell it that I see a code smell and it will usually have an idea of what I'm talking about and attempt to correct, little explanation needed

The LLM can also: do tons of research in a short amount of time, traverse the codebase and answer questions for me, etc

it's a noob savant

It's no replacement for a competent person, but it's a very useful assistant


I built a new service recently.

It has about a dozen or so endpoints, facilitating real time messaging.

It took me about 4 hours to build it out, fully tested with documentation and examples and readme.

About two hours were spent setting up the architecture and tests. About 45 min to an hour setting up a few of the endpoints. The rest were generated by CC. FWIW it is using layers and SRP to the max. Everything is simple and isolated, easy to test.

I think if I had a contractor or employee do this they would have coasted for a week and still fucked it up. Adding ridiculous complexity or just fucking up.

The nice thing about AI tools is you need less people. Most people are awful at their jobs, anyone can survive a few years and call themselves senior. Most teams are only successful because of the 1 or 2 guys who pull 150% while the others are barely doing 80%.


Anecdotally, on green field projects where you are exploring a new domain - it’s an insanely productive experience. On mundane day to day tasks it probably takes more time, but feels like less mental bandwidth.

Coding at full throttle is a very intensive task that requires deep focus. There are many days that I simply don’t have that in me.


There was one study that said that in a specific setting and was amplified heavily on forums by anti-AI people.

There have been many more studies showing productivity gains across a variety of tasks that preceded that one.

That study wasn't necessarily wrong about the specific methodology they had for onboarding people to use AI. But if I remember correctly it was funded by an organization that was slightly skeptical of AI.


I don't understand why anyone would believe a study on anything AI at this point. I don't believe anyone can quantify software development productivity much less measure the impact from AI


If anyone actually reads the study they'll see that even the authors of that study admit LLMs will increase productivity and there's a lot more to come.


which studies show this?


Here are some from the last few months:

AI coding assistant trial: UK public sector findings report: https://www.gov.uk/government/publications/ai-coding-assista... - UK government. "GDS ran a trial of AI coding assistants (AICAs) across government from November 2024 to February 2025. [...] Trial participants saved an average of 56 minutes a working day when using AICAs"

Human + AI in Accounting: Early Evidence from the Field: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5240924 - "We document significant productivity gains among AI adopters, including a 55% increase in weekly client support and a reallocation of approximately 8.5% of accountant time from routine data entry toward high-value tasks such as business communication and quality assurance."

OECD: The effects of generative AI on productivity, innovation and entrepreneurship: https://www.oecd.org/en/publications/the-effects-of-generati... - "Generative AI has proven particularly effective in automating tasks that are well-defined and have clear objectives, notably including some writing and coding tasks. It can also play a critical role for skill development and business model transformation, where it can serve as a catalyst for personalised learning and organisational efficiency gains, respectively [...] However, these potential gains are not without challenges. Trust in AI-generated outputs and a deep understanding of its limitations are crucial to leverage the potential of the technology. The reviewed experiments highlight the ongoing need for human expertise and oversight to ensure that generative AI remains a valuable tool in creative, operational and technical processes rather than a substitute for authentic human creativity and knowledge, especially in the longer term.".


That was a treat to explore. All of those are based on self-assessment surveys or toy problems. The UK report reads:

> On average, users reported time savings of 56 minutes per working day [...] It is also possible that survey respondents overestimated time saved due to optimism bias.

Yet in conclusion, this self-reported figure is stated as an independently observed fact. When people without ADHD take stimulants they also self-report increased productivity, higher accuracy, and faster task completion but all objective measurements are negatively affected.

The OECD paper supports their programming-related findings with the following gems:

- A study that measures productivity by the time needed to implement a "hello world" of HTTP servers [27]

- A study that measures productivity by the number of lines of code produced [28]

- A study co-authored by Microsoft that measures productivity of Microsoft employees using Microsoft Copilot by the number of pull requests they create. Then the code is reviewed by their Microsoft coworkers and the quality of those PRs is judged by the acceptance rate of those PRs. Unbelievably, the code quality doesn't only remain the same, it goes up! [30]

- An inspirational pro-AI paper co-authored by GitHub and Microsoft that's "shining a light on the importance of AI" aimed at "managers and policy-makers". [31]


> Yet in conclusion, this self-reported figure is stated as an independently observed fact. When people without ADHD take stimulants they also self-report increased productivity, higher accuracy, and faster task completion but all objective measurements are negatively affected.

Interesting analogy, because all those studies with objective measurements are defied by US students year by year, come finals seasons.


Yeah, they take them because they get high. Believing in things that are unsupported by empirical evidence is in the domain of religion, not science.


You can't really get high much on prescription-level dosages - that quickly gets tricky logistically and prohibitively expensive. People who look for highs go to the street for a reason.


You pointed out that students abuse stimulants for finals in spite of the evidence, were you just expanding on what I originally said, or is that meant to serve as counter-evidence implying that the research is wrong?

Regardless, I'm not saying it's a cheap or practical to get high this way, especially over the long term. People probably try stimulants because folk wisdom tells them that they'll get better grades. Then they get high and they feel like a superman from the dopamine rush, so they keep using them because they think it's materially improving their grades but really they're just getting high.


Have you seen any studies on this topic that you find credible?


I haven't seen any but I also don't follow the research that closely.


The study you are alluding to is this one by METR (Model Evaluation & Threat Research):

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

- https://arxiv.org/abs/2507.09089

""" Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%—AI tooling slowed developers down. This slowdown also contradicts predictions from experts in economics (39% shorter) and ML (38% shorter). """


Sure it’s a lot of work, but the noob in question has all the internet knowledge and can write multiples times faster than a human for a fraction of the costs. This is not about an individual being more productive, this is about business costs. Long term we should still hire and train juniors obviously, but short term there is lot of pressure to not do it as it makes no sense financially. Study or not the reality is there is not much difference in productivity between a senior with a cursor license and a senior and a junior that needs heavy guidance.


Code is a liability. You always want less of it. Typing faster does not particularly help. Unless the tool is verbose, then you fix the tool.


> Wasn’t there a study that said that using LLMs makes people feel more productive while they actually are not?

Curious about specifics of this study. Because in general, how one feels is critical to productivity. It's hard to become more productive when the work is less and less rewarding. The infamous "zone" / "flow state" involves, by its very definition, feeling of increasing productivity being continuously reinforced on a minutes-by-minutes level. Etc.


perceived result: 20% faster

actual result: 20% slower

link: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...


Personally this might explain why I feel so demoralized by LLMs

I hate the experience of trying to write code with them. I like to just type my thoughts into the files.

I hate trying to involve the LLM, even as a search. I want my search to feel like looking up references not having a conversation with a robot

Overall, for me, the whole experience of trying to code with LLMs is both frustrating and unrewarding

And it definitely doesn't seem more efficient or faster either


LLMs make two people more productive, the person that uses the LLM, and then the person that cleans up the mess.


My buggy executive function frequently gets in the way of putting code to screen. You know how hacker news has that lil timeout setting to pseudo force you to disengage from it? AI made it so I don't need anything like that. It is digital Adderall.


You aren't wrong in the coaching but, but feedback loops are orders of magnitude faster.

It takes an LLM 2-20 minutes to give me the next stage of output not 1-2 days (week?). As a result, I have higher context the entire time so my side of the iteration is maybe 10x faster too.


Everyone using that study to prove LLMs are bad hasn't actually read the study.


I don’t think anyone said anything about “good VS bad”, just about actual VS reported productivity impacts.


I am so tired of this style of "don't believe your lying eyes" conjecture.

I'm a career coder and I used LLMs primarily to rapidly produce code for domains that I don't have deep experience in. Instead of spending days or weeks getting up to speed on an SDK I might need once, I have a pair programmer that doesn't check their phone or need to pick up their kids at 4:30pm.

If you don't want to use LLMs, nobody is forcing you. Burning energy trying to convince people to whom the benefits of LLMs are self-evident many times over that they are imagining things is insulting the intelligence of everyone in the conversation.


> used LLMs primarily to rapidly produce code for domains that I don't have deep experience in

You’re either trusting the LLM or you still have to pay the cost of getting the experience you don’t have. So in either case you’re not going too much faster - the formers cost not being apparent until it’s much more expensive later on.

Edit: assuming you don’t struggle with typing speed, basic syntax, APIs etc. These are not significant cost reductions for experts, though they are for juniors.


Correct. In areas you yourself are a junior engineer, you’ll be more effective with an LLM at tackling that area maybe. It’s also surprisingly effective at executing refactors.


I'm not sure which one of us is ultimately more hung up on titles in this context, but I would push back and say that when someone with 30+ years experience tackling software problems delegates navigating the details of an API to an LLM, that is roughly the most "senior developer" moment of the day.

Conflating experience and instinct with knowing everything isn't just false equivalency, it's backwards.


I really don’t know what I said that was such an emotional trigger for you. All I said is that it’s an accelerant for you when you leave your domain. Like for example I’m a systems engineer. I hate coding UIs but with the LLM I can pump out a UI quickly and this was true both for web code and a GUI I built with dioxus. The UI code was also cleaner because I had some sense of how it should be structured and asked the AI to cleanup that structure. But ultimately it did most of the work in response to high level prompts and I picked and chose what to review line by line vs vibe coding.

That’s what I mean - by myself it would have taken me easily 10x longer if not worse because UI coding for me is a slog + there’s nuances about reactive coding + getting started is also a hurdle. The output of the code was still high quality because I knew when the LLM wasn’t making the choices I wanted it to make.


I can tell you exactly: it's your framing of relying on an LLM (or any outside assistance, including humans) as temporarily becoming "junior".

I feel strongly that delegation to strengths is one of the most obvious signs of experience.

Apologies for getting hung up on what might seem like trivial details, but when discussing on a text forum, word choices matter.


An experienced UI developer probably would have still been faster than I. That puts me closer into the junior camp (eg I wouldn’t really know where to start and just start by stumbling around) when I’m by myself but an LLM lets me get back closer to my level of expertise and velocity.


We might just have to agree to disagree. I believe that an experienced developer brings instincts and stacked skills even to domains where they have never touched.

In other words, I don't think that you temporarily regress to "junior" just because you're working on something new. You still have a profound fundamental understanding of how technology works and what to expect in different situations.

This reminds me of the classic "what happens when you type google.com into a web browser" question, with its nearly infinite layers of abstraction from keyboard switches to rendering a document with calls to a display driver and photons hitting your visual receptors.

We might just be quibbling over terminology, however.


> If you don't want to use LLMs, nobody is forcing you. Burning energy trying to convince people to whom the benefits of LLMs are self-evident many times over that they are imagining things is insulting the intelligence of everyone in the conversation.

Hey man, I don't bother trying to convince them because it's just going to increase my job security.

Refusing to use LLMs or thinking they're bad is just FUD and it's the same as people that prefer to use nano/vim over an IDE or it's the same as people that say "hur dur cloud is just somebody else's computer"

It's best to ignore and just leave them in the dust.


I’ve found LLMs are most useful when I know what I want to do but just don’t want to type it all out. My best success so far was an LLM saving me about 1,000 lines of typing and fixing syntax mistakes on a web component plus backend in a proprietary framework.


Yep, and the productivity of LLMs means that experienced developers can go from idea to implementation way faster. But first someone has to understand and define a solid structure. And later someone needs to review, test, and integrate the code into this framework. This is hard stuff. Arguably harder than writing code in the first place!

It's no wonder inexperienced developers don't get as much out of it. They define a vague structure, full of problems, but the sycophantic AI will spew out conformant code anyways. Garbage in, garbage out. Bad ideas + fast code gen is just not very productive in the long term - LLMs have made the quality of ideas matter again!


It takes experience to be able to push back.


I can see how this workflow made the senior developer faster. At the same time, work mentoring the AI strikes me as less valuable then the same time spent mentoring a junior developer. If this ends up encouraging an ever widening gap between the skill levels of juniors and seniors, I think that would be bad for the field, overall.

Getting that kind of data is difficult, right now it's just something I worry about.


I don't think it replaces a junior, but it raises the bar for the potential that a a junior would need to show early, for exactly the reason you mention. A junior will now need to be a potential senior.

The juniors that are in trouble are the low-potential workhorse folks who really aren't motivated but happened to get skilled up in a workshop or technical school. They hopped on the coding wagon as a lucrative career change, not because they loved it.

Those folks are in trouble and should move on to the next trend... which ironically is probably saying you can wrangle AI.


I would spend time mentoring a junior, but I don't have one so I work with AI. It was the company's decision, but when they asked me "who can continue developing and supporting system X" the answer is "the nobody that you provided". When you cut corners on growing juniors, you reap what you sow.


> work mentoring the AI strikes me as less valuable then the same time spent mentoring a junior developer

But where can you just "mentor" a junior? Hiring people is not so easy, especially not ones that are worth mentoring. Not every junior will be a willing, good recipient of mentoring, and that's if you manage to get one, given budget constraints and long lead times on hiring. And at best you end up with one or two; with parallel LLMs, you can have almost entire teams of people working for you.

I'm not arguing for replacing juniors - I worry about the same thing you do - but I can see why companies are so eager to use AI, especially smaller startups that don't have the budgets and manpower to hire people.


If a junior is not willing to learn and grow, there is no future for that person in the organization. "Forever junior" is not a valid job title. Better not hire someone that is not good enough than having to deal with the consequences, I learned from my past mistakes.


Of course, and that's why it's not a simple choice between using AI and hiring a junior. Hiring and mentoring a junior is a lot more work for an uncertain payoff.


The junior could use the LLM as a crutch to learn what to learn. Whatever output the LLM gave them, they could examine or ask the LLM to explain. Don't put into production anything you don't understand.

Though I'm extremely well versed in Python, I'm right now writing a Python Qt application with Claude. Every single Qt function or object that I use, I read the documentation for.


It's a classic short-term gain outlook for these companies.


Ya the early "studies" that said AI would benefit low skill more than senior never seem grounded in reality.

Coding with AI is like having a team of juniors that can complete their assignments in a few minutes instead of days. The more clear your instructions, the closer it is to what you wanted, but there are almost always changes needed.

Unfortunately it really does make the junior dev position redundant (although this may prove to be very short-sighted when all the SR devs retire).


I think the idea was that LLMs can allow someone who has no idea how to code, to write a prompt that can in fact output some working code. This is greatly raising their skill floor, as opposed to a senior where at best it’s doing something they already can do, just faster.

The elephant in the room being that if you aren’t senior enough to have written the code you’ll probably run into a catastrophic bug that you are incapable of fixing (or prompting the LLM to fix) very very quickly.

Really it’s just the next iteration of no-code hype where people dream of building apps without code, but then reality always come back to the fact that the essential skill of programmers is to understand and design highly structured and rigid logical systems. Code is just a means of specification. LLMs make it easier to leverage code patterns that have been written over and over by the hundreds of thousands of programmers that have contributed to its training corpus, but they can not replace the precision of thought needed to make a hand-wavy idea into a concrete system that actually behaves in a way that humans find useful.


I've never worked anywhere where the role of a Sr was to glue together a bunch of small pieces written by a team of Jr devs.

I've only worked places where Jr's were given roughly the same scope of work as a mid-level dev but on non-critical projects where they could take as much time as necessary and where mistakes would have a very small blast radius.

That type of Jr work has not been made redundant - although I suppose now its possible for a PM or designer to do that work instead (but if your PMs are providing more value by vibe coding non-critical features than by doing their PM work maybe you don't really need a PM?)


If worked with many companyies that have only 1 SR dev per team. The SR typically spends at least half their time overseeing the work of the rest of the team... Not saying this is a good structure but it is common.


>... it doesn't matter so much that you know every detail of the implementation, as long as it is validated.

What makes me nervous is when we generate both the implementation and the test cases. In what sense is this validation?


My last attempt had passing tests. It even exercised the code that it wrote! But, upon careful inspection, the test assertions were essentially checking that true equalled true, and the errors in the code didn't fail the tests at all.

Attempting to guide it to fixing the errors just introduced novel errors that it didn't make the first time around.

This is not what I signed up for.


Byzantine Incompleteness enters the chat.

Either you go formal, or you test the tests, and then test those ...


Would it have actually taken you 3x longer?

I am surprising myself these days with how fast I'm being using AI as a glorified Stack Overflow.

We are also having studies and posts come out that when actually tried side-by-side, the AI writes the coding route is slower, though the developer percieves it as faster.


I am not the biggest fan of LLMs but I have to admit that, as long as you understand what the technology is and how it works, it is a very powerful tool.

I think the mixed reports on utility have a lot to do with the very different ways the tool is used and how much 'magic' the end-user expects versus how much the end-user expects to guide the tool to do the work.

To get the best out of it, you do have to provide significant amount of scaffolding (though it can help with that too). If you're just pointing it at a codebase and expecting it to figure it out, you're going to have mixed results at best. If you guide it well, it can save a significant amount of manual effort and time.


> (though it can help with that too)

Yeah, this is a big thing I'm noticing a lot of people miss.

I have tons of people ask me "how do I get claude to do <whatever>?"

"Ask claude" is the only response I can give.

You can get the LLM to help you figure out how to get to your goal and write the right prompt before you even ask the LLM to get to your goal.


yeah very few months I try to have it “just do magic” again and I re-learn the lesson. Like, I’ll just say “optimize this shader!” and plug it in blind.

It doesn’t work. The only way it could is if the LLM has a testing loop itself. I guess in web world it could, but in my world of game dev, not so much.

So I stick with the method I outlined in OP and it is sometimes useful.


I can imagine it often being the case that if you measure a concise moderately difficult task over half a day or a few days, coding by hand might be faster.

But I think, and this is just conjecture, that if you measure over a longer timespan, the ai assisted route will be consistently faster.

And for me, this is down to momentum and stamina. Paired with the ai, I’m much more forward looking, always anticipating the next architectural challenge and filling in upcoming knowledge and resource gaps. Without the ai, I would be expending much more energy on managing people and writing code myself. I would be much more stop-and-start as I pause, take stock, deal with human and team issues, and rebuild my capacity for difficult abstract thinking.

Paired with a good ai agent and if I consistently avoid the well known pitfalls of said agent, development feels like it has the pace of cross country skiing, a long pleasant steady and satisfying burn.


> the AI writes the coding route is slower, though the developer percieves it as faster.

I have this pattern while driving.

Using the main roads, when there is little to no traffic, the commute is objectively, measurably the fastest.

However, during peak hours, I find myself in traffic jams, so I divert to squiggly country roads which are both slower and longer, but at least I’m moving all the time.

The thing is, when I did have to take the main road during the peak traffic, the difference between it and squiggly country roads was like two to three minutes at worst, and not half an hour like I was afraid it would be. Sure, ten minutes crawling or standing felt like an hour.

Maybe coding with LLMs makes you think you are doing something productive the whole time, but the actual output is little different from the old way? But hey, at least it’s not like you’re twiddling your thumbs for hours, and the bossware measuring your productivity by your keyboard and mouse activity is happy!


the best use of LLM is to help you program a digispark USB dongle with Arduino to make it emulate mouse movement and scrolling to appease the idi0tic spyware retarded boomers monitor like it was relevant somehow on their corporate spy box.

Meanwhile we adults can do real work on a separate real computer. Never use their laptop more than absolutely minimum possible.


Well, you might have several obstacles here:

1) the bossware might take screenshots too

2) the bosses pay for the whole LLM so they expect you to use the whole LLM

3) you may not want to contaminate your spare computer with whatever shit you're working on on job, and indeed it may be considered a breach of security (as if feeding OpenAI/Anthropic isn't, lol, but that's beside the point.

So you continue to feel miserable, but you get your paycheck, and it's better than unemployment, and your kids are fed and clothed, so there's that.


The truth of it is that when I code with an LLM I scope the work up to include parts that would be a stretch for me to implement. I know what I want them to do, I know where I could find the info to write the code, but the LLM can just spit it out and if it's validate-able, then great, on to the next.

If I were to attack the same system myself without any LLM assist, I'd make a lot of choices to optimize for my speed and knowledge base. The code would end up much simpler. For something that would be handed off to another person (including future me) that can be a win. But if the system is self contained then going bigger and fancier in that moment can be a win. It all depends on the exact goals.

All in all, there's a lot of nuance to this stuff and it's probably not really replacing anyone except people who A) aren't that skilled to start with and B) spend more time yelling about how bad AI is than actually digging in and trying stuff.


A junior would see the solution works and create a PR. A senior knows it works, why it works and what can be improved, then they open a PR.

AI is great at a first draft of anything, code, images, text, but the real skill is turning that first draft into something good.


I don't see this a problem of seniority but one of mindset. I've met enough "senior devs" that will push just about anything and curious juniors that are much more selective about their working process.


In the age of high interest rates everyone is pushing quantity over quality


I fail to see the causality.


High interest rates bring layoffs. Layoffs require performance, or at least perceived performance


I believe senior here means experienced, not older.


IMHO, not really, if you know what you want.

There will always be small things to fix, but if there needs to be a second draft, I would hazard that the PR was too big all along: a problem whether an AI is involved or not.


I usually ask it to build a feature based on a specification I wrote. If it is not exactly right, it is often the case that editing it myself is faster than iterating with the ai, which has sometimes put me in an infinite loop of corrections requests. Have you encountered this too?


For me I only use it as a second opinion, I got a pretty good idea of what I want and how to do it, and I can ask any input on what I have written. This gives me the best results sofar.


Have you tried a more granular strategy - smaller chunks and more iterative cycles?


At that point, you might as well write it yourself. Instead of writing 300 lines of code, you are writing 300 lines of prompts. What benefit would you get?


Its not. "Add this table, write the dto" takes 10 seconds to do. It would take me few mins probably assuming Im familiar with the language and much longer if Im not.

But its a lot better than that.

"Write this table. from here store it into table. Write endpoint to return all from the table"

I also had good luck with stuff like "scrape this page, collect x and y, download link pointed at y, store in this directory".


This only happens if you want it to one-shot stuff, or if you fall under the false belief that "it is so close, we just need to correct these three things!".

Yes I have encountered it. Narrowing focus and putting constraints and guiding it closer made the LLM agent much better at producing what I need.

It boils down to me not writing the code really. Using LLMs actually sharpened my architectural and software design skills. Made me think harder and deeper at an earlier stage.


Yes in that case I just paste it back in. Sometimes I start a whole new chat after that.


Yes. Juniors have a lack of knowledge about how to build coherent mental models of problems whose solution will ultimately be implemented in code, whereas seasoned engineers do.

Seniors can make this explicit to models and use them to automate "the code they would have written," whereas a junior doesn’t know what they would have written nor how they would have solved it absent a LLM.

Same applies to all fields: LLMs can be either huge leverage on top of existing knowledge or a crutch for a lack of understanding.


Aaaha for me it was exactly the other way around!

I had a very complex piece of logic, with many many many moving parts. So I implemented many paths of that logic by hand, every one with their own specifics. Every path took something like 200-400 lines of code.

When this was done and correct. It was difficult to see the moving parts through the forest. Some code was similar but still a bit different, and hard to think about and spread in many places.

I put everything into an Llm and asked about isolation, architecture and refactoring.

And it actually gave me pretty good abstractions and a good architecture. It didn't include every possible path, but was easy enough to be continued by hand.

It's not that I would not have tought about it. But it was easier that way, and probably my handcrafted solution would be very similar (+headache).

Of course, I reviewed it extensively, and reimplemented every missing path and corrected the ones that were buggy now by hand.

For the experiment, I played with agents to fill the missing parts, it was a disaster :)


Ah this is great, and I can see how that would be useful. In a way, it's just "there's a clear spec" by another path, but it's just defined by all the code you wrote.


My success and experience generally matches yours (and the authors'). Based on my experience over the last 6 months, nothing here around more senior developers getting more productivity and why is remotely controversial.

It's fascinating how a report like yours or theirs acts as a lightning rod for those who either haven't been able to work it out or have rigid mental models about how AI doesn't work and want to disprove the experience of those who choose to share their success.

A couple of points I'd add to these observations: Even if AI didn't speed anything up... even if it slowed me down by 20%, what I find is that the mental load of coding is reduced in a way that allows me to code for far more hours in a day. I can multitask, attend meetings, get 15 minutes to work on a coding task, and push it forward with minimal coding context reload tax.

Just the ability to context switch in and out of coding, combined with the reduced cognitive effort, would still increase my productivity because it allows me to code productively for many more hours per week with less mental fatigue.

But on top of that, I also antectodally experience the 2-5x speedup depending on the project. Occasionally things get difficult and maybe I only get a 1.2-1.5x speedup. But it's far easier to slot many more coding hours into the week as an experienced tech lead. I'm leaning far more on skills that are fast, intuitive abilities built up from natural talent and decades of experience: system design, technical design, design review, code review, sequencing dependencies, parsing and organizing work. Get all these things to a high degree of correctness and the coding goes much smoother, AI or no AI. AI gets me through all of these faster, outputs clear curated (by me) artifacts, and does the coding faster.

What doesn't get discussed enough is that effective AI-assisted coding has a very high skill ceiling, and there are meta-skills that make you better from the jump: knowing what you want while also having cognitive flexibility to admit when you're wrong; having that thing you want generally be pretty close to solid/decent/workable/correct (some mixture of good judgement & wisdom); communicating well; understanding the cognitive capabilities of humans and human-like entities; understanding what kind of work this particular human/human-like entity can and should do; understanding how to sequence and break down work; having a feel for what's right and wrong in design and code; having an instinct for well-formed requirements and being able to articulate why when they aren't well-formed and what is needed to make them well-formed.

These are medium and soft skills that often build up in experienced tech leads and senior developers. This is why it seems that experienced tech leads and senior developers embracing this technology are coming out of the gate with the most productivity gains.

I see the same thing with young developers who have a talent for system design, good people-reading skills, and communication. Those with cognitive flexibility and the ability to be creative in design, planning and parsing of work. This isn't your average developer, but those with these skills have much more initial success with AI whether they are young or old.

And when you have real success with AI, you get quite excited to build on that success. Momentum builds up which starts building those learning skill hours.

Do you need all these meta-skills to be successful with AI? No, but if you don't have many of them, it will take much longer to build sufficient skill in AI coding for it to gain momentum—unless we find the right general process that folks who don't have a natural talent for it can use to be successful.

There's a lot going on here with folks who take to AI coding and folks who dont. But it's not terribly surprising that it's the senior devs and old tech leads who tend to take to it faster.


Great post and thanks for the perspective. You have to be open minded to even try, and so that selects for only some devs. Then among the open minded, you need to be skeptical and careful, which again selects down. So the devs that are having positive experiences are likely in a minority.

Balance that against the threat AI poses to livelihoods and it's not a shock that overall sentiment is negative. But I would guess it will shake out in the direction we are pushing, at least in the nearer (3yr) term.


>> llms where supposed to help juniors

Lol what Who came up with this? They never were supposed to do anything. Just turned out to be useful in experienced hands as expected


Sounds like its faster to just write the code by hand


Once you get a sense for LLM workflow, sometimes the task is not appropriate for it and you do write by hand. In fact, most code I write is by hand.

But if I want a new system and the specs are clear, it can be built up in stages that are testable, and there are bits that would take some research but are well documented… then it can be a win.

The studies that say devs are slower with LLMs is fair because on average, devs don’t know how to optimize for them. Some do though.


The massive productivity gains I’ve seen come from multidisciplinary approaches, where you’d be applying science and engineering from fields like chemistry, physics, thermodynamics, fluids, etc, to speedy compiled languages. The output is immediately verifiable with a bit of trial and error and visualization and you’re saved literally months of up front text book and white paper research to start prototyping anything


Even if that's true today ( it's not ), it becomes less true over time as tools and models improve.

If you have someone who knows what they're doing with the latest and greatest coding agents it's just not comparable. You can have a Dev open up four or more terminals with multiple prompts running at the same time. A manual person just can't keep up.


Yeah but then all the code is unaudited? Its too much to review...

I generate code but I still place it there by hand, vibe code is just too full of bugs

and I find it a bit boring to wait for code to generate all the time.


The best way I’ve come to describe LLMs is as an ambitious, occasionally bewilderingly stupid but always incredibly hard working junior employee.

You have to watch what it’s doing. And you can’t let it take out into territory you don’t understand, because it will fuck up off leash. But it will thanklessly iterate revision after revision in the way one would ordinarily do with a team, but now don’t need to for tasks that would bore them.


My mental model is a very smart and interesting stranger at a bar.

Sometimes detail accuracy is sacrificed in service of a good story. Sometimes they simply full of shit and double down when pushed.


> So in the end, it's code that I know very, very well. I could have written it but it would have taken me about 3x longer when all is said and done.

What about the bugs ? Whould you have inserted the same bugs or different ones ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: