Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

More often than not is not good enough for me. But worse than not following instructions, is simply being wrong, and then doubling down on it.

This is something that happens to me a lot. When I ask it to do something moderately complex, or to analyse an error I get, quite often it leaps to the wrong conclusion and keeps doubling down on it when it doesn't work, until I tell it what the actual problem is.

It's great with simple boilerplate code, and I have actually used it to implement a new library successfully with a little bit of feedback from me, but it gets stuff wrong so often that I really don't need it doing anything beyond that. Although when I'm stuck, I still use it to spew ideas. Even if they're wrong, they can help me get going.



i'm genuinely curious - what do your prompts look like? do you have agent instructions for your repo that you've spent a little time pruning or maintaining? when you execute tasks are you planning these out or one-shotting them?

i ask because your comment is sounding like more often than not, you're actually getting *worse* results? (if i'm reading that right). and i want to understand is it a perception problem (like do you just have a higher bar for what you expect from a coding agent); or is it actually producing bad results, in which case i want to understand what's different in the ways we're using these agents.

(also can you provide a concrete example of how it got something wrong? like what were you asking in that moment and what did it do, and how did it double down.)


A few weeks ago, I was testing the difference between two libraries to draw graphs: cytoscape and NVL. I started with NVL, told it to implement a very basic case: just some simple mock data, and draw the thing on the screen. (Those weren't my prompts; I didn't save those.) I also took the advice of using a context prompt file to lay down coding standards.

It didn't get it right in one go, but after 7 attempts of analysing the error, it suddenly worked, and I was quite amazed. Step by step adding more features was terrible, however; it kept changing the names of functions and properties, and it turned out that was because I was asking for features the library didn't support, or at least didn't document. I ended up rejecting NVL because it was very immature, poorly documented, and there weren't many code examples available. I suspect that's what crippled the AI's ability to use it. But at no point did it tell me I was asking the impossible; it happily kept trying stuff and inventing nonexistent APIs until I was the one to discover it was impossible.

I'm currently using it to connect a neo4j backend to a cytoscape frontend; both of those are well established and well documented, but it still struggles to get details right. It can whip up something that almost works really quickly, but then I spend a lot of time hunting down little errors it made, and they often turn out to be based on a lack of understanding of how these systems work. Just yesterday it offered 4 different approaches to solve a problem, one of which blatantly didn't work, one used a library that didn't exist, one used a library that didn't do what I needed, and only one was a step in the right direction but still not quite correct in its handling of certain node properties.

I'm using Claude 3.7 thinking and Claude 4 for this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: