More often than not is not good enough for me. But worse than not following inst...

sorcercode · 2025-10-15T00:08:41 1760486921

i'm genuinely curious - what do your prompts look like? do you have agent instructions for your repo that you've spent a little time pruning or maintaining? when you execute tasks are you planning these out or one-shotting them?

i ask because your comment is sounding like more often than not, you're actually getting *worse* results? (if i'm reading that right). and i want to understand is it a perception problem (like do you just have a higher bar for what you expect from a coding agent); or is it actually producing bad results, in which case i want to understand what's different in the ways we're using these agents.

(also can you provide a concrete example of how it got something wrong? like what were you asking in that moment and what did it do, and how did it double down.)

mcv · 2025-10-15T05:06:24 1760504784

A few weeks ago, I was testing the difference between two libraries to draw graphs: cytoscape and NVL. I started with NVL, told it to implement a very basic case: just some simple mock data, and draw the thing on the screen. (Those weren't my prompts; I didn't save those.) I also took the advice of using a context prompt file to lay down coding standards.

It didn't get it right in one go, but after 7 attempts of analysing the error, it suddenly worked, and I was quite amazed. Step by step adding more features was terrible, however; it kept changing the names of functions and properties, and it turned out that was because I was asking for features the library didn't support, or at least didn't document. I ended up rejecting NVL because it was very immature, poorly documented, and there weren't many code examples available. I suspect that's what crippled the AI's ability to use it. But at no point did it tell me I was asking the impossible; it happily kept trying stuff and inventing nonexistent APIs until I was the one to discover it was impossible.

I'm currently using it to connect a neo4j backend to a cytoscape frontend; both of those are well established and well documented, but it still struggles to get details right. It can whip up something that almost works really quickly, but then I spend a lot of time hunting down little errors it made, and they often turn out to be based on a lack of understanding of how these systems work. Just yesterday it offered 4 different approaches to solve a problem, one of which blatantly didn't work, one used a library that didn't exist, one used a library that didn't do what I needed, and only one was a step in the right direction but still not quite correct in its handling of certain node properties.

I'm using Claude 3.7 thinking and Claude 4 for this.