That is true, so don't give it entirely free reign with that. I let Claude gener...

That is true, so don't give it entirely free reign with that. I let Claude generate as many additional tests as it'd like, but I either produce high level tests, or review a set generated by Claude first, before I let it fill in the blanks, and it's instructed very firmly to see a specific set of test cases as critical, and then increasingly "boxed in" with more validated test cases as we go along.

E.g. for my compiler, I had it build scaffolding to make it possible to run rubyspecs. Then I've had it systematically attack the crashes and failures mostly by itself once the test suite ran.