That is true, so don't give it entirely free reign with that. I let Claude generate as many additional tests as it'd like, but I either produce high level tests, or review a set generated by Claude first, before I let it fill in the blanks, and it's instructed very firmly to see a specific set of test cases as critical, and then increasingly "boxed in" with more validated test cases as we go along.
E.g. for my compiler, I had it build scaffolding to make it possible to run rubyspecs. Then I've had it systematically attack the crashes and failures mostly by itself once the test suite ran.
E.g. for my compiler, I had it build scaffolding to make it possible to run rubyspecs. Then I've had it systematically attack the crashes and failures mostly by itself once the test suite ran.