It's 100% the models. Terminal bench is a good indication for this. There the ag...

It's 100% the models. Terminal bench is a good indication for this. There the agents get "just a terminal tool", and yet they still can solve lots and lots of tasks. Last year you needed lots of glue, and two years ago you needed monstrosities like langchain that worked maybe once in a blue moon, if you didn't look funny at it.

Check out the exercise from the swe-agent people who released a mini agent that's "terminal in a loop" and that started to get close to the engineered agents this year.

https://github.com/SWE-agent/mini-swe-agent