This is the first flash/mini model that doesn't make a complete ass of itself wh...

amunozo · 2025-12-17T17:30:41 1765992641

I tried the same with my father's little village (Zarza Capilla, in Spain), and it gave a surprisingly good answer in a couple of seconds. Amazing.

peterldowns · 2025-12-17T19:26:06 1765999566

That's a really cool prompt idea, I just tried it with my neighborhood and it nailed it. Very impressive.

kingstnap · 2025-12-17T17:38:40 1765993120

You are effectively describing SimpleQA but with a single question instead of a comprehensive benchmark and you can note the dramatic increase in performance there.

jtrn · 2025-12-17T23:16:59 1766013419

I tested it for coding in Cursor, and the disappointment is real. It's completely INSANE when it comes to just doing anything agentic. I asked it to give me an option for how to best solve a problem, and within 1 second it was NPM installing into my local environment without ANY thinking. It's like working with a manic patient. It's like it thinks: I just HAVE TO DO SOMETHING, ANYTHING! RIGHT NOW! DO IT DO IT! I HEARD TEST!?!?!? LET'S INSTALL PLAYWRIGHT RIGHT NOW LET'S GOOOOOO.

This might be fun for vibecode to just let it go crazy and don't stop until an MVP is working, but I'm actually afraid to turn on agent mode with this now.

If it was just over-eager, that would be fine, but it's also not LISTENING to my instructions. Like the previous example, I didn't ask it to install a testing framework, I asked it for options fitting my project. And this happened many times. It feels like it treats user prompts/instructions as: "Suggestions for topics that you can work on."