https://www.youtube.com/watch?v=cUbGVH1r_1U Everyone is talking about the releas...

https://www.youtube.com/watch?v=cUbGVH1r_1U

Everyone is talking about the release of Gemini 3. The benchmark scores are incredible. But as we know in the AI world, paper stats don't always translate to production performance on all tasks.

We decided to put Gemini 3 through its paces on some standard Vision Language Model (VLM) tasks – specifically simple image detection and processing.

The result? It struggled where I didn't expect it to.

Surprisingly, VLM Run's Orion (https://chat.vlm.run/) significantly outperformed Gemini 3 on these specific visual tasks. While the industry chases the "biggest" model, it’s a good reminder that specialized agents like Orion are often punching way above their weight class in practical applications.

Has anyone else noticed a gap between Gemini 3's benchmarks and its VLM capabilities?