More

Mkengin · 2025-12-21T13:28:43 1766323723

I would assume that if a tool is there and the alternative too costly that they would use the tool instead of buring their project. Just today I stumbled over this for example, where they use GenAI as well: https://reddit.com/comments/1prqfsu

Mkengin · 2025-12-21T13:26:50 1766323610

Not for coding, but today I stumbled upon these two building their passion project using GenAI, which would otherwise perhaps not be possible: https://reddit.com/comments/1prqfsu

Mkengin · 2025-12-21T13:25:17 1766323517

It doesn't have to be hyped to be used, for example today I found these two building their passion project using GenAI, which would otherwise maybe not possible, who knows: https://reddit.com/comments/1prqfsu

Mkengin · 2025-12-21T13:12:21 1766322741

This is just one example, but today I found this where two people build their passion project using GenAI for image generation (+ photoshop), maybe otherwise this project wouldn't even be possible: https://reddit.com/comments/1prqfsu

Mkengin · 2025-12-18T22:49:10 1766098150

Though this Codex version isnt on the leaderboard, GPT-5.2-Medium already seems to be a bit better than Opus 4.5: https://swe-rebench.com/

gizmodo59 · 2025-12-18T23:36:50 1766101010

Is that your website or something? You keep promoting it

Mkengin · 2025-12-21T13:48:02 1766324882

No, I am not affiliated with the website, I just want to see more discussions based on uncontaminated benchmarks and feel that people rely too much on benchmarks that companies can conduct themselves. If that is the case, I don't feel I can trust them. For general LLM capabilities, for example, I would also tend to rely on dubesor [1] rather than artificial analysis or similar leaderboards.

[1] https://dubesor.de/benchtable

Mkengin · 2025-12-18T22:45:22 1766097922

At least on swe-rebench it does pretty well: https://swe-rebench.com/

Mkengin · 2025-12-18T22:41:33 1766097693

Your experience seems to match the recent results from swe-rebench: https://swe-rebench.com/

Mkengin · 2025-12-18T22:39:12 1766097552

According to SWE-Rebench Anthropic and OpenAI are really close in performance, while GPT-5.2 costs less than half the cost of CC per problem.

https://swe-rebench.com/

Mkengin · 2025-11-26T13:05:59 1764162359

Interesting. So similar to the vision encoder + projector in VLMs?

Mkengin · 2025-11-24T23:30:09 1764027009

I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/