Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://lmarena.ai/leaderboard/webdev

LM Arena shows Claude Opus 4.5 on top



I wonder how model competence and/or user preference on web development (that leaderboard) carries over to more complex and larger projects, or more generally anything other than web development ?

In addition to whatever they are exposed to as part of pre-training, it'd be interesting to know what kind of coding tasks these models are being RL-trained for? Are things like web development and maybe Python/ML coding overemphasized, or are they also being trained on things like Linux/Windows/embedded development etc in different languages?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: