Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It can be relatively cheap too under the constraints imposed by typical AI workloads, at least when it comes to getting to a 1TB/s or so. All you need is high-spec DDR5 and _a ton_ of memory channels in your SOC. During transformer inference you will easily be able to use those parallel, multichannel reads. I get why you'd need HBM and several TB/s of memory bandwidth for extremely memory intensive training workloads. But for inference 1TB/s gives you a lot to work with (especially if your model is a MoE), and it doesn't have to be ultra expensive.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: