It can be relatively cheap too under the constraints imposed by typical AI workl...

It can be relatively cheap too under the constraints imposed by typical AI workloads, at least when it comes to getting to a 1TB/s or so. All you need is high-spec DDR5 and _a ton_ of memory channels in your SOC. During transformer inference you will easily be able to use those parallel, multichannel reads. I get why you'd need HBM and several TB/s of memory bandwidth for extremely memory intensive training workloads. But for inference 1TB/s gives you a lot to work with (especially if your model is a MoE), and it doesn't have to be ultra expensive.