Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.




Not for prompt processing. Current Macs are really not great at long contexts



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: