Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They need to compile the model for their chips. Standard transformers are easier, so GPT-OSS, Qwen, GLM, etc if there is demand, they will deploy it.

Nemotron on the other hand is a hybrid (Transformer + Mamba-2) so it will be more challenging to compile it on Cerebras/Groq chips.

(Me thinks Nvidia is purposefully picking architecture+FP4 that is easy to ship on Nvidia chips, but harder for TPU or Cerebras/Groq to deploy)





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: