Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very interesting. Also I find these training parameters quite elegant:

- Diversity: This term encourages the model to generate a diverse set of samples, preventing mode collapse. - Fidelity: This term rewards the model for making predictions that are close to the ground-truth

I'm wondering if a continuos next-vector generative approach also increase innate "reasoning" capabilities of the model, since it could potentially capture more of the semantics of the data vs just tokens.



And may be even more adapted to sorts of RL finetuning?


They say this technique isn't compatible yet with RL because you can't adjust the logits. So no GRPO I guess, which is going to be the biggest issue. An LLM with no RL applied isn't going to be that useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: