Very interesting. Also I find these training parameters quite elegant:
- Diversity: This term encourages the model to generate a diverse set of samples, preventing mode collapse.
- Fidelity: This term rewards the model for making predictions that are close to the ground-truth
I'm wondering if a continuos next-vector generative approach also increase innate "reasoning" capabilities of the model, since it could potentially capture more of the semantics of the data vs just tokens.
They say this technique isn't compatible yet with RL because you can't adjust the logits. So no GRPO I guess, which is going to be the biggest issue. An LLM with no RL applied isn't going to be that useful.
- Diversity: This term encourages the model to generate a diverse set of samples, preventing mode collapse. - Fidelity: This term rewards the model for making predictions that are close to the ground-truth
I'm wondering if a continuos next-vector generative approach also increase innate "reasoning" capabilities of the model, since it could potentially capture more of the semantics of the data vs just tokens.