Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
EagleX 1.7T: Soaring past LLaMA 7B 2T in both English and Multi-lang evals (recursal.ai)
36 points by lhofer on March 16, 2024 | hide | past | favorite | 9 comments


To be clear, this is a 7B model. It's just trained on 1.7 trillion tokens. At first I was confused why they were making such a big deal of of a massive 1.7T model outperforming a 7B model.

By the way, GPT-4 has been predicted to be using a 1.7T model, although OpenAI has not confirmed this to my knowledge.


The most interesting bit is that this is RWKV model, meaning constant size state (no quadratic attention). AFAIK the biggest open weights non-transformer model.


> All evals are 0 shot

My bet is that this is the reason they are scoring high in "their" benchmarks. For model which are just trained on completely unlabelled data like llama, 0 shot won't work well.

e.g. For llama Hellaswag accuracy is 57.13% in their benchmark compared to 78.59% in [1].

[1]: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...


I think this is simply the default of lm-evaluation-harness. They said they ran every single benchmark they could out of the box.


> [June 2024] v6 MoE model

How does that work for RWKV architecture? Wouldn't you have to feed the same data through all the experts regardless if they're currently active or not to keep the rolling state consistent? Or am I misunderstanding that architecture?


>Let’s look what went really badly: Math.

>We dug through the dataset we used for training, and realized we missed out the entire math dataset (along with a few others) due to an error. Oops.

This is kinda hilarious.


Makes me excited for the next model though!


> All data shown here is made available in the Google Sheet over here:

Over where?


sounds dreamy. anyone know how I can install this on my m1 mac?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: