Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s impossible to answer that question without knowing what content/query domain you are embedding. Checkout MTEB leaderboard, dig into the retrieval benchmark, and look for analogous datasets.


So we're talking maximizing embedding model per use case? Medical dats would require differnet model than say sales data? Sounds very fragmented approach.


The answer lies with a validation dataset that you create for testing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: