Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Chinese, Japanese, Korean etc.. don’t work like this either.

However, even though the approach is “old fashioned” it’s still widely used for English. I’m not sure there is a universal approach that semantic search could use that would be both fast and accurate?

At the end of the day people choose a tokenizer that matches their language.

I will update the article to make all this clearer though!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: