Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I learned about Open Addressing a long time ago when I was optimizing Dnsmasq. The scenario involved looking up a very large list of domain names to block or combine with ipset in order to customize routing based on domain names. Open addressing worked very well on these large immutable lists.


If the lists are immutable, could you not just construct a special hash function a priori that will guarantee no collisions?


Perfect Hash Functions are possible, but in my limited experience they rarely out-perform the naive thing where you pre-size the hash table to fit all your data (avoid grow-on-insert overhead).

PHFs have two downsides: Finding a PHF for your data can take a bunch of CPU time so it's frustrating to have in your build, and the resulting hash function can be much more costly than a simpler hash function. That is, the cost of hashing can more than dominate an extra lookup in a tiny fraction of cases.

I'd say the use case for PHF is more when you're absolutely memory-bound, e.g. in an embedded ROM. For anything else, just bump up your hash table size a little, to not have those collisions.


Yes. The term of art is “perfect hash”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: