Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

would like to see work like this, but for datasets in the hundreds of TB or single-digit PB

but i definitely agree about this point

> Cluster fatigue is real

imo, the concept of “extremely ephemeral query workers” is under-explored

stateless, maintenance-free, burstable fleets of query workers is what I would like to see more of in the future.

it’s how we do it, and it gives us full-text search on multi-hundred terabyte data sets in S3, where queries finish in a handful of seconds. our approach: https://docs.scanner.dev/scanner/what-and-why/how-it-works/h...

anyone else doing ephemeral query workers fleets?



Yes.. its called snowflake? Theyre exactly that and why they work so well. I know youre asking for an OSS but what snowflake offers is a fleet of servers that can build your cluster in a second as opposed to minutes that you need if you want to spin it up yourself..


> extremely ephemeral query workers

Reading data from S3 can really add up, so this isn't as straightforward as it seems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: