Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Although I welcome a parquet successor, I am not particularly interested in a more complicated format. Random access time improvements are nice, but really what I would like just storing multiple tables in a single parquet file.

When I read "possible extension through embedded wasm encoders" I can already imagine the c++ linker hell required to get this thing included in my project.

I also don't think a lot of people need "ai scale".



Storing multiple tables in a single file would be trivially solvable by storing multiple Parquet files in a most basic plain uncompressed tarball (to retain ability to access any part of any file without downloading the whole thing). Or maybe ar or cpio - tar has too many features (such as support for links) that are unnecessary here. Basically, anything well-standardized that implements a very basic directory structure, with a simple index located at a predictable offset.

If any tools would've supported that.


Couldn't agree more. If tooling would just settle on an arbitrary archive format our lives would be better.


Lance already exists to solve Parquet problems but with drastically faster random access time


Lance is pretty far from a lingua franca. For instance the SDKs are only Rust/Python/Java, none of which I use.


Sounds like we need more SDKs, not a new format


If you want "several tables and database-like semantics in one file," then what you want is DuckDB.

If you want modern parquet, then you want the Lance format (or LanceDB for DB-like CRUD semantics).


also what does "ai scale" even mean?


I think its a bit markety, but they explain it rather well: because of AI your data needs to be consumed by machines on an unprecedented scale, which requires new solutions to problems. Historically we mostly did large input -> small output, now we're doing large input -> large output. The existing tools are (supposedly) not ready.


no, I read that. It doesn't really add any more practical detail.


It’s obvious a jab at mongo’s ”web scale”. https://youtube.com/watch?v=b2F-DItXtZs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: