Although I welcome a parquet successor, I am not particularly interested in a *m...

drdaeman · 2025-09-11T17:48:03 1757612883

Storing multiple tables in a single file would be trivially solvable by storing multiple Parquet files in a most basic plain uncompressed tarball (to retain ability to access any part of any file without downloading the whole thing). Or maybe ar or cpio - tar has too many features (such as support for links) that are unnecessary here. Basically, anything well-standardized that implements a very basic directory structure, with a simple index located at a predictable offset.

If any tools would've supported that.

vouwfietsman · 2025-09-12T05:38:03 1757655483

Couldn't agree more. If tooling would just settle on an arbitrary archive format our lives would be better.

nylonstrung · 2025-09-11T17:49:03 1757612943

Lance already exists to solve Parquet problems but with drastically faster random access time

vouwfietsman · 2025-09-12T05:33:43 1757655223

Lance is pretty far from a lingua franca. For instance the SDKs are only Rust/Python/Java, none of which I use.

nylonstrung · 2025-09-12T10:14:06 1757672046

Sounds like we need more SDKs, not a new format

gcr · 2025-09-11T19:20:07 1757618407

If you want "several tables and database-like semantics in one file," then what you want is DuckDB.

If you want modern parquet, then you want the Lance format (or LanceDB for DB-like CRUD semantics).

alfalfasprout · 2025-09-11T17:38:55 1757612335

also what does "ai scale" even mean?

vouwfietsman · 2025-09-11T17:42:05 1757612525

I think its a bit markety, but they explain it rather well: because of AI your data needs to be consumed by machines on an unprecedented scale, which requires new solutions to problems. Historically we mostly did large input -> small output, now we're doing large input -> large output. The existing tools are (supposedly) not ready.

alfalfasprout · 2025-09-11T17:47:09 1757612829

no, I read that. It doesn't really add any more practical detail.

aakkaakk · 2025-09-11T17:41:42 1757612502

It’s obvious a jab at mongo’s ”web scale”. https://youtube.com/watch?v=b2F-DItXtZs