Although I welcome a parquet successor, I am not particularly interested in a more complicated format. Random access time improvements are nice, but really what I would like just storing multiple tables in a single parquet file.
When I read "possible extension through embedded wasm encoders" I can already imagine the c++ linker hell required to get this thing included in my project.
I also don't think a lot of people need "ai scale".
Storing multiple tables in a single file would be trivially solvable by storing multiple Parquet files in a most basic plain uncompressed tarball (to retain ability to access any part of any file without downloading the whole thing). Or maybe ar or cpio - tar has too many features (such as support for links) that are unnecessary here. Basically, anything well-standardized that implements a very basic directory structure, with a simple index located at a predictable offset.
I think its a bit markety, but they explain it rather well: because of AI your data needs to be consumed by machines on an unprecedented scale, which requires new solutions to problems. Historically we mostly did large input -> small output, now we're doing large input -> large output. The existing tools are (supposedly) not ready.
When I read "possible extension through embedded wasm encoders" I can already imagine the c++ linker hell required to get this thing included in my project.
I also don't think a lot of people need "ai scale".