> Data files are useless without a program that knows how to utilize the data. A...

mbreese · 2025-10-02T02:45:53 1759373153

I think the counter argument here is that you’re now including a CSV decoder in every CSV data file now. At the data sizes we’re talking, this is negligible overhead, but it seems overly complicated to me. Almost like it’s trying too hard to be clever.

How many different storage format implementations will there realistically be?

catlifeonmars · 2025-10-02T05:08:28 1759381708

> How many different storage format implementations will there realistically be?

Apparently an infinite number, if we go with the approach in the paper /s

magicalhippo · 2025-10-02T08:29:06 1759393746

It does open up the possibility for specialized compressors for the data in the file, which might be interesting for archiving where improved compression ratio is worth a lot.

catlifeonmars · 2025-10-02T17:17:34 1759425454

That makes sense. I think fundamentally you’re trading off space between the compressed data and the lookup tables stored in your decompression code. I can see that amortizing well if the compressed payloads are large or if there are a lot of payloads with the same distribution of sequences though.