Couldn't you have used an already existing format for storage, e.g. Apache ORC?

mytherin · on Sept 20, 2020

Those formats have a different use case from a database system - they are mostly designed for write once, read many times workloads. As far as I’m aware you cannot update them in an ACID compliant manner, for example, which makes it difficult to use them as backend for an ACID database system.

hfmuehleisen · on Sept 20, 2020

Another DuckDB developer here, we support reading table data from Parquet files directly.

StreamBright · on Sept 20, 2020

I am a big fan of those formats but decoupling the actual storage features from the ecosystem is not a trivial task. I haven't look at the C++ version of ORC for a while but it used to be incomplete. Other than that, the solutions ORC uses to compress data is pretty amazing.