OLAP databases can/are still Relational databases. The difference is that they’r...

chrismorgan · on Sept 20, 2020

I gather that there may be some differences in functionality offered, but that it’s probably not much. So I presume performance is the biggest area of difference. In that regard: how big are the differences in performance for each category’s representative workloads?

legg0myegg0 · on Sept 20, 2020

Before the latest optimization, and only using 1 core, vs. SQLite we were seeing 133x performance on a basic group by or join, and about 4x for a pretty complex query. It was roughly even to Pandas in performance, but it can scale to larger than memory data and now it can use multiple cores! As an example, I could build views from 2 Pandas DataFrames with 2 columns and 1 million rows each, join them, and return the 1 million row dataset back to Pandas in 2 seconds vs. 40 seconds with SQLite/SQLAlchemy... Pretty sweet. DuckDB is going to be even faster now I bet!

jmatthews · on Sept 21, 2020

Have you compared this to dask from a performance standpoint? That is the larger than memory solution we're currently using for analytics.

chrismorgan · on Sept 20, 2020

How about the other way? When will SQLite perform better than DuckDB?

FridgeSeal · on Sept 20, 2020

If your workload is lots of singular INSERT or UPDATE’s.

greggyb · on Sept 20, 2020

I routinely see order of magnitude or greater improvements moving analytical workloads to compressed, columnstore engines vs rowstore engines.

The shortest useful description I can give of what constitutes an analytical workloads is that it is table-scan-heavy, and often concerned with a small subset of columns in a table.

FridgeSeal · on Sept 21, 2020

To add to the other comment (albeit with different DB’s because I haven’t used DuckDB yet) comparing MSSQL and Clickhouse: similar hardware and same dataset, CH responds in ~30-60ms, MSSQL 300-400+ ms for simpler queries. 90-180ms vs several seconds, up to about 30s for more complex queries.

I could add more indices to MSSQL and do all sorts of query optimisations, but out of the box the OLAP wins hands down for that workload.