OLAP databases can/are still Relational databases. The difference is that they’re optimised for different workloads.
SQLite/MySQL/Postgres/MSSQL etc are all OLTP databases whose primary operation is based around operations on single (or few) rows.
OLAP databases like ClickHouse/DuckDB, Monet, Redshift, etc are optimised for operating on columns and performing operations like bulk aggregations, group-bys, pivots, etc on a large subsets or whole tables.
If I was recording user purchases/transactions: SQLite.
If I was aggregating and analysing a batch of data on a machine: DuckDB.
I read an interesting engineering blog from Spotify(?) where in their data pipeline instead of passing around CSV’s or rows of JSON, passed around SQLite databases: DuckDB would probably be a good fit there.
I gather that there may be some differences in functionality offered, but that it’s probably not much. So I presume performance is the biggest area of difference. In that regard: how big are the differences in performance for each category’s representative workloads?
Before the latest optimization, and only using 1 core, vs. SQLite we were seeing 133x performance on a basic group by or join, and about 4x for a pretty complex query. It was roughly even to Pandas in performance, but it can scale to larger than memory data and now it can use multiple cores!
As an example, I could build views from 2 Pandas DataFrames with 2 columns and 1 million rows each, join them, and return the 1 million row dataset back to Pandas in 2 seconds vs. 40 seconds with SQLite/SQLAlchemy... Pretty sweet. DuckDB is going to be even faster now I bet!
I routinely see order of magnitude or greater improvements moving analytical workloads to compressed, columnstore engines vs rowstore engines.
The shortest useful description I can give of what constitutes an analytical workloads is that it is table-scan-heavy, and often concerned with a small subset of columns in a table.
To add to the other comment (albeit with different DB’s because I haven’t used DuckDB yet) comparing MSSQL and Clickhouse: similar hardware and same dataset, CH responds in ~30-60ms, MSSQL 300-400+ ms for simpler queries. 90-180ms vs several seconds, up to about 30s for more complex queries.
I could add more indices to MSSQL and do all sorts of query optimisations, but out of the box the OLAP wins hands down for that workload.
SQLite/MySQL/Postgres/MSSQL etc are all OLTP databases whose primary operation is based around operations on single (or few) rows.
OLAP databases like ClickHouse/DuckDB, Monet, Redshift, etc are optimised for operating on columns and performing operations like bulk aggregations, group-bys, pivots, etc on a large subsets or whole tables.
If I was recording user purchases/transactions: SQLite.
If I was aggregating and analysing a batch of data on a machine: DuckDB.
I read an interesting engineering blog from Spotify(?) where in their data pipeline instead of passing around CSV’s or rows of JSON, passed around SQLite databases: DuckDB would probably be a good fit there.