Hacker Newsnew | past | comments | ask | show | jobs | submit | andersmurphy's commentslogin

With a trend towards immutable single writer databases MMAP seems like a massive win.

Andy is very critical of using mmap in database implementations.

Andy's critiques are only valid on dedicated database servers.

https://www.symas.com/post/are-you-sure-you-want-to-use-mmap...

LMDB uses mmap and Andy recommends LMDB, in the very article this thread is about.


Why? Sqlite and LMDB make fantastic use of it. For anyone doing a single writer db it's a no brainer. It does so much for you and it does it very well. All the things you don't have to implement because it does it for you:

- Reading the data from disk

- Concurrency between different threads reading the same data

- Caching and buffer management

- Eviction of pages from memory

- Playing nice with other processes in the machine

Why would you not leverage it? It's such a great fit for scaling reads.


Fun footnote: SQLite only got on board with mmap after I demonstrated how slow their code was without it. I.e., getting a 22x speedup by replacing SQLite's btree code with LMDB https://github.com/LMDB/sqlightning

The strongest argument as far as I can see it is... the problem is you now lose control over all those things. It's a black box with effectively no knobs.

Anyways, read for yourself, Pavlo & Leis get into it in detail, and there's benchmarks:

https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf

https://db.cs.cmu.edu/mmap-cidr2022/


What am I missing? The transactional safety problem (the bulk of the paper) is solved simply with a single writer. Which is where you want to be anyway for efficient batching throughput (and isolation).

The other concerns seem to imply there are no other programs running on the same machine as the database. The minute that's not true (is it ever true?). Then OS will do a better job (as seen with LMDB etc).

I think it's telling that the paper focuses on mongoDB not LMDB.


“ It's such a great fit for scaling reads.”

And losing them.


How so? LMDB, boltdb/bbolt and sqlite (with mmap) are all rock solid. Just because mongodb used mmap badly does not make it any less valuable.

Feom what I remember if AWS loses your data they are basically give you some credits and that's it.


Yup, often orders of magnitude better.


100% this directly connected nvme is a massive win. Often several orders of magnitude.

You can take it even further in some context if you use sqlite.

I think one of the craziest ideas of the cloud decade was to move storage away from compute. It's even worse with things like AWS lambda or vercel.

Now vercel et al are charging you extra to have your data next to your compute. We're basically back to VMs at 100-1000x the cost.


Even easier with sqlite thanks to litestream.


datasette and datasette-lite (WASM w/pyodide) are web UIs for SQLite with sqlite-utils.

For read only applications, it's possible to host datasette-lite and the SQLite database as static files on a redundant CDN. Datasette-lite + URL redirect API + litestream would probably work well, maybe with read-write; though also electric-sql has a sync engine (with optional partial replication) too, and there's PGlite (Postgres in WebAssembly)


Yup another trick is to only serve br compressed resources and serve nothing to clients that don't support brotli. A lot of http clients don't support brotli out of the box.

I take it further and only stream content to clients that have a cookie, support js and br. Otherwise all you get is a minimal static pre br compressed shim. Seems to work well enough.


Somehow doubt this. It would mean most react websites that serve static content without paywalls for SEO would get banned by the indexes too.

Which for better or worse is a large portion of the modern internet.


Do we? I feel the layers of abstraction are quite extensive now. They are anything but simple.


(Good) Abstraction is there to hide complexity. I don't think it's controversial to say that software has become extremely complex. You need to support more spoken languages, more backends, more complex devices, etc.


The most complex thing to support is peoples' resumes. If carpenters were incentivized like software devs are, we'd quickly start seeing multi-story garden sheds in reinforced concrete because every carpenters dream job at Bunkers Inc. pays 10x more.


Handles billions of rows just fine. Can take you unreasonably far on a single server.


It's got crazy write throughput too if you hold it right.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: