Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
WAL-G – fast archival and restoration for PostgreSQL (github.com/wal-g)
104 points by bjoko on Feb 26, 2019 | hide | past | favorite | 16 comments


Has anyone here used WAL-G? How does it compare to WAL-E in practice?


I sort of commissioned the project at Citus, under the auspices of figuring out how much memory copies were costing WAL-E. The answer was: a lot. Our main goal was to touch as many risk points as possible in the design of WAL-E replacement, with emphasis on performance. It was a prototype to that end, and I rather planned to rewrite it, borrowing heavily as I saw fit: the fate of all prototypes. Various priorities got in the way of that.

It was also an opportunity to build competency in management and definition of an intern-sized project. Katie did a very good job, surpassing, by far, the amount of information I thought we could gather during her tenure.

However, unexpectedly, and for a while now, Yandex staff have really worked on the code base a rather lot, bringing it beyond merely practical. They seem to use it under similar conditions WAL-E was designed: for en-masse deployment.

That said, though, I don't think it has the end-user polish that WAL-E had (insomuch as it did) at its peak of maintenance attention. I would consider it acceptable for a programmer to use, but don't expect a sealed project. It might be suitable for those running a large operation and with willingness to get into the implementation.

You can see some plots here. https://www.citusdata.com/blog/2017/08/18/introducing-wal-g-.... Since that time, the errata about parallel recovery has been lifted, courtesy of Andrey.


If your DB is under few TB, you, probably, can go with WAL-E. We have thousands of clusters with total amount around 1PB of data. WAL-G is simple to set up, but current CLI experience is far from perfect.

Our goal is to make the most performant PostgreSQL backup system for cloud deployments. WAL-G is not just fast compression tool: we parallelize serial archive\restore interface and provide very cheap delta-backups. In PostgreSQL, you usually have PITR through WAL. If you have rare backups, your restore time is slow: WAL is applied serially. With WAL-G you can have delta-backups often, they are applied in parallel and much faster than WAL. This is important for us, because we have a bunch of distributed datacenters and from time to time we need to repair HA clusters from backups as fast as we can.

Best regards, Andrey.


> 1 pb of data

Since you mention your real name-would you mind telling who you are?

Thank you!


My name is Andrey, I'm an engineer. I'm contributing to PostgreSQL on behalf of Yandex.Cloud. Also, I'm working on WAL-G. I like jogging and quake[2,3]. Not sure what else defines me...



I switched the production backups for ctadvisor from WAL-E to WAL-G largely because of the whole the workflow I blogged about here:

https://www.lolware.net/2017/02/02/continuous-backup-tests-w...

I'm not a python person and playing with pip the various deps were painful.

It's not the developer's fault - I write plenty of Ruby and I expect non-rubyists would have the same issues with some of my code.

But having a Go binary make all these problems go away is a dream.


We are using Wal-G in production and testing it every day by copying/restoring prod db to staging. The CLI is spartan, but it works and it's fast.

Andrey is also responsive if you encounter any issue and went to a lot of trouble to fix a hard to reproduce recovery issue we encountered.


WAL-G only supports AWS, so a lot of us are still using WAL-E. If anyone is looking for a reasonably simple Go project to contribute to, have a look at adding Azure, OpenStack or GCE support to help bring it up to parity.


We have GCP. There's a PR https://github.com/wal-g/wal-g/pull/189/commits/17363c3fb6a5...

I'm on-call for this week, but I'm planing to work on merging this soon.


Would you accept PRs implementing ssh backups ?


FWIW, PGHoard (https://github.com/aiven/pghoard/) is another PostgreSQL backup daemon that works with S3, GCS, Azure and Swift.

Implementing a SFTP backend using e.g. paramiko in PGHoard should be pretty simple.


Sounds cool, but I doubt it is implementable... Let's create an issue for discussion of this feature?


Hi,

will do, thanks! I guess in_memory_storage_folder.go is a good starting point.


Excellent news!


It has great GCP support now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: