Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Parsing the Postgres protocol – logging executed statements (kviklet.dev)
88 points by jascha_eng on March 14, 2024 | hide | past | favorite | 19 comments


I built a postgres proxy for https://github.com/kviklet/kviklet. And documented how I did this because I found the byte level parsing quite interesting compared to the usual higher level REST API calls that most of my dev work consists of these days. Thought it might interest some of you as well!


Cool! At Neon[0], I work full time on our custom postgres proxy[1]. It's a very nice protocol to work with, although our usecase is quite a bit more complex compared to the ideas presented in the post.

Neon databases scale to zero, so the proxy needs to spin up databases on the fly. The proxy doesn't do that but it knows if the databases is running and asks our control plane to schedule it if it isn't. It's a fun service to maintain.

The biggest pain is error handling. Postgres is really bad for error messages and codes. The only available code we can use is usually protocol violation...

[0]: https://neon.tech/ [1]: https://github.com/neondatabase/neon/tree/main/proxy


Yeah I have tinkered with the PG protocol to de-identify data as its transmitted and the error handling is pretty bare bones. I think its reflected pretty well in how connection errors and even query errors come back as "you have an error" from Postgres most times lol.


Is there a way to log data leaks? (Large queries, queries returing a large amount of data etc.)

Many years ago there was the suspicion we were hacked (we weren't), and we found it suprisingly difficult to find out if data was leaked.


I have seen organization solve that via database activity monitoring. At adaptive (https://adaptive.live) we solve this problem along with database access and other related problem statements.


Reminds me of a project I made to de-identify PHI as it was streaming across PG.

The documentation on postgres data is very well done, and it's pretty straight forward to understand, you can find it here for anyone who is interested: https://www.postgresql.org/docs/16/protocol-message-formats....

The real pain is encryption now days, it was easy to make something in a day that could proxy requests but getting it to handle all the different connection errors and SQL errors was a nightmare.


While I enjoyed the read and protocol parsing is always an interesting exercise, I'm still left wondering why a proxy was deemed necessary. Why not parse the logs instead?


You can parse the logs to figure out executed statements. But then you only know the Postgres user that executed them.

In my experience there is often shared users, e.g. a readonly user or even for maintenance/operational tasks ("I need to fix prod real quick") a shared admin user.

The idea of Kviklet as a tool is to not share the password for such a user but instead use Single Sign on for authentication. Meaning multiple users can use the same DB user but the execution of statements is still linked to their individual accounts (e.g. Google Account) in the audit-log.

Hope this makes sense?


I ordinarily would have said you reinvented Teleport <https://github.com/gravitational/teleport/tree/v14.3.7#readm...> but now that they've gone AGPL with v15 I'm guessing there's a market for MIT licensed stuff, although for sure since Teleport has been around for so long it has encountered more edge cases and undergone more security reviews. I was surprised while digging up the link that Gravatational is still releasing v13 and v14 updates under Apache 2, so maybe even Teleport will continue to have legs for those who cannot deploy AGPL stuff


> I was surprised while digging up the link that Gravatational is still releasing v13 and v14 updates under Apache 2, so maybe even Teleport will continue to have legs for those who cannot deploy AGPL stuff

Teleport puts out 3 major releases a year (every 4 months) and supports versions back to N-2. So the v13 will be updated until May (v16's release) and v14 until September-ish (v17's release). Using v14 and prior is not a viable strategy for AGPL averse companies in the long run... unless they want to fork.

After September 2024, the Teleport options that will get updates are:

1. Compiling Teleport yourself under the terms of the AGPL

2. Use the pre-compiled Community Edition under its new commercial license (<100 employees and <$10MM)

3. Purchase a license (or Teleport Cloud tenant) under enterprise terms

The recent Teleport licensing changes are designed to:

1. Push business users in category 1 and 2 into category 3 and

2. Preempt having Teleport's value resold by a big cloud player like the AWS Elasticsearch/OpenSearch kerfuffle a while back.

Source: I work at Teleport, and while I had no say in the license change, I did keep an ear out as I care about our open source stance. It is part of what brought me to the company.


This also quite relevant news regarding Teleport: https://goteleport.com/blog/teleport-community-license/

Starting v16 they are switching to a commercial license.

It’s quite interesting to see how the market around access tooling is evolving. Especially now with the platform shift with AI


Yes it's similar to teleport, but I always found teleport way too expensive and and complex to set up. Also as you say it's not really open source.

Of course Kviklet isn't as mature yet but since I just want to target databases I think it has a bit of it's own niche.


What's not open source about AGPL?


It’s not that AGPL is not open source, but many enterprises are not allowed to use AGPL licensed software because of legel risks


I think the license is fine? Idk don't know so much about OS licenses. But teleport has a paid offering and hide a bunch of features behind a hefty paywall so the actual open source part has the limbs cut off.


That's great that they're moving to AGPL. Respecting user freedoms is important.


I failed the link in the OG post and cant seem to edit or even delete it :(

https://kviklet.dev/blog/parsing-the-postgres-protocol/

This is the correct one if someone wants to take a look.


You submitted the right one, but our software follows redirects and canonicals and eventually hit https://kviklet.dev/landing-page/blog/parsing-the-postgres-p....

I've fixed it above now.


Oh yeh something is off with my Cloudfront setup. I'll try fix it.

Thank you so much!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: