Cool, building in resilience seems to have worked. Our static site has origins i...

SteveNuts · 2025-10-20T16:32:13 1760977933

> Our static site has origins in multiple regions via CloudFront and didn’t seem to be impacted

This seems like such a low bar for 2025, but here we are.

immibis · 2025-10-20T18:50:40 1760986240

You're also betting that CloudFront isn't one of the several AWS services that only works when us-east-1 is up.

mlhpdx · 2025-10-20T19:44:32 1760989472

Yeah, it's not clear how resilient CloudFront is but it seems good. Since content is copied to the points of presence and cached it's the lightly used stuff that can break (we don't do writes through CloudFront, which in IMHO is an anti-pattern). We setup multiple "origins" for the content so hopefully that provides some resiliency -- not sure if it contributed positively in this case since CF is such a black box. I might setup some metadata for the different origins so we can tell which is in use.

x3n0ph3n3 · 2025-10-21T15:22:09 1761060129

CloudFront isn't just for CDN, but also for DDoS protection. Writes through CloudFront are not an anti-pattern.

mlhpdx · 2025-10-21T15:38:45 1761061125

There is always more than a way to do things with AWS. But CloudFront Origin groups can’t use HTTP POST. They’re limited to read requests. Without origin groups you opt-out of some resiliency. IMHO that’s a bad trade-off. To each their own.

x3n0ph3n3 · 2025-10-21T20:47:25 1761079645

WAF is cheaper on CloudFront and so is traffic (compared to the ALB). It keeps bad traffic near the sender rather than near the recpient.

melbourne_mat · 2025-10-21T08:30:47 1761035447

Yep if you wrote lambda@edge functions, which are part of Cloudfront and can be used for authentication among other things, they can only be deployed to us-east-1

nijave · 2025-10-21T13:43:51 1761054231

I was under the impression it's similar to IAM where the control plane is in us-east-1 and the config gets replicated to other regions. In that case, existing stuff would likely continue to work but updates may fail

nothrabannosir · 2025-10-21T00:52:26 1761007946

afaik cloudfront TLS certs and access logs S3 buckets must be stored in us-east-1

mlhpdx · 2025-10-21T15:47:29 1761061649

True for certs but not the log bucket (but it’s still going to be in a single region, just doesn’t have to be Virginia). I’m guessing those certs are cached where needed, but I can also imagine a perfect storm where I’m unable to rotate them due to an outage.

I prefer the API Gateway model where I can create regional endpoints and sew them together in DNS.

AndrewKemendo · 2025-10-20T17:22:08 1760980928

How did you do resilient auth for keys and certs?

mlhpdx · 2025-10-20T19:40:06 1760989206

We use AWS for keys and certs, with aliases for keys so they resolve properly to the specific resources in each region. For any given HTTP endpoint there is a cert that is part of a the stack in that region (different regions use different certs).

The hardest part is that our customers' resources aren't always available in multiple regions. When they are we fall back to a region where they exist that is next closest (by latency, courtesy of https://www.cloudping.co/).

AndrewKemendo · 2025-10-20T21:19:27 1760995167

That’s what I’d expect a basic setup to look like - region/space specific

So you’re minimally hydrating everyone’s data everywhere so that you can have some failover. Seems smart and a good middle ground to maximize HA. I’m curious what your retention window for the failover data redundancy is. Days/weeks? Or just a fifo with total data cap?

mlhpdx · 2025-10-20T22:08:58 1760998138

Just config information, not really much customer data. Customer data stays in their own AWS accounts with our service. All we hold is the ARNs of the resources serving as destinations.

We’ve gone to great lengths to minimize the amount of information we hold. We don’t even collect an email address upon sign-up, just the information passed to us by AWS Marketplace, which is very minimal (the account number is basically all we use).

AndrewKemendo · 2025-10-20T23:24:59 1761002699

Ah well that certainly makes it easier

zild3d · 2025-10-20T18:30:00 1760985000

active/active? curious what the data stack looks like as that tends to be the hard part

mlhpdx · 2025-10-20T19:34:36 1760988876

The data layer is DynamoDB with Global Tables providing replication between regions, so we can write to any region. It's not easy to get this right, but our use case is narrow enough and rate of change low enought (intentionally) that it works well. That said, it still isn't clear that replication to us-east-1 would be perfect so we did "diff" tables just to be sure (it has been for us).

There is some S3 replication as well in the CI/CD pipeline, but that doesn't impact our customers directly. If we'd seen errors there it would mean manually taking Virginia out of the pipeline so we could deploy everyehere else.

rstupek · 2025-10-20T21:20:46 1760995246

So your global tables weren't impacted in us-east-1... I thought I read their status showed issues with global table replication

mlhpdx · 2025-10-20T21:29:17 1760995757

Our stacks in us-east-1 stopped getting traffic when the errors started and we’ve kept them out of service for now, so those tables aren’t being used. When we manually checked around noon (Pacific) they were fine (data matched) but we may have just gotten lucky.

zild3d · 2025-10-20T19:56:06 1760990166

cool thanks, we've been considering dynamo global tables for the same. We have S3 replication setup for cold storage data. For primary/hot DB there doesn't seem to be many other options for doing local writes