Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think that's necessarily true. The outage updates later identified failing network load balancers as the cause--I think DNS was just a symptom of the root cause

I suppose it's possible DNS broke health checks but it seems more likely to be the other way around imo



I don’t work for aws, but a different cloud provider so this is not a description of this incident, but an example of the kind of thing that can happen

One particular “dns” issue that caused an outage was actually a bug in software that monitors healthchecks.

It would actively monitor all servers for a particular service (by updating itself based on what was deployed) and update dns based on those checks.

So when the health check monitors failed, servers would get removed from dns within a few milliseconds.

Bug gets deployed to health check service. All of a sudden users can’t resolve dns names because everything is marked as unhealthy and removed from dns.

So not really a “dns” issue, but it looks like one to users




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: