> Indeed, in my own career what I've seen is that if one microservice goes down ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		geodel on Feb 9, 2024 \| parent \| context \| favorite \| on: Rebuilding Netflix's video processing pipeline wit... > Indeed, in my own career what I've seen is that if one microservice goes down the user won't be seeing 500 errors or friends Exactly what it does is that first few hours of triage call goes with people claiming "well my service is up and issue is somewhere else". So find which service failed itself take crucial hours instead of fixing the failing service. But in a world where Micro Service Incident Commanders can pinpoint failing a service among 1000 micro service within seconds on their vast 80 inch monitoring consoles and direct resolution admirals to fix in next 15 mins. It might just all work fine.

fragmede on Feb 9, 2024 [–]

the problem comes when it's a distributed system, and it's the interaction between multiple systems that's causing the problem, and not a specific microservice being down. something got upgraded and the message size changed in an unexpected and incompatible way that worked fine in testing.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact