Not the OP, not a consultant, the biggest ES system I work on is an order manage...

Not the OP, not a consultant, the biggest ES system I work on is an order management system in one of the biggest investment bank in APAC, which processes orders from clients to 13 stock exchanges.

40 people approx work on it in Asia, maybe around 100 globally at the raw dev level.

I feel it's a sort of false good idea for our particular problems. Clients trade ether at the 10ms latency for high value orders or sub-ms for latency-sensitive ones. They conceptualise what they want to buy and a bunch of changes they want to apply on that bulk: for instance, change in quantity, price limits, time expiry and try to make money by matching a target price either by very closely following a prediction curve or spending as little time on doing so (giving us only a constraint and asking us to fit it by doing our own curve fitting algos).

The tech stack is pure java with a kernel of C++ for networking, with as little external dependency as possible. And no GC beyond the first few minutes after startup (to prealloc all the caches). Everything is self built.

How does it solve the problem: it simply is the only way we can think of to over optimize processing. If you want to fit a millisecond or sub millisecond target (we use fpga for that), you must cut the fat as much as possible. Our events are 1kb max, sent in raw tcp, the queue is the network switch send queue, the ordering has to he centrally managed per exchange since we have only one final output stream but we can sort of scale out the intermediary processing (basically for one input event how to slice in multiple output events in the right order).

I'd say it doesnt work: we run into terrifying issues the author of the main link pointed out so well (the UI, God, it's hard, the impossible replays nobody can do, the useless noise, solving event stream problems more than business problems etc), but I d also say I cant imagine any heavier system fitting better the constraint. We need to count the cycles of processing - we cant have a vendor library we cant just change arbitrarily. We cant use a database, we cant use heavier than tcp or multicast. I'll def try another bank one day to see how others do because I m so curious.