> Either way, if you have more than 20k connections open to backends you are doing something seriously wrong
I don't see how that is a fringe or rare case. With a loadbalancer (using no pipelining or multiplexing), the number of simultaenous outgoing http connections to backend systems is at least the number of simulatenous open incoming http connections. Having more than 28k simultanous incoming http requests is not a lot for a busy load balancer.
Now with pipelining (or limiting to 28k outgoing connections), the loadbalancer has to queue requests and multiplex them to the backends when connections become available. Pipelining suffers from head-of-line blocking, increasing possible latency caused by the loadbalancer further. In any case, you will increase latency to the end-user by queing. If you use HTTP/2 multiplexing, you can go past those 28k incoming connections without queing on the loadbalancer side.
> the number of simultaenous outgoing http connections to backend systems is at least the number of simulatenous open incoming http connections
No it isn't. You establish a pool of long lived connections per backend. The load balancer should be coalescing in flight requests. At that traffic volume you should also be doing basic in-memory caches to sink things like favicon requests.
I am not going to respond further as this chain is getting quite off topic. There are plenty of good resources available from relevant Google searches, but if you really still have questions about how load balancers work my email is in my profile.
> You establish a pool of long lived connections per backend
Yes, and you would do the same with HTTP/2. You haven't addressed the head-of-line blocking problem caused by HTTP/1.1 pipelining, which HTTP/2 completely solves. Head-of-line blocking becomes an increasing issue when your HTTP connections are long lived, such as when using websockets or large-media transfers or streaming.
I don't see how that is a fringe or rare case. With a loadbalancer (using no pipelining or multiplexing), the number of simultaenous outgoing http connections to backend systems is at least the number of simulatenous open incoming http connections. Having more than 28k simultanous incoming http requests is not a lot for a busy load balancer.
Now with pipelining (or limiting to 28k outgoing connections), the loadbalancer has to queue requests and multiplex them to the backends when connections become available. Pipelining suffers from head-of-line blocking, increasing possible latency caused by the loadbalancer further. In any case, you will increase latency to the end-user by queing. If you use HTTP/2 multiplexing, you can go past those 28k incoming connections without queing on the loadbalancer side.