Reverse Proxy Deep Dive: Why Load Balancing at Scale Is Hard

(startwithawhy.com)

75 points | by miggy 3 days ago

4 comments

  • betaby 15 hours ago
    On the subject I can recommend the original paper from Google about Maglev https://static.googleusercontent.com/media/research.google.c...

    and subsequent enhancement from Yandex folks https://github.com/kndrvt/mhs

    Explanation is at https://habr.com/ru/companies/yandex/articles/858662/ use your favorite translate site.

  • ExoticPearTree 3 hours ago
    Shower thoughts: since we can do service discovery pretty easily to know when a server was added or removed from a pool, we can also discover a metrics endpoint with a limited set like CPU load, memory load, threads available etc. With a helper process/thread running alongside the loadbalancer main processes, it could populate/update in almost realtime the equivalent of an haproxy stick tables but with much richer information. When the next request hits the loadbalancer, you know “exactly” where to route it for best performance.
  • gerdesj 10 hours ago
    HA Proxy has been doing this sort of thing for a very, very long time.

    You have stick tables and a very rich way of populating them and then you can use these tables of in RAM data to make routing decisions.

    Sometimes you need another proxy too - eg Apache/nginx or whatever, perhaps for authn/authz.

    Yes it is a tricky concept and this series of articles merely scratches the surface. Good effort though.

    • miggy 8 hours ago
      Author here. Absolutely, HAProxy’s sticktables is a powerful way to implement advanced routing logic, and they’ve been around for years. This series focuses on explaining the broader concepts and tradeoffs rather than diving deep into any single implementation, and since it also covers other aspects of reverse proxies, the focus on load balancing here is mostly to present the challenges and high-level ideas.

      Glad you found it a good effort, and I agree there’s room to go deeper in future posts.

  • nimbius 14 hours ago
    its honestly not, but younger developers can be forgiven for assuming traefik is all you need. the learn-to-code camps really did a number on kids these days :(

    use DSR and 50% of your traffic is taken care of. https://www.loadbalancer.org/blog/direct-server-return-is-si...

    explore load balancing lower in the stack based on ASN to preroute stuff for divide and conquer. (geolocated, etc...)

    weighted load balancing only works for uniform traffic sources. youll need to weight connections based on priority or location, backend heavy transactions (checkout vs just browsing the store) and other conditions that can change the affinity of your user (sometimes dynamically.) keepalived isnt mentioned once, or .1q trunk optimization, or SRV records and failover/HA thats performed in most modern browsers based on DNS information itself.

    • miggy 8 hours ago
      Author here. Thanks for sharing these thoughts. You’re right that DSR, ASN-based routing, SRV records, and other lower-layer approaches are important in certain setups.

      This post is focused primarily on Layer 7 load balancing, connection and request routing based on application-level information, so it doesn’t go into Layer 3/4 techniques like DSR or network-level optimizations. Those are certainly worth covering in a broader series that spans the full stack.

    • SteveNuts 9 hours ago
      > most modern browsers based on DNS information itself.

      I went down this rabbit hole and was surprised how all over the place the behavior was against various http clients (not just browsers). Very little consistency in how the IPs in the dns response are retried, if at all.