Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/master/src/dev/smc/conf/cloud.sagemath.com/gce-load-balancing.md
Views: 687
Cloudflare + GCE Load Balancing
This is our setup regarding cloudflare and GCE's load balancing.
Problem
In the fully SMC setup, there are N front-facing nodes running haproxy. They deal with incoming TCP traffic on port 80/443 → see haproxy config. Cloudflare is a reverse DNS proxy, which does forward traffic to SMC. Cloudflare (or even usual DNS entries) do not have any idea, if the machine, where incoming IP traffic is forwarded to, is alive or not.
Solution
Use GCE's load balancer for network traffic (TCP, layer 4), to forward only to those front-end haproxies, which are alive and healthy. They'll then do the cookie-based layer 7 load balancing across the hubs.
Setup
Target pools. This is a static list of GCE instances, which are handling traffic. Currently, that's set to be web0, 1 and 2.
(In the future, this could be dynamic, based on CPU usage, using instance templates or even containers, etc.)
Documentation: https://cloud.google.com/compute/docs/load-balancing/network/target-pools
Health Checks: Each haproxy instance has a health endpoint
/health-check
on port 60000 (not forwarded from the outside through the firewall, but accessible internally), that's being used to check if it is working. Therefore, a health check is created, to test every 2 secs if it runs, and fails after 2 consecutive fails, etc.haproxy config snippet
GCE health check:
In my first setup, it checks every 2 seconds, and it's dead if it fails two times, and alive if ok after 2 tests. So, it toggles in less than 5 seconds. (So, iff at least one haproxy is always alive, assuming daily restarts that gives an uptime ratio of 9-nines -- 1 - 100 * (5secs/24hours)^3)
Documentation: https://cloud.google.com/compute/docs/load-balancing/health-checks
PS: GCE network firewall rule:
Finally, to make this work, we need an external static IP address and forwarding rules. There are two rules, for port 80 and 443:
Actually check health:
All this can be configured via
https://console.cloud.google.com/networking/loadbalancing/list
checks for the target pool:
Ideas for the future
graceful shutdown of haproxy
To gracefully shutdown/restart an instance, use iptables to temporarily block the alive check. Existing TCP connections will stay connected, while no others will be created. After some time, cut off the remaining ones and proceed with a shutdown.
https://cloud.google.com/compute/docs/load-balancing/health-checks#handling_unhealthy_instances
add haproxy health information
pretty much what is here:
https://cbonte.github.io/haproxy-dconv/configuration-1.6.html#4.2-monitor fail