Implement load balancing and automatic scaling using fly.io

Introduction#

fly.io is a modern application delivery platform that provides load balancing and automatic scaling capabilities to ensure the reliability and performance of applications.

According to the fly.io load balancing documentation, it is known that fly.io supports automatic scaling based on request volume or TCP connection count. Because my friend's API was attacked recently and the server crashed, and because his APIs are used outside of browsers, such as in some applications and scripts, it is difficult to implement defense strategies. Therefore, I recommended him to migrate to fly.io.

Configuration#

First, package the program into a Docker image and deploy it on fly.io. I used the 2H512M configuration and created 11 instances.

Then, configure the maximum connection count for each instance using fly.toml and [services.concurrency]. Here is an example configuration:

# fly.toml app configuration file generated for cheat-show-backend on 2023-09-15T18:53:53-05:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = "XXXXXXXXXXXX"
primary_region = "lax"
swap_size_mb = 1024

[http_service]
  internal_port = 8080
  force_https = false
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]
  [services.concurrency]
    type = "connections"
    hard_limit = 10000
    soft_limit = 1200

[build]
dockerfile = "Dockerfile"

soft_limit is the soft limit. fly.io will determine the number of connections and limit each instance accordingly. If the connection count exceeds this limit for a single instance, additional instances will be started, enabling automatic scaling. After testing with wrk, it was determined that our deployed service can handle up to 2k concurrent connections per instance. To allow enough time for scaling, I changed the limit to 1200. If the connection count is below this value, fly.io will only start one machine. If it exceeds this value, multiple machines will be started based on the connection count.

hard_limit is the hard limit. If a single instance exceeds this value, a 503 error will be returned.

Load Testing#

Default State#

Increased Connection Count (wrk 5000 threads)#

Under normal circumstances, only one instance will be started. Billing is done per minute, with one instance costing $3.8 per month. If the connection count decreases, it will automatically scale down to one instance. Instances in a stopped state will not incur charges.

About Some Pitfalls#

fly.io's load balancing is source-based. If I have one instance in New York and one in Los Angeles, I would prefer to keep the Los Angeles instance running by default, with New York as a backup. In this case, a Los Angeles server is needed to reverse proxy fly.io. Direct access would cause fly.io to start instances based on proximity, so if a user from New York accesses it, a New York instance would be started, even if the connection count is only 1. Therefore, I also deployed 3 nginx reverse proxy instances in the Los Angeles region of fly.io, using 3 instances of 1h256m for load balancing. After optimization, each instance can handle over 10k concurrent connections.

Update: It seems that the issue with automatic scaling in the same region has been fixed. Previously, in the same region, all machines would be started by default for load balancing, and scaling only worked for cross-region scenarios. This pitfall no longer exists, and it currently works for the same region as well.
.

Optimized Dockerfile

FROM nginx
RUN sed -i 's/worker_processes auto;/worker_processes 8;/' /etc/nginx/nginx.conf
RUN sed -i 's/worker_connections [0-9]*;/worker_connections 9999999;/' /etc/nginx/nginx.conf
COPY nginx.conf /etc/nginx/conf.d/nginx.conf

nginx.conf

server {
  listen 8080 default_server;
  listen [::]:8080 default_server;
  server_name         api.test.com;
  keepalive_timeout   75s;
  keepalive_requests  100;

  location / {
      proxy_pass            http://ip:80;
      proxy_set_header      Host $host;
      proxy_set_header      Upgrade $http_upgrade;
      proxy_http_version 1.1;
  }
}

This allows for low-cost automatic scaling of services.

Cost Issue#

The 3 instances used to deploy nginx are free. The backend instance is usually only one and costs $3.8 per month. Instances that are not started will not incur charges.