Load Balancing and Proxy Configuration for CS

If you plan on using Riak CS in production, we highly recommend that you place Riak CS behind a load-balancing or proxy solution, be it hardware or software based. Also note that you should not directly expose Riak CS to public-facing network interfaces.

Riak CS users have reported success in using Riak CS with a variety of load-balancing and proxy solutions. Common solutions include proprietary hardware-based load balancers, cloud-based load-balancing options—such as Amazon’s Elastic Load Balancer—and open-source software projects like HAProxy and Nginx.

This guide briefly explores the commonly used open-source solutions HAProxy and Nginx and provides some configuration and operational tips gathered from community users and operations-oriented engineers at Basho.

HAProxy

HAProxy is a fast and reliable open-source solution for load balancing and proxying of HTTP- and TCP-based application traffic.

Users have reported success in using HAProxy in combination with Riak CS in a number of configurations and scenarios. Much of the information and example configuration for this section is drawn from the experiences of users in the Riak CS community in addition to suggestions from Basho engineering.

Example Configuration

The following is an example starting point configuration for HAProxy to act as a load balancer to a Riak CS installation.

Note on open files limits

The operating system’s open files limits need to be greater than 256000 for the example configuration that follows. Consult the Open Files Limit documentation for details on configuring the value for different operating systems.

global
    log 127.0.0.1     local0
    log 127.0.0.1     local1 notice
    maxconn           256000
    spread-checks     5
    daemon

defaults
    log               global
    option            dontlognull
    option            redispatch
    option            allbackups
    no option         httpclose
    retries           3
    maxconn           256000
    timeout connect   5000
    timeout client    5000
    timeout server    5000

frontend riak_cs
    bind              10.0.24.100:8080
    # Example bind for SSL termination
    # bind            10.0.24.100:8443 ssl crt /opt/local/haproxy/etc/data.pem
    mode              http
    option            httplog
    capture           request header Host len 64
    acl good_ips      src -f /opt/local/haproxy/etc/gip.lst
    block if          !good_ips
    use_backend       riak_cs_backend if good_ips

backend riak_cs_backend
    mode              http
    balance           roundrobin
    # Ping Riak CS to determine health
    option            httpchk GET /riak-cs/ping
    timeout connect 60s
    timeout http-request 60s
    server riak1 r1s01.example.com:8081 weight 1 maxconn 1024 check
    server riak2 r1s02.example.com:8081 weight 1 maxconn 1024 check
    server riak3 r1s03.example.com:8081 weight 1 maxconn 1024 check
    server riak4 r1s04.example.com:8081 weight 1 maxconn 1024 check
    server riak5 r1s05.example.com:8081 weight 1 maxconn 1024 check

Please note that the above example is considered a starting point and is a work in progress. You should carefully examine this configuration and change it according to your specific environment.

A specific configuration detail worth noting from the example is the commented option for SSL termination. HAProxy supports SSL directly as of version 1.5. Provided that your HAProxy instance was built with OpenSSL support, you can enable it by uncommenting the example line and modifying it to suit your environment. More information is available in the HAProxy documentation.

Also note the option for checking Riak CS health via the /riak-cs/ping endpoint. This option is essential for checking each Riak CS node as part of the round robin load-balancing method.

Nginx

Some users have reported success in using the Nginx HTTP server to proxy requests for Riak CS. An example that provides access to Riak CS is provided here for reference.

Example Configuration

The following is an example starting-point configuration for Nginx to act as a front-end proxy to Riak CS.

upstream riak_cs_host {
  server  10.0.1.10:8080;
}

server {
  listen   80;
  server_name  _;
  access_log  /var/log/nginx/riak_cs.access.log;

  location / {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_redirect off;

    proxy_connect_timeout      90;
    proxy_send_timeout         90;
    proxy_read_timeout         90;

    proxy_buffer_size          64k;  # If set to a smaller value,
                                     # nginx can complain with a
                                     # "headers too large" error

    proxy_buffers 8  64k;   # Increase from default of (8, 8k).
                            # If left to default with increased
                            # proxy_buffer_size, nginx complains
                            # that proxy_busy_buffers_size is too
                            # large.

    proxy_pass http://riak_cs_host;
  }
}

Note that the directive proxy_set_header Host $http_host is essential to ensure that the HTTP Host: header is passed to Riak CS as received rather than being translated into the hostname or address of the Riak CS backend server.

It’s also important to note that proxy_pass should not end in a slash, as this can lead to a variety of issues.