High Availability and Load Balancing

Guide to deploying Radiator Server in high availability configurations with load balancing, failover strategies, and multi-node architectures

High Availability and Load Balancing

This document covers strategies for deploying Radiator Server in high availability (HA) and load-balanced configurations to ensure continuous service availability and scalability.

In order to create high availability and load balancing the obvious, and correct, solution is to add multiple instances of Radiator processes.

  1. Incoming Requests
  • Capabilities to do LB/HA is typically restricted by the AAA Clients capabilities. These are usually IP/DNS based lists of Radiator servers to connect to. Some protocols keep long living connections due to encryption requirements (RadSec, EAP).
  1. Radiator instances
  • Authentications setups are Active-Active setups with all instances being active. See Radiator Sizing for performance information.
  1. Backends
  • The backends that the Radiator talks to can use various ways of discovering, retrying, load balancing, fail over. These are covered in Backend Load Balancing
  • Radiator also supports Local Backends for simplified deployments.
  • Very often the speed of the backends is the limiting factor.
  1. Other common requirements
  • Geographical redundancy
  • Standalone operations
  • Encryption and data security

Overview

When designing HA and LB solutions these two usually are done together. The metric that guides us is the peak performance need and capabilities for the entire system.

  1. Maximum TPS needed.
  • This is usually derived from amount of users and how quickly they need to be authenticated after a full system restart (people coming to work in the morning is a similar value).
  • 10k users, authentication of all in 5 minutes (300 seconds). This would result in a TPS need of 33 req/second. Radiator is tested to do 40k req/second on a 8 core machine. Yes this is 1000 times more than the need.
  • Satisfaction of encryption requirements slows down to 10 requests per CPU core. Resulting in 3.33 CPU cores needed.
  • So in order to satisfy the need we need 1 machine with 4 cores to handle the 10k users.
  1. Backend capabilities
  • Usually there is an external system to request information from. These are always network connected and have internal latency.
  • Ability to have multiple backends to do load balancing
  • Understand total TPS of backends. This might result in a need to throttle incoming requests at Radator side to ensure stable operations of backends.

This document uses a naming convention Radiator clusters (C01) and instances (R01) are explained in HA Identifiers.

Architecture Patterns

Here we cover how incoming requests can be handled in HA/LB setup. These are very often limited by the capabilities of the clients connecting to the system. Some will not have proper round robin setup for multiple Radiator servers, some will not support DNS lookups for services and so on.

Multiple Radiator instances

This is the most common setup where more Radiator instances are deployed. This is called Active-Active setup. In Radiator installations Active-Passive is not usually relevant as it is a thing for databases which need to maintain a single state.

Each client can connect to any of the Radiator instances. The dotted lines indicate that clients typically have all servers configured and will fail over between them if one becomes unavailable. This could be done on DNS level, but usually it is best to configure the clients to just round robin Radiator servers.

For load balancing just two nodes adds limited value as in the case of one node going down the other one needs to handle all traffic.

By adding more nodes we start seeing benefits of load balancing.

Not all clients support many Radiator servers so this ends up being problematic as nodes need to be configured to each client differently.

DNS Round-Robin

Using DNS for round robin selection of Radiator services is an efficient way of balancing load and achieving high availability. It does require that DNS works for the lookups to work. Usually DNS is built to be redundant.

Even with one Radiator failing we still have of 2/3 Radiators running independently. Clients will retry to next IP on failure.

Load Balancers

Instead of using DNS for load balancing one could use a hardware or software load balancers. Radiator itself is quite powerful and using a load balancer in front is probably an overkill solution for only performance needs. Load balancers in front usually solve the following problems:

  • Cloud providers load balancers for incoming traffic
  • Hardware load balancers for security requirements
  • Proxying of requests to simplify client configurations
  • Transmission encryption handling

This solution allows for CPU intensive encryption activities to be distributed intelligently depending on performance or protocol related needsa.

Radiator itself can act as an IP access list, load balancer, proxy and rate limiter. Some authentication protocols benefit or require sessions with clients and in those cases using a middle Radiator (or two) might be the right solution.

Session State Management

Some AAA protocols require session state to be kept over multiple requests. Examples of these are TACACS+ and EAP-TLS multi round requests-response groups. In these cases a load balancer in front of the actual Radiators is needed.

Some cases will require external databases for keeping session states. This is needed for change of authority (CoA) requests to be possible from external systems. Like a client needing a different profile for speed.

See Backends for more information how to configure them.

Monitoring Integration

Radiator provides comprehensive monitoring capabilities through Prometheus metrics and the Management API. These enable centralized logging aggregation and alerting on node failures across your HA deployment.

PROXY Protocol Integration

Using PROXY protocol to preserve client IPs through load balancers - see PROXY Protocol Support for detailed configuration.