Rate Limiting

Strategies for implementing rate limiting and request throttling in Radiator Server using Lua scripts and cache-based counters

Overview

Rate limiting in Radiator Server is driven by Lua scripts, giving full programming flexibility over when and how limits are applied — without modifying server configuration or restarting the service. Any request attribute, context variable, or external data source can influence the decision. Limits can be combined, stacked, and conditioned arbitrarily, so complex policies are built from simple, readable script logic rather than specialized configuration.

This document walks through a realistic example that builds a multi-layered rate limiting setup from scratch. The same pattern can be adapted to a wide range of scenarios. Examples of what can be expressed as straightforward Lua logic:

  • Per-user limits — Independent per-identity limits regardless of origin device (e.g. 5 attempts per 15 minutes), with optional randomized backoff to defeat timed brute-force attacks.
  • Per-device limits — Throttle by IP address or MAC address (e.g. 5 attempts per 15 minutes per NAS), blocking compromised devices from credential stuffing.
  • Per-access-point limits — Cap traffic at the access point or NAS level with a configurable burst window to absorb the client reassociation storm after a reboot or power outage.
  • Topology-aware limits — Apply different limit profiles per IP prefix: stricter thresholds for internet sources, relaxed for own address space, with country-level or ISP-level segmentation as needed.
  • Fully composable — All dimensions apply simultaneously in a single pipeline. User, device, and access-point limits stack independently, covering complex scenarios such as roaming, multi-tenancy, and tiered SLAs without custom server logic.

Algorithms

Two algorithms are available depending on the traffic pattern:

  • GCRA (Generic Cell Rate Algorithm) — Spaces requests smoothly using a virtual scheduling approach: each request is assigned a Theoretical Arrival Time (TAT), and requests that arrive before their scheduled slot are rejected. Ideal for protecting backends from bursts and for slowing brute-force attempts. Supports a randomization variant (rate_limit_gcra_rnd) to prevent attackers from predicting retry windows.
  • Counter-based (increment) — Counts requests within a fixed time window. Useful for tiered quota enforcement where different actions should trigger at different thresholds (e.g., warn at 80%, block at 100%). Also useful for longer term blocks that can be triggered after GCRA happens. Fail 5 times and then lock with a counter for 24h.

See Rate Limiting Algorithms for a detailed comparison with examples.

Rate limit counters are stored in the local in-process cache on each node. This makes rate limit checks extremely fast — a counter increment adds no network round trips to the request path. In a clustered deployment each node enforces limits independently, so the effective thresholds across the cluster scale with the number of nodes. This is intentional: rate limiting is a first line of defense against abusive traffic patterns, not an exact quota system. It stops the majority of brute-force and flooding attempts locally, before they reach the authentication backend, which is where the real cost lies.

Because counters live in memory, they are reset on server restart and on configuration reload. Any in-progress lockout or rate limit window is cleared, and all counters start from zero.

Example: Protecting a VPN Backend for Fixed TPS

The following example shows how to combine multiple GCRA layers to protect a VPN authentication backend. The backend can handle 600 requests per minute (10 TPS) and must be protected from brute-force attacks, credential stuffing, and unintended overload from misbehaving VPN gateways. The goal is that no single user, client device, or gateway can exhaust backend capacity.

Layer 1 — User level (5 per 15 minutes, GCRA with randomization)

Limits how fast a single user identity can attempt authentication. Five attempts per 15 minutes is sufficient for a human typing a password, while the randomized emission interval prevents an attacker from precisely timing retry windows.

All time values in the cache API are in milliseconds (900000 = 15 minutes, 30000 = 30 seconds).

local user_allowed = context.cache:rate_limit_gcra_rnd(
    "user_limit",
    "user:" .. context.aaa.identity,
    5,        -- 5 attempts
    900000,   -- per 15 minutes
    30000     -- +/-30 second jitter
)
if not user_allowed then
    context.aaa.reason = "user_rate_limited"
    context.aaa.message = "Too many login attempts, please try again later"
    return result.REJECT
end

context.aaa.reason is written to the server log and any configured audit trail. context.aaa.message is sent to the client as a RADIUS Reply-Message attribute, where supported by the NAS.

Layer 2 — Device/IP level (10 per 15 minutes GCRA, 1-hour block after 5 violations)

Keys the limit on Framed-IP-Address (the user's tunnel IP, sent by all major VPN vendors), falling back to Calling-Station-Id if Framed-IP-Address is absent. If neither attribute is present the device cannot be identified and this layer is skipped — the per-user and gateway layers still apply. In VPN deployments each device typically has a unique tunnel IP, but some carriers use NAT so a single address may represent several users — the GCRA limit is set to 10 per 15 minutes to absorb that. Each GCRA rejection increments a violation counter with a 1-hour window. At 5 violations the counter itself acts as the block signal, checked at the top of the script before GCRA runs, so no separate block key is needed.

The GCRA cache is managed automatically. The violation counter must be declared in radconf with the desired timeout:

caches {
    cache "device_violations" {
        timeout 3600s;  # 1 hour — violation window and block duration
    }
}
-- Framed-IP-Address is preferred (sent by all major VPN vendors); Calling-Station-Id is the fallback.
-- If neither is present the device cannot be identified and this layer is skipped.
-- Note: Lua attr() requires lowercase attribute names (dictionary names are lowercased).
local device = context.radius.request:attr("framed-ip-address") or context.radius.request:attr("calling-station-id")
if device then
    -- 5 or more GCRA violations in the last hour → hard block
    local violations = tonumber(context.cache:get("device_violations", "device:" .. device)) or 0
    if violations >= 5 then
        context.aaa.reason = "device_rate_limited"
        context.aaa.message = "Device rate limit exceeded"
        return result.REJECT
    end

    local device_allowed = context.cache:rate_limit_gcra_rnd(
        "device_limit",
        "device:" .. device,
        10,       -- 10 attempts per 15 minutes; higher than per-user to tolerate NAT
        900000,   -- per 15 minutes
        30000     -- +/-30 second jitter
    )
    if not device_allowed then
        -- Increment violation counter; window opens on first rejection and expires after 1 hour
        context.cache:increment("device_violations", "device:" .. device, 1)
        context.aaa.reason = "device_rate_limited"
        context.aaa.message = "Device rate limit exceeded"
        return result.REJECT
    end
end

Layer 3 — VPN gateway level (300 per 5 minutes, GCRA)

A VPN gateway may serve hundreds of concurrent users. Under normal operation it generates a steady stream of authentications; a burst occurs when a gateway restarts and all client sessions reconnect simultaneously. The limit is set to 300 per 5 minutes (10% of backend capacity over that window), giving enough headroom for a reconnect storm while preventing a single misbehaving or compromised gateway from saturating the backend. GCRA's natural burst behavior absorbs the initial rush before spacing enforcement kicks in.

local gw = context.radius.request:attr("nas-identifier") or context.radius.client.ip
local gw_allowed = context.cache:rate_limit_gcra(
    "gw_limit",
    "gw:" .. gw,
    300,      -- 300 requests
    300000    -- per 5 minutes
)
if not gw_allowed then
    context.aaa.reason = "gateway_rate_limited"
    context.aaa.message = "Gateway rate limit exceeded"
    return result.REJECT
end

Layer 4 — Global backend hard limit (10 TPS, GCRA)

A single global GCRA counter shared across all requests enforces the absolute backend capacity limit. Unlike the per-entity layers above, this uses a fixed key so every request counts against the same budget. At 10 TPS the emission interval is 100 ms — any request arriving faster than that is rejected immediately, protecting the backend from saturation regardless of how requests are distributed across users, devices, or APs.

local allowed = context.cache:rate_limit_gcra(
    "backend_limit",
    "global",   -- single shared key across all requests
    10,         -- 10 requests
    1000        -- per 1 second (100 ms emission interval)
)
if not allowed then
    context.aaa.reason = "backend_rate_limited"
    context.aaa.message = "Service temporarily unavailable, please retry"
    return result.REJECT
end

Routing by IP Space

In VPN termination, the RADIUS client is always the VPN gateway — a fixed, known IP. The meaningful distinction is where the end user is connecting from: the office Wi-Fi (a trusted internal network) or the internet (strict limits apply).

For IP VPN clients, the NAS includes the user's tunnel IP in Framed-IP-Address (RADIUS type ipaddr). Handler conditions support native CIDR matching on ipaddr attributes, so traffic can be split at the handler level before any pipeline logic runs. Regardless of which handler is selected, both call backend_rate_limit last — a shared global 10 TPS hard cap that applies to the combined traffic from all origins:

scripts {
    lua "rate_limit" {
        filename "lua/rate_limit.lua";  -- all layers; branches on vars.trusted
    }
    lua "backend_rate_limit" {
        filename "lua/backend_rate_limit.lua";  -- layer 4: global 10 TPS hard limit
    }
}

aaa {
    policy "DEFAULT" {

        # All handlers in this policy process only authentication requests.
        conditions all {
            radius.request.code == radius.ACCESS_REQUEST;
        }

        handler "OFFICE" {
            conditions any {
                # User is connecting from an RFC-1918 address space
                radius.request.attr.Framed-IP-Address == 10.0.0.0/8;
                radius.request.attr.Framed-IP-Address == 172.16.0.0/12;
                radius.request.attr.Framed-IP-Address == 192.168.0.0/16;
            }

            @execute {
                # Mark as trusted so rate_limit.lua uses relaxed counters (int_*)
                # and skips the gateway-level check.
                # Office users: relaxed per-user/device limits, shared backend budget
                modify {
                    vars.trusted = "true";
                }
                script "rate_limit";
                script "backend_rate_limit";
                backend {
                    name "USERS";
                }
                pap;
            }
        }

        handler "INTERNET" {
            @execute {
                # Internet users: full four-layer rate limiting
                script "rate_limit";
                script "backend_rate_limit";
                backend {
                    name "USERS";
                }
                pap;
            }
        }
    }
}

Handlers are evaluated in order and the first one with matching conditions is selected. A handler without conditions acts as a catch-all fallback. Multiple prefixes within a single conditions any block match if any one of them contains the attribute value.

The rate_limit.lua script reads vars.trusted to select the appropriate counter namespace and thresholds:

local context, previous = ...

local trusted      = context.vars:get("trusted") == "true"
local prefix       = trusted and "int_" or "ext_"
local user_limit   = trusted and 20 or 5   -- relaxed for internal users
local device_limit = user_limit * 10       -- device limit is 10x the per-user limit

-- Per-user check
local user_allowed = context.cache:rate_limit_gcra_rnd(
    prefix .. "user_limit",
    "user:" .. context.aaa.identity,
    user_limit,   -- 20 attempts (internal) or 5 (external)
    900000,       -- per 15 minutes
    30000         -- +/-30 second jitter
)
if not user_allowed then
    context.aaa.reason = "user_rate_limited"
    context.aaa.message = "Too many login attempts, please try again later"
    return result.REJECT
end

-- Per-device check (Framed-IP-Address preferred; falls back to Calling-Station-Id;
-- skipped if neither attribute is present)
-- Note: Lua attr() requires lowercase attribute names (dictionary names are lowercased).
local device = context.radius.request:attr("framed-ip-address") or context.radius.request:attr("calling-station-id")
if device then
    local device_allowed = context.cache:rate_limit_gcra_rnd(
        prefix .. "device_limit",
        "device:" .. device,
        device_limit,   -- 200 attempts (internal) or 50 (external)
        900000,         -- per 15 minutes
        30000           -- +/-30 second jitter
    )
    if not device_allowed then
        context.aaa.reason = "device_rate_limited"
        context.aaa.message = "Device rate limit exceeded"
        return result.REJECT
    end
end

-- Per-gateway check only for internet traffic (office users bypass this layer)
if not trusted then
    local gw = context.radius.request:attr("nas-identifier") or context.radius.client.ip
    local gw_allowed = context.cache:rate_limit_gcra(
        "gw_limit",
        "gw:" .. gw,
        300,    -- 300 requests
        300000  -- per 5 minutes
    )
    if not gw_allowed then
        context.aaa.reason = "gateway_rate_limited"
        context.aaa.message = "Gateway rate limit exceeded"
        return result.REJECT
    end
end

Further Extensions

The pattern above can be extended by adding Lua logic to the existing scripts without modifying the authentication pipeline or handler structure:

  • Work-from-home trusted IPs — After each successful authentication, store the user's Framed-IP-Address in the cache keyed by identity (cache:set("known_ip:" .. identity, client_ip, ttl)). On the next login attempt, look up the stored IP before applying any rate limits: if the current Framed-IP-Address matches, treat the request as trusted and apply the relaxed policy. Home IPs rarely change, so the stored value stays valid across sessions. During a volumetric attack from random source IPs, users connecting from their known home address continue to get through while the attacker's novel IPs are subject to the strict internet limits.
  • Progressive trust decay — Use a sliding TTL on the stored trusted IP entry. Each successful authentication from that IP refreshes the timer; if the user stops connecting, the home IP reverts to internet-class limits within a configurable window (e.g. 7 days).
  • Per-user emergency lockout — Write a lockout key to the cache when a threshold of failed attempts is crossed. Subsequent requests check for the key before any rate limit logic and reject immediately, independent of all other limits. The key can be cleared by an admin via the management API.
  • Country or ASN segmentation — Resolve Framed-IP-Address against a GeoIP or ASN database in Lua and assign a per-country or per-ASN counter namespace. Travelers from known business destinations can be given intermediate thresholds rather than the blanket internet policy.
  • Dynamic limit adjustment — Read per-user or per-tenant limit overrides from the backend at authentication time and pass them into the rate limit Lua script via vars. This allows tiered service levels (e.g. standard vs. premium accounts) without separate handler definitions.

Monitoring

Radiator Server maintains detailed per-handler counters covering request rates, accept/reject outcomes, and latency. These are visible in the built-in dashboards and exported via the Prometheus metrics endpoint for integration with external monitoring systems. When rate limiting is active, the reject counts on the relevant handlers will reflect it — no additional instrumentation is needed. Review the handler statistics dashboards to verify that limits are firing as expected and to tune thresholds based on observed traffic patterns.