Service Level Objective
How to configure service-level-objective blocks to enable automatic circuit breaking and degradation for backend servers.
Service Level Objective
A service-level-objective block configures automatic health monitoring for a backend server.
When a server violates the configured thresholds, Radiator marks it as degraded and routes
requests to healthier servers. Once the server recovers, traffic returns automatically.
Quick start example
When server-selection is configured, Radiator automatically applies a default
service-level-objective to every server that does not have an explicit block:
backends {
postgres "USERS" {
server-selection fallback;
server "primary" {
host "pg-primary.example.com";
database "radiator";
username "radiator";
connections 10;
}
server "replica" {
host "pg-replica.example.com";
database "radiator";
username "radiator";
connections 10;
}
query "FIND_USER" { ... }
}
}
Both servers above automatically receive:
service-level-objective {
failure-rate 3/5;
initial-backoff-period 3s;
max-backoff-period 30s;
recovery-probe-count 2;
}
Add an explicit service-level-objective block to a server to override these defaults.
An explicit block only needs to specify the fields that differ; omitted fields keep their
default values.
Servers without server-selection have no automatic SLO. Add an explicit
service-level-objective block if health monitoring is needed for a single-server backend.
Parameters
failure-rate
failure-rate <failures>/<window_size>;
Defines a sliding window failure rate. Radiator tracks the last window_size
request outcomes (successes and errors) for this server. If the number of failures
reaches failures, the server is marked degraded.
failuresmust be between 1 andwindow_size(inclusive).- Timeouts and connection errors both count as failures. Pool-exhaustion events
(all connections busy) do not count as failures — they are tracked separately
via the
PoolExhaustedcounter and do not affect SLO health. - The window fills incrementally. No violation triggers until the window has
window_sizetotal outcomes. failure-rate 1/5means any single failure in the last 5 requests degrades the server.failure-rate 5/5means all 5 of the last 5 requests must fail before degrading.
Default: failure-rate 3/5. A server is marked degraded when 3 or more of the last
5 requests fail.
max-backoff-period
max-backoff-period <duration>;
Controls time-based exponential backoff for degraded servers. When a server is degraded, Radiator waits an increasing duration before retrying:
- Wait
initial-backoff-period, then retry - Wait double that duration, then retry
- Continue doubling up to
duration
Accepts time units: 30s, 1m, 2h.
Set to 0s to disable backoff entirely — the degraded server is retried on every
incoming request instead of waiting. See Disabling backoff.
Default: 30s. Backoff doubles from initial-backoff-period up to 30 seconds.
initial-backoff-period
initial-backoff-period <duration>;
Sets the starting wait time before the first retry of a degraded server.
Subsequent retries double this value up to max-backoff-period.
Accepts time units: 1s, 500ms, 5s.
Default: 3s. The first backoff waits 3 seconds, then doubles.
recovery-probe-count
recovery-probe-count <count>;
Sets the number of consecutive successful probe requests required before a degraded server is considered recovered. Must be at least 1.
Default: 2. See How degradation works for the full
recovery and backoff behaviour.
How degradation works
When a server's failure rate is reached, Radiator marks it as degraded and stops routing normal traffic to it. While degraded, the server is still re-tried periodically using real incoming requests that get routed to it — these are the probes. If other servers are available, most traffic continues to flow to them; the degraded server only sees the occasional probe until it recovers.
Backoff and recovery
Once degraded, Radiator waits initial-backoff-period before routing the first probe
to the server. If the probe fails, the wait doubles for the next attempt, up to
max-backoff-period. If the probe succeeds, the next probe is sent after
initial-backoff-period again (not the doubled value). Once recovery-probe-count
consecutive probes succeed, the server returns to normal and all backoff state resets.
Backoff timeline example
With initial-backoff-period 3s, max-backoff-period 30s, and recovery-probe-count 2:
| Event | Wait for next probe |
|---|---|
| Server degrades | 3s |
| Probe fails | 6s |
| Probe fails | 12s |
| Probe fails | 24s |
| Probe fails | 30s (capped at max-backoff-period) |
| Probe succeeds (1 of 2) | 3s (held at initial-backoff-period) |
| Probe succeeds (2 of 2) | — recovered |
The stored backoff period is not reset until full recovery — if a probe succeeds and the next probe fails, the wait resumes from the last doubled value (30s in this example).
The wait only doubles when the failure count in the sliding window meets or exceeds
failures. A probe failure that does not push the window into violation still resets
the probe count but does not double the wait.
Monitoring
Each server emits counters when SLO violations occur. These counters appear in the
management API under the server's namespace (e.g., backend/Postgres/USERS/primary/):
| Counter | Description |
|---|---|
SLOFailureThresholdViolations | Incremented each time the server enters degraded state due to a failure threshold violation. |
SLORecovered | Incremented each time the server recovers from degraded state after a successful retry. |
SLOStillFailing | Incremented each time a degraded server is retried but still violates the SLO. |
SLOFailureThresholdViolations only increments on the transition to degraded state.
SLORecovered increments when a retry succeeds. SLOStillFailing increments when a
retry of a degraded server results in another SLO violation.
Interaction with server selection policies
The SLO system works with all server selection policies:
- fallback: Degraded primary is skipped; secondary becomes active. Primary recovers automatically when retries succeed.
- round-robin: Degraded servers are excluded from the rotation. Traffic redistributes across healthy servers.
- least-connections: Degraded servers are excluded from the pool sort. Healthy servers absorb the load.
When server-selection is configured, every server without an explicit
service-level-objective block automatically receives the default SLO
(failure-rate 3/5, initial-backoff-period 3s, max-backoff-period 30s,
recovery-probe-count 2). For single-server backends (no server-selection), add an
explicit service-level-objective block to enable health monitoring — without one,
the server receives traffic unconditionally.
Disabling backoff
To retry a degraded server on every incoming request instead of waiting between retries,
set max-backoff-period to 0s:
service-level-objective {
failure-rate 3/5;
max-backoff-period 0s;
}
The server is still marked degraded when the failure rate is exceeded, but every request will attempt to use the degraded server instead of waiting. This can be useful when the backend has its own health checks or when immediate retry is preferred over exponential backoff.
Omitting max-backoff-period does not disable backoff — it uses the default value
of 30s. You must explicitly set 0s to disable it.
Disabling SLO
For a single-server backend (no server-selection), omitting the service-level-objective
block is enough — without an explicit block, no SLO is applied and the server receives
traffic unconditionally.
For a server inside a server-selection backend, the default SLO is injected
automatically and there is no per-server opt-out via configuration. To avoid SLO on a
specific server, restructure the backend to not use server-selection.
About Radiator software development security
Architecture Overview
Backend Load Balancing
Basic Installation
Built-in Environment Variables
Comparison Operators
Configuration Editor
Configuration Import and Export
Data Types
Duration Units
Environment Variables
Execution Context
Execution Pipelines
Filters
Getting a Radiator License
Health check /live and /ready
High Availability and Load Balancing
High availability identifiers
HTTP Basic Authentication
Introduction
Linux systemd support
Local AAA Backends
Log storage and formatting
Management API privilege levels
Namespaces
Password Hashing
Pipeline Directives
Probabilistic Sampling
Prometheus scraping
PROXY Protocol Support
Radiator server health and boot up logic
Radiator sizing
Radiator software releases
Rate Limiting
Rate Limiting Algorithms
Reverse Dynamic Authorization
Service Level Objective
Template Rendering CLI
Tools radiator-client
TOTP/HOTP Authentication
What is Radiator?
YubiKey Authentication
YubiKey Context Variables
About Radiator software development security
Architecture Overview
Backend Load Balancing
Basic Installation
Built-in Environment Variables
Comparison Operators
Configuration Editor
Configuration Import and Export
Data Types
Duration Units
Environment Variables
Execution Context
Execution Pipelines
Filters
Getting a Radiator License
Health check /live and /ready
High Availability and Load Balancing
High availability identifiers
HTTP Basic Authentication
Introduction
Linux systemd support
Local AAA Backends
Log storage and formatting
Management API privilege levels
Namespaces
Password Hashing
Pipeline Directives
Probabilistic Sampling
Prometheus scraping
PROXY Protocol Support
Radiator server health and boot up logic
Radiator sizing
Radiator software releases
Rate Limiting
Rate Limiting Algorithms
Reverse Dynamic Authorization
Service Level Objective
Template Rendering CLI
Tools radiator-client
TOTP/HOTP Authentication
What is Radiator?
YubiKey Authentication
YubiKey Context Variables