Prometheus scraping

Exposing operational metrics and counters through authenticated HTTP endpoints for Prometheus monitoring

Prometheus scraping

Radiator exposes operational counters, timing aggregations and info gauges over authenticated HTTP management endpoints over Prometheus 0.0.4. Metrics are held in memory; reads are inexpensive and non‑blocking. This means they reset at restart.

Access control & authentication

Scraping requires an HTTP management user with at least monitor privilege. Define this in the management.http.credentials block:

credentials {
    user "monitor" {
        password "monitorpassword";
        privilege monitor;
    }
}

Use HTTP Basic Auth when scraping:

curl -u monitor:monitorpassword --basic \
  http://localhost:8080/api/v1/metrics/prometheus

Example Prometheus configuration

Prometheus server configuration prometheus.yml snippet example:

scrape_configs:
  - job_name: "radiator-server"
    metrics_path: "/api/v1/metrics/prometheus"
    basic_auth:
      username: "monitor"
      password: "monitorpassword"
    static_configs:
      - targets: ["127.0.0.1:8080"]

Example scrape output (excerpt)

From an integration test run showing log-based counters after authentication attempts:

# TYPE radiator_build_info gauge
radiator_build_info{app="radiator",version="10.31.0",kind="development",timestamp="2025-12-12T06:00:47Z",cpu_target="aarch64-apple-darwin",branch="main",commit="abc123"} 1
# TYPE radiator_uptime_seconds gauge
radiator_uptime_seconds{service_ok="true"} 3600
# TYPE radiator_instance_info gauge
radiator_instance_info{instance_id="R01",pid="12345"} 1
# TYPE radiator_log_total counter
radiator_log_total{namespace="server::radius-udp::AUTH_UDP",message="Radius UDP packet from unknown client"} 5
radiator_log_total{namespace="server::radius-udp::AUTH_UDP::policy::BASIC_AUTH::handler::PAP",message="AAA accept"} 999
radiator_log_total{namespace="backend::USERS_FILE",message="Backend query accepted"} 999
# TYPE radiator_total counter
radiator_total{namespace="server::radius-udp::AUTH_UDP::policy::BASIC_AUTH"} 1500
...

Key metrics:

Build and Instance Information

radiator_build_info: build metadata gauge with labels for version, branch, commit, etc.
radiator_uptime_seconds: time since server started.
radiator_instance_info: instance identification with instance_id, optional cluster_id, and process ID.

Process Statistics

radiator_process_memory_rss_bytes: resident set size (RSS) memory usage in bytes.
radiator_process_cpu_milliseconds_total: total CPU time used by the process in milliseconds.

System Statistics

radiator_system_memory_total_bytes: total system memory in bytes.
radiator_system_memory_available_bytes: available system memory in bytes (Linux only).
radiator_system_swap_total_bytes: total swap space in bytes (Linux only).
radiator_system_swap_used_bytes: used swap space in bytes (Linux only).
radiator_system_swap_pages_in_total: total pages swapped in from disk (Linux only).
radiator_system_swap_pages_out_total: total pages swapped out to disk (Linux only).
radiator_system_cpu_count: number of CPUs available.
radiator_system_cpu_active_milliseconds_total: total active (non-idle) CPU time across all CPUs in milliseconds (Linux only).
radiator_system_cpu_total_milliseconds_total: total CPU time across all CPUs in milliseconds (Linux only).
radiator_system_load_1m_x100: 1-minute load average multiplied by 100.
radiator_system_load_5m_x100: 5-minute load average multiplied by 100.
radiator_system_load_15m_x100: 15-minute load average multiplied by 100.

Log Counters

radiator_log_total: log message counters with a hierarchical namespace label (components joined by ::) and a message label. The namespace follows the pattern server::<transport>::<name> for server-level events, server::<transport>::<name>::policy::<policy>::handler::<handler> for handler-level events, and backend::<name> for backend-level events (see namespaces for details).
radiator_total: aggregate counter for each namespace node that has children. The value is the sum of all counters under that prefix.

Note: System statistics are OS dependent.

High availability labels

See the high availability identifiers article for semantics of instance / cluster IDs. The radiator_instance_info gauge includes instance_id.

Usage guidelines

Most metrics are in memory information. Some OS level information will require system calls which has a small performance impact. In general polling once a second should not have an impact.
Do not expose metrics endpoints without authentication; they reveal operational structure.

Troubleshooting

Symptom	Likely cause	Resolution
401 Unauthorized	Missing / wrong credentials	Create user with `monitor` privilege
Empty or truncated output	Network proxy / auth failure mid-stream	Retry with `curl -v`, inspect server logs
Counters not incrementing	No traffic hitting relevant handler/policy	Generate test load (e.g. integration script)

Push gateway (optional)

prompush is verified to work against Radiator for cases where Prometheus cannot directly scrape Radiator instances.