CallMeTechie
DE Login
Home Products Blog About Contact

Monitoring & Resilience

v1.x · Updated 1 month ago

What does it do?

Uptime Monitoring checks at regular intervals whether the backends of your routes are reachable. Without monitoring, you only see an error when a user reports it.

Without monitoring:

Client  →  Caddy  →  Backend (crashed)  →  502 Bad Gateway
                                           ↑ nobody knows

With monitoring:

Monitor probes every 60s  →  Backend does not respond  →  Status: DOWN (red)
                                                         → Email alert
                                                         → Webhook: route_monitor_down
                                                         → Circuit Breaker reacts

How it works internally

GateControl starts a poller that, at the configured interval (default: 60 seconds), checks all routes with monitoring_enabled = 1.

HTTP routes (Layer 7):

  • HTTP GET to http(s)://<peer IP>:<target port>/
  • User-Agent: GateControl-Monitor/1.0
  • Expectation: status code 200-399 = UP, anything else = DOWN
  • With backend_https: HTTPS with rejectUnauthorized: false (accepts self-signed)
  • Timeout: configured in config.timeouts.monitorHttp

L4 routes (TCP/UDP):

  • TCP Connect to the backend port
  • Successful connection establishment = UP, timeout/error = DOWN
  • Timeout: configured in config.timeouts.monitorTcp

Parallelization: Max 10 concurrent checks per cycle.

Auto-WoL for gateway routes: If a gateway route with wol_enabled = 1 switches from UP to DOWN, the monitor calls handleRouteDownDetected, which sends a magic packet via the gateway (gateways.notifyWol, timeout 60s). This automatically wakes the LAN host when it has gone down — see concepts/home-gateway.md.

Fields stored per route:

Field Description
monitoring_status up, down or unknown
monitoring_last_check Timestamp of the last check (ISO 8601)
monitoring_response_time Response time in milliseconds
monitoring_last_change Timestamp of the last status change

Use cases

Monitoring a Synology NAS

Route nas.example.com → Port 5001 (DSM). Monitoring detects when the NAS restarts after an update. You get an email when it goes down, and a second one when it's reachable again.

Multiple services on one server

Three routes point to the same peer, but different ports (3000, 8080, 5432). One service crashes — monitoring shows exactly which one. The others stay green.

Enabling the Circuit Breaker

Monitoring is a prerequisite for the Circuit Breaker. Only once monitoring detects an outage can the Circuit Breaker block the route and return 503 instead of sending requests into the void.

Combination with other features

Combination Effect
Monitoring + Circuit Breaker Monitoring checks drive the Circuit Breaker's state machine
Monitoring + Webhooks Events route_down / route_up to external systems (Slack, Discord, etc.)
Monitoring + Email alerts Immediate notification on status change
Monitoring + L4 routes TCP check instead of HTTP check, detects port reachability

Important notes

  • The first check runs 10 seconds after GateControl starts — so all services have time to come up.
  • Monitoring checks the direct connection to the backend (peer IP + port), not the public domain access via Caddy.
  • For backend_https routes HTTPS is used, but the certificate is not validated — self-signed works.
  • Email alerts require a working SMTP configuration under Settings → Email.
  • Webhook events are called route_down and route_up (not route_monitor_down/route_monitor_up).
  • The monitoring interval applies globally to all routes — individual intervals per route are not possible.
  • If monitoring is disabled, the last status remains (it is not reset to unknown).

What does it do?

The Circuit Breaker detects when a backend is repeatedly unreachable and switches the route into a blocked state. Instead of sending requests into the void (and keeping clients waiting), Caddy responds immediately with 503.

Without Circuit Breaker:

Client 1  →  Caddy  →  Backend (dead)  →  30s timeout  →  502
Client 2  →  Caddy  →  Backend (dead)  →  30s timeout  →  502
Client 3  →  Caddy  →  Backend (dead)  →  30s timeout  →  502
... 100 clients wait 30 seconds simultaneously ...

With Circuit Breaker:

Monitoring: Backend dead (5x in a row)  →  Circuit Breaker: OPEN
Client 1  →  Caddy  →  503 "Service temporarily unavailable" (immediately, <1ms)
Client 2  →  Caddy  →  503 (immediately)
... after 30s timeout ...
Monitoring: Backend back  →  Circuit Breaker: CLOSED
Client 3  →  Caddy  →  Backend  →  200 OK  ✓

How it works internally

The Circuit Breaker implements a state machine with three states:

         Threshold failures reached
 CLOSED ──────────────────────────────→ OPEN
   ↑                                      │
   │  Check successful                    │ Timeout elapsed
   │                                      ↓
   └──────────────────────────────── HALF-OPEN
           Check failed ──→ OPEN

States:

Status Caddy behavior Badge color
Closed Normal operation, requests are forwarded Green
Open Caddy immediately returns 503 with Retry-After header Red
Half-Open Monitoring check is allowed through; on success → Closed, on failure → Open Amber

Configurable values:

Parameter Default Description
Threshold 5 Consecutive failures before the circuit opens
Timeout 30s Seconds in open state before a half-open test takes place

Detailed flow:

  1. Monitoring probes the backend periodically
  2. On failure: failure counter is incremented (cb_failure_count in the routes table — persisted across restarts)
  3. On success: counter is reset to 0
  4. Counter reaches threshold → status switches to open, cb_opened_at is set
  5. Caddy config is rebuilt: the route serves a static 503 response
  6. After timeout seconds → status switches to half-open
  7. Next monitoring check decides:
    • Success → closed, Caddy config is restored
    • Failure → open, timer restarts

Caddy configuration in the open state (from src/services/caddyConfig.js):

{
  "handle": [{
    "handler": "static_response",
    "status_code": "503",
    "body": "Service temporarily unavailable",
    "headers": { "Retry-After": ["30"] }
  }]
}

The Retry-After value matches the configured timeout (default 30).

Use cases

Preventing request pile-ups when the backend is dead

Without Circuit Breaker, all incoming requests wait for Caddy's timeout (30s). With 100 concurrent clients that means 100 blocked connections. With Circuit Breaker, all of them are answered with 503 immediately.

Preventing thundering herd on recovery

Backend was down for 5 minutes, 1000 clients have cached and are waiting for retry. Without Circuit Breaker all 1000 requests hit the just-started backend simultaneously. With Half-Open the Circuit Breaker only lets a single monitoring check through — only after it succeeds is the route opened again.

Fast feedback for better UX

Instead of waiting 30 seconds for a timeout, the user immediately sees a "Service temporarily unavailable" page. The page can include a Retry-After header, which modern browsers respect.

Combination with other features

Combination Effect
Circuit Breaker + Monitoring Mandatory: monitoring checks drive the state machine
Circuit Breaker + Retry Retry tries with a closed circuit; with an open circuit: immediately 503
Circuit Breaker + Load Balancing Circuit Breaker kicks in when all backends are down
Circuit Breaker + Webhooks Events circuit_breaker_open / circuit_breaker_closed

Important notes

  • Monitoring is mandatory. Without Uptime Monitoring enabled, the Circuit Breaker has no data source and always stays in the closed state.
  • Failure counter and open timestamp are persisted in the database (cb_failure_count, cb_opened_at). Open circuits survive restarts; if the timestamp is missing after a restart, it is re-set on the first check run.
  • The Circuit Breaker operates per route, not per backend. For load balancing with multiple backends, the circuit opens when the monitoring target is unreachable.
  • In the open state no requests are forwarded to the backend — Caddy responds with 503 Service Unavailable + Retry-After header. No bypass per API or individual request.
  • Manual reset (since v1.50.4): POST /api/v1/routes/:id/circuit-breaker/reset or the Reset circuit breaker button in the route edit modal (only visible when status ≠ closed). Sets cb_failure_count = 0, cb_opened_at = NULL, status to closed, and re-renders the Caddy config immediately. Without this reset, an open breaker waits for the next monitoring cycle and runs through the normal open → half-open → closed path.
  • Circuit Breaker is only available for HTTP routes, not for L4 (TCP/UDP).

What does it do?

Rate Limiting counts each client IP's requests and blocks further requests once the limit is reached. The client then receives HTTP 429 (Too Many Requests) instead of a normal response.

Without Rate Limiting:

Bot sends 10,000 requests/minute  →  Backend processes all  →  Server overloaded

With Rate Limiting (100 requests/minute):

Bot sends 100 requests     →  Backend processes all  ✓
Bot sends request #101     →  Caddy: 429 Too Many Requests  ✕
Bot sends request #102     →  Caddy: 429 Too Many Requests  ✕
... after 1 minute ...
Bot sends request #1       →  Backend processes  ✓  (new time window)

How it works internally

GateControl uses the caddy-ratelimit plugin for route traffic. The rate-limit handler is inserted into Caddy's handler chain before the reverse proxy (src/services/caddyConfig.js, rate_limit_enabled block).

Not to be confused with the admin API limiters (src/middleware/rateLimit.js): these protect the GateControl Admin UI (/login, /api/v1/*) and are configured separately in Express. The rate limiting described here concerns exclusively the client traffic of a configured route.

Caddy JSON configuration:

{
  "handler": "rate_limit",
  "rate_limits": {
    "static": {
      "key": "{http.request.remote.host}",
      "window": "1m",
      "max_events": 100
    }
  }
}

Key: {http.request.remote.host} — each client IP gets its own quota.

Configurable values:

Parameter Range Default Description
Requests 1 – 100,000 100 Maximum requests per time window
Window 1s, 1m, 5m, 1h 1m Duration of the time window

Handler order in Caddy:

  1. ACL / Forward Auth (if active)
  2. Custom Request Headers (if present)
  3. Rate Limit ← here
  4. Request Mirroring (if active)
  5. Compression (if active)
  6. Reverse Proxy

Use cases

Protecting login pages against brute force

Route app.example.com → Web app with login. Rate Limit: 10 requests / 1 minute. An attacker can only make 10 password attempts per minute — this significantly slows down brute-force attacks.

Protecting an API against abuse

Route api.example.com → REST API. Rate Limit: 1000 requests / 5 minutes. Normal usage remains unaffected, but a single client cannot overload the API.

Preventing scraping

Route shop.example.com → Webshop. Rate Limit: 60 requests / 1 minute. Bots scraping prices are throttled after 60 page views per minute.

Recommended values:

Use case Requests Window
Login page 10–20 1m
REST API 500–1000 5m
Webshop / website 60–120 1m
Static assets 1000–5000 1m
Webhook endpoint 50–100 1m

Combination with other features

Combination Effect
Rate Limit + Route Auth Rate limit after the auth check — protects the backend, not the login page
Rate Limit + Basic Auth Rate limit before auth — also protects against brute force on Basic Auth
Rate Limit + ACL Only VPN peers get through, then get rate-limited
Rate Limit + IP filter IP filter blocks known IPs, rate limit throttles the rest
Rate Limit + Compression No conflict — Rate Limit counts requests, Compression compresses responses

Important notes

  • Rate Limiting is per IP address, not global. 100 requests/minute means: each individual IP may make 100 requests.
  • Behind a NAT router all clients share the same IP — the limit then applies to all of them together.
  • Allowed window values: 1s, 1m, 5m, 1h. Other values are normalized to 1m.
  • HTTP 429 contains no Retry-After header — the client must wait on its own until the window expires.
  • Rate Limiting is only available for HTTP routes, not for L4 (TCP/UDP).
  • For routes with Forward Auth (Route Auth or IP filter), rate limiting is applied after the auth check.
  • WebSocket connections only count the initial HTTP upgrade as one request.

What does it do?

If the backend returns an error or is unreachable, Caddy retries the request automatically instead of immediately sending an error to the client.

Without Retry:

Client  →  Caddy  →  Backend (just restarted)  →  502 Bad Gateway  →  Client sees error

With Retry (3 attempts):

Client  →  Caddy  →  Backend (attempt 1: 502)
                  →  Backend (attempt 2: 502)
                  →  Backend (attempt 3: 200 OK)  →  Client sees a normal response

With Retry + multiple backends:

Client  →  Caddy  →  Backend A (502)
                  →  Backend B (200 OK)  →  Client sees a normal response

How it works internally

GateControl configures Caddy's load_balancing.retries mechanism in the reverse-proxy handler (src/services/caddyConfig.js, retry_enabled block):

Caddy JSON configuration:

{
  "handler": "reverse_proxy",
  "upstreams": [
    { "dial": "10.8.0.3:8080" }
  ],
  "load_balancing": {
    "retries": 3
  }
}

Behavior:

  • Caddy retries the request up to retries times on connection errors
  • With one backend: all retries go to the same backend
  • With multiple backends: retries rotate to the next backend (round robin or weighted)
  • The retry logic is part of Caddy's load balancer — not a separate handler
  • Retry is triggered on connect errors and on the status codes from the UI field Retry Status Codes. Since v1.50.4 this list is actually forwarded to Caddy (reverse_proxy.load_balancing.retry_match), along with try_duration: 5s (without it, Caddy would otherwise ignore retries). Invalid tokens (non-numeric, outside 100–599) are silently discarded.

Configurable values:

Parameter Range Default Description
Retry Count 1 – 10 3 Number of retry attempts
Retry Status Codes CSV 502,503,504 Which response codes trigger a retry

Use cases

Catching a backend restart

Route app.example.com → Node.js app on port 3000. On deployment the app is briefly restarted (2-3 seconds of downtime). With 3 retries and a single backend, Caddy bridges this gap — at best the client notices a slightly longer load time.

Load balancing with failover

Route api.example.com → 3 API servers (Backend A, B, C). Server B fails. Caddy tries B, gets an error, and automatically forwards the request to C. The client notices nothing.

Temporary 503 errors under high load

Route service.example.com → Microservice that returns 503 under overload. With retries, the service has a moment to recover, and the next request goes through.

Combination with other features

Combination Effect
Retry + Load Balancing Retries rotate between backends — more effective than with a single backend
Retry + Circuit Breaker Circuit Breaker prevents retries when the backend is permanently down
Retry + Monitoring Monitoring detects if the backend is permanently down; Retry helps with short outages
Retry + Rate Limiting Each retry attempt counts as one request to the backend, not against the client's rate limit

Important notes

  • POST/PUT/DELETE are retried as well. GateControl performs no automatic idempotency check — the admin must know whether the backend supports retryable write operations. Example: a retry on POST /api/orders could trigger a duplicate order. Only enable retry if the backend supports idempotent operations or only handles GET requests.
  • Retry is only available for HTTP routes, not for L4 (TCP/UDP).
  • Retries happen back-to-back — there is no exponential backoff.
  • With a single backend, retries can additionally load the server if it is already overloaded.
  • Retry Count of 1 means: 1 initial attempt + 1 retry = maximum 2 requests to the backend.
  • Retries are invisible to the client — they either receive the successful response or the last error.
  • In combination with Circuit Breaker: when the Circuit Breaker is open, no retries are attempted (Caddy immediately serves 503).

Cookie Settings

We use cookies to improve your experience. Essential cookies are always active.

Privacy Policy
ESC
↑↓ navigate open esc close