Monitoring & Resilience
Uptime Monitoring
Periodic availability checks for all routes via HTTP GET or TCP connect — with status display, response time tracking, and alerting on outages.
How does it work?
Without Monitoring:
Client → Caddy → Backend (crashed) → 502 Bad Gateway
↑ nobody knows
With Monitoring:
Monitor checks every 60s → Backend not responding → Status: DOWN (red)
→ Email alert
→ Webhook: route_down
→ Circuit Breaker reacts
Technical Details
HTTP Routes (Layer 7):
- HTTP GET to
http(s)://<Peer-IP>:<Target-Port>/ - User-Agent:
GateControl-Monitor/1.0 - Expected: Status code 200-399 = UP, anything else = DOWN
- With
backend_https: HTTPS withrejectUnauthorized: false
L4 Routes (TCP/UDP):
- TCP connect to backend port
- Connection successful = UP, timeout/error = DOWN
Parallelization: Maximum 10 concurrent checks per cycle.
Stored Fields per Route
| Field | Description |
|---|---|
monitoring_status | up, down, or unknown |
monitoring_last_check | Timestamp of last check (ISO 8601) |
monitoring_response_time | Response time in milliseconds |
monitoring_last_change | Timestamp of last status change |
Setup
- Create or edit route
- Enable Uptime Monitoring toggle
- Configure interval under Settings → Monitoring (default: 60 seconds)
- Optional: Enable Email Alerts (SMTP must be configured)
- Save
Via API
# Enable monitoring
curl -X PUT https://gatecontrol.example.com/api/v1/routes/1 \
-H "Authorization: Bearer gc_..." \
-d '{"monitoring_enabled":true}'
# Trigger manual check
curl -X POST https://gatecontrol.example.com/api/v1/routes/1/check \
-H "Authorization: Bearer gc_..."
# Get monitoring summary
curl https://gatecontrol.example.com/api/v1/monitoring/summary \
-H "Authorization: Bearer gc_..."
Important Notes on Monitoring
- First check runs 10 seconds after GateControl starts.
- Monitoring checks the direct connection to the backend (peer IP + port), not the public domain access.
- Email alerts require a working SMTP configuration.
- Webhook events:
route_downandroute_up. - The monitoring interval is global for all routes.
Circuit Breaker
Blocks requests to failed backends and immediately returns HTTP 503 — prevents request queuing and protects backends from overload during recovery.
How does it work?
Without Circuit Breaker:
Client 1 → Caddy → Backend (dead) → 30s Timeout → 502
Client 2 → Caddy → Backend (dead) → 30s Timeout → 502
... 100 clients waiting 30 seconds simultaneously ...
With Circuit Breaker:
Monitoring: Backend dead (5x in a row) → Circuit Breaker: OPEN
Client 1 → Caddy → 503 "Service temporarily unavailable" (instant, <1ms)
... after 30s timeout ...
Monitoring: Backend back → Circuit Breaker: CLOSED
Client 3 → Caddy → Backend → 200 OK ✓
State Machine
Threshold failures reached
CLOSED ──────────────────────────────→ OPEN
↑ │
│ Check successful │ Timeout elapsed
│ ↓
└──────────────────────────────── HALF-OPEN
Check failed ──→ OPEN
States
| Status | Caddy Behavior | Badge Color |
|---|---|---|
| Closed | Normal operation, requests forwarded | Green |
| Open | Caddy immediately returns 503 with Retry-After header | Red |
| Half-Open | Monitoring check passes through; success → Closed, failure → Open | Amber |
Configurable Values
| Parameter | Default | Description |
|---|---|---|
| Threshold | 5 | Consecutive failures before circuit opens |
| Timeout | 30s | Seconds in open state before half-open test |
Caddy Configuration in Open State
{
"handle": [{
"handler": "static_response",
"status_code": "503",
"body": "Service temporarily unavailable",
"headers": { "Retry-After": ["30"] }
}]
}
Setup
- Enable Uptime Monitoring (prerequisite!)
- Enable Circuit Breaker toggle
- Set Threshold (e.g. 5)
- Set Timeout (e.g. 30 seconds)
- Save
# Enable circuit breaker
curl -X PUT https://gatecontrol.example.com/api/v1/routes/1 \
-H "Authorization: Bearer gc_..." \
-d '{"monitoring_enabled":true,"circuit_breaker_enabled":true,"circuit_breaker_threshold":5,"circuit_breaker_timeout":30}'
# Manually reset circuit breaker
curl -X PATCH https://gatecontrol.example.com/api/v1/routes/1/circuit-breaker \
-H "Authorization: Bearer gc_..." \
-d '{"status":"closed"}'
Important Notes on Circuit Breaker
- Monitoring is required. Without Uptime Monitoring, the circuit breaker has no data source.
- Failure counters are kept in memory. On restart, all counters start at 0.
- In open state, no requests are forwarded to the backend.
- Manual reset sets status to
closedand clears the failure counter. - Only available for HTTP routes, not L4.
Retry on Error
Automatically retries failed requests to the backend — ideal for brief outages, backend restarts, or load balancing with multiple backends.
How does it work?
Without Retry:
Client → Caddy → Backend (just restarted) → 502 Bad Gateway → Client sees error
With Retry (3 attempts):
Client → Caddy → Backend (Attempt 1: 502)
→ Backend (Attempt 2: 502)
→ Backend (Attempt 3: 200 OK) → Client sees normal response
With Retry + multiple backends:
Client → Caddy → Backend A (502)
→ Backend B (200 OK) → Client sees normal response
Technical Details
{
"handler": "reverse_proxy",
"upstreams": [{ "dial": "10.8.0.3:8080" }],
"load_balancing": { "retries": 3 }
}
| Parameter | Range | Default | Description |
|---|---|---|---|
| Retry Count | 1 – 10 | 3 | Number of retry attempts |
Use Cases
Catch Backend Restarts
Route app.example.com → Node.js app on port 3000. During deployment the app briefly restarts. With 3 retries, Caddy bridges this gap.
Load Balancing with Failover
Route api.example.com → 3 API servers. Server B fails. Caddy tries B, gets an error, and automatically routes to C.
Setup
- Create or edit route
- Enable Retry on Error toggle
- Set Retry Count (1-10, default: 3)
- Save
# Enable retry with 5 attempts
curl -X PUT https://gatecontrol.example.com/api/v1/routes/1 \
-H "Authorization: Bearer gc_..." \
-d '{"retry_enabled":true,"retry_count":5}'
Important Notes on Retry
- POST/PUT/DELETE are also retried. Problematic for non-idempotent operations. Only enable retry when the backend supports idempotent operations.
- Retries happen immediately — no exponential backoff.
- Retry count of 1 means: 1 initial attempt + 1 retry = maximum 2 requests.
- Combined with circuit breaker: open circuit returns 503 immediately, no retries.
- Only available for HTTP routes, not L4.
Request Mirroring
Duplicates incoming requests asynchronously to secondary backends — for shadow deployments, debugging, and load testing, without affecting the primary response.
How does it work?
Without Mirroring:
Client → Caddy → Backend (v1) → Response to client
With Mirroring:
Client → Caddy → Backend (v1) → Response to client ✓
→ Backend (v2) → Response discarded (Mirror)
→ Log Service → Response discarded (Mirror)
Technical Details
{
"handler": "mirror",
"targets": [
{ "dial": "10.8.0.5:8080" },
{ "dial": "10.8.0.6:9090" }
]
}
| Parameter | Value |
|---|---|
| Max mirror targets per route | 5 |
| Body buffer | Up to 10 MB |
| Timeout per mirror target | 10 seconds |
| Max concurrent goroutines | 100 |
| WebSocket upgrades | Skipped |
| Requests > 10 MB body | Mirrored without body |
Use Cases
Test Shadow Deployments
You're developing v2 of your API. Mirror production traffic to the v2 instance. Compare logs and metrics without risk.
Debugging with Logging Backend
Mirror target: a logging service that records all incoming requests. Analyze production traffic without instrumenting the app.
Load Testing with Real Traffic
New server should replace the current one. Mirror traffic and observe CPU, RAM, and response times under real load.
Setup
- Create or edit route
- Enable Request Mirroring toggle
- Add mirror targets: select peer from dropdown, enter port
- Add up to 5 targets
- Save
# Enable mirroring with 2 targets
curl -X PUT https://gatecontrol.example.com/api/v1/routes/1 \
-H "Authorization: Bearer gc_..." \
-d '{"mirror_enabled":true,"mirror_targets":[{"peer_id":2,"port":8080},{"peer_id":3,"port":9090}]}'
Important Notes on Mirroring
- Write operations are mirrored. POST, PUT, DELETE — everything is sent to mirror targets. Ensure mirror targets can handle this traffic.
- Mirror targets must be active, enabled peers.
- Mirror target responses are completely discarded.
- Requests > 10 MB are mirrored without body.
- WebSocket upgrade requests are not mirrored.
- Only available for HTTP routes, not L4.