CallMeTechie
DE Login
Home Products Blog About Contact

AI Bot Blocker

v1.x · Updated 1 month ago

Overview

The AI Bot Blocker protects services exposed via GateControl from unwanted AI crawlers. It detects and blocks requests from known AI companies (OpenAI, Google, AWS, DeepSeek, GitHub Copilot, Microsoft Azure) based on their IP address ranges — directly at the reverse-proxy level, before the request reaches the backend.

License feature key: bot_blocking

How it works

The Bot Blocker is based on the caddy-defender plugin for Caddy. It is inserted as the first handler in the Caddy route chain — before Request Tracing, Rate Limiting, Authentication, and Compression. This means bots are rejected immediately without burdening other handlers.

Detected IP ranges

The plugin automatically maintains up-to-date IP lists for the following providers:

Provider Description
OpenAI GPTBot, ChatGPT-User, and other OpenAI services
AWS Amazon Web Services (frequently used by AI crawlers)
Google Cloud Google-Extended, Gemini, and other Google AI services
GitHub Copilot GitHub Copilot requests
DeepSeek DeepSeek AI crawler
Azure Microsoft Azure Public Cloud

The IP lists are regularly updated by the plugin maintainer.

Bot counter

How it works

A background task counts the blocked requests per route every 60 seconds. The counter is based on HTTP 403 responses in the Caddy access log, filtered by the route's domain.

Display

An orange badge is shown in the route list:

  • Bot icon + number (e.g. 🤖 42): Number of requests blocked so far
  • Bot icon only (no number): Bot Blocker is active, but no bots have been blocked yet

The badge is only shown for HTTP routes (not for L4/TCP routes).

Known limitation

The counter counts all HTTP 403 responses on the route, not only those from the Bot Blocker. If IP access control or ACL is also active on the same route, these can produce 403 responses that are counted as well. The accuracy is sufficient for most use cases.

Testing

Verify bot blocking

# Normal request — should go through
curl -s -o /dev/null -w "%{http_code}" https://your-route.com/
# Expected result: 200 (or 302 on auth)

# Simulate a request from an OpenAI IP (only possible on the local network)
# Instead: look for "defender" entries in the GateControl log
docker logs gatecontrol 2>&1 | grep "defender"

Check the counter

# Fetch route data — bot_blocker_count contains the current counter
curl -s /api/v1/routes/:id | jq '.route.bot_blocker_count'

Limitations

  • HTTP routes only: L4/TCP routes do not support bot blocking (caddy-defender is an HTTP handler)
  • IP-based: Blocking is based on IP addresses, not User-Agent strings. Bots coming from non-listed IP ranges are not detected.
  • No custom IP ranges: The default ranges maintained by the plugin are used
  • No whitelist: Individual IPs cannot be exempted from blocking
  • Counter accuracy: Counts all 403s, not just bot blocks (see above)

Database

Fields in the routes table

Column Type Default Description
bot_blocker_enabled INTEGER 0 Feature enabled (0/1)
bot_blocker_mode TEXT 'block' Active mode
bot_blocker_count INTEGER 0 Cumulative block counter
bot_blocker_config TEXT null JSON with mode-specific options

Migration

Version 28 (add_bot_blocker) — created on 2026-03-28.

Backup/Restore

The Bot Blocker configuration (bot_blocker_enabled, bot_blocker_mode, bot_blocker_config) is included in backup/restore. The bot_blocker_count is not exported — the counter starts at 0 after a restore.

Technical details

Caddy handler config

{
  "handler": "defender",
  "raw_responder": "block",
  "ranges": ["openai", "aws", "gcloud", "githubcopilot", "deepseek", "azurepubliccloud"]
}

Handler position in the route chain

1. defender (Bot Blocker)     ← blocks bots immediately
2. trace (Request Tracing)
3. headers (Custom Headers)
4. rate_limit
5. mirror (Request Mirroring)
6. encode (compression)
7. reverse_proxy (backend)

Go module

pkg.jsn.cam/caddy-defender (originally github.com/JasonLovesDoggo/caddy-defender)

Background task

  • Interval: 60 seconds
  • Source: /data/caddy/access.log
  • Logic: Parses JSON lines, filters by status === 403, matches request.host against routes with bot_blocker_enabled, increments bot_blocker_count
  • Log rotation: Timestamp-based tracking (no offset), compatible with Caddy's log rotation (10 MB, 3 files)

Cookie Settings

We use cookies to improve your experience. Essential cookies are always active.

Privacy Policy
ESC
↑↓ navigate open esc close