AI Bot Visibility Checker
Check if a URL is configured to allow or block AI crawlers like GPTBot, ClaudeBot, and others
This is an experimental Worker. Use it as a starting point for your own projects.
The AI Bot Visibility Checker analyzes a webpage's robots.txt file and page-level directives to determine whether known AI crawlers (like GPTBot, ClaudeBot, PerplexityBot) are allowed or blocked. It provides per-crawler status and a summary by platform.
Disclaimer: This tool reports configuration only. It does not verify whether a URL is actually indexed or used by any AI product, as there are no public APIs for that information.
Features
- Check 14+ known AI crawler bots in one request
- Parse
robots.txtwith user-agent specific rules - Detect page-level blocks (meta robots, X-Robots-Tag headers)
- Support for generic directives (
noindex,noai,noimageai) - Stateless edge execution (no bindings required)
- Sub-second response times
API Reference
GET /check
Check AI crawler visibility configuration for a URL.
Prop
Type
Response
url string
The URL that was analyzed.
disclaimer string
Reminder that this reports configuration only, not actual indexing status.
crawlers array
Array of crawler status objects.
summary object
Aggregated results by status.
Example Request
curl "https://your-worker.workers.dev/check?url=https://www.cloudflare.com"Example Response
{
"url": "https://www.cloudflare.com",
"disclaimer": "Configuration only; we cannot verify actual index inclusion in any AI product.",
"crawlers": [
{
"id": "GPTBot",
"platform": "ChatGPT",
"status": "allowed"
},
{
"id": "ChatGPT-User",
"platform": "ChatGPT",
"status": "allowed"
},
{
"id": "ClaudeBot",
"platform": "Claude",
"status": "not_specified"
},
{
"id": "PerplexityBot",
"platform": "Perplexity",
"status": "blocked"
},
{
"id": "Google-Extended",
"platform": "Google (Gemini/Bard)",
"status": "not_specified"
}
// ... 9 more crawlers
],
"summary": {
"allowed": ["ChatGPT"],
"blocked": ["Perplexity"],
"notSpecified": [
"Claude",
"ChatGPT Search",
"Google (Gemini/Bard)",
"Google Vertex AI",
"Common Crawl",
"ByteDance",
"Meta AI",
"Meta",
"Apple",
"Amazon",
"DuckDuckGo"
]
}
}Error Responses
error string
Human-readable error message.
code string
Machine-readable error code.
400 Bad Request - Missing or invalid URL:
{
"error": "Missing or invalid query parameter: url",
"code": "INVALID_URL"
}502 Bad Gateway - Failed to fetch the URL:
{
"error": "Failed to fetch URL",
"code": "FETCH_ERROR"
}Monitored AI Crawlers
The worker checks the following AI crawlers:
| Crawler ID | Platform | Description |
|---|---|---|
GPTBot | ChatGPT | OpenAI's web crawler |
ChatGPT-User | ChatGPT | ChatGPT user-facing bot |
OAI-SearchBot | ChatGPT Search | OpenAI's search crawler |
ClaudeBot | Claude | Anthropic's web crawler |
PerplexityBot | Perplexity | Perplexity AI's crawler |
Google-Extended | Google (Gemini/Bard) | Google's Gemini crawler |
Google-CloudVertexBot | Google Vertex AI | Google Vertex AI crawler |
CCBot | Common Crawl | Common Crawl corpus builder |
Bytespider | ByteDance | ByteDance's crawler |
Meta-ExternalAgent | Meta AI | Meta's AI crawler |
FacebookBot | Meta | Facebook's web crawler |
Applebot | Apple | Apple's web crawler |
Amazonbot | Amazon | Amazon's web crawler |
DuckAssistBot | DuckDuckGo | DuckDuckGo AI assistant |
This list is defined in src/constants/crawlers.ts and can be customized for your needs.
Implementation Details
Detection Logic
The worker uses a multi-layered approach to determine crawler visibility:
Fetch robots.txt and page content
Both requests are made in parallel:
const [pageResult, robotsBody] = await Promise.all([
fetchPage(url),
fetchRobotsTxt(origin),
]);Parse robots.txt rules
Extract per-user-agent Allow/Disallow directives:
const rules = robotsBody ? parseRobotsTxt(robotsBody) : new Map();The parser handles:
- User-agent specific rules
- Wildcard user-agents (
*) - Path-specific allow/disallow patterns
Extract page-level signals
Check HTML meta tags and HTTP headers:
// Detects:
// <meta name="robots" content="noindex, noai">
// <meta name="GPTBot" content="noindex">
// X-Robots-Tag: noindex, nofollow
const signals = getPageRobotSignals(html, headers);Compute per-crawler status
Apply precedence rules for each crawler:
// 1. Page-level block -> blocked
// 2. robots.txt disallow -> blocked
// 3. robots.txt allow -> allowed
// 4. No rule -> not_specified
const status = computeStatus(rules, signals, crawlerId, path);Core Implementation
Here's the main route handler from src/routes/check.ts:
app.get("/check", async (c) => {
const url = validateUrl(c.req.query("url"));
if (!url) return jsonError(c, "Missing or invalid query parameter: url", "INVALID_URL");
try {
const parsed = new URL(url);
const origin = parsed.origin;
const path = parsed.pathname || "/";
const [pageResult, robotsBody] = await Promise.all([fetchPage(url), fetchRobotsTxt(origin)]);
const rules = robotsBody ? parseRobotsTxt(robotsBody) : new Map();
const response = buildVisibilityResponse(url, path, rules, pageResult.html, pageResult.headers);
return jsonSuccess(c, response);
} catch (e) {
const message = e instanceof Error ? e.message : "Failed to fetch URL";
return jsonError(c, message, "FETCH_ERROR", 502);
}
});Status Precedence
The visibility logic follows this precedence (from src/lib/visibility.ts):
function computeStatus(
rules: RobotsRules,
signals: PageRobotSignals,
crawlerId: string,
path: string
): CrawlerStatus {
// Page-level block takes highest precedence
if (isCrawlerBlockedByPage(signals, crawlerId)) return "blocked";
// Then check robots.txt
const robotsAllow = isAllowedByRobots(rules, crawlerId, path);
if (robotsAllow === false) return "blocked";
if (robotsAllow === true) return "allowed";
// No rule specified
return "not_specified";
}Page-Level Directives
The worker detects these meta tags and headers:
<!-- Generic blocks (apply to all crawlers) -->
<meta name="robots" content="noindex" />
<meta name="robots" content="noai" />
<meta name="robots" content="noimageai" />
<!-- Crawler-specific blocks -->
<meta name="GPTBot" content="noindex" />
<meta name="ClaudeBot" content="nofollow, noindex" />
<!-- HTTP header equivalent -->
X-Robots-Tag: noindex, noairobots.txt Parsing
Example robots.txt rules:
# Block specific crawlers
User-agent: GPTBot
Disallow: /
User-agent: PerplexityBot
Disallow: /private/
# Allow all others
User-agent: *
Allow: /The parser extracts per-user-agent rules and applies path matching with longest-prefix-wins logic.
Advanced Usage
Adding Custom Crawlers
Extend the crawler list in src/constants/crawlers.ts:
export const AI_CRAWLERS: Array<{ id: string; platform: string }> = [
{ id: "GPTBot", platform: "ChatGPT" },
{ id: "ClaudeBot", platform: "Claude" },
// Add your custom crawlers
{ id: "MyCustomBot", platform: "My AI Platform" },
// ...
];Batch Checking Multiple URLs
Create a new endpoint to check multiple URLs:
app.post("/batch-check", async (c) => {
const { urls } = await c.req.json();
const results = await Promise.all(urls.map((url) => checkVisibility(url)));
return c.json({ results });
});Adding Site-Wide Analysis
Fetch and parse the sitemap to check all pages:
import { parseSitemap } from "./lib/sitemap";
app.get("/site-check", async (c) => {
const sitemapUrl = c.req.query("sitemap");
const urls = await parseSitemap(sitemapUrl);
// Check each URL...
});FAQ
Use Cases
- SEO Tools - Add AI crawler visibility checks to SEO audit dashboards
- CMS Plugins - Integrate into WordPress/Drupal to show AI bot status
- Privacy Compliance - Monitor which AI platforms can access your content
- Analytics Dashboards - Track AI crawler access policies across sites
- Browser Extensions - Show AI visibility status for the current page
- Web Scraping Tools - Check if your scraper is allowed before crawling
Limitations
- Configuration only: Does not verify actual indexing by AI platforms (no public APIs exist)
- Static analysis: Does not execute JavaScript; only analyzes HTML and headers
- No authentication: Cannot check auth-protected pages
- robots.txt compliance: Assumes crawlers respect robots.txt (not legally enforced)
- Limited crawlers: Only checks 14 known AI crawlers (list can be extended)
- No sitemap parsing: Only checks individual URLs, not entire sitemaps
Deployment
Deploy
Follow the deployment wizard to deploy the Worker to your Cloudflare account. No additional configuration or bindings required.
Test your deployment
curl "https://your-worker.workers.dev/check?url=https://www.cloudflare.com"Local Development
cd apps/experiments/ai-bot-visibility
npm install
npm run devTest locally:
curl "http://localhost:8787/check?url=https://www.cloudflare.com"Configuration
No bindings or environment variables are required. The wrangler.json is minimal:
{
"name": "ai-bot-visibility",
"main": "src/index.ts",
"compatibility_date": "2024-01-01"
}Dependencies
{
"dependencies": {
"hono": "^4.6.12"
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20241127.0",
"typescript": "^5.7.2",
"wrangler": "^4"
}
}Cloudflare Features Used
- Workers - Serverless execution environment
- Fetch API - HTTP client for fetching pages and robots.txt
- Edge network - Low-latency requests from global edge locations