This site is not affiliated with or endorsed by Cloudflare, Inc. It simply showcases experiments built using Cloudflare services.
Cloudflare Experiments

AI Bot Visibility Checker

Check if a URL is configured to allow or block AI crawlers like GPTBot, ClaudeBot, and others

This is an experimental Worker. Use it as a starting point for your own projects.

The AI Bot Visibility Checker analyzes a webpage's robots.txt file and page-level directives to determine whether known AI crawlers (like GPTBot, ClaudeBot, PerplexityBot) are allowed or blocked. It provides per-crawler status and a summary by platform.

Disclaimer: This tool reports configuration only. It does not verify whether a URL is actually indexed or used by any AI product, as there are no public APIs for that information.

Features

  • Check 14+ known AI crawler bots in one request
  • Parse robots.txt with user-agent specific rules
  • Detect page-level blocks (meta robots, X-Robots-Tag headers)
  • Support for generic directives (noindex, noai, noimageai)
  • Stateless edge execution (no bindings required)
  • Sub-second response times

API Reference

GET /check

Check AI crawler visibility configuration for a URL.

Prop

Type

Response

url string

The URL that was analyzed.

disclaimer string

Reminder that this reports configuration only, not actual indexing status.

crawlers array

Array of crawler status objects.

summary object

Aggregated results by status.

Example Request

curl "https://your-worker.workers.dev/check?url=https://www.cloudflare.com"

Example Response

{
  "url": "https://www.cloudflare.com",
  "disclaimer": "Configuration only; we cannot verify actual index inclusion in any AI product.",
  "crawlers": [
    {
      "id": "GPTBot",
      "platform": "ChatGPT",
      "status": "allowed"
    },
    {
      "id": "ChatGPT-User",
      "platform": "ChatGPT",
      "status": "allowed"
    },
    {
      "id": "ClaudeBot",
      "platform": "Claude",
      "status": "not_specified"
    },
    {
      "id": "PerplexityBot",
      "platform": "Perplexity",
      "status": "blocked"
    },
    {
      "id": "Google-Extended",
      "platform": "Google (Gemini/Bard)",
      "status": "not_specified"
    }
    // ... 9 more crawlers
  ],
  "summary": {
    "allowed": ["ChatGPT"],
    "blocked": ["Perplexity"],
    "notSpecified": [
      "Claude",
      "ChatGPT Search",
      "Google (Gemini/Bard)",
      "Google Vertex AI",
      "Common Crawl",
      "ByteDance",
      "Meta AI",
      "Meta",
      "Apple",
      "Amazon",
      "DuckDuckGo"
    ]
  }
}

Error Responses

error string

Human-readable error message.

code string

Machine-readable error code.

400 Bad Request - Missing or invalid URL:

{
  "error": "Missing or invalid query parameter: url",
  "code": "INVALID_URL"
}

502 Bad Gateway - Failed to fetch the URL:

{
  "error": "Failed to fetch URL",
  "code": "FETCH_ERROR"
}

Monitored AI Crawlers

The worker checks the following AI crawlers:

Crawler IDPlatformDescription
GPTBotChatGPTOpenAI's web crawler
ChatGPT-UserChatGPTChatGPT user-facing bot
OAI-SearchBotChatGPT SearchOpenAI's search crawler
ClaudeBotClaudeAnthropic's web crawler
PerplexityBotPerplexityPerplexity AI's crawler
Google-ExtendedGoogle (Gemini/Bard)Google's Gemini crawler
Google-CloudVertexBotGoogle Vertex AIGoogle Vertex AI crawler
CCBotCommon CrawlCommon Crawl corpus builder
BytespiderByteDanceByteDance's crawler
Meta-ExternalAgentMeta AIMeta's AI crawler
FacebookBotMetaFacebook's web crawler
ApplebotAppleApple's web crawler
AmazonbotAmazonAmazon's web crawler
DuckAssistBotDuckDuckGoDuckDuckGo AI assistant

This list is defined in src/constants/crawlers.ts and can be customized for your needs.

Implementation Details

Detection Logic

The worker uses a multi-layered approach to determine crawler visibility:

Fetch robots.txt and page content

Both requests are made in parallel:

const [pageResult, robotsBody] = await Promise.all([
  fetchPage(url),
  fetchRobotsTxt(origin),
]);

Parse robots.txt rules

Extract per-user-agent Allow/Disallow directives:

const rules = robotsBody ? parseRobotsTxt(robotsBody) : new Map();

The parser handles:

  • User-agent specific rules
  • Wildcard user-agents (*)
  • Path-specific allow/disallow patterns

Extract page-level signals

Check HTML meta tags and HTTP headers:

// Detects:
// <meta name="robots" content="noindex, noai">
// <meta name="GPTBot" content="noindex">
// X-Robots-Tag: noindex, nofollow
const signals = getPageRobotSignals(html, headers);

Compute per-crawler status

Apply precedence rules for each crawler:

// 1. Page-level block -> blocked
// 2. robots.txt disallow -> blocked
// 3. robots.txt allow -> allowed
// 4. No rule -> not_specified
const status = computeStatus(rules, signals, crawlerId, path);

Core Implementation

Here's the main route handler from src/routes/check.ts:

app.get("/check", async (c) => {
  const url = validateUrl(c.req.query("url"));
  if (!url) return jsonError(c, "Missing or invalid query parameter: url", "INVALID_URL");

  try {
    const parsed = new URL(url);
    const origin = parsed.origin;
    const path = parsed.pathname || "/";

    const [pageResult, robotsBody] = await Promise.all([fetchPage(url), fetchRobotsTxt(origin)]);

    const rules = robotsBody ? parseRobotsTxt(robotsBody) : new Map();
    const response = buildVisibilityResponse(url, path, rules, pageResult.html, pageResult.headers);
    return jsonSuccess(c, response);
  } catch (e) {
    const message = e instanceof Error ? e.message : "Failed to fetch URL";
    return jsonError(c, message, "FETCH_ERROR", 502);
  }
});

Status Precedence

The visibility logic follows this precedence (from src/lib/visibility.ts):

function computeStatus(
  rules: RobotsRules,
  signals: PageRobotSignals,
  crawlerId: string,
  path: string
): CrawlerStatus {
  // Page-level block takes highest precedence
  if (isCrawlerBlockedByPage(signals, crawlerId)) return "blocked";

  // Then check robots.txt
  const robotsAllow = isAllowedByRobots(rules, crawlerId, path);
  if (robotsAllow === false) return "blocked";
  if (robotsAllow === true) return "allowed";

  // No rule specified
  return "not_specified";
}

Page-Level Directives

The worker detects these meta tags and headers:

<!-- Generic blocks (apply to all crawlers) -->
<meta name="robots" content="noindex" />
<meta name="robots" content="noai" />
<meta name="robots" content="noimageai" />

<!-- Crawler-specific blocks -->
<meta name="GPTBot" content="noindex" />
<meta name="ClaudeBot" content="nofollow, noindex" />

<!-- HTTP header equivalent -->
X-Robots-Tag: noindex, noai

robots.txt Parsing

Example robots.txt rules:

# Block specific crawlers
User-agent: GPTBot
Disallow: /

User-agent: PerplexityBot
Disallow: /private/

# Allow all others
User-agent: *
Allow: /

The parser extracts per-user-agent rules and applies path matching with longest-prefix-wins logic.

Advanced Usage

Adding Custom Crawlers

Extend the crawler list in src/constants/crawlers.ts:

export const AI_CRAWLERS: Array<{ id: string; platform: string }> = [
  { id: "GPTBot", platform: "ChatGPT" },
  { id: "ClaudeBot", platform: "Claude" },
  // Add your custom crawlers
  { id: "MyCustomBot", platform: "My AI Platform" },
  // ...
];

Batch Checking Multiple URLs

Create a new endpoint to check multiple URLs:

app.post("/batch-check", async (c) => {
  const { urls } = await c.req.json();
  const results = await Promise.all(urls.map((url) => checkVisibility(url)));
  return c.json({ results });
});

Adding Site-Wide Analysis

Fetch and parse the sitemap to check all pages:

import { parseSitemap } from "./lib/sitemap";

app.get("/site-check", async (c) => {
  const sitemapUrl = c.req.query("sitemap");
  const urls = await parseSitemap(sitemapUrl);
  // Check each URL...
});

FAQ

Use Cases

  • SEO Tools - Add AI crawler visibility checks to SEO audit dashboards
  • CMS Plugins - Integrate into WordPress/Drupal to show AI bot status
  • Privacy Compliance - Monitor which AI platforms can access your content
  • Analytics Dashboards - Track AI crawler access policies across sites
  • Browser Extensions - Show AI visibility status for the current page
  • Web Scraping Tools - Check if your scraper is allowed before crawling

Limitations

  • Configuration only: Does not verify actual indexing by AI platforms (no public APIs exist)
  • Static analysis: Does not execute JavaScript; only analyzes HTML and headers
  • No authentication: Cannot check auth-protected pages
  • robots.txt compliance: Assumes crawlers respect robots.txt (not legally enforced)
  • Limited crawlers: Only checks 14 known AI crawlers (list can be extended)
  • No sitemap parsing: Only checks individual URLs, not entire sitemaps

Deployment

Deploy

Follow the deployment wizard to deploy the Worker to your Cloudflare account. No additional configuration or bindings required.

Test your deployment

curl "https://your-worker.workers.dev/check?url=https://www.cloudflare.com"

Local Development

cd apps/experiments/ai-bot-visibility
npm install
npm run dev

Test locally:

curl "http://localhost:8787/check?url=https://www.cloudflare.com"

Configuration

No bindings or environment variables are required. The wrangler.json is minimal:

{
  "name": "ai-bot-visibility",
  "main": "src/index.ts",
  "compatibility_date": "2024-01-01"
}

Dependencies

{
  "dependencies": {
    "hono": "^4.6.12"
  },
  "devDependencies": {
    "@cloudflare/workers-types": "^4.20241127.0",
    "typescript": "^5.7.2",
    "wrangler": "^4"
  }
}

Cloudflare Features Used

  • Workers - Serverless execution environment
  • Fetch API - HTTP client for fetching pages and robots.txt
  • Edge network - Low-latency requests from global edge locations

Next Steps

On this page