AI-crawler policy
Also known as AI bot policy · GPTBot / ClaudeBot rules
Explicit robots.txt rules for AI user agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) that deliberately allow or block them.
What it is
An AI-crawler policy is a set of named User-agent groups in robots.txt that addresses the bots used to train models or fetch live answers, such as GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and Google-Extended (Gemini and AI Overviews). Each group explicitly allows or disallows crawling, rather than relying on the default catch-all rule. Note that Google-Extended is a training/grounding token, not a separate crawler.
Why it matters
AI search and answer engines decide whether they may read, cite, or train on your content based on these tokens, so silence leaves the decision to each vendor's default. An explicit policy lets you opt into being a citable source for AI answers (allow) or protect proprietary content (disallow) deliberately, instead of by accident.
How to verify
Fetch curl -s https://example.com/robots.txt and look for User-agent lines naming GPTBot, ClaudeBot, PerplexityBot, and Google-Extended with their Allow/Disallow directives. Confirm there is no overly broad Disallow: / that unintentionally blocks every AI bot, and that the syntax matches each vendor's documented agent string.
How to fix
Add explicit User-agent blocks in robots.txt for each major AI agent and set Allow: / or Disallow: / according to your strategy, keeping the agent names exactly as documented. If you want AI visibility, allow the answer-engine bots while still gating sensitive paths, and remember robots.txt is advisory, so use auth or WAF rules for content that must be hard-blocked.
Related terms
- robots.txtA plain-text file at the site root that tells crawlers which paths they may or may not request.
- llms.txtA Markdown file at /llms.txt that gives LLMs a curated, plain-language map of your site and links to its most important content.
- Server-side rendering (SSR)Delivering the page's core text and markup in the initial HTML response, so it is readable without executing client-side JavaScript.
- Markdown content negotiationServing a clean Markdown representation of a page when a client requests it via the Accept: text/markdown header.
Official references
External, opens in a new tab.
Put this into practice.
Work through every check by hand and turn it into a shareable GEO Score report — or scan your site automatically in seconds.