robots.txt
Also known as Robots Exclusion Protocol · REP
A plain-text file at the site root that tells crawlers which paths they may or may not request.
What it is
robots.txt is a file served at the domain root (for example example.com/robots.txt) that uses User-agent and Disallow/Allow directives to control crawler access at the path level. It can also declare the location of your XML sitemap with a Sitemap: line.
Why it matters
A stray Disallow: / can hide your entire site from Google, Bing, and AI crawlers such as GPTBot, PerplexityBot, and Google-Extended, making your content ineligible for both search results and AI-generated answers. Conversely, a correct robots.txt guides crawlers efficiently and points them to your sitemap.
How to verify
Open yourdomain.com/robots.txt in a browser and confirm it returns a 200 status with no site-wide Disallow: / under User-agent: *. Use Search Console's robots.txt report or a fetch tool to confirm key URLs are allowed.
How to fix
Publish a robots.txt that allows crawling of production paths, removes any blanket Disallow: / left over from staging, and includes a Sitemap: line with the absolute sitemap URL. Keep separate, explicit rules for any AI crawlers you wish to allow or block rather than blocking everything by default.
Related terms
- XML SitemapAn XML file listing a site's canonical URLs to help search engines discover and prioritize pages for crawling.
- noindex TagA directive, set via meta tag or HTTP header, that tells search engines to keep a page out of their index.
- AI-crawler policyExplicit robots.txt rules for AI user agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) that deliberately allow or block them.
- llms.txtA Markdown file at /llms.txt that gives LLMs a curated, plain-language map of your site and links to its most important content.
Official references
External, opens in a new tab.
Put this into practice.
Work through every check by hand and turn it into a shareable GEO Score report — or scan your site automatically in seconds.