Search Verification & Indexing

robots.txt

Also known as Robots Exclusion Protocol · REP

A plain-text file at the site root that tells crawlers which paths they may or may not request.

What it is

robots.txt is a file served at the domain root (for example example.com/robots.txt) that uses User-agent and Disallow/Allow directives to control crawler access at the path level. It can also declare the location of your XML sitemap with a Sitemap: line.

Why it matters

A stray Disallow: / can hide your entire site from Google, Bing, and AI crawlers such as GPTBot, PerplexityBot, and Google-Extended, making your content ineligible for both search results and AI-generated answers. Conversely, a correct robots.txt guides crawlers efficiently and points them to your sitemap.

How to verify

Open yourdomain.com/robots.txt in a browser and confirm it returns a 200 status with no site-wide Disallow: / under User-agent: *. Use Search Console's robots.txt report or a fetch tool to confirm key URLs are allowed.

How to fix

Publish a robots.txt that allows crawling of production paths, removes any blanket Disallow: / left over from staging, and includes a Sitemap: line with the absolute sitemap URL. Keep separate, explicit rules for any AI crawlers you wish to allow or block rather than blocking everything by default.

In the checklist

This concept maps to a check in the GEO Score checklist.

Use the checklist

Related terms

Official references

External, opens in a new tab.

Put this into practice.

Work through every check by hand and turn it into a shareable GEO Score report — or scan your site automatically in seconds.