robots.txt is a plain-text file placed at the root of a website (yourdomain.com/robots.txt) that tells web crawlers — including search engine bots and AI crawlers — which pages or sections of your site they are and aren’t allowed to access.
Format
User-agent: [crawler name or * for all]
Disallow: [path to block]
Allow: [path to allow]
Example allowing all crawlers:
User-agent: *
Allow: /
Example blocking one AI crawler:
User-agent: GPTBot
Disallow: /
robots.txt and AI visibility
Your robots.txt is a direct gate on AI visibility for retrieval-based engines. Any path marked Disallow for a specific AI crawler user agent is invisible to that engine’s citation system. Common AI crawler user agents:
| Crawler | AI Engine |
|---|---|
GPTBot |
ChatGPT / OpenAI |
PerplexityBot |
Perplexity |
anthropic-ai / ClaudeBot |
Claude / Anthropic |
Googlebot |
Google Search + AI Overviews |
bingbot |
Bing + Microsoft Copilot |
OAI-SearchBot |
OpenAI search features |
Common robots.txt mistakes that hurt AI visibility
Accidentally blocking AI crawlers: Some CDN security configurations add Disallow: / rules for unrecognized user agents — which catches AI crawlers. Check that your major content paths are accessible.
Overly broad rules: A rule like Disallow: /blog blocks your entire content library. Be specific: Disallow: /blog/drafts if you only want to block unpublished content.
Blocking CSS/JS: If robots.txt blocks style or script files, pages may render incorrectly for crawlers that execute JavaScript — affecting how your content is read.
How to audit
Visit yourdomain.com/robots.txt and review every rule. Test specific user agents and URLs with Google Search Console’s robots.txt Tester. Any unintended blocking of AI crawlers should be corrected promptly.