robots.txt is a plain-text file placed at the root of a website (yourdomain.com/robots.txt) that tells web crawlers — including search engine bots and AI crawlers — which pages or sections of your site they are and aren’t allowed to access.

Format

User-agent: [crawler name or * for all]
Disallow: [path to block]
Allow: [path to allow]

Example allowing all crawlers:

User-agent: *
Allow: /

Example blocking one AI crawler:

User-agent: GPTBot
Disallow: /

robots.txt and AI visibility

Your robots.txt is a direct gate on AI visibility for retrieval-based engines. Any path marked Disallow for a specific AI crawler user agent is invisible to that engine’s citation system. Common AI crawler user agents:

Crawler	AI Engine
`GPTBot`	ChatGPT / OpenAI
`PerplexityBot`	Perplexity
`anthropic-ai` / `ClaudeBot`	Claude / Anthropic
`Googlebot`	Google Search + AI Overviews
`bingbot`	Bing + Microsoft Copilot
`OAI-SearchBot`	OpenAI search features

Common robots.txt mistakes that hurt AI visibility

Accidentally blocking AI crawlers: Some CDN security configurations add Disallow: / rules for unrecognized user agents — which catches AI crawlers. Check that your major content paths are accessible.

Overly broad rules: A rule like Disallow: /blog blocks your entire content library. Be specific: Disallow: /blog/drafts if you only want to block unpublished content.

Blocking CSS/JS: If robots.txt blocks style or script files, pages may render incorrectly for crawlers that execute JavaScript — affecting how your content is read.

How to audit

Visit yourdomain.com/robots.txt and review every rule. Test specific user agents and URLs with Google Search Console’s robots.txt Tester. Any unintended blocking of AI crawlers should be corrected promptly.

robots.txt

Format

robots.txt and AI visibility

Common robots.txt mistakes that hurt AI visibility

How to audit

Related Terms

Ready to improve your AI visibility?