New: Real-time hallucination alerts are live. Learn more →

LLM Metrix logoLLM Metrix
Back to Glossary
Definition

robots.txt

A plain-text file at your site root that tells web crawlers which pages they may or may not access. A direct gate on AI visibility: blocking GPTBot, PerplexityBot, or Googlebot in robots.txt removes your content from those engines' citation systems entirely. Should be audited regularly for unintended AI crawler blocks.

robots.txt is a plain-text file placed at the root of a website (yourdomain.com/robots.txt) that tells web crawlers — including search engine bots and AI crawlers — which pages or sections of your site they are and aren’t allowed to access.

Format

User-agent: [crawler name or * for all]
Disallow: [path to block]
Allow: [path to allow]

Example allowing all crawlers:

User-agent: *
Allow: /

Example blocking one AI crawler:

User-agent: GPTBot
Disallow: /

robots.txt and AI visibility

Your robots.txt is a direct gate on AI visibility for retrieval-based engines. Any path marked Disallow for a specific AI crawler user agent is invisible to that engine’s citation system. Common AI crawler user agents:

Crawler AI Engine
GPTBot ChatGPT / OpenAI
PerplexityBot Perplexity
anthropic-ai / ClaudeBot Claude / Anthropic
Googlebot Google Search + AI Overviews
bingbot Bing + Microsoft Copilot
OAI-SearchBot OpenAI search features

Common robots.txt mistakes that hurt AI visibility

Accidentally blocking AI crawlers: Some CDN security configurations add Disallow: / rules for unrecognized user agents — which catches AI crawlers. Check that your major content paths are accessible.

Overly broad rules: A rule like Disallow: /blog blocks your entire content library. Be specific: Disallow: /blog/drafts if you only want to block unpublished content.

Blocking CSS/JS: If robots.txt blocks style or script files, pages may render incorrectly for crawlers that execute JavaScript — affecting how your content is read.

How to audit

Visit yourdomain.com/robots.txt and review every rule. Test specific user agents and URLs with Google Search Console’s robots.txt Tester. Any unintended blocking of AI crawlers should be corrected promptly.

Ready to improve your AI visibility?

Put your knowledge into practice with step-by-step tutorials.