New: Real-time hallucination alerts are live. Learn more →

LLM Metrix logoLLM Metrix
Back to Glossary
Definition

Crawl Budget

The number of pages a web crawler will fetch from your site within a given period. AI crawlers (GPTBot, PerplexityBot) have independent crawl budgets — pages not crawled are not eligible for AI citation. Wasted budget on duplicate or thin pages means important content may go unindexed.

Crawl budget is the number of pages a web crawler (search engine bot or AI crawler) will fetch and process from your site within a given time period. It’s determined by the crawler’s server capacity allocation for your domain and your server’s ability to handle crawl requests.

Why crawl budget matters for AI visibility

For RAG-powered AI engines, only crawled and indexed content is eligible for retrieval and citation. If your most important pages are deprioritized or not crawled, they won’t appear in AI responses regardless of their content quality.

Crawl budget is especially relevant for:

  • Large sites (10,000+ pages): Not all pages will be crawled equally frequently
  • Sites with many low-value pages: Thin pages, parameter-generated URLs, and duplicate content consume crawl budget that could be spent on high-value content
  • New content: Fresh pages may not be crawled for days or weeks on small-budget sites

AI crawlers and crawl budget

AI engines run their own crawlers with separate crawl budgets from Google:

  • GPTBot (OpenAI): Crawls for ChatGPT and RAG systems
  • PerplexityBot: Crawls for Perplexity’s retrieval index
  • Anthropic-AI / ClaudeBot: Crawls for Claude’s live retrieval
  • Googlebot: Crawls for Google Search and AI Overviews

Each of these bots has independent crawl budget allocations for your domain. Blocking one in robots.txt eliminates your eligibility for that engine’s citation system entirely.

Optimizing for AI crawl budget

  • Ensure all high-value content is listed in your XML sitemap
  • Use internal linking to direct crawl priority toward important pages
  • Reduce or consolidate thin and duplicate content
  • Verify crawl access for AI-specific user agents in robots.txt
  • Improve page load speed (slow pages cost more crawl budget per page)

Ready to improve your AI visibility?

Put your knowledge into practice with step-by-step tutorials.