Crawl budget is the number of pages a web crawler (search engine bot or AI crawler) will fetch and process from your site within a given time period. It’s determined by the crawler’s server capacity allocation for your domain and your server’s ability to handle crawl requests.
Why crawl budget matters for AI visibility
For RAG-powered AI engines, only crawled and indexed content is eligible for retrieval and citation. If your most important pages are deprioritized or not crawled, they won’t appear in AI responses regardless of their content quality.
Crawl budget is especially relevant for:
- Large sites (10,000+ pages): Not all pages will be crawled equally frequently
- Sites with many low-value pages: Thin pages, parameter-generated URLs, and duplicate content consume crawl budget that could be spent on high-value content
- New content: Fresh pages may not be crawled for days or weeks on small-budget sites
AI crawlers and crawl budget
AI engines run their own crawlers with separate crawl budgets from Google:
- GPTBot (OpenAI): Crawls for ChatGPT and RAG systems
- PerplexityBot: Crawls for Perplexity’s retrieval index
- Anthropic-AI / ClaudeBot: Crawls for Claude’s live retrieval
- Googlebot: Crawls for Google Search and AI Overviews
Each of these bots has independent crawl budget allocations for your domain. Blocking one in robots.txt eliminates your eligibility for that engine’s citation system entirely.
Optimizing for AI crawl budget
- Ensure all high-value content is listed in your XML sitemap
- Use internal linking to direct crawl priority toward important pages
- Reduce or consolidate thin and duplicate content
- Verify crawl access for AI-specific user agents in robots.txt
- Improve page load speed (slow pages cost more crawl budget per page)