Chunking is the process of splitting a long document into smaller segments before indexing it in a RAG retrieval system. Because LLMs have finite context windows and retrieval systems work most effectively with focused, semantically coherent units, documents are split into chunks of a few hundred to a few thousand tokens before being embedded and stored. Chunking strategy directly determines which parts of your content get retrieved and cited.
Why chunking matters for AI visibility
Retrieval systems don’t retrieve whole pages — they retrieve chunks. When a user asks a question, the system finds the most relevant chunks across all indexed documents and injects those chunks into context. Your full 3,000-word article may be split into six chunks; only one or two of those chunks may be retrieved for any given query.
This means:
- Where you put your key claims matters — a claim buried in paragraph 12 may end up in a chunk that’s never retrieved for relevant queries
- Each section should stand alone — if a chunk is retrieved without surrounding context, it needs to be intelligible and credible on its own
- Your brand name should appear in every major section — not just the introduction, since chunks can be retrieved independently
Common chunking strategies
| Strategy | How it works | Effect on your content |
|---|---|---|
| Fixed-size chunking | Split every N tokens regardless of structure | May cut mid-sentence or mid-idea |
| Semantic chunking | Split at meaning boundaries (paragraphs, sections) | More coherent chunks; respects content structure |
| Recursive chunking | Split at headings, then paragraphs, then sentences | Preserves hierarchy; most structure-aware |
| Sliding window | Chunks overlap slightly | Prevents information loss at boundaries |
Most modern RAG systems use recursive or semantic chunking, which means HTML heading structure directly influences where chunks are cut. Well-marked <h2> and <h3> sections create natural, clean chunk boundaries.
Optimizing your content for chunking
- Use clear heading structure —
h2andh3tags create natural chunk boundaries in semantic chunking systems - Open each section with a key claim — the first sentence of each section is most likely to survive chunking as the section intro
- Include your brand name per major section — a retrieved chunk that doesn’t mention your brand is a missed citation opportunity
- Avoid long preambles — if the first 300 words of your page are introduction and context before any substance, that chunk may be retrieved for generic queries but not specific ones
- Keep definitions self-contained — if you define a term, keep the definition in the same paragraph as the term, not split across heading boundaries
“Why is the AI quoting a weird excerpt from my page?”
If you see an AI response citing a specific sentence from your page that seems oddly out of context, chunking is the explanation. That sentence was in a chunk deemed highly relevant to the query — but the surrounding context that would make it feel natural wasn’t included. The fix is to make each paragraph more self-contained so any chunk reads clearly without its neighbors.