New: Real-time hallucination alerts are live. Learn more →

LLM Metrix logoLLM Metrix
Back to Glossary
Definition

Chunking

The process of splitting long documents into smaller segments for RAG indexing — chunks are what retrieval systems actually retrieve, not whole pages. Heading structure and section clarity determine where chunk boundaries fall and which content gets cited.

Chunking is the process of splitting a long document into smaller segments before indexing it in a RAG retrieval system. Because LLMs have finite context windows and retrieval systems work most effectively with focused, semantically coherent units, documents are split into chunks of a few hundred to a few thousand tokens before being embedded and stored. Chunking strategy directly determines which parts of your content get retrieved and cited.

Why chunking matters for AI visibility

Retrieval systems don’t retrieve whole pages — they retrieve chunks. When a user asks a question, the system finds the most relevant chunks across all indexed documents and injects those chunks into context. Your full 3,000-word article may be split into six chunks; only one or two of those chunks may be retrieved for any given query.

This means:

  • Where you put your key claims matters — a claim buried in paragraph 12 may end up in a chunk that’s never retrieved for relevant queries
  • Each section should stand alone — if a chunk is retrieved without surrounding context, it needs to be intelligible and credible on its own
  • Your brand name should appear in every major section — not just the introduction, since chunks can be retrieved independently

Common chunking strategies

Strategy How it works Effect on your content
Fixed-size chunking Split every N tokens regardless of structure May cut mid-sentence or mid-idea
Semantic chunking Split at meaning boundaries (paragraphs, sections) More coherent chunks; respects content structure
Recursive chunking Split at headings, then paragraphs, then sentences Preserves hierarchy; most structure-aware
Sliding window Chunks overlap slightly Prevents information loss at boundaries

Most modern RAG systems use recursive or semantic chunking, which means HTML heading structure directly influences where chunks are cut. Well-marked <h2> and <h3> sections create natural, clean chunk boundaries.

Optimizing your content for chunking

  1. Use clear heading structureh2 and h3 tags create natural chunk boundaries in semantic chunking systems
  2. Open each section with a key claim — the first sentence of each section is most likely to survive chunking as the section intro
  3. Include your brand name per major section — a retrieved chunk that doesn’t mention your brand is a missed citation opportunity
  4. Avoid long preambles — if the first 300 words of your page are introduction and context before any substance, that chunk may be retrieved for generic queries but not specific ones
  5. Keep definitions self-contained — if you define a term, keep the definition in the same paragraph as the term, not split across heading boundaries

“Why is the AI quoting a weird excerpt from my page?”

If you see an AI response citing a specific sentence from your page that seems oddly out of context, chunking is the explanation. That sentence was in a chunk deemed highly relevant to the query — but the surrounding context that would make it feel natural wasn’t included. The fix is to make each paragraph more self-contained so any chunk reads clearly without its neighbors.

Ready to improve your AI visibility?

Put your knowledge into practice with step-by-step tutorials.