New: Real-time hallucination alerts are live. Learn more →

LLM Metrix logoLLM Metrix
Back to Glossary
Definition

Tokenization

The process of breaking text into smaller units (tokens) that a language model processes — roughly words or word-pieces. Tokenization underlies how models read content and how limits like context windows are measured, since models think in tokens rather than characters or words.

Tokenization is the process of breaking text into smaller units — tokens — that a language model can process. A token is roughly a word or a word-piece; “visibility” might be one token, while a longer or rarer word splits into several.

Why it matters

Models don’t read characters or words the way people do — they read tokens. This has practical consequences:

  • Limits are measured in tokens. A model’s context window — how much it can consider at once — is counted in tokens, not pages.
  • Cost and speed scale with tokens. Longer inputs and outputs mean more tokens to process.
  • It shapes comprehension. How text is tokenized affects how the model interprets it, which is one reason clear, well-structured content is easier for models to use.

For AEO, you rarely interact with tokenization directly, but it underpins why concise, clearly written content is processed more reliably than dense or messy text. See how LLMs learn about brands.

Ready to improve your AI visibility?

Put your knowledge into practice with step-by-step tutorials.