New: Real-time hallucination alerts are live. Learn more →

LLM Metrix logoLLM Metrix
Back to Glossary
Definition

Token

The basic unit of text LLMs process — roughly 0.75 words each. Context windows, retrieval budgets, and API costs are all measured in tokens; understanding tokens explains why only portions of long pages get read by AI engines.

Token is the basic unit of text that LLMs process. Rather than reading word by word or character by character, LLMs split text into tokens — chunks that approximate syllables, words, or short word sequences depending on the tokenization algorithm. Understanding tokens is foundational to understanding context windows, retrieval limits, and why AI engines can only process a finite amount of text at once.

How tokenization works

A tokenizer converts raw text into a sequence of numeric IDs before the model processes it:

"AI visibility" → ["AI", " visibility"] → [15836, 12274]
"ChatGPT"       → ["Chat", "G", "PT"]  → [14126, 38, 2898]

The exact breakdown depends on the model’s vocabulary. Common English words are typically one token; uncommon words, proper nouns, and technical terms often split into two or more tokens. Emojis, code, and non-English text are usually less token-efficient (more tokens per character).

Rough token-to-word conversion

Tokens Approximate text
1 token ~0.75 words
100 tokens ~75 words
1,000 tokens ~750 words
4,000 tokens ~3,000 words (a typical web article)
128,000 tokens ~96,000 words (~a full novel)

A typical web page is 500–2,000 tokens. A context window of 128,000 tokens can theoretically fit dozens of full pages — but RAG systems rarely inject that much, prioritizing the most relevant content chunks.

Why tokens matter for AI visibility

Context budgets: Context windows are measured in tokens. When a RAG system retrieves your page and injects it into context, it consumes a portion of the model’s token budget. Systems with hard limits on how many tokens each retrieved source can consume may truncate long pages.

Chunking is token-based: RAG systems split documents into chunks of a fixed token size (e.g., 512 or 1,024 tokens) for indexing. Where a chunk boundary falls in your content affects which information gets retrieved and cited together.

Brand name tokenization: If your brand name tokenizes into multiple tokens (especially if it’s an invented word or abbreviation), make sure it appears in text exactly as you want it cited — tokenization quirks can occasionally cause models to abbreviate or alter brand names in unexpected ways.

Tokens and cost

For teams building custom AI tools on top of LLM APIs (e.g., an internal search or support bot), token count directly affects API cost — both input tokens (the context you send) and output tokens (the response generated). This is why retrieval systems are selective about how many tokens of source content they inject.

Ready to improve your AI visibility?

Put your knowledge into practice with step-by-step tutorials.