What is a Token?

A token is the basic unit of text that large language models process. Tokens aren't exactly words - they're pieces of words, whole words, or parts of punctuation. On average, one token equals about 3/4 of a word in English.

Understanding tokens matters because LLMs have token limits that determine how much text they can handle at once, and many AI tools charge based on token usage.

How tokenization works

LLMs break text into tokens before processing. Common words might be single tokens ("the," "is," "cat"). Less common words split into multiple tokens ("unbelievable" might become "un" + "believ" + "able").

Numbers, punctuation, and special characters also become tokens. The sentence "ChatGPT-4 costs $20/month" contains more tokens than it has characters because each element gets tokenized separately.

You don't need to understand the exact rules. Just know that roughly 100 tokens equals 75 words, or about 400 tokens per page of text.

Why tokens matter

Context windows measure in tokens. If an LLM has a 4,000 token context window, that's about 3,000 words combined for your prompt and its response. Exceed this and the model can't process your request.

Costs are calculated per token. When AI tools charge by usage, they're typically charging per 1,000 tokens processed. A 2,000-word blog post might cost a few cents in API tokens.

Speed relates to tokens. Generating 1,000 tokens takes longer than 100 tokens. When you see "tokens per second," that measures how fast the model generates output.

Token limits in practice

If you're writing a long blog post and providing a detailed content brief, you might hit token limits. The model can only "see" a certain amount of text at once.

This is why generating very long content often works better in sections. Write an outline first, then generate each section separately rather than trying to create a 3,000-word post in one go.

Some tools handle this automatically, breaking your request into smaller chunks that fit within token limits. Others require you to manage it manually.

Checking token count

Many AI tools show token usage as you type prompts. This helps you stay within limits and estimate costs.

OpenAI provides a tokenizer tool that shows exactly how text breaks into tokens. This can help you optimize prompts to fit more information into limited context windows.

Optimizing token usage

Be concise in prompts. Every unnecessary word in your instructions uses tokens that could be part of the response instead.

For cost optimization, generate shorter initial drafts then expand them rather than asking for very long output that might need substantial editing anyway.

When context windows are constrained, provide only essential information. A lean content brief that fits comfortably within limits often works better than an exhaustive one that leaves little room for the response.