Every AI model has a maximum input size measured in tokens — roughly words and word-parts. This is the context window. Everything the model uses to generate a response, including conversation history, retrieved documents, system instructions, and the current message, must fit within this limit.
A legal review agent analyzing a 200-page contract cannot hold the entire document in context at once. The system uses chunking and retrieval to surface only the relevant sections for each specific question. Understanding context window limits is essential for designing systems that handle large documents or long conversations reliably — and for knowing when a retrieval approach is needed rather than trying to fit everything in at once.