Author: Kita Yohei Published: June 9, 2026
Understanding GEO strategy starts with knowing how AI works. This page organizes 17 concepts covering how AI processes, retrieves, generates, and cites information into a single structured technical map. The goal isn't to memorize definitions — it's to build a structural understanding of the answer to "why does AI cite specific content?"
1. Understanding the Basic Units
AI doesn't process text directly. It first splits text into minimum units called "tokens," converts them into numerical vectors through "Embedding," and handles them in blocks called "chunks." All of this processing occurs within a physical range called the "context window." This is why content structure, length, and information density matter in GEO strategy.
- Token
- The minimum unit AI uses to process text. Japanese is less token-efficient than English, affecting how quickly the context window is consumed.
- Chunk
- The unit into which RAG systems split content for retrieval and processing. Chunk design determines reference efficiency for AI.
- Embedding
- The process of converting text into numerical vectors. The foundational technology that enables AI to judge semantic similarity.
- Context Window
- The maximum number of tokens an LLM can process in one inference. Even large windows don't reference everything equally — the "Lost in the Middle" problem applies.
2. Understanding the Retrieval Process
When AI references external information, it uses an architecture called RAG (Retrieval-Augmented Generation). Content related to the user's query is retrieved, semantically ranked by cosine similarity, and then precisely filtered by reranking — this flow explains the "why" of GEO strategy.
- RAG (Retrieval-Augmented Generation)
- The collective term for the mechanism by which AI searches for and retrieves external information before generating a response. The technical backbone of GEO strategy.
- Retrieval
- The first phase of RAG. Retrieves content related to the query using vector search and similar methods. Content not retrieved cannot be cited.
- Vector Search
- Technology that searches for related documents based on semantic similarity of text. Searches by semantic proximity, not keyword matching.
- Cosine Similarity
- A metric that quantifies the semantic similarity between a query and content. The primary criterion for retrieval decisions in RAG.
- Reranking
- The process of re-evaluating and reordering candidate documents using a high-precision model after initial retrieval. Determines whether content is "selected" after being retrieved.
3. Understanding the Generation Process
Retrieved content is passed to the context window, and the LLM performs inference to generate a response. In this phase, "Grounding" occurs — AI generates responses based on the retrieved information. But "hallucination" (generating incorrect information) and the "Lost in the Middle" problem (middle content being less referenced) also occur in this phase.
- Inference
- The process by which a trained model receives input and generates a response. The place where the results of GEO strategy actually appear.
- Grounding
- The mechanism by which AI generates responses based on specific information sources. Information within the context window becomes eligible for Grounding.
- Hallucination
- The phenomenon where AI generates factually incorrect information during inference. Providing accurate, well-organized information reduces this risk.
- Lost in the Middle
- The phenomenon where information in the middle of the context window tends to be referenced less readily. Placing important information at the beginning is effective.
4. Understanding Why AI Cites
AI doesn't cite specific content by coincidence. Content that satisfies both "information density (how much meaning it carries)" and "Information Gain (how much new meaning it carries)" is what gets selected. AI also recognizes brands through the "Knowledge Graph" — a database of entity relationships — and "co-occurrence" is how that context forms. This section is the final piece connecting the technical map to GEO strategy.
- Information Density
- The concentration of meaning per unit of text. A design metric for improving AI information transmission efficiency by eliminating redundancy.
- Information Gain
- The new information value beyond AI's existing knowledge. If information density determines "how well it's communicated," Information Gain determines "why it gets cited."
- Knowledge Graph
- A database in which AI structures and manages entities and their relationships. The foundation for how brands are recognized.
- Co-occurrence
- The frequency and pattern with which a brand appears alongside specific themes and concepts. The meaning accumulation process that bridges schema declarations and external mentions.
Explore Other Categories
How AI works is one of five categories for understanding GEO strategy. Reading across categories connects the full picture.
→ Back to Glossary