AI検索の仕組み

仕組み 2026-06-04

著者：喜多陽平 / Kita Yohei

公開日：2026年5月25日

AI検索の仕組みとは｜LLMが情報を収集・解釈するプロセス

AI検索の仕組みとは、LLMがウェブ上の構造化された情報を収集・学習し、ユーザーの質問に対してひとつの回答として統合・出力するプロセスの総体です。

AIが自社ブランドを認識するかどうかは、検索順位ではなく「AIがサイトの情報を正しく読み取れるか」によって決まります。

AI検索の仕組みを理解することが、GEO対策を正しく設計するための出発点です。

この記事でわかること

AI-botトラフィックの構成（トレーニングクローラーとリアルタイム検索）
LLMが情報を信頼するシグナルとは何か
約27%のサイトが意図せずAIに見えない理由
AI検索の仕組みがGEO対策に与える示唆

AI検索の概要についてはAI検索とはをご覧ください。

1. AI検索が情報を集める2つの経路

AI検索は、情報を収集する経路が大きく2つに分かれています。

AI検索の情報収集経路
経路	内容	主な利用AI
トレーニングクローラー	LLMの学習データを収集するボット。ウェブサイトを巡回して情報を取得し、モデルの学習に使用する	全LLM共通（学習段階）
リアルタイム検索	ユーザーの質問に答える際に、リアルタイムでウェブを検索して情報を取得する	ChatGPT・Gemini等の一部バージョン

どちらの経路においても、AIが正しく読み取れるコンテンツ構造を整えることがGEO対策の基盤となります。

2. AI-botトラフィックの実態

私が海外のリサーチを読み込む中で、特に印象に残ったのがAI-botトラフィックの構成比率です。

LightSite AIの調査（2026年2月）では、数十のドメインにまたがる数百万件のAI-botリクエストを分析した結果、観測されたAI-botトラフィックの約90%がトレーニングクローラーによるものであることが明らかになったとされています。

AI-botトラフィックの内訳
※公開情報をもとに自社で作成（出典：LightSite AI, 2026年2月）

この比率が示すのは、AIへの露出の大部分がリアルタイムの検索ではなく、学習段階での情報取得によって決まるということです。
AIに正しく認識されるためには、学習クローラーが読み取りやすい形でコンテンツを整備することが重要とされています。

3. LLMが信頼するシグナルとは

LightSite AIの調査では、LLMが情報を正確に解釈・保持するために重要なシグナルも明らかになっています。

同研究によると、エンティティ関係・製品コンテキスト・ブランドのポジショニングが明確に構造化されている場合、AIシステムはより一貫して信頼性の高い回答を返す傾向があるとされています。

LightSite AI創業者のStas Levitan氏はこう述べています。

Brand visibility in AI answers begins with whether systems can consistently interpret structured information. If interpretation fails at the infrastructure level, recommendations become inconsistent or disappear entirely.

引用元：Stas Levitan / LightSite AI

「AIの回答におけるブランドの可視性は、システムが構造化情報を一貫して解釈できるかどうかから始まる。インフラレベルで解釈に失敗すると、推奨は不安定になるか、完全に消える」という見方が、海外では広まっています。

LLMが信頼するシグナルとして有効とされているのは、主に以下の3点です。

エンティティの明確化：企業名・製品名・カテゴリの関係性が明示されていること
構造化データの整備：LLMが機械的に読み取れる形式でコンテンツが整理されていること
質問形式のURL設計：質問に答える形のURLを持つページはAI-botのエンゲージメント率が高い傾向があるとされています

4. なぜ27%のサイトがAIに見えないのか

LightSite AIの調査では、見落とされがちな重要な問題も明らかになっています。

約27%のウェブサイトが、セキュリティ設定などの理由で、少なくとも1つの主要なLLMボットを意図せずブロックしているとされています。
ブロックされたボットは情報を取得できないため、そのブランドはAIの学習データに反映されにくくなります。

意図せずLLMボットをブロックしているウェブサイトの割合
※公開情報をもとに自社で作成（出典：LightSite AI, 2026年2月）

自社サイトがLLMボットをブロックしていないかを確認することは、GEO対策の前提条件のひとつとされています。
robots.txtやセキュリティ設定を見直し、主要なLLMクローラー（GPTBot・Google-Extended等）がアクセスできる状態を維持することが重要です。

5. AI検索の仕組みがGEOに与える示唆

AI検索の仕組みを理解することで、GEO対策の優先順位が明確になります。ポイントは3点です。

コンテンツの構造が最優先：LLMは構造が整理されたコンテンツを優先して解釈するとされています。定義文・FAQ・比較表を組み込むことが、AI検索での可視性向上に直結するとされています
学習データへの反映が鍵：AI-botトラフィックの約90%がトレーニングクローラーであることを踏まえると、リアルタイム検索対策だけでなく、学習段階での情報整備が重要とされています
LLMボットのアクセス確認：約27%のサイトが意図せずLLMをブロックしている実態を踏まえ、自社サイトのクローラーアクセス設定を確認することが推奨されています

GEOの具体的な始め方についてはGEO対策の始め方をご覧ください。
自社コンテンツのAI引用状況は、GEOツールのGenviewでモニタリングできます。

よくある質問

Q: AIはリアルタイムでウェブを検索して回答しているのですか？: A: AIによって異なります。ChatGPTやGeminiの一部バージョンはリアルタイム検索機能を持ちますが、LightSite AIの調査では観測されたAI-botトラフィックの約90%がトレーニングクローラーによるものとされています。リアルタイム検索だけでなく、学習段階での情報整備が重要です。
Q: 自社サイトがLLMに読み取られているか確認する方法はありますか？: A: robots.txtを確認し、主要なLLMクローラー（GPTBot・Google-Extended等）がブロックされていないかを確認することから始めます。約27%のサイトが意図せずLLMをブロックしているというデータもあるため、定期的な確認が推奨されます。
Q: 構造化データを整備するとAI検索での可視性は上がりますか？: A: 可視性向上につながるとされています。LightSite AIの研究では、エンティティ関係・製品コンテキスト・ブランドのポジショニングが明確に構造化されている場合、AIシステムはより一貫して情報を解釈できるとされています。定義文・FAQ・スキーママークアップの整備が有効です。

参考文献・調査ソース

Digital Journal / LightSite AI（Stas Levitan）：LightSite AI Research Examines How Large Language Models Determine Brand Trust

Author: Kita Yohei

Published: May 25, 2026

How AI Search Works | The Process of LLM Information Collection and Interpretation

How AI search works refers to the complete process by which LLMs collect and learn from structured information on the web, then synthesize and output that information as a single response to a user's question.

Whether AI systems recognize your brand is determined not by your search ranking, but by whether AI can accurately read and interpret your site's content.

Understanding how AI search works is the foundation for designing an effective GEO strategy.

What You'll Learn in This Article

The two pathways through which AI search collects information
What signals LLMs use to determine trustworthiness
Why approximately 27% of websites are invisible to AI
What AI search mechanics mean for GEO strategy

For a broader overview, see What Is AI Search?

1. The Two Pathways AI Search Uses to Collect Information

AI search collects information through two distinct pathways.

AI Search Information Collection Pathways
Pathway	What It Does	Primary Users
Training crawlers	Bots that collect data for LLM training by crawling websites and retrieving content used in model learning	All LLMs (during training phase)
Real-time search	When answering a user's question, the system retrieves information from the web in real time	Select versions of ChatGPT, Gemini, and others

In both pathways, structuring content so AI can accurately read and interpret it forms the foundation of GEO strategy.

2. The Reality of AI-Bot Traffic

As I've been reading international research on AI search, the breakdown of AI-bot traffic was one of the most striking data points I encountered.

According to research published by LightSite AI (February 2026), analysis of millions of AI-bot requests across dozens of domains found that approximately 90% of observed AI-bot traffic originates from training crawlers.

AI-Bot Traffic Breakdown
※ Created in-house based on publicly available information (Source: LightSite AI, February 2026)

What this ratio tells us is that the majority of AI visibility is determined not by real-time search, but by how information is ingested during the training phase.
Structuring content in a way that training crawlers can accurately read is considered a critical foundation for AI visibility.

3. What Signals LLMs Trust

The LightSite AI research also identified the signals that help LLMs interpret and retain brand information reliably.

According to the research, AI systems respond more consistently and accurately when entity relationships, product context, and brand positioning are clearly structured.

Stas Levitan, Founder of LightSite AI, stated:

Brand visibility in AI answers begins with whether systems can consistently interpret structured information. If interpretation fails at the infrastructure level, recommendations become inconsistent or disappear entirely.

Source: Stas Levitan / LightSite AI

The view that AI brand visibility starts at the infrastructure level — before any optimization tactic — is gaining traction internationally.

The following three elements are considered effective for building the signals LLMs trust:

Entity clarity: Clearly stating relationships between company name, product names, and categories
Structured data: Organizing content in machine-readable formats that LLMs can parse reliably
Question-oriented URL structure: Pages with URLs formatted as questions tend to show higher AI-bot engagement rates

4. Why 27% of Websites Are Invisible to AI

The LightSite AI research also surfaced an often-overlooked issue that significantly impacts AI visibility.

Approximately 27% of websites unintentionally block at least one major LLM bot, often due to security configurations.
When a bot is blocked, it cannot retrieve information — meaning that brand is underrepresented or absent from the AI's training data.

Share of Websites Unintentionally Blocking LLM Bots
※ Created in-house based on publicly available information (Source: LightSite AI, February 2026)

Confirming that your site is not blocking LLM bots is considered a prerequisite for any GEO strategy.
Reviewing your robots.txt settings and security configurations to ensure major LLM crawlers — such as GPTBot and Google-Extended — have access is a recommended first step.

5. What AI Search Mechanics Mean for GEO

Understanding how AI search works makes it possible to prioritize GEO efforts more effectively. Three key implications stand out.

Content structure is the top priority: LLMs are considered to interpret well-structured content more reliably. Adding definitions, FAQs, and comparison tables is considered directly linked to improved AI search visibility.
Training-phase visibility matters most: With approximately 90% of AI-bot traffic coming from training crawlers, optimizing for the learning phase — not just real-time search — is considered essential.
Check your crawler access settings: Given that approximately 27% of sites unintentionally block LLM bots, auditing your site's crawler accessibility is a recommended first step.

For a practical guide to GEO implementation, see How to Get Started with GEO.
With the GEO tool Genview, you can monitor how often your content is being cited by AI systems.

Frequently Asked Questions

Q: Is AI searching the web in real time to generate answers?: A: It depends on the AI platform. Some versions of ChatGPT and Gemini have real-time search capabilities, but LightSite AI research found that approximately 90% of observed AI-bot traffic comes from training crawlers. Optimizing for the training phase is considered just as important as optimizing for real-time search.
Q: How can I check whether LLMs are able to read my site?: A: Start by reviewing your robots.txt file to confirm that major LLM crawlers — such as GPTBot and Google-Extended — are not blocked. Given that approximately 27% of websites unintentionally block at least one LLM bot, regular checks are recommended.
Q: Does adding structured data improve visibility in AI search?: A: Yes, it is considered to improve visibility. LightSite AI research found that AI systems interpret and retain information more reliably when entity relationships, product context, and brand positioning are clearly structured. Adding definitions, FAQs, and schema markup is considered effective.

References and Sources

Digital Journal / LightSite AI (Stas Levitan): LightSite AI Research Examines How Large Language Models Determine Brand Trust

← GEOを理解するに戻る