AIクローラーは3種類＋土台。GEO対策の正解は「特定Bot最適化」ではなく「全Bot対応」にある

コラム 2026-06-09

公開日：2026年05月25日

AIクローラーは3種類＋土台で構成される｜GEO対策は「全Bot対応」にある

AIがWebサイトの情報を取得するBotは、目的とタイミングによって①インデックス型・②学習型・③代理アクセス型の3種類に分類できます。それらの土台として、従来の検索エンジン型Botが存在します。

GEO対策とは「特定のBotへの最適化」ではなく、「どのBotが来ても引用・理解・学習されやすい状態を作ること」だと整理できます。つまり、GEOの正解は「全Bot対応」にあります。

この記事は、Genviewを開発する中で私自身が整理した「理解の地図」をそのまま公開するものです。現時点では各社の公式ドキュメントと海外の検証データを参照した整理であり、Genview上での自社検証（Botの来訪頻度・引用率の変化など）は今後順次追記していきます。

この記事でわかること

AIクローラーBotの「3＋1」分類とそれぞれの動き方
インデックス型・学習型・代理アクセス型ごとの対策の方向性
各Botの来訪頻度と海外調査データ
どのBotにも共通して効くGEO対策の基本設計

なぜBot分類が必要なのか

GEO対策の「なぜ」を理解するには、AIがどうやってWebから情報を取っているかを知る必要があります。

GEO対策を調べていると「FAQ構造が大事」「定義文を入れろ」「スキーマを実装しろ」という情報が飛び交っています。どれも正しいのですが、なぜその対策が効くのかの説明がないことが多い。その「なぜ」を理解するには、AIがどうやってWebから情報を取っているかを知る必要があります。そこで登場するのがAIクローラーBotです。

2026年時点で主要なAIクローラーは10種類以上存在し、それぞれ動き方・目的・評価軸がまったく異なります。これを整理せずに対策しても、効果の根拠が曖昧なままです。

「3＋1」分類の概要

AIクローラーBotは、大きく以下の4カテゴリに整理できます。各分類は「動きのタイミング」と「目的」によって明確に区別されます。

AIクローラーBotの「3＋1」分類
分類	代表的なBot	動きのタイミング
① インデックス型	OAI-SearchBot / PerplexityBot / Claude-SearchBot	常時巡回→質問時に参照
② 学習型	GPTBot / Google-Extended / ClaudeBot	長期的に収集→AIの知識へ
③ 代理アクセス型	ChatGPT-User / Claude-User	質問の瞬間にオンデマンド取得
【別レイヤー】従来型	Googlebot / Bingbot	従来の検索エンジン（全対策の土台）

それぞれの分類で動き方・評価軸・対策の方向性が異なります。以下で各分類を詳しく解説します。

① インデックス型：事前収集して質問時に参照する

OAI-SearchBot（OpenAI）・PerplexityBot・Claude-SearchBot（Anthropic）が該当します。ユーザーが検索する前から常時Webを巡回して独自のデータベースを構築し、ユーザーの質問時にそこから高速で情報を取り出して回答を生成します。

OAI-SearchBot

OAI-SearchBotはChatGPT Search向けの独自インデックス構築に利用されるBot。OpenAIの公式ドキュメントに明示されている通り、学習用のGPTBotとは独立した別のBotです。なお、ChatGPT SearchはOAI-SearchBotによる独自収集のほか、Bingのインデックスやリアルタイム取得も組み合わせたハイブリッド型の情報取得を行っています。

PerplexityBot（2026年5月時点）

PerplexityBotはPerplexity AIが運営する検索・回答エンジンの情報源。2026年5月時点の公式説明では、基盤モデル学習ではなく検索インデックス用途とされています（AI企業は仕様を変更することがあるため、最新の公式情報もあわせてご確認ください）。

Claude-SearchBot（2026年5月時点）

2026年5月時点のAnthropic公式ドキュメントでは、Claude-SearchBotは「検索インデックス用のBot」として明確に分類されており、代理アクセス型のClaude-Userとは役割が異なります。

インデックス型への対策の方向性

これらのBotは「引用しやすい構造」を最重視します。具体的には以下の4点です。

見出し（H2/H3）の直下に結論を置くBLUF（Bottom Line Up Front）構造
FAQ形式による質問と回答のセット
数字・出典が明確な一次情報
ドメインの信頼性（被引用実績・権威性）

「HTMLテキストの構造の綺麗さだけが対策」ではなく、ドメインそのものの信頼性も大きく影響します。きれいなHTMLが書けていても、信頼性の低いドメインからは引用されにくい傾向があります。

② 学習型：長期的にAIの知識になる

GPTBot（OpenAI）・ClaudeBot（Anthropic）・Google-Extended（Google）が該当します。数ヶ月〜数年単位でWebのデータを収集し、AIモデルの基礎知識（パラメータ）として学習させるためのBotです。

厳密には、GPTBotとClaudeBotは独立したクローラーですが、Google-Extendedはクローラーそのものではなく「Googleのクローラーに対してAI学習利用を許可するかどうかを制御するUser-agentトークン」という性質を持ちます。この記事では便宜上「学習利用系」としてまとめていますが、技術的な位置づけが異なる点には留意してください。

公式ドキュメントからの根拠

OpenAI公式：GPTBotについて「OpenAIの生成AIの基盤モデルをより有用かつ安全にするために使用する」と明記
Google公式：Google-Extendedについて「GeminiモデルおよびVertex AIのトレーニングと、AI回答のグラウンディングに使用する」と説明

Grok系Bot（2026年5月時点）

Grok系Bot（xAI）については、2026年5月時点で公式ドキュメントが一切公開されていません。X（旧Twitter）上でのリアルタイムの言及やトレンドが影響すると推測されていますが、詳細は不明であり、この記事の情報もすべて推測ベースです。

学習型への対策の方向性

サイト全体の専門性（ハブ＆スポーク構造）
概念（エンティティ）の網羅的な定義
著者・組織のE-E-A-T（経験・専門性・権威性・信頼性）
論理の一貫性・用語の統一

学習型Botへの対策は「今すぐ引用される」ことよりも、「AIの知識の中にどう刻まれるか」という長期的な視点が必要です。

③ 代理アクセス型：質問の瞬間にページを取りに来る

ChatGPT-User（OpenAI）とClaude-User（Anthropic）が該当します。事前のデータベースを持たず、ユーザーがプロンプトで「このURLを読み込んで」と指示したその瞬間に、ユーザーの身代わりとしてピンポイントで対象ページへ直接アクセスします。

Anthropicは2026年2月に公式ドキュメントを更新し、ClaudeBot（学習用）・Claude-SearchBot（検索インデックス用）・Claude-Userの3Bot体制と各Botの役割を明示しました。このうち代理アクセス型に該当するのはClaude-Userであり、「Claudeのユーザーが質問した際に、ウェブサイトへのアクセスを支援するBot」と定義されています。Claude-SearchBotは①インデックス型に分類される別のBotです。

代理アクセス型への対策の方向性

H2/H3見出しごとに「その見出しの答え」が直下にある構造
長文でも崩れないセマンティックHTML（正しい見出し階層）
JavaScriptに依存しない本文テキストの記述

【別レイヤー】従来型検索エンジン：すべての土台

GooglebotとBingbotは、AI検索用ではなく従来の検索エンジンのためのBotです。しかし、OAI-SearchBotはBingの検索APIを裏側で利用するハイブリッド型であることが知られており、PerplexityなどもGoogleやBingのインデックスを補助的に利用していると見られています（公式による全面的な明示はなし）。

つまり、従来のSEO（Googlebot対策）ができていないサイトは、そもそもAI検索の選択肢にすら上がらないという構造があります。これがGooglebotを「別レイヤー」として独立させた理由です。

Botの来訪頻度：海外データで見えてきたこと

各Botがどのくらいの頻度でサイトを訪れるのか。現時点では海外の観測データを参照します。

Cloudflare AIクローラートラフィック調査（2025年10〜11月）

Cloudflareが2025年10〜11月のHTMLリクエストを分析したデータです。ユニークページの到達率を比較すると、Googlebotが他のAIクローラーと比べて圧倒的なカバレッジを持っていることがわかります。

Cloudflare調査：Botごとのユニークページ到達率（2025年10〜11月）
Bot	ユニークページ到達率
Googlebot	11.6%
GPTBot	3.6%
ClaudeBot	2.4%
PerplexityBot	0.06%

Googlebotのカバレッジは他のAIクローラーと比較して圧倒的であり、これが「Googlebot＝全AI対策の土台」という整理の根拠のひとつです。ソース：Cloudflare AIクローラートラフィック調査（Search Engine Journal転載）／元のCloudflareブログ

12サイト・30日間のサーバーログ調査（2026年3〜4月）

海外の実態調査（12サイト・30日間のサーバーログ分析）による1サイトあたりの1日の平均ヒット数です。GPTBotが最も頻繁にアクセスしており、次いでClaudeBotとなっています。

12サイト・30日間サーバーログ調査：1サイトあたり1日の平均ヒット数（2026年3〜4月）
Bot	1日あたり平均ヒット数
GPTBot	4,200件
ClaudeBot	1,800件
PerplexityBot	980件
Google-Extended	540件

また、GPTBotは高トラフィックページを平均2.4日ごとに再訪問するのに対し、ClaudeBotは平均6.8日ごとと、頻度に大きな差があります。頻繁に更新するサイトほどGPTBotに先に拾われる傾向があることが示されています。

AI検索流入の急成長

AI検索経由の訪問数は、2025年Q1から2026年Q1の1年間で156億件から274億件へ42.8%増加したというデータも出ています。AI検索はすでに「将来の話」ではなく現在進行形のチャネルです。

※上記データはいずれも海外サイトを対象とした調査です。日本語サイトにおける実態については、Genviewでの自社検証を進めており、データがまとまり次第この記事に追記します。

3分類に共通して効く対策

各Botの動き方は異なりますが、どのBotにも共通して効果がある対策が存在します。これがGEO対策の基本設計です。以下の表では、共通対策とその理由を整理しています。

どのBotにも共通して効くGEO対策
対策	なぜ全Botに効くか
FAQ・定義文の設置	インデックス型も代理アクセス型も「答えやすい構造」を好む
見出し（H2/H3）＋結論配置	どのBotもセクション単位で情報を取得する
一次情報・出典の明示	学習型も引用型も信頼性を評価軸に持つ
セマンティックHTML	JS除去後も意味が伝わる構造はすべてのBotに有効
E-E-A-T・著者情報	学習型は特に重視。インデックス型も信頼性判断に使う

GEO対策は「どのBotに何をするか」を個別最適化するのではなく、「どのBotが来ても対応できる状態を作る」ことが本質です。Bot分類の知識は「なぜその対策が必要か」を理解するための地図として使うのが正しい使い方だと考えています。

今後Genviewで検証すること

この記事に書いた内容は、現時点では公式ドキュメントと海外の観測データをもとにした整理です。Genviewの開発・運用を通じて、以下の検証を自社で進めていきます。

各AIクローラーBotの来訪頻度（日本語サイト・規模別）
Botの来訪からAI引用までのタイムラグ
FAQ形式の文章構造の有無による引用率の変化（FAQPage構造化データを含む場合あり）
セマンティックHTML実装前後でのBot挙動の差
ClaudeBot・GPTBot・PerplexityBotで引用されるコンテンツの構造的な違い

検証データがまとまり次第、この記事に追記または別記事として公開します。「GEO対策ツールを作っている人間が、一番深くGEOを理解している」という姿勢で発信を続けます。

よくある質問

Q: GEO対策はSEO対策と別でやる必要がありますか？: A: 別ではなく、SEOが土台になります。Googlebotにインデックスされていないページは、AI検索の対象にもなりにくい構造があります。まずSEOの基盤を整えた上で、GEO対策を重ねるのが正しい順序です。
Q: robots.txtでBotをブロックすべきですか？: A: 学習用Bot（GPTBot・ClaudeBot）のブロックは、AI回答での引用・検索への掲載には影響しません。ただし、インデックス型Bot（OAI-SearchBot・PerplexityBot）をブロックすると、ChatGPT SearchやPerplexityの検索結果から除外されます。目的に応じて個別に設定することを推奨します。
Q: PerplexityBotはGoogleやBingと何が違いますか？: A: GooglebotやBingbotは従来の検索エンジンのためのBotで、ユーザーに検索結果リンクを返します。PerplexityBotはAI回答生成のための情報収集を行い、回答内に引用URLとして表示される点が異なります。引用元として明示されるため、クリック流入に直結しやすい特性があります。

参照・出典

Author: Kiyoto Yoshida (CMO, FID Inc. / PM, Genview)

Last updated: May 25, 2026

AI Crawlers Fall Into 3 Types Plus a Foundation | The Answer to GEO Strategy Is "All-Bot Coverage"

Bots that retrieve information from websites for AI can be classified into three types based on their purpose and timing: ① index-type, ② learning-type, and ③ proxy-access type. Underlying all of these is the traditional search engine bot as the foundational layer.

GEO strategy is not about "optimizing for a specific bot," but rather about "creating a state in which any bot can easily cite, understand, and learn from your content." In other words, the correct answer for GEO is "all-bot coverage."

This article publicly shares the "map of understanding" that I developed while building Genview. At this point, it is an organized overview based on official documentation from each company and overseas verification data. Our own verification through Genview (bot visit frequency, changes in citation rates, etc.) will be added to this article incrementally going forward.

What You Will Learn From This Article

The "3+1" classification of AI crawler bots and how each type behaves
The direction of strategy for each of index-type, learning-type, and proxy-access type
Visit frequency data for each bot and findings from overseas research
The foundational GEO strategy design that works for all bots

Why Bot Classification Is Necessary

Understanding the "why" of GEO strategy requires knowing how AI retrieves information from the web.

When you research GEO strategy, information flies around: "FAQ structure is important," "include definition statements," "implement schemas." All of this is correct, but explanations of why those measures are effective are often absent. Understanding that "why" requires knowing how AI retrieves information from the web. That is where AI crawler bots come in.

As of 2026, there are more than 10 major AI crawlers, each with completely different behaviors, purposes, and evaluation axes. Implementing measures without organizing this leaves the rationale for their effectiveness vague.

Overview of the "3+1" Classification

AI crawler bots can broadly be organized into the following four categories. Each classification is clearly distinguished by its "timing of activity" and "purpose."

The "3+1" Classification of AI Crawler Bots
Classification	Representative Bots	Timing of Activity
① Index-type	OAI-SearchBot / PerplexityBot / Claude-SearchBot	Constantly crawling → referenced at query time
② Learning-type	GPTBot / Google-Extended / ClaudeBot	Long-term collection → becomes AI knowledge
③ Proxy-access type	ChatGPT-User / Claude-User	On-demand retrieval at the moment of the query
[Separate layer] Traditional	Googlebot / Bingbot	Traditional search engines (the foundation of all strategy)

The behaviors, evaluation axes, and strategic directions differ for each classification. Each is explained in detail below.

① Index-type: Pre-collected and Referenced at Query Time

OAI-SearchBot (OpenAI), PerplexityBot, and Claude-SearchBot (Anthropic) fall into this category. They constantly crawl the web before users search to build their own databases, then rapidly retrieve information from those databases to generate responses when a user poses a question.

OAI-SearchBot

OAI-SearchBot is the bot used to build the proprietary index for ChatGPT Search. As explicitly stated in OpenAI's official documentation, it is a separate bot from GPTBot, which is used for training. ChatGPT Search also employs a hybrid information retrieval approach, combining OAI-SearchBot's proprietary collection with Bing's index and real-time retrieval.

PerplexityBot (as of May 2026)

PerplexityBot is the information source for the search and answer engine operated by Perplexity AI. As of May 2026, its official description states that it is used for search indexing rather than foundational model training. (Since AI companies may change their specifications, please also verify the latest official information.)

Claude-SearchBot (as of May 2026)

As of May 2026, Anthropic's official documentation clearly classifies Claude-SearchBot as "a bot for search indexing," distinguishing its role from Claude-User, which is a proxy-access type bot.

Strategic Direction for Index-type Bots

These bots prioritize "structures that are easy to cite." Specifically, four key areas:

BLUF (Bottom Line Up Front) structure with conclusions placed immediately below headings (H2/H3)
FAQ format pairing questions and answers
Primary information with clear figures and sources
Domain credibility (citation track record and authority)

Strategy is not limited to "clean HTML text structure" — the credibility of the domain itself has a significant impact. Even with clean HTML, content from low-credibility domains tends to be less likely to be cited.

② Learning-type: Becomes AI Knowledge Over the Long Term

GPTBot (OpenAI), ClaudeBot (Anthropic), and Google-Extended (Google) fall into this category. They collect web data over months to years to train AI models as foundational knowledge (parameters).

Strictly speaking, GPTBot and ClaudeBot are independent crawlers, but Google-Extended is not a crawler itself — it is a "User-agent token that controls whether Googlebot's crawling is permitted for AI learning purposes." Although this article groups them together under "learning-related" for convenience, please note that their technical positioning differs.

Evidence From Official Documentation

OpenAI official: Explicitly states that GPTBot is "used to make OpenAI's generative AI foundation models more helpful and safe."
Google official: States that Google-Extended is "used for training Gemini models and Vertex AI, and for grounding AI responses."

Grok-type Bots (as of May 2026)

As of May 2026, no official documentation whatsoever has been published for Grok-type bots (xAI). It is speculated that real-time mentions and trends on X (formerly Twitter) have an influence, but details are unknown, and all information in this article about them is inference-based.

Strategic Direction for Learning-type Bots

Site-wide expertise (hub-and-spoke structure)
Comprehensive definition of concepts (entities)
Author and organizational E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness)
Logical consistency and unified terminology

Strategy for learning-type bots requires a long-term perspective focused not on "being cited right now," but on "how to become embedded in AI's knowledge."

③ Proxy-access type: Retrieves Pages at the Moment of the Query

ChatGPT-User (OpenAI) and Claude-User (Anthropic) fall into this category. Without maintaining a prior database, at the exact moment a user instructs "load this URL" in a prompt, they directly access the target page as a proxy for the user.

In February 2026, Anthropic updated its official documentation to explicitly state the three-bot structure of ClaudeBot (for learning), Claude-SearchBot (for search indexing), and Claude-User, along with the role of each bot. Of these, the proxy-access type is Claude-User, defined as "the bot that assists in accessing websites when Claude's users ask questions." Claude-SearchBot is a separate bot classified as ① index-type.

Strategic Direction for Proxy-access Type Bots

Structure in which "the answer to that heading" appears immediately below each H2/H3 heading
Semantic HTML that holds up even in long-form content (correct heading hierarchy)
Body text written without JavaScript dependency

[Separate Layer] Traditional Search Engines: The Foundation of Everything

Googlebot and Bingbot are bots for traditional search engines, not AI search. However, it is known that OAI-SearchBot uses Bing's search API as a hybrid approach behind the scenes, and Perplexity and others are also understood to supplementally use Google and Bing indexes (not fully disclosed officially).

In other words, sites that have not completed traditional SEO (Googlebot strategy) may not even enter the selection pool for AI search. This is why Googlebot was separated as a "different layer."

Bot Visit Frequency: Insights From Overseas Data

How frequently does each bot visit a site? For now, we refer to overseas observational data.

Cloudflare AI Crawler Traffic Study (October–November 2025)

The following table shows data from Cloudflare's analysis of HTML requests from October to November 2025. Comparing unique page reach rates, Googlebot has an overwhelmingly larger coverage compared to other AI crawlers.

Cloudflare Study: Unique Page Reach Rate by Bot (October–November 2025)
Bot	Unique Page Reach Rate
Googlebot	11.6%
GPTBot	3.6%
ClaudeBot	2.4%
PerplexityBot	0.06%

Googlebot's coverage is overwhelmingly larger compared to other AI crawlers, which is one of the bases for the positioning of "Googlebot = the foundation of all AI strategy." Source: Cloudflare AI Crawler Traffic Study (Search Engine Journal) / Original Cloudflare Blog

12-Site, 30-Day Server Log Study (March–April 2026)

The table below shows the average daily hit count per site from an overseas study analyzing server logs from 12 sites over 30 days. GPTBot leads in access frequency, followed by ClaudeBot.

12-Site, 30-Day Server Log Study: Average Daily Hit Count per Site (March–April 2026)
Bot	Average Daily Hit Count
GPTBot	4,200
ClaudeBot	1,800
PerplexityBot	980
Google-Extended	540

GPTBot revisits high-traffic pages on average every 2.4 days, while ClaudeBot does so on average every 6.8 days — a significant difference in frequency. This indicates a tendency for frequently updated sites to be picked up by GPTBot first.

Rapid Growth of AI Search Traffic

Data also shows that visits via AI search grew 42.8% from 15.6 billion to 27.4 billion over the one-year period from Q1 2025 to Q1 2026. AI search is already a present and active channel, not a topic for the future.

* All of the above data are from studies targeting overseas sites. Genview is conducting its own verification regarding the actual situation for Japanese-language sites, and will add data to this article as it becomes available.

Measures That Work for All Three Classifications

While the behaviors of each bot differ, there are measures that are commonly effective for all bots. This is the foundational design of GEO strategy. The table below organizes common measures and the reasons they are effective for all bots.

GEO Measures That Work for All Bots
Measure	Why It Works for All Bots
FAQ and definition statements	Both index-type and proxy-access type bots prefer "structures that are easy to answer from"
Heading (H2/H3) + conclusion placement	All bots retrieve information on a section-by-section basis
Primary information with clear citations	Both learning-type and citation-type bots have credibility as an evaluation axis
Semantic HTML	Structures that convey meaning even after JS removal are effective for all bots
E-E-A-T and author information	Especially important for learning-type bots. Index-type also uses it for credibility assessment

The essence of GEO strategy is not to individually optimize "what to do for which bot," but to "create a state that can respond to any bot that arrives." Bot classification knowledge is best used as a map for understanding "why that measure is necessary."

What Genview Will Verify Going Forward

The content written in this article is an organized overview based on official documentation and overseas observational data at this point. Through the development and operation of Genview, we will conduct the following verifications in-house.

Visit frequency of each AI crawler bot (Japanese-language sites, by scale)
Time lag from bot visit to AI citation
Changes in citation rates based on the presence or absence of FAQ content structure (including cases with FAQPage structured data)
Differences in bot behavior before and after semantic HTML implementation
Structural differences in content cited by ClaudeBot, GPTBot, and PerplexityBot

As verification data is compiled, it will be added to this article or published as a separate article. We will continue to publish with the attitude that "the people building a GEO strategy tool understand GEO most deeply."

FAQ

Q: Does GEO strategy need to be done separately from SEO strategy?: A: Not separately — SEO is the foundation. Pages that are not indexed by Googlebot tend to be less likely to be targeted by AI search either. The correct order is to establish an SEO foundation first, then layer GEO strategy on top of it.
Q: Should I block bots with robots.txt?: A: Blocking learning-type bots (GPTBot, ClaudeBot) does not affect citation in AI responses or inclusion in search results. However, blocking index-type bots (OAI-SearchBot, PerplexityBot) will result in exclusion from ChatGPT Search and Perplexity search results. It is recommended to configure settings individually based on your objectives.
Q: How is PerplexityBot different from Google and Bing?: A: Googlebot and Bingbot are bots for traditional search engines that return search result links to users. PerplexityBot collects information for AI response generation and is displayed as a citation URL within the response — which is the key difference. Since it is explicitly presented as a citation source, it tends to have a direct connection to click-through traffic.

References

← 実験・コラムに戻る