TDM例外とは｜意味・定義・GEO対策における位置づけ

AIクローラー対応 2026-06-09

著者：喜多陽平 / Kita Yohei　公開日：2026年06月02日

TDM例外（Text and Data Mining Exception）とは、EU著作権指令（CDSMD 2019/790）において定められた「テキスト・データマイニングを著作権侵害なく行える」という例外規定のことです。同指令では、コンテンツ所有者が商業目的のTDMに対してオプトアウト（拒否）できる権利も認められており、AIの学習用クロールへの対応方針を設計する上でGEO対策と密接に関係します。

このページでわかること

TDM例外の意味・法的背景
なぜコンテンツ所有者がAI学習を拒否できるのか
robots.txtとTDMRepの違い
GEO対策とのトレードオフ
よくある誤解

TDM例外とは

TDM（Text and Data Mining）とは、大量のテキスト・データを自動的に収集・分析する処理のことです。AI学習データの収集はTDMの一形態として位置づけられます。

EU著作権指令（CDSMD 2019/790）は、TDMに関して2種類の例外規定を設けています。ひとつは科学研究目的のTDMを原則として許可するもの（Article 3）、もうひとつは商業目的のTDMも許可しつつ、コンテンツ所有者が機械可読な形式でオプトアウトを表明した場合はその意思を尊重するものです（Article 4）。

なぜAI学習を拒否できるのか

コンテンツ所有者がAI学習用クロールを拒否できる根拠は、EU著作権指令Article 4のオプトアウト条項です。コンテンツ所有者が「機械可読な形式」で権利留保を宣言した場合、商業目的のTDMはその意思を尊重しなければなりません。

さらに2024年に施行されたEU AI Act（2024/1689 Article 53）では、汎用AI（GPAI）開発者に対してTDM権利留保の遵守が義務づけられました。EU市場で事業を行うAI企業はコンテンツ所有者のオプトアウト表明を無視できなくなっています。

ただしこれはEUの法制度です。日本を含むEU域外では現時点で同等の法的拘束力はなく、robots.txtによるブロックも技術的慣例にとどまります。

AI学習を拒否する方法

TDMオプトアウトを表明する主な方法は2つあります。それぞれ性質が大きく異なります。

① robots.txt（最も普及・ただし法的拘束力なし）

現在最も広く使われているオプトアウト手段です。GPTBot・ClaudeBot・PerplexityBotなど特定のクローラーをBot名で指定してブロックします。ただしrobots.txtは1994年から続く技術的な慣例であり、法的な強制力はありません。AIクローラーが無視しても法的制裁はなく、2025年の調査ではAIクローラーが3週間でサイト1件あたり平均156件のrobots.txt違反リクエストを送っていたことが報告されています。

② TDMRep（W3C仕様・EU法的拘束力あり）

W3Cが策定したText and Data Mining Reservation Protocol（TDMRep）は、HTTPヘッダー・メタタグ・tdmrep.jsonファイルを通じてTDM権利留保を機械可読な形式で宣言するプロトコルです。EU著作権指令およびEU AI Actに明示的に言及されており、EU域内ではrobots.txtとは異なり法的拘束力を持ちます。

GEO対策とのトレードオフ

AIクローラーをブロックすることはAI学習データへの自社情報の含有を制限できます。しかし同時に、AIに引用・推薦される機会も失います。これがGEO対策とTDMオプトアウトの本質的なトレードオフです。

Genviewでは、この判断を「全部ブロックか全部許可か」ではなく、「何をAIに学習・参照させたいか」という観点から戦略的に設計することを推奨しています。

例えば学習用クロール（GPTBot）はブロックしながら、検索インデックス・引用用クロール（OAI-SearchBot）は許可するという使い分けが可能です。学習データには含めたくないが、AIの検索結果・引用には登場したい——という方針を持つ場合、クローラーの種類ごとに判断を変えることが有効です。

詳しくはAIボットクロールとはをご覧ください。

Genviewによる定義

GEO対策の文脈において、TDM例外とは「EU著作権指令が定めるテキスト・データマイニングの例外規定であり、コンテンツ所有者がAI学習用クロールに対してオプトアウトを表明できる法的根拠」です。robots.txtによる技術的なオプトアウトとTDMRepによる法的拘束力を持つオプトアウトの両方を理解した上で、GEO対策の目的に合わせた判断が必要です。

GEO対策においてクロール拒否は「守り」の手段ですが、AIに引用・推薦されることを目指す「攻め」とはトレードオフの関係にあります。何のためにブロックするのかを明確にした上で設定することが推奨されます。

この定義はGenviewの見解であり、業界の総意ではありません。

よくある誤解

誤解①：「robots.txtでブロックすれば法的に保護される」

robots.txtは技術的な慣例であり、法的な強制力はありません。AIクローラーが無視しても現時点では法的制裁はありません。法的な保護を求める場合はTDMRepによる権利留保の宣言が有効ですが、その法的効力はEU域内に限られます。

誤解②：「AIクローラーをすべてブロックすべき」

AIクローラーには学習用・インデックス構築用・リアルタイム取得用など複数の種類があります。すべてをブロックするとAI引用・推薦の機会も同時に失います。目的に応じて許可・拒否を使い分けることが推奨されます。

誤解③：「TDM例外は日本には関係ない」

TDM例外はEUの法制度ですが、EU市場に向けてコンテンツを公開している企業・EU域内のAI企業と取引している企業には影響します。また日本でも著作権法上のTDMに関する議論は進行中であり、今後の法改正によって状況が変わる可能性があります。

よくある質問

Q: TDMオプトアウトはどこから始めれば良いですか？
A: まず「AIに何を学習・参照させたいか」という方針を決めることが先です。その上で自社サイトのrobots.txtで主要なAIクローラーの扱いを確認します。EU市場に向けたコンテンツを持つ場合はTDMRepの実装も検討が必要です。

Q: GEO対策においてAIクローラーはブロックすべきですか？
A: 一概には言えません。学習データへの含有を制限したい場合はブロックが有効ですが、AIの検索・引用への登場を目指す場合はブロックが逆効果になります。クローラーの種類ごとに目的を整理した上で判断することが推奨されます。

参考文献

European Parliament「Directive (EU) 2019/790 on Copyright in the Digital Single Market（CDSMD）」2019年（TDM例外規定 Article 3・4）
European Parliament「EU AI Act（2024/1689）Article 53」2024年（汎用AI開発者へのTDM権利留保遵守義務）
W3C「Text and Data Mining Reservation Protocol（TDMRep）」（EU法的拘束力を持つTDMオプトアウトプロトコル仕様）
「Beyond Robots.txt: Purpose-Based Scraping Control（2026）」2026年4月（2025年調査：AIクローラーが3週間でサイト1件あたり平均156件のrobots.txt違反リクエスト）

Author: Kita Yohei　Published: June 2, 2026

The TDM Exception (Text and Data Mining Exception) is the exception defined by the EU Copyright Directive (CDSMD 2019/790) that permits text and data mining without copyright infringement. The same directive also grants content owners the right to opt out of commercial TDM — making it closely related to GEO strategy when designing how to handle AI training crawlers.

What You'll Learn on This Page

The meaning and legal background of the TDM Exception
Why content owners can refuse AI training
The difference between robots.txt and TDMRep
The trade-off with GEO strategy
Common misconceptions

What Is the TDM Exception?

TDM (Text and Data Mining) refers to the automated collection and analysis of large volumes of text and data. AI training data collection is positioned as a form of TDM.

The EU Copyright Directive (CDSMD 2019/790) establishes two types of TDM exception. One permits TDM for scientific research as a general rule (Article 3). The other permits commercial TDM while requiring that content owners' rights reservations — when expressed in machine-readable format — be respected (Article 4).

Why Content Owners Can Refuse AI Training

The legal basis for content owners to refuse AI training crawlers is the opt-out clause in Article 4 of the EU Copyright Directive. When content owners declare a rights reservation in "machine-readable format," commercial TDM must respect that intent.

Additionally, the EU AI Act (2024/1689, Article 53), which came into force in 2024, obliges general-purpose AI (GPAI) providers to comply with TDM rights reservations. AI companies operating in EU markets can no longer ignore opt-out declarations from content owners.

However, this is EU law. Outside the EU — including Japan — equivalent legal force does not currently exist, and robots.txt blocking remains a technical convention without legal enforcement.

How to Refuse AI Training

There are two main approaches to declaring a TDM opt-out. They differ significantly in nature.

① robots.txt (most widely used — but no legal force)

The most commonly used opt-out method. Specific crawlers — GPTBot, ClaudeBot, PerplexityBot — are blocked by name. However, robots.txt is a technical convention dating to 1994 with no legal enforcement mechanism. AI crawlers can ignore it with zero legal consequence. A 2025 study found that AI crawlers sent an average of 156 robots.txt violation requests per site over a three-week period.

② TDMRep (W3C specification — legally binding in the EU)

The Text and Data Mining Reservation Protocol (TDMRep), developed by W3C, allows content owners to declare TDM rights reservations through HTTP headers, meta tags, and tdmrep.json files in machine-readable format. It is explicitly referenced in the EU Copyright Directive and EU AI Act, giving it legal force within the EU that robots.txt lacks.

The Trade-Off with GEO Strategy

Blocking AI crawlers can limit your content's inclusion in AI training data. But it simultaneously limits your opportunities to be cited and recommended by AI. This is the fundamental trade-off between TDM opt-out and GEO strategy.

Genview recommends approaching this not as "block everything or allow everything," but as a strategic decision: "what do you want AI to learn from and reference?"

For example, it's possible to block training crawlers (GPTBot) while allowing search index and citation crawlers (OAI-SearchBot). If your goal is to keep your content out of training data while still appearing in AI search results and citations, differentiating by crawler type is an effective approach.

See What Is AI Bot Crawling? for more detail.

Genview's Definition

In the context of GEO strategy, the TDM Exception is defined as "the text and data mining exception defined by EU copyright law, which establishes the legal basis for content owners to opt out of AI training crawlers." Understanding both the technical opt-out (robots.txt) and the legally binding opt-out (TDMRep) is necessary to make decisions that align with your GEO strategy goals.

In GEO strategy, crawler blocking is a "defensive" measure — but it trades off against the "offensive" goal of being cited and recommended by AI. Clarifying what you're blocking for is essential before making any configuration decisions.

This definition reflects Genview's perspective and is not an industry consensus.

Related Terms

AI Bot Crawling: The process by which AI crawler bots retrieve web content. The primary subject of TDM exceptions and opt-outs.
llms.txt: A supplementary file that tells AI what information it should understand. Unlike TDMRep, it has an opt-in rather than opt-out nature.
RAG (Retrieval-Augmented Generation): The mechanism by which AI retrieves real-time web information to generate responses. Training crawlers and citation crawlers operate on different layers from RAG.
Grounding: The mechanism by which AI anchors responses to specific sources. Allowing crawls makes it more likely your content becomes a grounding source.

Common Misconceptions

Misconception 1: "Blocking with robots.txt provides legal protection"

robots.txt is a technical convention with no legal enforcement mechanism. AI crawlers can currently ignore it with no legal consequence. For legal protection, TDMRep rights reservation declaration is more effective — though its legal force is limited to the EU.

Misconception 2: "All AI crawlers should be blocked"

AI crawlers come in several types — training, index-building, and real-time retrieval. Blocking all of them simultaneously eliminates your AI citation and recommendation opportunities. Differentiating by purpose is recommended.

Misconception 3: "The TDM Exception is irrelevant outside the EU"

While the TDM Exception is EU law, it affects companies publishing content for EU audiences or working with EU-based AI companies. In Japan and elsewhere, TDM-related copyright discussions are ongoing, and the legal landscape may change with future regulatory developments.

FAQ

Q: Where do I start with TDM opt-out?
A: Start by deciding what you want AI to learn from and reference. Then review your robots.txt to check how major AI crawlers are being handled. If you publish content for EU audiences, implementing TDMRep is also worth considering.

Q: Should AI crawlers be blocked for GEO strategy?
A: It depends. Blocking is effective if you want to limit inclusion in training data. But if your goal is to appear in AI search and citations, blocking can be counterproductive. Organizing your goals by crawler type and deciding accordingly is the recommended approach.

References

European Parliament, "Directive (EU) 2019/790 on Copyright in the Digital Single Market (CDSMD)," 2019 (TDM exception Articles 3 and 4)
European Parliament, "EU AI Act (2024/1689), Article 53," 2024 (TDM rights reservation compliance obligation for GPAI providers)
W3C, "Text and Data Mining Reservation Protocol (TDMRep)" (EU legally binding TDM opt-out protocol specification)
"Beyond Robots.txt: Purpose-Based Scraping Control (2026)," April 2026 (2025 study: AI crawlers sent average 156 robots.txt violation requests per site over three weeks)

← GEO用語集に戻る