Your Site Has a Say in the AI Era—Are You Using It?

ai crawlers

If your traffic from Google has dipped—or your high-performing content isn’t pulling in clicks the way it used to—you’re not imagining things.

The rise of AI-generated answers in search results has changed the flow of attention. Chatbots and AI-powered search features (like Google’s AI Overviews) are pulling more and more of their responses directly from web content—without always sending users back to the source.

And while it’s tempting to blame the algorithm (or assume there’s nothing you can do), the truth is, most websites haven’t taken even the most basic step to manage how their content is being accessed.

Let’s talk about what that step is, why it matters now, and how to approach it without tanking your discoverability in the process.

First, what is robots.txt—and why should you care?

The robots.txt file is one of the oldest mechanisms on the web for telling search engines and bots how to behave on your site. Think of it as a posted house rule: it doesn’t physically stop anyone, but most legitimate crawlers—including Googlebot and GPTBot—will respect what you put there.

If your site doesn’t have one, you’re effectively saying: “Come on in. Take what you want.”

If your site does have one, you get to say: “Here’s what’s okay—and here’s what’s not.”

Historically, this file has been used to control SEO indexing: for example, preventing Google from crawling duplicate pages or private content. But in the new AI era, it’s taken on another role: signaling which bots you’ll allow to crawl your site for training AI models, and which you won’t.

Why this matters now

Recent data from Cloudflare shows that fewer than 40% of the top 10,000 websites even have a robots.txt file. And of those that do, only a tiny fraction have specific rules addressing AI bots like GPTBot, ClaudeBot, or Meta-ExternalAgent.

That means the vast majority of websites are leaving their content completely open to large-scale AI scraping. This has two consequences:

  1. Your content becomes free fuel for AI responses.
    The insights, examples, and expertise you publish—especially if it’s detailed or practical—can show up in chatbot responses with no credit, no link, and no traffic back to your site.
  2. You get none of the upside.
    Unlike traditional SEO, where good content earned you visibility, search traffic, and engagement, these new systems operate more like a one-way pipe. They learn from your content, then answer without routing users back to you.

Some crawlers are particularly aggressive. According to Cloudflare’s data, GPTBot now accounts for nearly 30% of AI bot activity, and some bots crawl tens of thousands of pages for every single human they refer back to the source.

That’s a lopsided exchange, especially for brands that invest in original content to build awareness or generate leads.

What kinds of bots are we talking about?

Let’s break it down. These are some of the most active AI crawlers today:

  • GPTBot (OpenAI) – Used to train ChatGPT. Highly active.
  • ClaudeBot (Anthropic) – Used for Claude’s LLM models.
  • Meta-ExternalAgent – Crawls for Meta’s AI features.
  • PerplexityBot – Crawls content for Perplexity’s AI search.
  • Google-Extended – Signals content usage for Google’s AI training.

There are also legacy bots like Googlebot (for search indexing), Bingbot, and ByteDance’s Bytespider—some of which serve multiple purposes, like both SEO and AI model training.

This matters because disallowing one bot (like GPTBot) won’t stop another (like ClaudeBot) from crawling your content. And some, like Google, use multiple user agents for different purposes—so being precise matters.

What brands can (and should) do

Let’s be clear: this isn’t about locking down your site entirely. It’s about setting boundaries and understanding the tradeoffs.

Here are practical steps to start protecting your content—while keeping it discoverable to humans and search engines.

1. Check if your site has a robots.txt file.
Go to yourdomain.com/robots.txt. If you get a 404, you don’t have one.

2. If it exists, review what it includes.
Is it only blocking duplicate or private content? Does it mention bots like GPTBot or ClaudeBot? If not, it may be time to update it.

3. Decide what bots you’re okay with.

  • Want to stay indexed on Google Search? Keep Googlebot allowed.
  • Want to opt out of Google’s AI training? Disallow Google-Extended.
  • Want to block OpenAI or Meta from training on your content? Disallow GPTBot and Meta-ExternalAgent.

4. Use tools to manage and monitor it.
If you’re on Cloudflare, they now offer managed robots.txt—which can automatically add Disallow directives for top AI bots while keeping your site SEO-friendly.

You can also opt to block AI bots only on monetized pages (e.g. blog posts with ads), which is useful if your site has a mix of open and restricted content.

But won’t this hurt SEO?

Not if you do it carefully. Blocking AI training bots doesn’t block search indexing—unless you mistakenly disallow the wrong user agents. This is why it’s important to understand the difference between:

  • Googlebot (for search)
  • Google-Extended (for AI training)

Disallowing the second won’t affect your organic rankings, but it will keep your content out of AI-generated answers—at least from Google.

The bigger picture: A more balanced web

We’re not anti-AI. But we are pro-authority and pro-consent. If you’re investing in content, you should have a say in how that content is used—and whether you’re getting value in return.

The AI era doesn’t mean giving up control. It means thinking differently about what visibility really looks like—and how to protect the signals that matter most to your brand.

You don’t need to outsmart the system. You just need to engage with it.

If you’re publishing original content and want to stay visible—to humans and to AI—we can help you protect and position it. From setting your site up to signal the right things, to building content AI wants to cite (and people want to share), Top Fox works with brands who play the long game.Let’s make sure your best thinking actually gets seen. Reach out to Top Fox Marketing.

PS: Want the practical takeaway?

If you’re looking for clear guidance on what to allow, what to block, and how to set smart guardrails, read Part 2: AI Crawlers Are Reading Your Site. Should You Let Them In? It breaks this into a straightforward, B2B-friendly playbook.

Share the Post: