Cloudflare vs. the Crawlers: Why LLM Access Just Got a Lot More Complicated

Over the last six weeks, we’ve seen generative AI dragged into the courtroom on both sides of the Atlantic: from Bartz v. Anthropic and the subsequent class action to Kadrey v. Meta in the U.S., while monitoring the evolving UK case between Getty Images and Stability AI. Each of these cases has chipped away at the industry’s long-held assumption that public data is free for the taking, that copyright is a gray area, and that training data is no one’s problem.

Now, Cloudflare is finishing what the courts started.

On July 1, Cloudflare rolled out a major policy shift: AI bots are now blocked by default across its network, which serves roughly 20% of all websites globally. Crawlers from companies like OpenAI, Google, Anthropic, Perplexity, and others must now explicitly identify themselves, declare their purpose (training, inference, search), and obtain permission before accessing content. Publishers no longer need to opt out—by default, they’re opted out.

This changes everything.

The End of “Public Means Free”

Since the early days of LLM development, AI companies have justified their data practices by leaning hard on fair use and the “publicly available” defense: if it’s online and not behind a paywall, it’s fair game.

Cloudflare’s update dismantles that logic at scale. It elevates robots.txt and llms.txt from symbolic gestures to enforceable access controls, and not just at the site level but at the infrastructure level. It also makes clear that infrastructure providers can, and will, draw lines where regulators haven’t.

This is an inflection point in the generative web. And it’s one that puts the power back in the hands of publishers.

Origins of the Change: Perplexity Got Caught—Now Everyone Pays the Price

Cloudflare’s new AI bot policy didn’t come out of nowhere. It followed investigative reporting from Wired and Cloudflare’s own research that revealed Perplexity had been deploying stealth crawlers that bypassed site-level protections like robots.txt, masked their identity as browser traffic, and rotated IP addresses to evade detection. When caught, Perplexity defended their actions while others also came to their defense. But Cloudflare didn’t budge. Matthew Prince, co-founder and CEO of Cloudflare, compared their tactics to “North Korean hackers” and took definitive action.

Rather than wait for regulators or litigation to catch up, Cloudflare removed Perplexity from its verified bot registry and made a broader policy decision: AI crawlers must now be transparent, declarative, and explicitly permissioned.

In short: one company overstepped, and the entire system was forced to adjust.

This isn’t the end of generative web access but it is a turning point. Cloudflare’s move reflects a broader sentiment we’re seeing in courts and policy circles: if AI companies want access, they need to ask first and they must document how that access is being used.

If the past year was defined by AI’s euphoric scaling phase, this summer marks the beginning of its containment era where companies, platforms, and yes, even CDN providers, start building defenses against unauthorized LLM scraping.

From Scrape to License: A New Business Model Emerges

In tandem with its crawler policy change, Cloudflare also launched a Pay‑Per‑Crawl licensing marketplace, allowing publishers to monetize access to their data. Early partners include The Associated Press, Condé Nast, Stack Overflow, and Reddit, all of whom have signaled that they’re done giving their content away for free. Yet keen observers will recognize that most of these publications already have strategic licensing agreements with OpenAI, so what changes for ChatGPT? In fact, Cloudflare’s move doesn’t duplicate those deals—it enforces them.

Where those partnerships once relied on contract terms and mutual trust, Cloudflare now adds infrastructure-level enforcement. If an AI crawler oversteps, publishers can block access automatically and at scale. This dramatically reduces the need for lawsuits or prolonged negotiations. It also gives publishers something they haven’t had in the AI boom until now: operational leverage.

That leverage supports a deeper shift already underway. With the launch of this marketplace, the training economy is now moving from permissionless scraping to structured licensing. Instead of asking what content is “freely available,” we’ll soon be asking: What’s authorized, who owns it, and who paid for it?

Strategic Implications for Brands and Publishers

If you’re still approaching LLM access the way you approached SEO five years ago, it’s time to recalibrate. We’ve entered a new visibility economy that’s shaped by infrastructure-level enforcement and litigation-level liability.

If you’re a publisher or brand, there are three actions that need to be taken immediately:

Double-check your robots.txt and llms.txt files. Are they aligned with your current AI strategy?
Decide whether you want AI visibility—or protection. You can’t have both.
Explore licensing relationships proactively. Don’t wait for scraping to happen before asserting rights.

Final Thought

In courtrooms, copyright owners are winning. On infrastructure, access is closing. And across the generative ecosystem, we’re seeing a transition from freeform experimentation to regulated, permissioned intelligence.

Cloudflare didn’t just issue a network-level policy update. It created a new precedent for how content should and should not be used in training. It reframed the default from “yes unless told no” to “no unless granted yes”. That’s not just technical enforcement. That’s governance.

And in a web where models increasingly shape what we see, read, and believe, that governance couldn’t come soon enough.

Talk to Our Experts

The rules of AI access are changing fast, and the brands that act early will set the standard. Whether you need to audit your current permissions, build a licensing strategy, or align your content for AI visibility, our team can help. Connect with our experts today.

Optimize Your Brand’s Visibility in AI Search

Millions turn to AI platforms daily to discover and decide. Make sure they find your brand.

Explore AI Optimization

Cloudflare vs. the Crawlers: Why LLM Access Just Got a Lot More Complicated

The End of “Public Means Free”

Origins of the Change: Perplexity Got Caught—Now Everyone Pays the Price

From Scrape to License: A New Business Model Emerges

Strategic Implications for Brands and Publishers

Final Thought

Talk to Our Experts

Optimize Your Brand’s Visibility in AI Search

More from Avenue Z

Recommended reads

Connect With Us

Subscribe to Our Newsletter

Categories

Channels

Contributors

About

Cloudflare vs. the Crawlers: Why LLM Access Just Got a Lot More Complicated

The End of “Public Means Free”

Origins of the Change: Perplexity Got Caught—Now Everyone Pays the Price

From Scrape to License: A New Business Model Emerges

Strategic Implications for Brands and Publishers

Final Thought

Talk to Our Experts

Optimize Your Brand’s Visibility in AI Search

More from Avenue Z

Crypto in 401(k)s: Innovation or Instability?

California’s AI Legislation: Regulation Meets Reality

AI in Q3: What it Means for Marketers

The Great Web Traffic Debate: Google’s Latest Lawsuit and What it Means for Marketing Pros

Recommended reads

Connect With Us