In Graphic Detail: AI licensing deals, protection measures aren’t slowing web scraping

New data is reinforcing a structural shift in how AI systems access publisher content: AI models are increasingly scraping publisher content, regardless of bot-blocking measures or content licensing deals meant to control usage, improve attribution or drive referral traffic.

New research from analytics firms and bot-tracking companies shows AI tools are increasingly crawling publisher sites as inputs for AI-generated summaries and training, while sending back only limited referral traffic.

The result is a traffic imbalance, but also a redefinition of how publishers’ content is used — less as destinations for readers, and more as infrastructure for AI products they can’t control.

“Despite the investment in cybersecurity, publishers are structurally on the back foot. Most sites can only use one cybersecurity provider due to cost and latency. However, AI companies have dozens of scraping tools they can use interchangeably that all evolve quickly to get around those defenses,” said Toshit Panigrahi, co-founder and CEO of TollBit. However, Panigrahi also described these trends as an opportunity, as they show the increasing demand for web content. It just goes back to how publishers can try to extract money from that demand.

Here are four charts revealing the latest in AI web scraping, and what it means for publishers:

Web traffic is changing with more bots, less human traffic

The composition of web traffic is changing as a result of all this AI bot scraping, meaning human visits to websites are decreasing as AI bot traffic increases.

According to TollBit’s latest “State of the Bots” report, AI bot scraping in the second half of 2025 grew 29 percent from Q2 to Q3, and 20 percent from Q3 to Q4 in 2025.

This is likely also due to more humans interacting with AI tools directly (every time someone searches for information through an AI search engine, bots are sent to scrape sites for information to return to the user). TollBit data reflects this: training crawls fell by around 15 percent between Q2 and Q4 2025, while RAG bots rose around 33 percent and AI search indexers rose around 59 percent in that period. (Training bots collect data to train LLMs by scouring the web and downloading content to build large datasets, while retrieval augmented generation (RAG) bots are like AI search bots that fetch real-time information published online.)

This is only exacerbated by the fact that many web scrapers are indistinguishable from human visitors on sites. They can disguise themselves as human site visitors to bypass websites’ bot detection measures.

The ratio of AI visitors to human visitors is increasing, too. In Q2, there was 1 AI bot visit for every 50 human visits — by Q4, this shifted to 1 AI bot visit for every 31 human visits. That represents an increase in the bot-to-human ratio of about 60 percent across this period, according to TollBit’s report.

Over the same period, human traffic to sites increased by 1 percent from Q2 to Q3 but decreased by 5 percent from Q3 to Q4.

AI bots continue to ignore robots.txt

Despite efforts to get AI bots to comply with publishers’ robots.txt permissions — from proposed legislation to updated web mechanisms — 30 percent of total AI bot scrapes in Q4 2025 bypassed explicit robots.txt permissions, according to TollBit.

ChatGPT was the worst offender this past quarter. TollBit’s report found that 42 percent of scrapes by OpenAI’s RAG agent “ChatGPT-User” accessed content from sites that had explicitly blocked it on robots.txt.

Click-through rate declines, even from those with AI licensing deals

The rate of people clicking on a link from an AI summary to go to a publisher’s site declined last year, according to TollBit’s findings.

Click-through rates (CTRs) from AI tools dropped by nearly three times (from 0.8 percent in Q2 to 0.27 percent in Q4 2025).

Many publishers have signed content licensing deals with AI companies — with many seeing this as a solution for some of the rampant AI bot traffic they’re experiencing without compensation. It offers publishers a new revenue stream, while also helping their brands surface in AI tools with proper citations, ideally driving more human traffic to their sites from AI surfaces.

But TollBit found websites with AI content licensing deals saw CTRs drop by over 6.5-times in 2025 — from 8.8 percent in Q1 down to 1.33 percent in Q4. This suggests AI licensing deals are not insulating publishers from low AI referrals.

So what this all means is AI bot traffic is rising, while human traffic declines as bots ignore scraping protections like robots.txt and more people turn to AI search engines. Even with content licensing deals that supposedly generate more referral traffic, publishers are seeing CTRs from AI tools continue to drop.

The cherry on top is that referrals continue to be a tiny fraction of publishers’ referral traffic.

AI tools are sending an average of about 0.12 percent of publishers’ overall referral traffic, according to TollBit — a far cry from the roughly 80 percent of human referral traffic sent to sites from Google. On average, Google delivers over 678 human visitors to a site for each visitor from an AI application, the report noted.

AI referral traffic accounts for 1 percent of all website traffic across 10 industries, according to Conductor’s 20206 AEO/GEO benchmarks report published in January. AI referral traffic is growing about 1 percent month-over-month, on average. ChatGPT is the dominant referrer, responsible for about 87 percent of all AI referral traffic across those industries.

And compared to Google, the amount of traffic coming from ChatGPT is almost laughable. According to aggregate data, sourced from the analytics company Chartbeat for a recent Reuters Institute for the Study of Journalism report, Google delivers 500 times as many referrals as ChatGPT from search.

“Across answer engines, what consistently surfaces is high-quality content created at scale, grounded in real expertise, and structured so AI systems can clearly understand, synthesize, and cite it. Without this type of high-value, technically sound content, the risk isn’t just losing traffic, it’s losing relevance if your content can’t be interpreted or trusted by AI models,” said Lindsay Boyajian Hagan, vp of marketing and co-head of revenue at Conductor.

More ads in AI push publishers’ sites down in search results

Last month, Digiday reported that publishers were growing concerned about the rise of ads in AI surfaces like ChatGPT and Google’s AI Overviews and AI Mode, worried that this could further incentivize Google to keep users on its own platforms and leave less space for publishers’ sites to surface in both AI experiences and traditional organic search results.

A recent analysis by Semrush shows that Google is increasing AI overviews and ads in search results.

By October 2025, Google ads appeared on 25.56 percent of search results that included AI Overviews, up from 5.17 percent in March, representing a 394 percent increase in eight months, according to Semrush, which tracked over 10 million keywords in 2025.

Subscribe to Updates

What's Hot

In Graphic Detail: AI licensing deals, protection measures aren’t slowing web scraping

In Graphic Detail: AI licensing deals, protection measures aren’t slowing web scraping

Web traffic is changing with more bots, less human traffic

AI bots continue to ignore robots.txt

Click-through rate declines, even from those with AI licensing deals

More ads in AI push publishers’ sites down in search results

Related Posts