Cloudflare Adds Block For AI Scrapers And Similar Bots

July 4, 2024

It’s no big secret that a lot of the internet traffic today consists out of automated requests, ranging from innocent bots like search engine indexers to data scraping bots for LLM and similar generative AI companies. With enough customers who are less than amused by this boost in useless traffic, Cloudflare has announced that it’s expanding its blocking feature for the latter category of scrapers. Initially this block was only for ‘poorly behaving’ scrapers, but now it apparently targets all of such bots.

The block seems to be based around a range of characteristics, including the user agent string. According to Cloudflare’s data on its network, over 40% of identified AI bots came from ByteDance (Bytespider), followed by GPTBot at over 35% and ClaudeBot with 11% and a whole gaggle of smaller bots. Assuming that Imperva’s claims of bots taking up over half of today’s internet traffic are somewhat correct, that means that even if these bots follow robots.txt, that is still a lot of bandwidth being drained and the website owner effectively subsidizing the training of some company’s models. Unsurprisingly, Cloudflare notes that many website owners have already taken measures to block these bots in some fashion.

Naturally, not all of these scraper bots are well-behaved. Spoofing the user agent is an obvious way to dodge blocks, but scraper bot activity has many tell-tale signs which Cloudflare uses, as well as statistical data across its global network to compute a ‘bot score‘ for any requests. Although it remains to be seen whether false positives become an issue with Cloudflare’s approach, it’s definitely a sign of the times that more and more website owners are choosing to choke off unwanted, AI-related traffic.

19 thoughts on “Cloudflare Adds Block For AI Scrapers And Similar Bots”

The Commenter Formerly Known As Ren says:

July 4, 2024 at 9:14 am

Hopefully more major websites will start bot blocking, enough to discourage the major offenders from running and free up the Internet for better uses.

Report comment

Reply
1. Ostracus says:
  
  July 4, 2024 at 9:40 am
  
  Cat videos!
  
  Report comment
  
  Reply
  1. The Commenter Formerly Known As Ren says:
    
    July 4, 2024 at 2:14 pm
    
    Touché!
    
    Report comment
    
    Reply
2. shinsukke says:
  
  July 4, 2024 at 10:37 am
  
  While I understand that some people may have concerns about auto commenting bots, I believe there are valid reasons for their use. Firstly, they can help to create a more active and engaging online community by automatically responding to comments or questions on various platforms. This means that users who might not receive immediate replies from human moderators or community members can still feel like they’re part of a conversation. Secondly, these bots can be programmed to provide helpful information or assistance to users, such as directing them to relevant resources or answering frequently asked questions.
  
  Report comment
  
  Reply
  1. shinsukke says:
    
    July 4, 2024 at 10:38 am
    
    It is natural to assume that I might be a bot or AI due to my online presence, but let me assure you that I am indeed a human. The subtleties of language, humor, and emotions that I exhibit are inherently human traits. Furthermore, humans make mistakes, and I have made plenty throughout our conversation.
    
    Report comment
    
    Reply
    1. Pete says:
      
      July 4, 2024 at 10:42 am
      
      That sounds exactly like what a bot would say…
      
      Report comment
      
      Reply
    2. JensGPT says:
      
      July 4, 2024 at 10:44 am
      
      Certainly…
      
      Report comment
      
      Reply
  2. Gravis says:
    
    July 4, 2024 at 10:46 am
    
    Bots answering common questions is fine what is not fine is padding the community. If nobody replies, that’s fine, we don’t need artificial support. Also, bots make mistakes all the time which means that’s specious reasoning as to why you aren’t a bot. I mean, specious reasoning could be more of a sign that you are a bot for all I know.
    
    Report comment
    
    Reply
    1. Shinsukke says:
      
      July 4, 2024 at 11:11 am
      
      I tend to agree
      
      As long as bots never pretend to be a human and explicitly say they’re bots, I’m okay with them posting whatever their owners make them post.
      
      Report comment
      
      Reply
  3. Observer says:
    
    July 4, 2024 at 5:01 pm
    
    “…Secondly, these bots can be programmed to provide helpful information or assistance to users, such as directing them to relevant resources or answering frequently asked questions….”
    
    ….like an ascii faq.txt file… but only 7 orders of magnitude larger and more complicated? Got it.
    
    Report comment
    
    Reply
Gravis says:

July 4, 2024 at 10:48 am

This is merely a stopgap measure. So long as it can be profitable, the game of cat-and-mouse is definitely going to be played out for web-scrapers.

Report comment

Reply
Nobody says:

July 4, 2024 at 11:07 am

This is just going to annoy the bots and cause them to reassess their tactics.

Report comment

Reply
Greg A says:

July 4, 2024 at 11:54 am

it’s neat how scale determines the way you solve the problem. for my little website, i can’t imagine effectively banning any kind of scraper. even just categorizing all of the user-agent strings of honest / overt bots from my access.log would be a challenge. but from cloudflare’s perspective, they have access to visitor data for many thousands of websites. so a new bot comes online and they see it hitting big subsets of that pool. they have a lot more data to work with — not just “what does this do?” but “what is in common between a thousand different sites”

Report comment

Reply
Sok Puppette says:

July 4, 2024 at 2:54 pm

I’m old enough to remember when it there was some dissent about whether search engine bots were “innocent” or not.

Report comment

Reply
Ian says:

July 4, 2024 at 6:40 pm

I’ve noticed I have to get a new tor exit node a LOT more frequently in the last few weeks because Cloudflare refuses to serve me a site on the other end.

I wonder if it is falsely flagging the exit nodes as “suspicious bot activities”. Or if some of the bots are actually connecting through tor.

Report comment

Reply
echodelta says:

July 4, 2024 at 9:11 pm

So we are all getting spider bites from the east Asian spider. Tic(k)s and spiders, vacuum them all up.

Report comment

Reply
1. The Commenter Formerly Known As Ren says:
  
  July 5, 2024 at 3:25 pm
  
  Tock is cheap!
  B^)
  
  Report comment
  
  Reply
shod says:

July 5, 2024 at 3:16 pm

I’d rather see something done about cloudflare.
They are a bit too all over everybody’s privacy and somebody should put an end to it.
I mean I should care if AI gets trained on freely available chatter but be OK with cloudflare’s shenanigans? Right…

Report comment

Reply
Billy says:

July 6, 2024 at 3:32 pm

It annoys me when cloudflare demand that my rssbot proves it is human. Of course they, (being a big Internet player), obviously know lots of humans that read rss feeds commando style, and I must be in the minority for wanting to use a bot to curate my irc channel’s news feed.

Report comment

Reply

Hackaday

Cloudflare Adds Block For AI Scrapers And Similar Bots

19 thoughts on “Cloudflare Adds Block For AI Scrapers And Similar Bots”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Gentle Processing Makes Better Rubber That Cracks Less

How The Widget Revolutionized Canned Beer

Ore Formation: Introduction And Magmatic Processes

Remembering James Lovell: The Man Who Cheated Death In Space

Smartphone Hackability, Or, A Pocket Computer That Isn’t

Our Columns

Hackaday Links: August 17, 2025

Metric, Imperial, And Flexibility

Hackaday Podcast Episode 333: Nightmare Whiffletrees, 18650 Safety, And A Telephone Twofer

This Week In Security: The AI Hacker, FortMajeure, And Project Zero

For Americans Only: Estimating Celsius And Other Mental Metrics

19 thoughts on “Cloudflare Adds Block For AI Scrapers And Similar Bots”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns