Anthropic, TikTok gray scraper bots a growing online threat


There’s black, there’s white, and there’s gray. That’s what a new report calls generative AI scraper bots like Anthropic’s ClaudeBot and TikTok’s Bytespider, which are blurring the boundaries of legitimate activity.

Some bots – automated software programs designed for large-scale activity online – are good. These are search engine crawler bots, SEO, or customer service bots.

Others are bad and designed for malicious or harmful online activities such as breaching accounts to steal personal data or commit fraud. But there’s space in between, and this is where you will find the so-called gray bots.

ADVERTISEMENT

In a report, US cybersecurity company Barracuda says that gray bots have now become “a persistent and growing threat” online – and it’s all because of generative AI.

Significant impact on businesses

That’s because gray bots are actually AI scraper bots designed to extract or scrape large volumes of data from websites, often to train generative AI models.

“Gray bots are blurring the boundaries of legitimate activity. They are not overtly malicious, but their approach can be questionable. Some are highly aggressive,” said Rahul Gupta, Barracuda’s senior principal software engineer.

According to the report, GenAI scraper bots like Anthropic’s ClaudeBot and TikTok’s Bytespider now send millions of requests to some web applications in a short period.

What’s more, while bot activity is often thought to come in waves, Barracuda’s data shows that some web applications actually experience a steady stream of scraper bot requests, with an average of 17,000 requests per hour.

Barracuda points out that the impact on businesses is significant. One glaring issue is that the scraping and subsequent use of copyright-protected data by AI training models may be in violation of the owners’ legal rights.

Frequent scraping by bots also increases server load, which can degrade the performance of web applications and affect the user experience. Furthermore, they can also increase application hosting costs due to the increase in cloud CPU use and bandwidth consumption.

ADVERTISEMENT
Stefanie vilius Marcus Walsh profile Paulina Okunyte
Get our latest stories today on Google News

Besides, the presence of AI scraper bots can distort website analytics, making it challenging for organizations to track genuine behavior and make informed business decisions.

“Many web apps rely on tracking user behavior and popular workflows to make data-driven decisions. Generative AI bots can distort these metrics, leading to misleading insights and poor decision-making,” said Gupta.

There are also data privacy risks. Some industries, such as healthcare and finance, may face compliance issues if their proprietary or customer data is scraped.

Finally, users and customers may simply lose trust in a platform if AI-generated content floods it or if their data is used without consent.

TikTok’s scraper is particularly aggressive

The most active GenAI gray bot is ClaudeBot – and by a considerable margin. The bot collects data to train Claude, a GenAI tool intended for widespread everyday use.

However, Anthropic, the firm behind Claude, at least features content on its website explaining how ClaudeBot behaves and how to block scraper activity.

Gray bots are here to stay, so organizations should adapt and factor them into their security strategies.

On the contrary, TikTok’s Bytespider, another AI scraper bot used to train GenAI models, is known to be particularly aggressive and unscrupulous.

ADVERTISEMENT

Be that as it may, gray bots are here to stay, so organizations should adapt and factor them into their security strategies, Barracuda said in the report.

For example, websites can deploy robots.txt, a line of code added to the website that signals to a scraper that it should not take any of that site’s data. The problem, though, is that robots.txt is not legally binding.

What’s more, for this particular line of code to be effective, the specific name of the scraper bot needs to be added. However, less scrupulous gray bots can then engage in evasive maneuvers to keep their names confidential, for example, by changing them regularly.