How do scraper bots impact hosting needs and the cost of the service?


The future is artificial intelligence (AI) agents crawling the web for answers to questions – so expect hosting costs to skyrocket.

Listen to the various AI companies promising us a new, bright future enabled by agentic AI where everything will be hunky dory, with answers to the world’s most pressing questions at our fingertips. Need to order a pizza or your weekly food shop? AI can do that. Want personalised product recommendations for a relative’s upcoming birthday? That’s doable, too.

But often unsaid among the bold predictions of the future enabled by AI is the impact this new way of interaction will have on the underbelly of the web. Its infrastructure and the organisations that keep things going day-in, day-out, could soon be stretched to breaking point.

ADVERTISEMENT

Scraper bots have already shown a voracious appetite for scraping up content from across the web and ingesting it into AI models as part of their training data. And augmented by the rise of agentic AI, it’s not only in powering the AI models that web hosts need to consider the impact on their activities. It’s also in preparing the outputs.

A and I talking about buying pizza or buying gifts
By Cybernews.

A voracious appetite

The problem is a fast-growing one, compounded by the speed of the AI revolution. Half of all interview traffic is already non-human, with “bad bots” the fastest-growing slice, according to Imperva’s 2024 Bad Bot report. And OpenAI’s GPT bots have already overtaken Google’s Googlebot web crawlers when it comes to the proportion of all web traffic.

AI-related bots linked to OpenAI reportedly account for around one in every eight visits to websites, compared to 8% for Google’s bots. The result is that web traffic is spiking, while not being associated with human visits to those services online.

a robot hand clicking a mouse
By Cybernews.

This is an issue for anyone hosting websites, because for well-trafficked sites popular with AI models, there can be huge spikes in search traffic. OpenAI’s GPTBot is consistently the most popular bot crawling the web among the AI giants, with huge upticks in traffic reported daily by Cloudflare. Given that companies and individuals pay for their web hosting once it exceeds a certain level of acceptable traffic, AI scraper bots can cause a conundrum for companies.

Gintaras Radauskas Marcus Walsh profile Linas Kmieliauskas Niamh Ancell BW
Be the first to know and get our latest stories on Google News
ADVERTISEMENT

Pushing the limits

The arrival of AI crawlers on your website can rapidly push up costs and run roughshod through fair use levels agreed with web providers, causing not only issues of reliability but also increasing the costs that hosts have to pay when footing the bill. The model is an extractive one, taking knowledge for its own benefit – often without redirecting traffic back to the website – while also pushing the cost of servicing that onto hosts themselves.

One website, iFixit.com, faced a $5,000 hosting charge for a single day of web traffic when it was unlucky enough to find itself in the crosshairs of an AI web crawler. Other users have reported increases in bandwidth of 30 terabytes beyond the normal levels they’re used to – something they have to eventually pay up for.

ifixit bill for 5000 dollars, qr code, blue background
By Cybernews.

That means web hosting is getting more expensive, and providers are likely to increase their base levels of charges to accommodate any potential spikes caused by crawlers.

The result is that websites are fighting back, trying to head off and prevent these extra charges before they’re incurred. There are more than 3,750 AI user agents mentioned as disallowed in the robots.txt files of the top 10,000 domains, according to Cloudflare, meaning they can’t inadvertently keep hammering a website until its hosting bill becomes too expensive.

But some suggest that the decision isn’t one that web hosts should have to make. The AI companies should be more judicious in where they send their tools and products to scrape from, so as not to put the burden on those who might least expect it.

ADVERTISEMENT