Website owners report surge in malicious bots impersonating Googlebot, sparking call to check IPs


Administrators are noticing an influx of malicious bot requests impersonating Googlebot and other legitimate crawlers, attempting to slip past website defenses. Google offers IP verification tools to help its crawlers be identified.

Key takeaways:

Chris Siebenmann, a longtime technical blogger at Wandering Thoughts and Unix system administrator at the University of Toronto, reports a surge in malicious bot traffic impersonating Googlebot.

ADVERTISEMENT

“This June, the floodgates opened,” Siebenmann said in a blog post.

“For weeks, I’ve been seeing hundreds of requests a day claiming to be Googlebot (on a few days, thousands of requests). The requests come from a variety of IP addresses at a variety of providers, which I think are mostly or entirely cloud and hosting providers.”

Google’s web crawlers have long enjoyed preferential treatment from websites, since appearing in search results drives traffic. Most websites, even those that aggressively block crawlers, are often configured to never block Googlebot.

Siebenmann noticed the onslaught accidentally, because the expert had his websites configured to block fake bots outside Google’s published IP address ranges. This made it easy to distinguish fake Googlebot requests from legitimate ones.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

Attempts to impersonate Googlebot and other legitimate big crawlers have been observed previously. However, the author notes that they were generally sufficiently rare. Prior to June, “only a few attempts once in a while” landed on Siebenmann’s websites.

Siebenmann suspects a single but large-scale campaign. Many IPs are used simultaneously, each making only a few requests as Googlebot, and when they fail, some will retry with another User-agent string.

IPs span a variety of providers – most requests come from HostRoyale, M247, Latitude.sh, Web2Objects, and AWS.

ADVERTISEMENT

Googlebot is Google's most common bot, crawling and indexing the web before it appears in search results.

Check if your data has been leaked

Find out if your email, phone number or related personal information might have fallen into the wrong hands.
18,611,353,922
Breached accounts
36,030
Breached websites

Google has published resources helping developers verify requests from its crawlers and fetchers. A one-time DNS lookup can be performed using a command-line tool, and for automated matching solutions, IP ranges are available.

“This is useful if you’re concerned that spammers or other troublemakers are accessing your site while claiming to be from Google,” the documentation reads.

The amount of bot HTML traffic has already surpassed human traffic and is quickly becoming a real cost for web server owners rather than a nuisance. Bots consume bandwidth, slow website performance for legitimate users, scrape and steal content for LLM training, and bring no real benefit to website owners.

Cloudflare CEO Matthew Prince warned that bots are taking over the internet faster than expected.

At the same time, some website owners, including Siebenmann, are reconsidering whether Googlebot is worth the exceptions, as Google Search shifts from traditional 10 blue links that drive traffic toward AI-generated answers that send significantly less traffic, and often misquote the sources.

“Google Search rests on a social contract: their bots can crawl our sites, they can index our sites, and they can show excerpts of our sites because, and ‘only because’ they send people to our sites. Our sites, our words, with our design, with our links, with our context and our aesthetics, shared the way we want to share them,” Paul Cantrell, a computer scientist, posted on one of Mastodon servers previously.

Some website owners now feel that Google is breaking their part of this social bargain


ADVERTISEMENT

Unlock more exclusive Cybernews content on YouTube.