Reddit sues Perplexity and others for allegedly stealing data via Google

Social media platform Reddit has officially accused prominent AI startup Perplexity and three other companies of stealing its data by scraping Google search results in which Reddit content appeared.
In the lawsuit filed in the US District Court for the Southern District of New York, Reddit claims that the data-scraping companies circumvented its data protection measures in order to steal data.
According to the complaint, Perplexity “desperately needs” this data to power its AI-based “answer engine” system.
The other three companies – SerpApi, a Lithuanian startup Oxylabs, and a Russian company AWMProxy – allegedly scraped Google search results in which Reddit content appeared and sold data to AI companies like OpenAI and Meta.
“Reddit brings this action to stop the industrial-scale, unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit,” the complaint reads.
Reddit has built up control barriers
Data scraping, which refers to a technique in which a computer program extracts data from output generated from another program, has indeed become a big problem for large social media sites such Reddit – even though it’s legal.
AI companies are hungry for quality data in order to train their large language models in authentic human discourse. Reddit, used by more than 416 million people every week, is a “top-cited source” for most of them.
But the platform wants to be compensated properly and has struck deals with AI companies including OpenAI and Google, permitting them to access Reddit data legally.
Generally, though, Reddit doesn’t allow “unauthorized commercialization of its content absent an express agreement with guardrails in place,” the company explained in the complaint.
As the content owner, Reddit has also locked down the platform with technological-control barriers to prevent unscrupulous scrapers from accessing and stealing data directly from its website.
But the defendants, according to the complaint, circumvented Reddit’s data protection measures by scraping the data from billions of Google Search results without permission.
“Recognizing that Reddit denies scrapers like them access to its site, Defendants SerpApi, Oxylabs, and AWMProxy scrape the data from Google’s search results instead,” says the complaint.
“They do so by masking their identities, hiding their locations, and disguising their web scrapers as regular people (among other techniques) to circumvent or bypass the security restrictions meant to stop them.”
The companies, says Reddit, packaged that “stolen” data and resold it to others, which used it to train their AI systems. Perplexity, which does not have a license to use Reddit content, was allegedly one of the buyers.
Defendants say they did nothing wrong
In the complaint, Reddit calls the defendants “data-scraping service providers” who specialize in producing tools designed to circumvent digital defenses and scrape others’ content.
According to the plaintiff, these tools are aimed at bypassing two levels of security: First, evading Reddit’s own anti-scraping measures, and second, circumventing Google’s controls and scraping Reddit content directly from Google’s search engine results.
“In a very real sense, these Defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead,” said Reddit.
But the companies say they haven’t even received the lawsuit so far. In a statement, Perplexity said: “Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”
As per Reuters, Oxylabs also stated that it was “shocked and disappointed by this news, as Reddit has made no attempt to speak with us directly,” and that it would also defend itself against the allegations.
These Defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.
Reddit.
This is what SerpApi also intends to do. The data-scraping firm told Cybernews in a statement that the company had not received any communication from Reddit but will fight any allegations in court because “the crawling and parsing of public data is protected by the First Amendment of the United States Constitution.”
“In the eight years we've been in business, SerpApi has always operated on the right side of the law. We value freedom of speech tremendously. We will continue to defend our rights to the fullest extent,” said the company.
Unlock more exclusive Cybernews content on YouTube.