
It's been another year where AI has predictably filled our newsfeeds. But if you dare to look beyond the smoke and mirrors, the reality is that without data, there is no AI. Online content is the fossil fuel of AI, and the race continues to capture as much of it as possible by any means necessary.
OpenAI and Google have repeatedly urged the US government to allow AI training on copyrighted material. OpenAI CEO Sam Altman went a step further, stating that all publicly accessible online content is "fair game" for training large language models (LLMs).
The reasoning behind the controversial statement was that restricting access would "impede innovation" and harm "national security" by giving China (PRC) a lead in AI.
Altman translated - if you don't give Open AI free access to steal all copyrighted material by writers, musicians and filmmakers without legal repercussions then we will lose the AI race with China - a communist nation which nonetheless protects the copyright of individuals. pic.twitter.com/xlVjeIOykR
undefined Ewan Morrison (@MrEwanMorrison) March 16, 2025
Elsewhere, Apple and Nvidia were being accused of improperly using YouTube subtitles for AI training. At a high level, copyright was being thrown under the bus to compete with China on AI, with big tech leading the race in a virtual land grab to hoover up as much of the Internet as possible to build the best LLM. But what happens when others try to follow in their footsteps?
Reddit's data deal and the lawsuit that followed
Early this year, Reddit struck a lucrative licensing deal with OpenAI, reportedly worth around 70 million dollars per year. Analysts reached that figure by subtracting Google's $60 million data deal from Reddit's estimated total of $130 million in AI licensing revenue. The partnership gave OpenAI structured, real-time access to Reddit's content for training and integration into ChatGPT.
Reddit gained access to OpenAI's models to build new user and moderator features. The deal turned OpenAI into a formal advertising partner and paved the way for a new business model by monetizing what had previously been considered part of the public commons.
The problems began when smaller AI companies, such as Perplexity, attempted to access similar data without paying for it. Reddit sued them. The recent lawsuit, filed in New York federal court, accuses Perplexity and three intermediaries, Oxylabs, SerpApi, and AWM Proxy, of scraping Reddit content at scale, including by using Google search results to bypass Reddit's protections.
Reddit's legal team even set a trap. They created a hidden test post accessible only to Google's crawler. Within hours, that post appeared in Perplexity's results, suggesting that some proxy access was at play.
Regarding Perplexity -> “Perplexity’s business model is effectively to take Reddit’s content from Google search results,” then feed it into an A.I. model and “call it a new product,” the lawsuit said.
undefined Glenn Gabe (@glenngabe) October 22, 2025
Reddit said it had set a trap for Perplexity by creating a “test post” on… pic.twitter.com/qwikYu2vGn
Perplexity's defense and the meaning of "Fair Play"
Perplexity's response came swiftly and defiantly. In a public statement, the company said it does not train models on Reddit content and has never done so. Instead, it describes its product as an application-layer tool that summarizes information with citations.
"We summarize Reddit discussions, and we cite Reddit threads in answers, just like people share links to posts here all the time," the company wrote.
Perplexity has described Reddit's litigation against Perplexity as a form of corporate bullying and argues that Reddit uses its lawsuit to force Perplexity and other AI companies into licensing agreements.
The team behind the AI-powered answer engine also states that Reddit's summaries are driving traffic to Reddit, not away from it. Perplexity views its leadership as having an obligation to preserve the Internet's open nature to enable transparent sharing, verification, and citation of information.
On the flipside of the argument, Cloudflare is accusing Perplexity of using undeclared crawlers that ignored no-crawl directives and disguised themselves to evade detection. Wired and Forbes have made similar claims. If those reports hold, then Perplexity's actions are not simply about open access. They may represent a deliberate attempt to cross ethical lines while claiming the moral high ground.
The vanishing middle ground
This debate is much bigger than Reddit or Perplexity, prompting questions around what happens when the traditional web economy stops working.
Search engines have traditionally acted as intermediaries between users and content providers by indexing, ranking, and generating traffic to content based on their rankings. This traffic was then monetized through advertising, but AI summarization tools disrupt this model.
How we search and find information is changing, and we now expect immediate answers to everything. The cost of this expectation is that AI companies are meeting our demands by extracting value from user-generated data while keeping users on their own platforms. We are no longer visiting the page to access the original material.
How much damage does undefinedAI Overviewundefined do to a website's traffic?
undefined Ashkan Farhangi (@AshFarhangi) October 23, 2025
• Traffic drop is around −89%
• Desktop: Click-through rate fell from 25% → 2.8%
• Mobile: Click-through rate fell from 24% → 3%
Plus Google Search is facing a undefined-9.9%undefined drop in users
every year
source: UK Gov pic.twitter.com/q3kCaLEHrz
Google's new AI Overview tool has already impacted referral traffic to news organizations, and the same trend is occurring with Perplexity and other tools that aggregate and provide summaries rather than directing the end-user to the originating site.
The link next to an AI summary rarely gets clicked. This change is proving devastating to publishers and independent content creators who need exposure to survive.
There is an argument, often made by AI advocates, that the game has changed. If the web economy no longer supports free access, creators need to adapt. But the stakes are even higher.
Reddit’s stock recently plunged 15% after Google quietly changed a single indexing parameter, cutting ChatGPT’s Reddit citations from 29% to 5% almost overnight. The event exposed how the new internet economy is no longer built on SEO, but on AI visibility and how one algorithmic shift can wipe billions in value without a single change to user behaviour or content.
Reddit just lost 82% of its AI citations overnight.
undefined Jake Ward (@jakezward) October 7, 2025
(and Wall Street noticed)
ChatGPT's Reddit citations dropped from 29.2% to 5.3% in days. And their stock fell 15% over 8 days.
The cause? Google changed a single indexing parameter that limited what LLMs could access in… pic.twitter.com/MkylSNUqNw
As users turn to concise, ad-free summaries, publishers are being forced to innovate their monetization models rather than cling to outdated ad revenue. Defenders of Perplexity compare it to Wikipedia or academic citations that acknowledge sources while making information accessible. But there is an imbalance of power emerging.
Reddit, Perplexity, and Google are not underdogs. They are technology giants with the resources to reshape the digital landscape. When they say "open," what they often mean is "open for us." While users may celebrate ad-free answers, the long-term cost is the erosion of the independent web.
The real question
If history is any guide, the pattern will mirror what happened to music and film during the Napster era. Free access leads to collapse. Collapse leads to consolidation. Consolidation leads to control. Once small websites disappear and everything is paywalled or AI-summarized, the internet risks turning into a handful of corporate- and government-owned curated portals.
The original social contract of the web is breaking down. The balance between access and attribution, openness and ownership, is collapsing under the weight of an AI gold rush.
As users, we are tired of wading through irrelevant sponsored results on Google's first page. We want quick, ad-free answers, and AI companies are stepping up to meet our new expectations. But could all sides of this argument be unwittingly destroying the open Internet we take for granted?
Unlock more exclusive Cybernews content on YouTube.
Your email address will not be published. Required fields are markedmarked