The internet is forgetting itself and AI can't fix what's already gone


As thousands of people went into the weekend with their flip-flops and beach towels, ready to enjoy the sun during Spring Break, techies and geeks from all over the world could be found online celebrating 404 day on April 4th.

Whether you choose to believe that the Dead Internet Theory is nothing but a conspiracy theory or argue whether the amount of bot traffic will ever surpass that of humans using the net, the one thing we can all agree upon is that the web today is littered with missing articles, vanishing blog posts, lost datasets, broken citations, and links that quietly lead nowhere.

The "404 Not Found" page is but a sign of something far greater than most would think. Our lives online seem to be turning into scenes from Back to the Future, with Marty McFly helplessly viewing his siblings being erased from a family photo. But unlike the movie, Doc Brown won't be arriving in a DeLorean to rewind and restore what has already slipped away.

ADVERTISEMENT

A web that is quietly disappearing

The idea that the Internet is permanent has always been a convenient myth. In reality, it behaves more like a living system, constantly changing, rewriting itself, and in many cases, forgetting.

Pew Research Center studies put a number on something we've all heard anecdotally. Today, 25% of web pages created between 2013 and 2023 can't be accessed. The number jumps to 38% for content created just in 2013.

Why does this matter? We're not looking at "edge" case content hidden away in obscure forums. Articles, resources (government), academic references, and common web pages that were used to help inform decision-making and influence how people understand things are gone from the Internet.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

When you look deeper into the data, it gets worse. More than 23% of news pages have at least one broken link. It's not limited to news; 21% of government website pages also have issues. Even Wikipedia has more than 50% of its pages with at least one dead reference link.

When a link breaks, so does the surrounding information that originally supported its credibility. Whether it be a journalist referencing a now-defunct source, a research study citing lost supporting data, or a legal case referencing lost evidence. All of these examples can lead to an untrustworthy 'chain' of reliance.

ADVERTISEMENT

Many legal scholars have warned that the unreliability of links undermines the long-term use of citations. In fact, studies have shown that over half of all links cited by courts are either dead or broken; as such, it is difficult to determine whether later interpreters will rely on an existing ruling based on missing or altered supporting material.

four brown and gold court gavels with digital text
Court gavels resembling broken links to online legal documents. Aitor Diago/Getty Images

Sometimes, even though the link appears active, it may point to something other than what was referenced at the time. This type of reliability loss is not related to record removal. Rather, this issue occurs when someone edits or updates the reference materials.

In both scenarios, the reliability of the resource is diminished, and, for example, AI trained on these resources will suffer from reduced reliability.

Why the problem keeps growing

There is a structural reason why link rot continues to accelerate. The web was not designed with permanence in mind. Pages are moved, domains expire, companies shut down, or content strategies change. Entire platforms rise and fall within a few years. The problem is that the speed of creation far outstrips that of sluggish maintenance tasks.

Even well-intentioned updates can break things. A site redesign might restructure URLs. A CMS migration might fail to preserve older paths, and even a simple change in file storage can render years of linked PDFs inaccessible.

Elsewhere, social posts disappear, accounts are deleted, and content is taken down or hidden behind paywalls. Research shows that nearly one in five social media posts can become unavailable within months.

The organizations are trying to preserve the Internet

ADVERTISEMENT

Luckily, some are trying to document things before they fade into the ether. The Wayback Machine and Archive Today are the Internet's unsung heroes, taking snapshots so we'll have some record of pages at least until the archived page gets taken down.

There are academic-minded tools like Perma.cc that allow academics and lawyers to create digital archives for cited links to protect the integrity of scholarly and legal work, where link rot can exact a heavy price. Even those tools are limited in scope, though. The digitized world is large, and everything has a price.

This is why 404 Day is more than just a gimmick. It's an excuse to stop and think about something easy to miss. The Internet feels immediate, like it's right at our fingertips, so we assume it always will be there, which is possibly a bigger hallucination than what AI sometimes throws at you.

Do 404 errors still matter in the age of AI?

A bigger question hung over 404 Day in 2026. Do broken links matter as much in an age of AI, when more people are asking digital assistants for answers instead of clicking through websites and search results?

Check if your data has been leaked

Find out if your email, phone number or related personal information might have fallen into the wrong hands.
18,611,353,922
Breached accounts
36,030
Breached websites

They arguably matter more than ever. AI systems are built on top of the open web. They learn from it, summarize it, and increasingly act as a layer between the user and the source. If those sources begin to disappear, the foundation of the web we all take for granted becomes weaker.

The problems might begin when a user no longer sees a missing page, but it still shapes what the model can and cannot reference. All of which has implications for accuracy and trust.

As users move away from search engines and websites, they will no longer encounter 404 errors directly, but they will still receive synthesized responses. This means the problem has moved out of sight. The broken links still exist, but they sit behind the interface, influencing what is included or excluded from the answer, and this is where things become more complicated.

AI platforms constantly grow, and publishers are already seeing changes in how traffic flows. Some are gaining new forms of discovery, while others are losing direct visits. The web is becoming a source layer rather than the primary destination, meaning the durability of content becomes even more important. AI cannot quote, cite, or verify what no longer exists.

ADVERTISEMENT
yellow folder, white paper inside, icon, digital numbers, binary code, light blue numbers,
A folder stands over a digital screen filled with binary code. Aitor Diago/Getty.

What the future of the web will look like

The Internet constantly evolves, and the loss of our digital history could have serious consequences beyond the occasional frustration of encountering the dreaded "404" error.

Scrolling down our newsfeed today involves reading, sharing, and depending on facts to make informed decisions. But once that information has served its purpose, it could become something that can be rewritten, reframed, and potentially used to influence or mislead without us even realizing it.

As we reflect on yet another 404 day, the message is getting harder to ignore. The Internet is not just losing pages; it's losing pieces of its memory and then being used to train ML models.

If we allow it to continue unchecked, we risk unwittingly reshaping the record of what people can see, verify, and ultimately believe.


Unlock more exclusive Cybernews content on YouTube.

ADVERTISEMENT