World’s most-visited websites put visitors at risk by exposing leftover files

The CyberNews Investigations team scanned the internet’s 35,000 most-visited websites from the Alexa top one million list for common leftover files.
82 of the top 35,000 websites had multiple types of leftover files exposed on their servers.
It is estimated that websites with exposed dangerous leftover files get around 17 million monthly visits.
Exposed leftover files can potentially result in malware injections, stolen user data, and full website takeovers by threat actors.

CyberNews security researchers found exposed leftover files on dozens of the world’s most-visited websites. Leaving these files publicly accessible is incredibly dangerous and can lead to full website takeovers by threat actors.

For web administrators, keeping an eye on every single file might sometimes seem like a real hassle. However, the most severe server breaches are often caused by the least noticeable security holes in their websites.

According to CyberNews researcher Martynas Vareikis, leftover files are one type of such inconspicuous openings to threat actors, even for the biggest websites on the internet.

From overlooked database history and DS_STORE files to GIT repositories, even a single exposed item can open millions, if not billions, of visitors to a plethora of potential dangers, including data breaches, phishing attacks, identity theft, or worse
Martynas Vareikis, CyberNews

In light of this, we at CyberNews decided to analyze the 35,000 top websites and see how many were storing their files insecurely.

What our Investigations team discovered was eye-opening: 82 top Alexa-ranked websites had leftover files exposed to anyone and accessible without authorization, potentially putting countless numbers of visitors at risk of cyberattacks.

The reach of the affected sites is massive – CyberNews estimates that they get around 17 million total visits a month. The list includes sites from all over the world, including domains from the US, Russia, Japan, China, Germany, France, Korea, the Netherlands, and more. They also were linked to by government and education sites.

In order to conduct our investigation, CyberNews researchers scanned the 35,000 most-visited websites on the internet for exposed DS_STORE, ENV, and MYSQL_HISTORY files, as well as GIT repositories. They then analyzed the output and removed any false positives. For website popularity metrics, we used the Alexa top one million list.

Here’s what we found.

What we discovered

Desktop service files

Desktop Services Store (DS_STORE) files top the list with over 81 exposed instances overall, beating the other types by a large margin.

DS_STORE file was more prevalent in bigger websites – on average, such websites were the biggest ones, potentially affecting the biggest number of visitors.

By analyzing exposed DS_STORE files, malicious actors can gather information about the contents of folders stored in web servers, which can lead them to unprotected files containing sensitive data and access credentials.

GIT directories

Exposed GIT directories come second, with 24 instances left out in the open on the world’s top websites.

GIT repositories are hidden folders inside websites’ working GIT directories (workspaces) and contain multiple files that are required for a web application’s GIT client to function properly. They can often include highly sensitive information, such as configuration files, object databases, and cached files.

According to Zur Ulianizky, Head of Security Research at XM Cyber, attackers who manage to gain access to GIT repositories are easily able to extract sensitive information stored inside them. “Extracting the full repository is pretty easy since there are a lot of open-source tools that achieve exactly this goal, which could lead to further attacks,” cautions Ulianizky.

“In addition, if the attacker is lucky, the repository represents the application. Using the code they found, the attacker might be able to find common web application vulnerabilities or business logic issues within the application.”

Ashu Savani, the co-founder of cybersecurity training firm TryHackMe, adds that GIT directories will also have committed code histories that reveal a web application’s source code. Not only that, exposed GIT directories can lead to threat actors getting their hands on credentials like keys for cloud environments, database connection strings, or third-party API keys.

“Attackers can use these credentials to either get a foothold into the organization's environments or access sensitive data. With GIT directories, can proactively hunt for security issues within the source code and use this to exploit the application,” notes Savani.

“Depending on the motivation of the threat actor, they could also release this information to the public, which could cause reputational and financial damage to the organization.”

MySQL_HISTORY and ENV files

MYSQL_HISTORY and ENV files round out the top four, with four exposed instances of each file type discovered during the investigation.

MYSQL_HISTORY files are used by certain MySQL database clients to log and keep track of executed SQL queries and serve as reference for web administrators in case something goes wrong. These files are essential for database management and often contain highly sensitive information, such as database names, table and column names, as well as account passwords.

Similarly, ENV files are used by many web applications to store access credentials and are some of the worst files to have exposed for any website.

Leaving such files publicly accessible can be catastrophic for a company, says Stephen Curry, CEO of secure E-Sign service provider CocoSign, adding that “unsecured data can put customers' personal information at risk.”

“If this data is breached, any customer that has their records hacked will start receiving spam mail or undesirable phone calls,” Curry explains.

Needless to say, both MYSQL_HISTORY and ENV files are highly valuable targets for threat actors, as their public exposure can lead to personal data theft, malware or ransomware injections, and even full website takeovers.

Protecting against unauthorized access via leftover files

According to Sam Jadali, the founder of cyber defense and threat detection service provider Melurna, widespread malicious botnets have enabled bad actors to scan domains for frequently used file names that contain sensitive information.

“The ubiquitous and pervasive nature of these bots makes it increasingly easy to compromise servers. Web and app developers may forget to delete backups, application environment or MySQL history files,” Jadali told CyberNews.

“When left in publicly accessible locations, bad actors use the data to discover credentials, map server infrastructure, perform lateral attacks, inject malware, or infect servers with ransomware. Using today’s advanced technology, hackers can scan the global internet IPv4 range in less than 5 minutes.”
Sam Jadali, founder of Melurna

When it comes to web infrastructure, Jadali says that prevention is the best preparation. He recommends filtering malicious bots by using properly configured web application firewalls.

“However, the future of safeguarding our data requires the sustained collaboration of developers and hosting providers. Web users and developers must practice proper security hygiene, and hosting providers must stay abreast of the latest attack vectors,” concludes Jadali.

Meanwhile, Zur Ulianizky recommends the following security best practices to web server administrators:

Make sure that the team is aware of secure development. In addition, it's important that the developers are aware of OWASP TOP 10 vulnerabilities and how to prevent them during the development.
Any input by the user must be validated. In addition, make sure that the output is sanitized.
Exceptions must be handled. Make sure that every exception is handled properly. Attackers are using exceptions in order to gather the information that can aid in additional attacks.
Use browser security headers, such as HSTS, X-Frame-Options, and X-XSS protection.
Implement Identity and Access Management in order to follow the least privilege principle.
Run automatic security products to reveal vulnerabilities during the development, testing, and deployment of new versions of the application.
Perform manual penetration testing assessment on a regular basis.