8.7 billion records spilled: Inside the massive Chinese data leak

The exposed Elasticsearch cluster, which contained over 160 indices, held billions of primarily Chinese records, ranging from national citizen ID numbers to various business records. The massive leak is among the largest single Elasticsearch exposures ever recorded.

Key takeaways:

Cybernews researchers discovered 8.7 billion exposed Chinese records on an unsecured Elasticsearch cluster, one of history's largest data leaks.
The leaked data includes national ID numbers, home addresses, plaintext passwords, and social media identifiers, creating severe identity theft risks.
The exposed database remained publicly accessible for over three weeks before being closed, giving attackers ample time to scrape data.
Researchers believe the dataset was intentionally aggregated on bulletproof hosting, suggesting data broker activity or malicious intent.

Key Takeaways by nexos.ai, reviewed by Cybernews staff.

While anything China-related tends to yield extremely high numbers, the latest data leak is massive even by Chinese standards. On January 1st 2026, the Cybernews research team discovered 8.73 billion Chinese records exposed online.

The leaked data ranges from national ID numbers and home addresses to social media identifiers and email addresses, severely increasing identity theft and account takeover risks for individuals involved.

The exposed data was stored on a massive Elasticsearch cluster. Organizations and businesses use Elasticsearch because it supports rapid sorting, near-real-time data searching, and high scalability. For example, the cluster our team discovered contained 163 indices, housing billions upon billions of records.

The massive data cluster was discovered on the first days of 2026 and remained open for over three weeks. While there are no indications that the data was abused by malicious actors, if our researchers managed to find it, there’s no reason others couldn’t too.

“Despite the short exposure window, the scale of the dataset means that automated scraping during this period could have resulted in widespread secondary dissemination,” our researchers said.

Billions of Chinese records leaked online — Sample of the leaked data. Image by Cybernews.

Bob Diachenko, a Cybernews contributor, cybersecurity researcher, and owner of SecurityDiscovery.com, is behind this major discovery. According to him, the cluster's metadata across multiple datasets shows that data was imported as recently as late 2025.

“The presence of timestamps and import dates points to a long-running aggregation effort rather than a single historical breach,” the team explained.

What information has the major Chinese data leak exposed?

Since exposed records are spread across multiple indices, they vary widely. The exposed records range from full names and poorly protected account passwords to messaging and social media identifiers.

According to the team, the exposed data aggregates personal identifiers, contact information, government-style identifiers, online account references, and credentials at an unprecedented scale.

The geographic distribution of the leaked records is limited, predominantly focusing on mainland China, with regional metadata spanning multiple Chinese provinces and cities.

Major Chinese data leak uncovered — Sample of the leaked data. Image by Cybernews.

Our researchers grouped the details into four categories: personally identifiable information (PII), account and platform data, authentication data, as well as corporate and business records. The exposed records include:

Personally Identifiable Information (PII):

Full names
Mobile phone numbers
National ID numbers
Home addresses
Date and place of birth
Gender and demographic attributes

Account and platform data:

Messaging and social media identifiers
Email addresses
Usernames
Platform-specific account references

Authentication data:

Plaintext and weakly protected passwords in multiple datasets

Corporate and Business Records:

Company registration details
Legal representatives
Business contact information
Registration addresses and licensing metadata

Sample of the major Chinese data leak — Sample of the leaked data. Image by Cybernews.

Researchers note that the exposed cluster was highly organized and segmented, with thematic indices adhering to data type. For example, the team observed phone-centric, ID-centric, account-centric, and other types of datasets.

As the database contained no banner, no organization names, and no operator identifiers, the team could not confirm the identity of the data owner. At the same time, no public claim of ownership has emerged.

“The infrastructure was hosted on a bulletproof hosting provider, commonly associated with high-risk or non-compliant data operations. Moreover, the dataset structure and scale suggest intentional aggregation, not accidental logging or misconfiguration by a single consumer service,” our researchers said.

Interestingly, the datatypes present in the cluster matched the types of data that data brokers collect. At the same time, other services hosted on the server suggest that the personal and company information could have been abused by a malicious actor for financial fraud.

The team could not accurately evaluate how many individuals were exposed. While different clusters contained duplicate data, the sheer volume of exposed records still suggests the number of exposed individuals could be in the hundreds of millions.

Chinese data leak expose billion of records — Sample of the leaked data. Image by Cybernews.

Largest Chinese data leak: What are its implications?

Even though the 8.7 billion-record-strong dataset is no longer accessible, it was open for over three weeks, giving malicious actors ample time to scrape it. Our researchers believe attackers could utilize the data for multiple purposes.

For one, the exposed records included plaintext credentials, some with poorly protected passwords. This type of data is extremely useful for account takeovers, with cybercriminals accessing additional user details. Password information enables cybercrooks to carry out credential stuffing attacks, as users often reuse the same passwords for multiple accounts.

The Cybernews community is talking about this. Be a part of the conversation.

Another major risk for individuals is identity theft. Since the dataset included tremendous amounts of PII, together with national identifiers, malicious actors may attempt to set up fraudulent accounts. ID numbers are often the key metric that organizations and businesses demand upon setting up accounts.

“This exposure demonstrates how large-scale personal data aggregation can persist outside regulatory oversight when hosted in permissive environments. Even without a confirmed owner, the dataset represents a systemic privacy risk affecting potentially hundreds of millions of individuals,” our researchers explained.

Other major Chinese data leaks

Like any major global player, China has suffered from major data leaks over the past year. Last September, an anonymous source leaked over 500GB of internal documents from the Chinese internet censorship program, known as the Great Firewall of China.

Meanwhile, the last “biggest ever” data leak we saw came from China and was also discovered by Cybernews researchers. It occurred in May 2025, when over 4 billion documents containing financial data, WeChat and Alipay details, and other sensitive personal data were exposed to the public.

In 2024, the team uncovered a COMB – compilation of many breaches – targeting Chinese individuals with over 1.2 billion records. The data mostly covered details from the Chinese social media app QQ, microblogging platform Weibo, courier services provider ShunFeng, and many other organizations.

Likely one of the most damaging data leaks to plague China took place in 2022, after malicious actors shared a massive dataset weighing 23 terabytes, supposedly covering information about a billion Chinese nationals. The database was allegedly stolen from the Shanghai police.

Vilius Petkauskas

Deputy Editor

Vilius Petkauskas is the deputy editor at Cybernews, covering information security, data breaches, ransomware, and cybercrime. With 13 years of journalism experience, he specializes in investigating large-scale data breaches and explaining how cybercriminals operate. Before joining Cybernews, Vilius worked as an economics and politics journalist at IQ magazine and as a journalist and fact-checker at 15min.lt. His work has been cited by Forbes, Fox, Time, and the Guardian.

What data was exposed in the massive 8.7B record leak?

The exposed Elasticsearch cluster contained 8.73 billion records, including highly sensitive PII such as national ID numbers, full names, home addresses, mobile phone numbers, and passwords.

Who owns the database that leaked the Chinese records?

The owner of the database remains unknown. The cluster had no organization name or identifiers and was hosted on a "bulletproof" hosting provider often used for risky or illicit operations.

How long was the data exposed online?

The database was discovered by Cybernews researchers on January 1, 2026, and remained open and accessible to anyone on the internet for over three weeks until it was closed on January 26, 2026.

What are the risks of this data leak for individuals?

The exposure of national ID numbers combined with authentication data such as passwords creates a severe risk of identity theft and account takeovers.

Leak discovered: January 1st, 2026
Leak closed: January 26th, 2026