
The exposed Elasticsearch cluster, which contained over 160 indices, held billions of primarily Chinese records, ranging from national citizen ID numbers to various business records. The massive leak is among the largest single Elasticsearch exposures ever recorded.
-
Cybernews researchers discovered 8.7 billion exposed Chinese records on an unsecured Elasticsearch cluster, one of history's largest data leaks.
-
The leaked data includes national ID numbers, home addresses, plaintext passwords, and social media identifiers, creating severe identity theft risks.
-
The exposed database remained publicly accessible for over three weeks before being closed, giving attackers ample time to scrape data.
-
Researchers believe the dataset was intentionally aggregated on bulletproof hosting, suggesting data broker activity or malicious intent.
While anything China-related tends to yield extremely high numbers, the latest data leak is massive even by Chinese standards. On January 1st 2026, the Cybernews research team discovered 8.73 billion Chinese records exposed online.
The leaked data ranges from national ID numbers and home addresses to social media identifiers and email addresses, severely increasing identity theft and account takeover risks for individuals involved.
The exposed data was stored on a massive Elasticsearch cluster. Organizations and businesses use Elasticsearch because it supports rapid sorting, near-real-time data searching, and high scalability. For example, the cluster our team discovered contained 163 indices, housing billions upon billions of records.
The massive data cluster was discovered on the first days of 2026 and remained open for over three weeks. While there are no indications that the data was abused by malicious actors, if our researchers managed to find it, there’s no reason others couldn’t too.
“Despite the short exposure window, the scale of the dataset means that automated scraping during this period could have resulted in widespread secondary dissemination,” our researchers said.
Bob Diachenko, a Cybernews contributor, cybersecurity researcher, and owner of SecurityDiscovery.com, is behind this major discovery. According to him, the cluster's metadata across multiple datasets shows that data was imported as recently as late 2025.
“The presence of timestamps and import dates points to a long-running aggregation effort rather than a single historical breach,” the team explained.
What information has the major Chinese data leak exposed?
Since exposed records are spread across multiple indices, they vary widely. The exposed records range from full names and poorly protected account passwords to messaging and social media identifiers.
According to the team, the exposed data aggregates personal identifiers, contact information, government-style identifiers, online account references, and credentials at an unprecedented scale.
The geographic distribution of the leaked records is limited, predominantly focusing on mainland China, with regional metadata spanning multiple Chinese provinces and cities.
Our researchers grouped the details into four categories: personally identifiable information (PII), account and platform data, authentication data, as well as corporate and business records. The exposed records include:
Personally Identifiable Information (PII):
- Full names
- Mobile phone numbers
- National ID numbers
- Home addresses
- Date and place of birth
- Gender and demographic attributes
Account and platform data:
- Messaging and social media identifiers
- Email addresses
- Usernames
- Platform-specific account references
Authentication data:
- Plaintext and weakly protected passwords in multiple datasets
Corporate and Business Records:
- Company registration details
- Legal representatives
- Business contact information
- Registration addresses and licensing metadata
Researchers note that the exposed cluster was highly organized and segmented, with thematic indices adhering to data type. For example, the team observed phone-centric, ID-centric, account-centric, and other types of datasets.
As the database contained no banner, no organization names, and no operator identifiers, the team could not confirm the identity of the data owner. At the same time, no public claim of ownership has emerged.
“The infrastructure was hosted on a bulletproof hosting provider, commonly associated with high-risk or non-compliant data operations. Moreover, the dataset structure and scale suggest intentional aggregation, not accidental logging or misconfiguration by a single consumer service,” our researchers said.
Interestingly, the datatypes present in the cluster matched the types of data that data brokers collect. At the same time, other services hosted on the server suggest that the personal and company information could have been abused by a malicious actor for financial fraud.
The team could not accurately evaluate how many individuals were exposed. While different clusters contained duplicate data, the sheer volume of exposed records still suggests the number of exposed individuals could be in the hundreds of millions.
Largest Chinese data leak: What are its implications?
Even though the 8.7 billion-record-strong dataset is no longer accessible, it was open for over three weeks, giving malicious actors ample time to scrape it. Our researchers believe attackers could utilize the data for multiple purposes.
For one, the exposed records included plaintext credentials, some with poorly protected passwords. This type of data is extremely useful for account takeovers, with cybercriminals accessing additional user details. Password information enables cybercrooks to carry out credential stuffing attacks, as users often reuse the same passwords for multiple accounts.
The Cybernews community is talking about this. Be a part of the conversation.
Another major risk for individuals is identity theft. Since the dataset included tremendous amounts of PII, together with national identifiers, malicious actors may attempt to set up fraudulent accounts. ID numbers are often the key metric that organizations and businesses demand upon setting up accounts.
“This exposure demonstrates how large-scale personal data aggregation can persist outside regulatory oversight when hosted in permissive environments. Even without a confirmed owner, the dataset represents a systemic privacy risk affecting potentially hundreds of millions of individuals,” our researchers explained.
Other major Chinese data leaks
Like any major global player, China has suffered from major data leaks over the past year. Last September, an anonymous source leaked over 500GB of internal documents from the Chinese internet censorship program, known as the Great Firewall of China.
Meanwhile, the last “biggest ever” data leak we saw came from China and was also discovered by Cybernews researchers. It occurred in May 2025, when over 4 billion documents containing financial data, WeChat and Alipay details, and other sensitive personal data were exposed to the public.
In 2024, the team uncovered a COMB – compilation of many breaches – targeting Chinese individuals with over 1.2 billion records. The data mostly covered details from the Chinese social media app QQ, microblogging platform Weibo, courier services provider ShunFeng, and many other organizations.
Likely one of the most damaging data leaks to plague China took place in 2022, after malicious actors shared a massive dataset weighing 23 terabytes, supposedly covering information about a billion Chinese nationals. The database was allegedly stolen from the Shanghai police.
Vilius Petkauskas is a deputy editor at Cybernews. Vilius brings over a decade of experience in journalism to his role at Cybernews. He oversees content quality, topic pitching and research article development. Before joining Cybernews, Vilius sharpened his pen as a journalist for both print and online media, covering a diverse range of topics from local business to international politics.
What data was exposed in the massive 8.7B record leak?
The exposed Elasticsearch cluster contained 8.73 billion records, including highly sensitive PII such as national ID numbers, full names, home addresses, mobile phone numbers, and passwords.
Who owns the database that leaked the Chinese records?
The owner of the database remains unknown. The cluster had no organization name or identifiers and was hosted on a "bulletproof" hosting provider often used for risky or illicit operations.
How long was the data exposed online?
The database was discovered by Cybernews researchers on January 1, 2026, and remained open and accessible to anyone on the internet for over three weeks until it was closed on January 26, 2026.
What are the risks of this data leak for individuals?
The exposure of national ID numbers combined with authentication data such as passwords creates a severe risk of identity theft and account takeovers.
- Leak discovered: January 1st, 2026
- Leak closed: January 26th, 2026
Your email address will not be published. Required fields are markedmarked