8.7 billion records spilled: Inside the massive Chinese data leak


The exposed Elasticsearch cluster, which contained over 160 indices, held billions of primarily Chinese records, ranging from national citizen ID numbers to various business records. The massive leak is among the largest single Elasticsearch exposures ever recorded.

Key takeaways:

While anything China-related tends to yield extremely high numbers, the latest data leak is massive even by Chinese standards. On January 1st 2026, the Cybernews research team discovered 8.73 billion Chinese records exposed online.

ADVERTISEMENT

The leaked data ranges from national ID numbers and home addresses to social media identifiers and email addresses, severely increasing identity theft and account takeover risks for individuals involved.

The exposed data was stored on a massive Elasticsearch cluster. Organizations and businesses use Elasticsearch because it supports rapid sorting, near-real-time data searching, and high scalability. For example, the cluster our team discovered contained 163 indices, housing billions upon billions of records.

The massive data cluster was discovered on the first days of 2026 and remained open for over three weeks. While there are no indications that the data was abused by malicious actors, if our researchers managed to find it, there’s no reason others couldn’t too.

“Despite the short exposure window, the scale of the dataset means that automated scraping during this period could have resulted in widespread secondary dissemination,” our researchers said.

Billions of Chinese records leaked online
Sample of the leaked data. Image by Cybernews.

Bob Diachenko, a Cybernews contributor, cybersecurity researcher, and owner of SecurityDiscovery.com, is behind this major discovery. According to him, the cluster's metadata across multiple datasets shows that data was imported as recently as late 2025.

“The presence of timestamps and import dates points to a long-running aggregation effort rather than a single historical breach,” the team explained.

What information has the major Chinese data leak exposed?

ADVERTISEMENT

Since exposed records are spread across multiple indices, they vary widely. The exposed records range from full names and poorly protected account passwords to messaging and social media identifiers.

According to the team, the exposed data aggregates personal identifiers, contact information, government-style identifiers, online account references, and credentials at an unprecedented scale.

The geographic distribution of the leaked records is limited, predominantly focusing on mainland China, with regional metadata spanning multiple Chinese provinces and cities.

Major Chinese data leak uncovered
Sample of the leaked data. Image by Cybernews.

Our researchers grouped the details into four categories: personally identifiable information (PII), account and platform data, authentication data, as well as corporate and business records. The exposed records include:

Personally Identifiable Information (PII):

  • Full names
  • Mobile phone numbers
  • National ID numbers
  • Home addresses
  • Date and place of birth
  • Gender and demographic attributes

Account and platform data:

  • Messaging and social media identifiers
  • Email addresses
  • Usernames
  • Platform-specific account references

Authentication data:

ADVERTISEMENT
  • Plaintext and weakly protected passwords in multiple datasets

Corporate and Business Records:

  • Company registration details
  • Legal representatives
  • Business contact information
  • Registration addresses and licensing metadata
Sample of the major Chinese data leak
Sample of the leaked data. Image by Cybernews.

Researchers note that the exposed cluster was highly organized and segmented, with thematic indices adhering to data type. For example, the team observed phone-centric, ID-centric, account-centric, and other types of datasets.

As the database contained no banner, no organization names, and no operator identifiers, the team could not confirm the identity of the data owner. At the same time, no public claim of ownership has emerged.

“The infrastructure was hosted on a bulletproof hosting provider, commonly associated with high-risk or non-compliant data operations. Moreover, the dataset structure and scale suggest intentional aggregation, not accidental logging or misconfiguration by a single consumer service,” our researchers said.

Interestingly, the datatypes present in the cluster matched the types of data that data brokers collect. At the same time, other services hosted on the server suggest that the personal and company information could have been abused by a malicious actor for financial fraud.

The team could not accurately evaluate how many individuals were exposed. While different clusters contained duplicate data, the sheer volume of exposed records still suggests the number of exposed individuals could be in the hundreds of millions.

Chinese data leak expose billion of records
Sample of the leaked data. Image by Cybernews.
ADVERTISEMENT

Largest Chinese data leak: What are its implications?

Even though the 8.7 billion-record-strong dataset is no longer accessible, it was open for over three weeks, giving malicious actors ample time to scrape it. Our researchers believe attackers could utilize the data for multiple purposes.

For one, the exposed records included plaintext credentials, some with poorly protected passwords. This type of data is extremely useful for account takeovers, with cybercriminals accessing additional user details. Password information enables cybercrooks to carry out credential stuffing attacks, as users often reuse the same passwords for multiple accounts.

The Cybernews community is talking about this. Be a part of the conversation.

Another major risk for individuals is identity theft. Since the dataset included tremendous amounts of PII, together with national identifiers, malicious actors may attempt to set up fraudulent accounts. ID numbers are often the key metric that organizations and businesses demand upon setting up accounts.

“This exposure demonstrates how large-scale personal data aggregation can persist outside regulatory oversight when hosted in permissive environments. Even without a confirmed owner, the dataset represents a systemic privacy risk affecting potentially hundreds of millions of individuals,” our researchers explained.

Other major Chinese data leaks

Like any major global player, China has suffered from major data leaks over the past year. Last September, an anonymous source leaked over 500GB of internal documents from the Chinese internet censorship program, known as the Great Firewall of China.

Meanwhile, the last “biggest ever” data leak we saw came from China and was also discovered by Cybernews researchers. It occurred in May 2025, when over 4 billion documents containing financial data, WeChat and Alipay details, and other sensitive personal data were exposed to the public.

ADVERTISEMENT

In 2024, the team uncovered a COMB – compilation of many breaches – targeting Chinese individuals with over 1.2 billion records. The data mostly covered details from the Chinese social media app QQ, microblogging platform Weibo, courier services provider ShunFeng, and many other organizations.

Likely one of the most damaging data leaks to plague China took place in 2022, after malicious actors shared a massive dataset weighing 23 terabytes, supposedly covering information about a billion Chinese nationals. The database was allegedly stolen from the Shanghai police.

Vilius Petkauskas
Deputy Editor

  • Leak discovered: January 1st, 2026
  • Leak closed: January 26th, 2026