Mysterious actor spills over 1.2B records on Chinese users


An unknown actor is building a COMB – a compilation of many breaches – targeting Chinese individuals and already has over 1.2 billion records. Each one contains at least a phone number but often includes other sensitive data such as address or ID card number – and it’s leaking online.

On May 6th, the Cybernews research team discovered a colossal dataset solely focused on citizens of China. The entity behind it likely inadvertently misconfigured the Elasticsearch (data storage and search tool) instance and left the data leaking onto the internet.

The actor started inflating the COMB recently, as the first entry was uploaded on the 29th of April. A week later, the endpoint contained 1,230,703,487 personal data records of Chinese citizens and counting.

ADVERTISEMENT

The entire population of China is roughly 1.4 billion, and the COMB is about 87% of that. Each record in the 100-gigabyte size leak contains at least a phone number.

Most of the data is aggregated from previous public leaks. However, the compilation also includes some private and previously unseen datasets.

“Such an immense collection of personal information suggests the individuals behind it likely have ulterior motives,” the Cybernews research team warns. “The complete dataset is likely to contain duplicates, but that may be by design. It allows threat actors to view all the leaked data about a person, tying together different data points from different leaks and breaches.”

This discovered COMB is the second largest leak this year, following the 26 billion records in the Mother of All Breaches (MOAB) collection left open by a data breach search engine.

leaked-data-china-comb

What’s inside the stash?

The COMB is hosted in a data center in Germany. The leaky Kibana instance, the dashboard interface for viewing the data, was set to Simplified Chinese, suggesting that the administrator could also be of Chinese origin.

ADVERTISEMENT

The data chest includes the following:

  • 668,304,162 records containing QQ account numbers and phone numbers. QQ is a hugely popular social media app in China, similar to WhatsApp.
  • 502,852,106 records containing Weibo account IDs and phone numbers. Weibo is a Chinese microblogging platform, similar to a hybrid of Twitter and Facebook.
  • 50,557,417 records in the ShunFeng sub-dataset, including phone numbers, names, and addresses. ShunFeng provides logistic/courier services in China.
  • 8,064,215 records in the Siyaosu sub-dataset, exposing names, phone numbers, addresses, and Identity Card numbers.
  • 746,310 records in the sub-dataset called Chezhu, leaking name, phone number, email address, address, and Identity Card Number data.
  • 100,790 records in the Pingan sub-dataset contain names, phone numbers, email addresses, home addresses, ordered services, card numbers, and amount paid. Ping An is an insurance company in China.
  • 78,487 records in the Jiedai sub-dataset leaked names, phone numbers, addresses, ID card numbers, places of work, education levels, partner names, and phone numbers.
datasets-china-comb

The intent may be malicious

At the time of discovery, attribution for the massive data leak could not be definitively made, as no individual or group has openly claimed responsibility for it.

“The discovered data was likely obtained illegally and is possibly intended to be used for illegal purposes. The data likely belongs to an individual threat actor or a group of individuals,” Cybernews researchers said.

Usually, large amounts of data are used in a black market for illegal services to search for previously leaked information. It may also be preparations for large-scale robocalling, scams, or phishing attempts focusing on Chinese citizens.

The data hoarder appears to have no interest in passwords, as none are present, despite many leaks in the past exposing them.

“The choice of Elasticsearch as data repository is also telling, as it is a go-to tool for both storing large amounts of data and also quickly searching for that data,” researchers said

Elasticsearch is capable of rapid sorting, near real-time data searching, and is highly scalable.

ADVERTISEMENT

At the time of discovery, the personal information stash had just started to be built, with the potential for additional data to be included in the future.

The Cybernews research team has informed the German cloud provider about the seemingly illegally-stored and open data repository.

China has suffered massive data leaks in the past. In 2022, hackers claimed to have breached the Shanghai police, stealing data on one billion Chinese citizens. Last year, the data of 630 million Chinese was exposed on a Russian-linked forum, including bank card details.

china-comb-leaked-data

The dangers ahead

People in China should be aware that their data is being compiled for potential future campaigns where threat actors may attempt widespread targeting.

Most of the data is not new. However, it only matters a little for malicious actors as long as the data itself is valid.

The users whose personal information has been included in the COMB may be targeted in spear-phishing attacks and receive higher than usual levels of spam emails, calls, or text messages from fraudsters.

While there are no passwords in the datasets, cybercrooks can still try to gain unauthorized access to accounts by matching the usernames and passwords from other leaks.

Phone numbers are often used as a means of authentication or account recovery. Therefore, attackers could attempt to steal identities or access online accounts without authorization.

ADVERTISEMENT

“Scamming operates on a percentages basis. Armed with a billion phone numbers, attackers can attempt social engineering attacks to gain trust, impersonate individuals, or manipulate victims into revealing more sensitive information,” our researchers warn.

Other possibilities include using data for surveillance and tracking, especially if state-sponsored entities are involved.