Meta’s data scraping: against the rules yet impossible to stop?

While we enjoy so many “free” online services like social media, our privacy becomes the price we have to pay.

Every one of us has suffered a data breach. If you’ve heard of our personal data leak checker or password leak checker, you might find that your number, email address, or even password had been leaked at some point.

In some cases, leaks occur due to a cyberattack, a malicious insider, or simply unintentional loss or exposure of data. However, threat actors don’t always have to penetrate the company’s network to obtain our sensitive details.

Just last week, Facebook, long criticized for trading user data, was fined €265 million ($277m at the time) by Ireland’s data privacy regulator over a leak that exposed over 533 million Facebook user records. Roughly a quarter of its users’ phone numbers, names, genders, occupations, email addresses, locations, and even marital statuses are circulating the web for free.

Threat actors are no longer even charging for that data – it’s out there for anyone to take advantage of. Facebook said it took action against data scraping but is that enough?

Ireland launched an inquiry last year after a massive dataset scraped from Facebook was made available online.

Ireland's privacy watchdog, which consulted all the data protection supervisory authorities within the EU before the decision, said that Facebook violated the General Data Protection Regulation (GDPR), namely Articles 25(1) and 25(2).

The aforementioned rules discuss the necessity of data minimization and pseudonymization to protect data and ensure "personal data are not made accessible without the individual's intervention."

Scraping is precisely that – someone harvesting data available about us online, from our user names to emails and phone numbers to any other data that can be obtained from publicly available sources.

This is the second such fine for Meta in merely a couple of months. In September, Ireland had already fined Meta-owned Instagram €405 million (about $427,813) after examining the public disclosure of children's emails and phone numbers.

Anti-scraping policy

As scraping continues to be an internet-wide challenge, Facebook opened up two new research areas for its bug bounty community and now rewards scraping bugs submitted by its Gold+ Hacker Plus researchers.

Meta also says it rewards reports of unprotected or openly public data sets containing at least 100,000 unique Facebook user records that include information such as email, phone number, physical address, and religious or political affiliation.

In July, Meta filed separate actions in federal court against a US subsidiary of a Chinese national high-tech enterprise Octopus and Ekrem Ateş for scraping data from Facebook and Instagram.

The company accused Octopus, a US subsidiary of a Chinese national high-tech enterprise, of building a cloud-based platform to provide paying customers access to on-demand scraping software and services. A Turkey-based defendant Ekrem Ateş is being sued for allegedly using automated Instagram accounts to scrape data from the profiles of over 350,000 Instagram users.

Less is known about WhatsApp, Meta-owned end-to-end encryption messenger service anti-scraping protection. Officially, scraping violates its Terms of Service.

However, WhatsApp hasn't issued an official comment following reports about an alleged massive data leak, leaving us wondering whether such a dataset of user phone numbers could be obtained by scraping.

"While companies may have terms that forbid it, they really need technical controls in place to help prevent it. Any data that is accessible can be scraped," John Earle, President of cybersecurity consulting firm Protocol 86, told Cybernews.

Can scraping be prevented?

The fact that companies don't allow scraping does not deter bad actors from abusing the applications' native application programming interface (API), believes Kyle Kurdziolek, senior cloud security manager at the data management company, BigID.

"We see this across multiple social media and communication services, Twitter, LinkedIn, Instagram, Reddit, etc., where it violates their respective Terms of Service, but when it comes to abusing APIs especially surrounding mobile apps, it's problematic. There is little to no care or focus to implement such defenses given the lack of security inherently to mobile applications," Kurdziolek said.

Stopping unwanted scraping is extremely difficult because, as Sam Crowther, CEO of bot mitigation company Kasada, explained, you only get a single chance to determine whether a request originates from a human or a bot – there's no time to observe user behaviors using machine learning (ML) or other means.

"Scraping bots are very difficult to detect because they look and act just like humans – hiding behind residential proxy networks and leveraging highly customized automation tools. It's entirely possible that WhatsApp didn't notice these scraping bot requests," Crowther said.

Bots are, in fact, quite hard to stop since requests are not usually coming from the same IP or the same session ID.

"Scrapers now have the ability to break up the scraping work into chunks and send them to different bots. I don't mean 1 or 2, more like thousands to 10s of thousands of bots. That activity is harder to spot. I know this is accurate because security researchers use the same techniques to scrape threat actor forums and channels," David Maynor, Head of Cybrary Threat Intelligence Group (CTIG), told Cybernews.

Secure your WhatsApp

A private phone number that belongs to an individual as opposed to government agencies’ and corporations' contacts, is considered to be personally identifiable information (PII).

Therefore, companies must protect the information you share with them. However, due to some security flaws or simple scraping that some companies turn a blind eye to, your data, like your email address or phone number, can get leaked.

There are a couple of things you can do to make sure that your disclosed information will not benefit threat actors:

Do not answer calls and text messages from unknown members. Block anyone who raises suspicion.

Enable 2FA as soon as possible – head to WhatsApp Settings-Account and turn the feature on.

Check that your profile information is not publicly visible. Go to Settings-Privacy and choose who can see your profile picture, “about” information, and other account details. Make sure you share those only with a small group of people.

Don’t fall for scam support messages. We’ve noticed scammers offering their “help” by redirecting WhatsApp users to experts who allegedly can help get the hacked account back. The only way to recover a hacked account is by contacting official support.

Meta’s data scraping: against the rules yet impossible to stop?

More from Cybernews

GDPR violations

Anti-scraping policy

Can scraping be prevented?

Secure your WhatsApp