How Meta is using our personal data for AI (and how to opt out)


Generative AI has exploded in popularity, with tools like ChatGPT demonstrating new, amazing capabilities in language generation since the November 2022 release of the text chatbot. However, this advanced AI requires vast datasets to train on, sparking a heated debate around data privacy and ethics. Tech giants are hungry for more data to fuel ever-improving AI systems, while users and campaigners are all too aware of the risks of giving up their data.

This is why a recent intervention by Meta, the parent company of Facebook, Instagram, and WhatsApp, is so interesting.

Users of Meta apps may not realize it, but the tech giant has been collecting some of their personal data to train artificial intelligence models. This includes public information scraped from social media as well as licensed data from third parties.

The ethics of AI

While Meta claims this data is necessary to “unlock advancements” in AI in a blog post, privacy advocates argue the practice raises ethical concerns. Users have limited ability to opt-out under current policies.

In August, data protection agencies from several countries published a statement reminding tech companies to protect user privacy when scraping data. The statement named Meta, alongside Alphabet and Microsoft, and was addressed to the tech companies.

Meta recently added a “Generative AI Data Subject Rights” form to its website, allowing users to delete certain bits of their personal data used for AI training. However, it only applies to third-party data, not content directly posted on Meta platforms.

What kind of data does Meta have?

The form allows users to “access, download or correct any personal information from third parties used for generative AI,” to “delete any personal information from third parties used for generative AI,” or to “object or restrict the processing of my personal information from third parties used for generative AI.”

Meta's latest AI model, Llama 2, was not trained directly on first-party user data, a spokesperson told CNBC, but it’s unknown whether the company may launch consumer features using AI in the future. For now, the opt-out form is limited to publicly available information scraped from the internet, as well as data that Meta licenses from organizations and bodies that collect it.

Facebook and Instagram contain vast troves of personal data, including private posts visible only to you or friends. Meta has not confirmed whether this content could also train its AI models in the future. The opt-out form suggests Meta currently sticks to publicly available sources.

What happens next?

But with generative AI advancing rapidly, the pressure on tech firms to leverage more private data may grow. Stricter privacy laws in some countries constrain how personal information can be processed. US legislation, as currently developed, lacks similar safeguards at a federal level.

For US users, understanding how Meta utilizes user data for the purposes of AI likely requires reading between the lines. The opt-out form only covers the third-party information that's gathered by other organizations besides Meta. To delete more data, users would likely have to delete their accounts entirely.

The decision by Meta to move to transparency is likely an attempt to head off criticism from a group of data protection agencies from around the world, who recently issued a joint statement warning large tech companies to enshrine user privacy from data scraping. “Individuals can also take steps to protect their personal information from data scraping, and social media companies have a role to play in enabling users to engage with their services in a privacy-protective manner,” the statement issued by the group of data protection agencies said.