Meta admits to scraping public Facebook and Instagram posts to train new AI bot

Meta’s execs reveal its new AI virtual assistant was trained using public posts scraped from the Facebook and Instagram social media platforms owned by the tech giant.

Meta founder and CEO Mark Zuckerberg introduced the new Meta AI assistant Wednesday at its annual products event in California, along with its new-generation Ray-Ban smart glasses and updated Quest mixed reality headset, among other products.

The new custom AI assistant is said to leverage technology from the Llama 2 open-source large language model (LLM) and the company's other current LLM research.

What Zuckerberg conveniently omitted from the announcement was that the data used to train its new AI tool was scraped from the millions of texts and images publicly posted on its Facebook and Instagram platforms without its users' knowledge or approval.

In fact, only in August did Meta even give its Facebook users the ability to choose whether or not to allow their personal data to be used for training third-party AI models. To opt-out, the user must fill out a specific “Generative AI Data Subject Rights" form, seemingly buried somewhere on the Facebook platform.

Ironically, the form does not offer the Facebook user a way to opt-out of their data being used to train Meta's AI assistant, only third parties.

In some sort of effort to appease the masses, Meta, the parent company of both Facebook and Instagram, graciously said it chose not to scrape private posts out of respect for its users.

Meta President of Global Affairs Nick Clegg told Reuters Wednesday that the tech giant also excluded users’ private Facebook messenger and Instagram chats and took additional measures to filter private details from the datasets. (We would hope so.)

"We've tried to exclude datasets that have a heavy preponderance of personal information," Clegg said, adding that the "vast majority" of the data used by Meta for the training was already publicly available.

Clegg said Meta deliberately chose not to use the more business-oriented social media platform LinkedIn – which is owned by Microsoft – over privacy concerns.

Furthermore, Clegg said the company has implemented restrictions on what content the AI assistant could generate, such as blocking the tool from creating AI-generated dupes of public figures.

Data scraping woes

The new chatbot, launched to compete with OpenAI’s ChatGPT and Google’s Bard, will be integrated with WhatsApp, Messenger, and Instagram, as well as the smart glasses and Quest headsets.

To ‘teach’ the AI models, tech companies will “scrape” massive amounts of data from the internet and feed it into the chatbot so it can summarize information and generate imagery.

Big tech makers have largely been criticized for data scraping from the internet, often ingesting copyrighted materials without permission to train their AI models.

A number of lawsuits have been brought against OpenAI by well-known American writers – including a joint suit filed last week by Game of Thrones author George R.R. Martin, novelist John Grisham, and family fictionist Jodi Picoult – accusing the AI tech company of copyright infringement.

Just this week, Hollywood's 146-day writer's strike ended with a tentative deal, part of which will allow movie studios to train their own AI models using material created by the union writers, a bone of contention the writers fought to exclude from the contracts.

Meta AI still in beta

Currently in beta, Meta calls its new chatbot an “advanced conversational assistant” that can generate text, audio, and imagery, plus have real-time access to information through a partnership with Microsoft's Bing search engine.

The company states that the AI chatbot will be limited to information that largely existed prior to 2023.

Eventually, Meta plans to launch an AI creator studio for businesses and developers to build their own personalized AI chatbot.

The company also said it will launch 28 more AIs in beta over the coming weeks, all with unique interests and personalities that users can message and interact with on WhatsApp, Messenger, and Instagram.

Some more famous AI personalities include Snoop Dogg, Tom Brady, Kendall Jenner, and Paris Hilton.

Presently, Meta’s new AI tool is only available in the US.

Meta admits to scraping public Facebook and Instagram posts to train new AI bot

More from Cybernews

Data scraping woes

Meta AI still in beta