AI training bans: hogwash in, hogwash out


News media organizations have blocked AI firms from harvesting their content for training. No one wants to see intellectual property theft, of course, but what kind of information will that leave for the machines to learn from?

In January 2023, The New York Times website posted a piece by Kevin Roose, a technology columnist, who urged a rethink of the idea of banning ChatGPT, OpenAI’s powerful chatbot, in schools.

Roose said he believed schools should embrace ChatGPT as a teaching aid – not only because bans aren’t probably going to work but also because AI can “unlock student creativity.”

ADVERTISEMENT

The times they are a-changin’, though. In August, the celebrated daily newspaper changed its terms of service to forbid its content from being used to train AI models without its prior consent.

The updated rules say that content produced by The New York Times cannot be used in the development of “any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

Other major media organizations such as The Associated Press, a news agency, and the News Media Alliance, soon also called for new laws to stop their content training AI tools. The Guardian, a British daily, followed with its own ban on OpenAI from trawling its pages.

At first glance, the concerns of the media industry are valid. Media and indeed any creative content should enjoy copyright protections, while generative AI technology, as promising as it is, can still produce a lot of false information and confuse the public.

But, surely, a middle way needs to be found. That’s because if major and respected media organizations and authors refuse to allow their content to be used for AI training, the quality of information left up for grabs will indeed be worrying.

One needs to realize that the use of generative AI is not going to be banned or even severely restricted – that genie is already out of the bottle.

Sure, the technology is definitely overhyped, and more intellectuals are voicing their wish for the chatter about almost magic-sounding machine intelligence to be replaced with ideas about machine usefulness – yes, to humans.

But generative AI will still be there, in one form or another. It will augment and complement how content is created. Even Amazon, the global retail giant, is now allowing self-published authors to use AI-assistance and sell their work without disclosing it, though it has taken a somewhat firmer stance on what it calls AI-generated content.

ADVERTISEMENT
ChatGPT
ChatGPT, OpenAI's generative chatbot. Image by Shutterstock.

So, in other words: AI is not going away, ever. But it’s up to us, both as individual consumers and corporate entities, to at least try to steer the direction of travel taken by this phenomenon.

If public discourse were a boat, then preventing AI firms from training their models on respected and trusted information would be close to allowing it to drift on the open sea. It’s especially bizarre seeing major media organizations, usually campaigning for truth and justice, moving this way.

If ChatGPT and other models are denied access to The New York Times, The Washington Post, or The Guardian, they will be left with sensationalistic, prejudiced, and often simply second-rate scraps from The Daily Mail and The New York Post. Why do we want to help the Murdochs of the world take over the media world once and for all?

Sure, today more people are getting their news and other information from social media platforms, and the tech bosses are always eager to tell us the algorithms are perfect at showing the world as it is.

Well, no, they are not – far from it. Have you browsed Elon Musk’s X lately? Facebook? The quality of content on these apps has gone down massively, and, by the way, there’s a lot of AI-generated stuff on them already – as well as tons of misinformation, spread by either weird conspiracy theorists, or nation-states, pursuing an agenda.

Do we want the mainstream generative AI to be trained on spammy bile? I know I don’t. This is, again, why a sort of a compromise is needed.

Let’s call it a mutual understanding, though. AI companies should undoubtedly pay for training their large language models on publicly available data, including the news. Most news websites are already behind paywalls – if consumers have to pay up, AI developers should, too.

However, media organizations should also understand that they can help shape the future of public discourse in the AI era by becoming machine educators and ensuring that the information the computers are fed is accurate, balanced, and, most importantly, factual.

Several steps in the right direction have already been taken. For example, in July, OpenAI announced it had signed a deal with The Associated Press (AP) to license the news agency’s collection of news stories. OpenAI will use AP’s news stories to train and develop its chatbot tool.

ADVERTISEMENT

No financial details of the deal were announced, but it’s quite obvious OpenAI is paying AP – as it should. In exchange, the news agency will “leverage OpenAI’s technology and product expertise.”

It’s not entirely clear what kind of expertise AP is getting, but AI could help journalists choose a better headline, say, or style. As long as the essential role of journalism is respected, and AI bots are perceived as what they actually are – tools – it should be possible to find a smoother way forwards.

These AI systems should be allowed to be trained on honest, reputable, content. Let’s not kid ourselves – the purveyors of fake news are definitely not blocking the machine-learning trawlers. On the contrary, propagandists and liars are probably glad that the defenders of the facts are stepping out of their way.


ADVERTISEMENT

Comments

Sam
prefix 1 year ago
I for one am glad that far woke left and woke right MEDIA companies like the NYT are not going to allow AI. Its a breathe of fresh air knowing that BIASED policies wont be used with AI!! Its a celebration today for TRUTH!!!
Leave a Reply

Your email address will not be published. Required fields are markedmarked