OpenAI responds to New York Times copyright lawsuit, sees manipulation

OpenAI, the creator of ChatGPT, is in the headlines again. The AI start-up has reacted to a copyright lawsuit filed by The New York Times and called the case “without merit.”

On December 27th, just as OpenAI was emerging from its leadership turmoil, The New York Times sued the company and its main investor, Microsoft, accusing them of using millions of the newspaper’s articles without permission to help train AI models, including those underpinning ChatGPT.

In the lawsuit (PDF), the famous newspaper called for OpenAI and Microsoft to “destroy” models and training data containing the offending material and to be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.”

Now, OpenAI has publicly responded to The New York Times with a dedicated blog post, saying that the daily was “not telling the full story.”

“While we disagree with the claims in The New York Times lawsuit, we view it as an opportunity to clarify our business, our intent, and how we build our technology,” said OpenAI.

Very polite, indeed. The company then moved to attack The Times quite aggressively, saying that both parties had been engaged in constructive discussions about their partnership until “their lawsuit on December 27th – which we learned about by reading The New York Times – came as a surprise and disappointment to us.”

Naturally, OpenAI does not agree with claims that ChatGPT reproduced The Times’ stories verbatim. The firm said that the daily had manipulated prompts to include regurgitated excerpts of articles.

“It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” the company said.

According to OpenAI, The Times mentioned regurgitation of their content but repeatedly refused to share any examples.

It added: “Because models learn from the enormous aggregate of human knowledge, any one sector – including news – is a tiny slice of overall training data, and any single data source – including The New York Times – is not significant for the model’s intended learning.”

For what it’s worth, the newspaper’s lawyers said in the lawsuit that The Times had attempted to reach an agreement with OpenAI “for months.” A resolution has not been found and the daily's representatives say the OpenAI's blog post changes nothing.

“The blog concedes that OpenAI used The Times’s work, along with the work of many others, to build ChatGPT. As The Times’s complaint states, ‘Through Microsoft’s Bing Chat (recently rebranded as ‘Copilot’) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.’ That’s not fair use by any measure,” said Susman Godfrey partner Ian Crosby, lead counsel for The New York Times, in a statement sent to Cybernews.

OpenAI has long maintained its position that access to huge amounts of data available on the internet should be widely tolerated because, otherwise, AI models will not be able to learn and solve new problems. The firm has already signed partnership deals with Axel Springer and The Associated Press.

To news organizations, artists, authors, musicians, and filmmakers, copyright issues and authenticity are more important, though. More and more industries want credit and compensation from tech companies that take their work to build generative AI tools.

Since August, at least 598 news organizations, including The Times, The Washington Post, and Reuters, have installed blockers on their websites to prevent tech companies from scraping their articles.