The New York Times bans use of its content to train AI


The paper of record has changed its terms of service to forbid its content from being used to train artificial intelligence models without its prior consent.

The updated rules say that content produced by The New York Times cannot be used in the development of “any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

The document says the ban covers, but is not limited to text, photographs, images, illustrations, designs, audio clips, video clips, “look and feel,” metadata, data, or compilations.

Updated on August 3rd, the terms of service also ban the use of automated tools like website crawlers to collect and use such content without the publication’s prior written consent.

The New York Times warns that engaging in the prohibited use of the services could result in penalties, fines, or sanctions against the user or those assisting the user, but does not specify further.

According to Adweek, which first reported the news, The New York Times is taking a novel approach in explicitly mentioning AI training in its terms or service.

While publishers can see web crawlers visiting their websites, they have no way of knowing whether they are used for search engine optimization or for training AI models, it said.

Updating service terms could give newsrooms more control over how their content is used.

To address some of these concerns, OpenAI has recently announced that publishers will be able to block its GPTBot web crawler aimed at improving AI models from scraping their websites.

Neither Microsoft nor Google, other significant players in the field, have taken similar steps.

Alongside Meta, OpenAI has been recently sued by comedian Sarah Silverman and two other authors over allegations the two companies illegally trained their AI models on copyrighted material.