Midjourney AI developers caught copying artists to train generator

Midjourney AI, one of the most popular generative AI tools for creating art, has used a database of 16,000 artists to train its model and discussed ways to avoid legal problems online, court evidence shows.

The new year has started off with new problems for Midjourney AI. Company executives breathed a sigh of relief last year when a judge in California largely dismissed copyright claims by three artists in a case brought against MidjourneyAI and two other firms that build text-to-image tools, DeviantArt and Stability AI.

The three artists – Sarah Andersen, Kelly McKernan, and Karla Ortiz – sued the aforementioned startups, accusing the businesses of using people's copyrighted artwork without permission to build text-to-image AI tools.

The judge dismissed copyright violation claims since the artists hadn’t registered their work with the US Copyright Office. However, the case wasn’t thrown out entirely – and because the judge allowed the artists to amend their claims, this is precisely what they have now done.

In November, a group of visual artists added seven more names and additional details about the alleged infringement to the copyright lawsuit (PDF). They claimed that companies such as Midjourney have produced systems that create art in the style of the artists when the artists’ names are used as prompts fed to the AI.

As a result, users have generated art that is “indistinguishable” from their own. And now artists have revealed a leaked Google Sheet allegedly showing how Midjourney specifically developed a database of time periods, styles, genres, movements, mediums, techniques, and thousands of artists to train its AI text-to-image generator.

Jon Lam, a senior storyboard artist at Riot Games, posted several screenshots of Midjourney software developers discussing the creation of a database, on X, a social media platform.

Lam said that both the list and the screenshots were included as part of a lawsuit amendment to a class-action complaint. “Prompt engineers, your ‘skills’ are not yours,” added the artist.

The screenshots – which are from early 2022 – also reveal how Midjourney employees, including the CEO David Holz, discuss ways to “launder” the dataset in order to avoid legal trouble.

“All you have to do is just use those scraped datasets and then conveniently forget what you used to train the model. Boom, legal problems solved forever,” wrote one employee.

The 24-page list of artists’ names includes notable artists such as Andy Warhol, Frida Kahlo, Banksy, Vincent van Gogh, and specific styles from companies like Disney or Nintendo.

Access to the Google file is now restricted, but a version has been uploaded to the Internet Archive.

The artists are not the only ones battling AI companies. Stock image platform Getty Images also filed a lawsuit in the High Court of Justice in London against Stability AI, claiming it ‘unlawfully’ scraped millions of images from its site.

Harm to individual artists can be more painful, though. Back in January 2023, Kelly McKernan, one of the artists who originally sued the AI image generation companies, said that the wave of AI “art” has real consequences for a real human creator struggling to make ends meet.

Midjourney AI developers caught copying artists to train generator

More from Cybernews