Breakdancer's neverending viral nightmare deepens with AI in the mix

Published: 23 August 2024

Last updated: 23 August 2024

Niamh Ancell
Journalist

australian-breakdancer-ai — Screenshot from X user 69420digits

Hilarious footage from the Australian breakdancer who competed at this year's Paris Olympics has been transformed into artificial intelligence (AI) nightmare fuel.

Rachel Gunn, who competed in the Paris Olympic Games 2024, went viral online, resulting in various memes surfacing across the web.

Now, users have gotten hold of the video and transformed the footage into a harrowing display of body horror using text-to-video generation tools.

One video, uploaded to X, shows the breakdancer morphing into ungodly ab-human figures, including something vaguely resembling a mushroom, and even transforming into a man with two heads.

I love Raygun. 😭#AI 🔊🔊🔊🔊 pic.twitter.com/ckfoCxAVqp
undefined 69 (420 Digits) (@69420digits) August 19, 2024

Text-to-video generation, which transforms text prompts into video clips, is still in development and has a long way to go.

The video was created by X user 69420digits but shared by the user ‘kimmonismus,’ who agreed that text-to-video models just aren’t there yet.

“Text to Video has already developed pretty well so far. However, the models are clearly not a simulation of the world. Calculating physical laws does not work, the architecture is not designed for this. So, we still need a few more breakthroughs.”

As we’ve seen with various different models that turn text into images or video, there’s still development needed when creating images of real people.

Stable Diffusion 3 Medium, a text-to-image/video model created by Stability AI, has struggled to accurately depict real people.

Instead, the model would create videos and images that belong in horror movies, including women with multiple limbs, grotesque figures with multiple hands and fingers, as well as uncanny depictions of human faces.

According to Stability AI, the new model “comprehends long and complex prompts involving spatial reasoning, compositional elements, actions, and styles.” Yet, it cannot reasonably generate an image of a person with an appropriate amount of fingers.

stable-diffusion-horrors — Image by Reddit user Another_one

Users on Reddit speculated that text-to-image and video models struggle to emulate human anatomy due to the potential “not safe for work” (NSFW) restrictions placed on them.

One user said, “It works fine as long as there are no humans in the picture. I think their improved NSFW filter for filtering training data decided anything humanoid is NSFW.”

Another user supports this statement by saying, “Believe it or not, heavily censoring a model also removes human anatomy.”

Share

Post

Share