Alibaba gives portraits a voice

The Alibaba Group has created a portrait video generation framework called EMO that allows individuals to transform portraits into vocal avatar videos with artificial intelligence (AI).

On Tuesday 27th, an organization within the Alibaba Group, the Institute for Intelligent Computing, released a paper titled ‘EMO: Emote Portrait Alive – Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions.’

This paper outlines a new technology that allows users to create talking and singing avatars from still portraits.

Users can input a single reference photo alongside vocal audio (either singing or talking), and EMO can generate vocal avatar videos with realistic facial expressions and different head poses.

The organization claims that “we can generate videos with any duration, depending on the length of input audio,” potentially a nod to OpenAIs Sora, which can only create videos up to 60 seconds.

Videos generated by EMO have graced platforms such as X, with videos surfacing across the platform. One video featured one of Sora’s digitally generated characters singing a song by Dua Lipa.

The software is strikingly realistic and allows you to transform any picture of a person, dead or alive, into a talking or singing character with a range of facial expressions and head movements.

Another video showcases a young Leonardo DiCaprio singing and rapping to the song Godzilla by Eminem. Although these images are scarily uncanny, companies like Alibaba continue to demonstrate the wild abilities of AI.

Many more incredible videos have surfaced showing the EMO’s capabilities. One video shows an AI generated image of Mona Lisa singing Flowers by Miley Cyrus.

This new technology comes just after the reveal of OpenAIs Sora, a large-scale AI model capable of transforming a user's text prompt into 60 seconds of high-fidelity video.

OpenAI teased the software by providing a range of examples to demonstrate the power of Sora.

However, unlike Alibaba’s new software, Sora can only produce short, silent films that feature characters devoid of dialogue.

Big Tech threatens to shatter Hollywood as it continues to produce software like EMO and Sora.

Although EMO and Sora have the potential to democratize creativity, they could also threaten the livelihoods of individuals in the creative industry while raising concerns surrounding copyright infringement, fair use, and the proliferation of deepfake videos.

Leave a Reply

Your email address will not be published. Required fields are markedmarked