OpenAI's 'Sora' wows by transforming text prompts into video


ChatGPT-maker OpenAI previews its latest software protege ‘Sora’ – a generative AI that can take a short text description and turn it into a vividly realistic AI video clip – and the crowds are going wild.

Meet OpenAI’s Sora, a large scale AI model capable of transforming a users text prompt into 60-seconds of high fidelity video.

The Microsoft backed leader in AI technology announced the latest software development on Thursday. “Introducing Sora, our text-to-video model,” OpenAI posted on X.

ADVERTISEMENT

“Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions,” it said.

To provide an example of Sora’s capabilities, the company posted a detailed text prompt and followed by a bright, colorful, and lifelike video clip, Sora had artificially generated directly from it.

“Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes,” the prompt states.

Next, the video clip, looking like a movie set, visually takes the viewer through a birdseye view of a couple holding hands, wearing winter clothes, walking past shops on a snowy yet cherry blossom tree-lined street, a city in the distance.

Technically accurate, “we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios” OpenAI explained in its blog.

“Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world,” the company said.

Video generation models as world simulators

ADVERTISEMENT

Still in development, Sora is also capable of generating video from other prompt inputs besides text, such as pre-existing images or video.

“This capability enables Sora to perform a wide range of image and video editing tasks—creating perfectly looping video, animating static images, extending videos forwards or backwards in time,etc.,” OpenAI said.

Social media users couldn’t get enough the dozens of samples OpenAI supplied on X profile and blog.

“I am often critical of OpenAI but the Sora video previews have left me speechless,” said X user and AI software developer @BenjaminDEKR. “This is excellent, world-changing work,” he said.

Tech industry creative and X user @bilawalsidhu was also impressed. “OpenAI just dropped their Sora research paper. As expected, the video-to-video results are flipping spectacular," he posted along with a video-to-video generated sample.

Still, Sora currently exhibits numerous limitations as a simulator, the company warned.

“Sora does not accurately model the physics of many basic interactions, like glass shattering,” or interactions, such as eating food, can also can appear wonky.

Other “common failure modes” developers are working through include the spontaneous appearances of objects on landing pages or incoherencies in longer duration samples.

ADVERTISEMENT

OpenAI also stressed it would also make sure to address safety concerns before the software becomes available in other OpenAI products, such as ChatGPT.

To help do that, the company said has employed red team experts to "adversarially" test the model, focusing on “misinformation, hateful content, and bias.”