Open AI Unveils Shocking Video-Generating AI

Have you seen the videos created by Open AI's Sora? Just type in a simple prompt and you'll see a woman walking through a colorful Tokyo city center or a mammoth charging through a snowy meadow. Here are three of the most impressive features of Sora, a video-generating AI that has shocked many with its quality.

Consistency

The first and foremost feature is consistency. In the three photos below, taken from Sora's video, you can see a woman walking past a sign. The text on the sign is obscured by the woman, but the text on the sign that reappears after she walks past remains the same, because Sora is able to accurately recognize the sign and the woman's object and restore the obscured background.

Before the figure walks past the sign
While the figure walks past a sign
After the figure walks past the sign.

That's why Sora is able to maintain 3D consistency in dynamic footage, even when the camera is moving and rotating, which is remarkable considering that other AI video production tends to focus on videos where the camera is fixed straight ahead to minimize jagged objects, backgrounds, and noise.

3d consistency
Civit.ai (example AI-generated video of a frontal view)

High-level expressive

When I first saw Sora's footage, I thought, "This is real." The crisp high definition, realistic and intricate backgrounds, and the realism of the reflections in the water and on the glass windows made it hard to believe it was fake. Take a look at the video capture below and you'll see why.

HD quality
Representing reflections in water
Representing Reflections in Windows

Versatile

Sora is also incredibly versatile: it can generate videos based on text, like the one we've shown you, but it can also take in images or videos to create natural-looking videos. You can even create an extension of an existing video, or create a video that connects two different videos.

What's amazing about Sora is that it can create videos not only from live-action footage, but also from digital art, illustrations, watercolors, and even the digital world of the game. The accompanying description of the Minecraft video also suggests that a video-generating AI like Sora has the potential to be a simulator of sorts.

Image 2 Video
Video 2 Video
Create a center video that connects the video on the left with the video on the right
Create a video of a painter painting watercolors on a canvas
Create a visualization of the digital world of Minecraft.

Limitations and implications

While Sora's videos have shocked people with their unbelievable quality for an AI-generated video, there are still some awkwardnesses: it doesn't model basic interactions well, such as breaking glass, and it sometimes produces videos with unnatural object placement in videos with many objects.

It also requires a lot of computing resources to generate the video and a lot of data to train the model, which is a complex process, so it will be a while before it's ready for public use.

Hard-to-model interactions: breaking glass

But even considering these shortcomings, it's clear that Sora is a tremendous achievement for AI video creation, and its significance for the AI industry and others is that it gives us confidence in the direction we're headed. Sora answered the question that what we were doing was not wrong, and that the reason why AI videos were not good was because they lacked scale. So far, "Scale is all you need" has not been proven wrong.

The process of teaching video to AI
The process of creating AI video

When will Sora be available?

There's no specific information about Sora's release date yet, and Open AI says it's only available to select artists and others for internal feedback, with the goal of sharing early results and progress. With the recent increase in fake news, criminal cases, and ethical issues using AI, it seems like Sora needs to be prepared for these issues.

link to Open AI website for Sora
link to Open AI reserch website for Sora