SORA AI’s Problems [And Solutions]

In a Reddit thread from three years ago, a user sparked a discussion about the future of AI imagery, contemplating a time when photorealistic videos could be created from just a few sentences. Despite being downvoted and dismissed, fast forward to today, and AI-generated videos are a reality. OpenAI recently introduced SORA, a tool that can transform text into photorealistic video clips within minutes.

SOAR is not only capable of creating videos from scratch, but it can also combine separate videos into one scene, animate still images, modify non-AI videos seamlessly, and more. This article will explore the capabilities of SORA, how it came to be, its limitations, and the potential implications for society.

What Can SORA Do?

With SORA, users can type in a short text prompt, and within minutes, the AI generates a 60-second video clip. The videos produced by SORA are visually impressive, with coherent and stable objects. SORA can also animate images such as cartoons and seamlessly combine two videos into one scene. It even has the ability to generate different camera angles for a single scene using a single prompt.

How Did SORA Come to Be?

SORA is based on similar technology to OpenAI's GPT-3, which was built off Google's Transformer architecture. In 2017, Google developed the Transformer, which not only improved text generation but also demonstrated the ability to identify patterns in videos. OpenAI built upon Google's technology to create GPT-3, and later, using insights from Google's modified Transformer, they developed SORA.

While the specifics of SORA's training data are not known, OpenAI partnered with Shutterstock to obtain a vast collection of copyright-free data for their AI to train on. The collaboration with Shutterstock hints at the training data used by SORA.

Limitations of SORA

Although SORA produces impressive results, it has some limitations. It struggles with distinguishing between left and right and faces difficulties with logical concepts and causal relationships. Some video outputs might still exhibit unrealistic elements or failures, but even these failures can have a surreal appeal.

Additionally, generating videos with SORA requires significant computational power, making it less accessible for everyday use. However, with advancements in technology, it is likely that these limitations will diminish over time.

Implications and Solutions

The advent of SORA and similar AI-generated video tools raises several implications for society. One potential consequence is the reduced need for stock footage, as AI can replicate and generate various scenes easily. On the darker side, AI-generated videos can contribute to misinformation, fake news, and manipulated evidence.

To address these challenges, robust systems for authenticating and verifying AI-generated videos need to be developed. The ctpa standard, a technical marker that embeds metadata into media, is a step in the right direction. However, further advancements are necessary, such as automatic AI detection built into social media platforms to prevent the spread of AI-generated videos.

Keywords

AI imagery, SORA, text-to-video, photorealistic videos, limitations, implications, misinformation, authenticating, automatic AI detection

FAQ

What is SORA? SORA is a tool developed by OpenAI that can transform text into photorealistic video clips within minutes.
Can SORA combine separate videos into one scene? Yes, SORA has the capability to combine separate videos into one scene and even generate different camera angles using a single prompt.
What are the limitations of SORA? SORA struggles with distinguishing left from right and faces challenges with logical concepts and causal relationships. It also requires significant computational power.
How can the authenticity of AI-generated videos be verified? The ctpa standard, which embeds metadata into media, is being adopted to verify the origin of videos. However, more advanced systems for detecting AI-generated videos are still needed.