Yesterday, OpenAI, the creator of ChatGPT, unveiled Sora, a new AI model designed to generate videos based on text prompts. This release has sparked a frenzy of excitement on the internet, with high-quality videos flooding Twitter feeds and other social media platforms. Notable figures such as Marques Brownlee, MrBeast, and Elon Musk have been drawn to join in on the action.
While the possibilities with Sora are exhilarating, there are still lingering questions surrounding the model, prompting a closer examination of what we know about Sora and its capabilities.
So, what exactly is Sora? If ChatGPT is OpenAI's chat-based model, then Sora is the company's AI model for generating "realistic and imaginative scenes based on text commands."
Essentially, Sora operates as a text-to-video process. Users provide prompts, and Sora generates videos claimed to be of high quality and lasting up to a minute in length.
The official explanation on the website and OpenAI's intentions for Sora state, "We are teaching AI to understand and simulate the physical world in motion, with the goal of training models to help people solve problems that require real-world interaction."
However, OpenAI acknowledges that Sora is not without its flaws in its current iteration. The model may struggle to accurately simulate the physics of complex scenes and comprehend specific instances of causality. It can also potentially mix up spatial details in prompts and encounter complexity not explicitly described.
The team also mentions that they are working on building a set of tools to help detect misleading content generated by Sora.
One of the burning questions on everyone's mind is whether Sora is available to the public. The answer is, for now, no. Altman shared that the text-to-video generation tool is currently only in the hands of a select few creators, with no clear timeline for its public release.