OpenAI released Sora 2 today, the second generation of its video and audio generation model. The system represents an advancement of the Sora model first introduced in February 2024 and is intended to offer improved physical accuracy, realism, and control.

With the second version of Sora, OpenAI is primarily countering Google, which recently landed a hit with the video/audio model Veo 3. Video model startups like Black Forest Labs and Runway will likely be watching very closely to see what Sam Altman’s company is now bringing to market.

Technical Capabilities

According to OpenAI, Sora 2 can depict complex movement sequences such as Olympic gymnastics exercises, backflips on paddleboards, or triple axel jumps with realistic physics. A key difference from earlier video generation models is that the system better accounts for physical laws. While earlier models spontaneously deformed or teleported objects to fulfill a text instruction, Sora 2 is said to be able to model realistic failures – for example, a basketball bounces off the backboard instead of spontaneously landing in the basket.

The model features expanded control capabilities and can follow complex instructions across multiple settings while maintaining consistent world state. It supports various styles, including realistic, cinematic, and anime representations.

As an integrated video-audio system, Sora 2 generates background noises, speech, and sound effects. A new “Cameo” feature allows real people, animals, or objects to be inserted into generated scenes after a brief video recording, reproducing appearance and voice. In a demo video, OpenAI reproduced CEO Sam Altman as a showcase.

Limitations

OpenAI acknowledges that the model is “far from perfect and makes many mistakes.” The release is understood as an intermediate step on the path to comprehensive world simulators.

Availability and Access

Sora 2 is being rolled out via a new iOS app, initially available in the US and Canada. Access is granted gradually through an invitation system. In parallel, the model is accessible via sora.com. Usage is initially free with generous limits, which however depend on available computing resources. ChatGPT Pro users receive access to an experimental “Sora 2 Pro” model. An API release is planned.

Safety Measures

OpenAI has implemented various safety mechanisms:

Control over likenesses : Users decide for themselves who may use their Cameo recordings and can revoke access or remove videos at any time

: Users decide for themselves who may use their Cameo recordings and can revoke access or remove videos at any time Protection of minors : Default limits for daily visible generations and stricter permissions for Cameos

: Default limits for daily visible generations and stricter permissions for Cameos Parental controls : Through ChatGPT, parents can set scroll limits, disable algorithm personalization, and manage direct message settings

: Through ChatGPT, parents can set scroll limits, disable algorithm personalization, and manage direct message settings Feed algorithm : Language-controlled recommendation algorithms that are not optimized for dwell time but for creative use

: Language-controlled recommendation algorithms that are not optimized for dwell time but for creative use Moderation: Scaling of human moderation teams to review problematic content

The company is not planning ad-based monetization for the time being. The only planned business model is to offer users the option to pay for additional generations during high demand.