OpenAI Released Sora 2: Text-to-Video Gets Real

OpenAI just released Sora 2, their text-to-video generation model that’s been in development for over a year since the original February 2024 announcement.

This isn’t a research preview anymore. It’s a full product launch with a standalone app, social feed, and a clear signal that OpenAI thinks AI-generated video is ready for everyday use.

What Sora 2 can do
#

The core capability is simple: describe what you want to see, and Sora generates a video. But the improvements from version 1 to version 2 are substantial:

Video quality and realism
#

Better physics: Complex movements like gymnastics routines, dance sequences, and intricate anime battles look more natural
Improved object permanence: Things stay consistent as the camera moves or objects interact
Higher fidelity: More realistic lighting, textures, and motion

Audio generation
#

This is the big new feature. Sora 2 doesn’t just create video, it generates synchronized audio:

Dialogue: Characters can speak with lip-sync
Sound effects: Footsteps, ambient noise, environmental sounds
Music: Background scores that match the scene

Length and control
#

Videos up to 20 seconds (or longer with storyboard mode)
Remix existing videos: Upload a video and have Sora modify it
Storyboard mode: Chain multiple clips together for extended sequences
Loop creation: Generate seamless video loops

Where you can use it
#

Sora.com
#

New standalone web platform where you create and explore videos. Three subscription tiers:

Free: 500 credits per month, watermarked videos
Plus ($20/month): 5,000 credits, some non-watermarked downloads
Pro ($200/month): 10,000 credits, priority generation, higher resolution

iOS app
#

Native mobile app for creating on your phone. Android coming later.

ChatGPT integration
#

Generate videos directly in ChatGPT Plus and Pro conversations.

The Sora Feed
#

Here’s where it gets interesting. OpenAI built a social feed into Sora where users can:

Discover videos other people have created
Remix and extend existing creations
Follow creators and see trending content
Steer the algorithm based on your preferences

OpenAI’s philosophy document explains this as “creating a space to inspire creative participation, not passive consumption.”

They’re explicitly positioning it as different from TikTok or Instagram Reels. The feed is designed to show you what’s possible and encourage you to create, not just scroll.

My take: This is smart. Generative video is hard to understand in the abstract. Seeing what real people make helps you grasp the capabilities and limitations much faster than any feature list.

But it also means OpenAI is building a content platform, not just a tool. That comes with all the moderation, algorithmic, and social dynamics that make traditional platforms challenging.

Safety and guardrails
#

The system card details the extensive safety work OpenAI did:

Content restrictions
#

No photorealistic people uploads: You can’t upload videos or images of real people to prevent deepfakes
Minor protection: Strict policies against generating content involving minors
Violence and explicit content: Filters for graphic, sexual, or violent imagery
Public figures: Restrictions on generating videos of identifiable celebrities or politicians

Technical safeguards
#

C2PA metadata: All videos include digital provenance information showing they were AI-generated
Watermarking: Free tier videos are watermarked; paid tiers can remove it but metadata remains
Red teaming: Extensive testing with external experts to find edge cases and failure modes

Moderation approach
#

Automated filters: Content screening before generation and after completion
Human review: Flagged content reviewed by moderation teams
User reporting: Community-driven flagging system

My take: The inability to upload photos of people is a blunt but necessary tool. Deepfakes are a real problem, and OpenAI is taking a conservative approach rather than trying to solve it with detection alone.

The trade-off is that legitimate use cases (animating family photos, creating personalized content) are blocked. That will frustrate some users, but it’s probably the right call for a public release.

What’s impressive technically
#

Generating coherent video is much harder than generating images. Video has to maintain:

Temporal consistency: Objects stay the same across frames
Physical plausibility: Movement follows real-world physics
Camera motion: Perspective shifts need to make sense
Audio synchronization: Sound has to match what’s happening visually

Sora 2 handles all of this reasonably well based on the example videos OpenAI shared. Not perfectly, but well enough to be useful.

The fact that it generates synchronized audio alongside video is particularly notable. Most text-to-video systems generate silent clips and leave audio as a separate step.

What’s still limited
#

From the documentation and examples:

20-second limit feels short for many creative projects
Slow generation: Videos can take minutes to create
Prompt sensitivity: Small wording changes can produce very different results
Cost: Pro tier at $200/month is expensive for casual use
Artifacts: Videos still have telltale signs of being AI-generated

What this means for creators
#

Immediate use cases:

Rapid prototyping: Visualize concepts quickly for pitches or storyboards
Stock footage alternative: Generate specific scenes that are hard or expensive to shoot
Social media content: Short-form video creation without filming
Concept art: Visualize scenes for larger projects

Limitations:

Quality isn’t production-ready for most professional video work
Lack of control: Can’t specify exact camera angles, timing, or detailed actions
Artifacts: AI-generated look is still visible in most outputs

My take: This is a tool for iteration and exploration, not for final output. Think of it like sketching versus finished illustration. It’s fast and flexible for testing ideas, but you probably won’t ship the raw Sora output for serious projects.

That said, the pace of improvement is striking. Compare this to what was possible two years ago, and it’s clear the gap is closing quickly.

The bigger questions
#

What happens to traditional video production? Stock footage companies, small video agencies, and certain types of commercial work face real disruption.

Who owns AI-generated content? The legal framework is still unclear. OpenAI grants commercial rights, but derivative work and training data questions remain.

How do we verify authenticity? When anyone can generate realistic video, how do we know what’s real? C2PA metadata helps but isn’t a complete solution.

What about consent and likeness? Even with restrictions on uploading photos, models are trained on video data. Who consented to being in that training set?

What comes next
#

OpenAI positioned this as the beginning, not the endpoint:

Longer videos: Storyboard mode is the first step toward extended content
Better control: More precise editing and direction capabilities
API access: Developers will eventually be able to build on Sora
Integration: Expect Sora to show up in more OpenAI products

The speed of improvement is what strikes me most. From research preview to shipping product in under two years. From silent videos to synchronized audio. From limited beta to public iOS app.

This isn’t the future of video. It’s the present. And it’s available now if you’re willing to pay $20/month.

Try it yourself: Head to sora.com to create your first video, or download the iOS app. Read the full system card for technical details and safety measures.

What Sora 2 can do#

Video quality and realism#

Audio generation#

Length and control#

Where you can use it#

Sora.com#

iOS app#

ChatGPT integration#

The Sora Feed#

Safety and guardrails#

Content restrictions#

Technical safeguards#

Moderation approach#

What’s impressive technically#

What’s still limited#

What this means for creators#

The bigger questions#

What comes next#

Related