Skip to main content

Create Cinematic AI Videos with Sora 2 in Kaiber Superstudio

A guide to creating and prompting with OpenAI's second-generation video model Sora 2 in Kaiber Superstudio.

Updated this week

Whether you are making short films, product demos, social content, music videos, concept reels, dialogue-driven scenes, if you can describe it, Sora 2 can generate it. Create clips of up to 20 seconds with synchronized dialogue, sound effects, ambient noise, and music baked into the video.

Sora 2 features

  • Longer clip length. Up to 20 seconds per clip.

  • Text to video. Describe a scene and Sora 2 generates it.

  • Image to video. Upload an image as a visual reference, your text prompt defines the scene.

  • Multi-shot videos. Add multiple shots to one generation, each with its own prompt. Build short sequences without generating each clip separately.

  • Native audio. Generates dialogue with natural lip movement, background sound, ambient noise, and music. Request specific audio elements in the prompt.

  • Realistic physics. Improved gravity, weight, motion, collisions, and cause-and-effect. Objects have weight, light behaves properly, and motion looks real.

Specs

  • Lengths: 4, 8, 12, 16, or 20 seconds

  • Aspect ratios: 16:9 or 9:16

  • Resolution: 720p

  • Access: Select Sora 2 from the Model Menu in the Create Video flow

What Sora 2 does well

Product videos

Upload an image of your product as a reference and let Sora 2 bring it to life. Describe the scene shot by shot, or just describe the concept and let the model take over. It handles lighting, reflections, and material textures well, so things like glass, metal, fabric, and packaging look convincing. Great for quick product demos, social ads, or hero shots without a full production setup.

Music video scenes

Sora 2's cinematic motion make it a strong fit for music video content. You can prompt for specific visual styles, camera movements, and moods that match the energy of a track. Think neon-lit cityscapes, slow-motion close-ups, dramatic lighting shifts, silhouette animations, or surreal dreamlike sequences. Prompt for cuts, montage, dynamic music video scenes to create a multi scene clip.

Cinematic scenes

This is where Sora 2 really shines. It handles realistic human motion, natural lighting, and physics-accurate environments better than most. Period dramas with candlelit interiors, drone shots over dramatic coastlines, documentary-style portraits with shallow depth of field, macro close-ups with natural light refraction.

Creative and experimental content

Sora 2 doesn't just do realism. It handles stylized and surreal content just as well. Stop-motion clay animation, papercraft worlds, underwater bioluminescent scenes, tilt-shift miniatures, retro VHS aesthetics, anime, silhouette animation. Describe the style you want in the prompt and the model adapts. Some of the most interesting results come from mixing realistic physics with impossible scenarios.

How to prompt Sora 2

Upload a clear image of a person, product or scene as a reference (optional)

Prompt for the type of video you want. You can be specific and describe the scene shot by shot or let Sora 2 call the shots.


The more specific you are, the closer the output matches your vision. Leaving some details open gives the model more freedom, which can lead to creative results.


Detailed prompts = control and consistency.
Lighter prompts = AI creative freedom.

A solid prompt includes:

  • Scene description. Setting, subject, what's happening.

  • Camera direction. Framing, angle, lens, movement.

  • Lighting and color. Time of day, light source, palette.

  • Style. Cinematic, documentary, anime, vintage etc.

  • Audio cues. Dialogue, ambient sound, music style.

You don't need all of these every time. But the more you include, the more the video will align with your vision.

Set the style early. "1970s film" or "handheld documentary" sets the tone for everything else.

Be specific. Instead of "a beautiful street," try "wet asphalt, zebra crosswalk, neon signs reflecting in puddles."

Time your actions in beats. "Character takes four steps to the window, pauses, and pulls the curtain" works better than "character walks across the room."

Prompting for audio and dialogue

Separate dialogue from the visual description. Keep spoken lines in their own section so the model can tell the difference between what's visual and what's spoken.

  • Keep dialogue proportionate to the length of your video. A 4-second clip fits one or two lines. Don't overdo it.

  • Label speakers consistently. "Detective" and "Suspect" throughout. Not "the man" in one line and "Detective" in the next.

  • Add ambient sound cues. Even in a quiet shot, something like "distant traffic hum" adds to the scene.

Example:

A cramped, windowless room. A single bare bulb hangs from the ceiling, pooling light onto a metal table. The Detective sits with sharp, unblinking eyes. Across from him, the Suspect slouches, cigarette smoke curling toward the ceiling.

Dialogue:

- Detective: "You're lying. I can hear it in your silence."

- Suspect: "Or maybe I'm just tired of talking."

Using a reference image

Upload a single image and Sora 2 uses it as a visual reference. This locks in things like character appearance, clothing, environment, and aesthetic without the image needing to be the first frame.

Tip: Don't have a reference image? Use the Create Image Flow to generate one with Nano Banana Pro or Nano Banana 2.

Creating multi-shot videos

Add multiple shots to one generation, each with its own prompt. Useful for short narratives, product demos, or social content.

  • Set the style and then lay out each shot in your prompt.

  • Use an image as a character or product reference (optional).

  • Keep each shot focused. One camera setup, one action per shot.

  • Name characters consistently. "The Traveler" in shot 1 should stay "the Traveler" in shot 2.

Multi-shot prompt formula

Style: [visual style], [lighting], [depth of field], [film stock or format]

Shot 1: [Camera framing + angle]. [Subject description] [action]. [Setting/environment detail]. [Camera movement].

Shot 2: [Camera framing + angle]. [Subject] [new action]. [Key visual detail]. [Camera movement].

Shot 3: [Camera framing + angle]. [Subject] [action]. [Setting change or reveal]. [Camera movement].

Background sound: [Ambient sound], [secondary sound], [atmosphere].

Example prompt using the formula:

Style: Cinematic, warm golden hour lighting, shallow depth of field, shot on 35mm film.

Shot 1: Wide establishing shot. A woman in a red dress walks along a quiet cobblestone street lined with old European buildings. Late afternoon sun casts long shadows. Camera tracks slowly alongside her.

Shot 2: Medium close-up. She stops at a flower stand, picks up a bunch of sunflowers and smells them. Soft smile. The vendor nods in the background.

Shot 3: Over-the-shoulder shot from behind her. The street opens up to reveal a wide piazza with a fountain. Camera holds steady as she walks into the distance.

Background sound: Distant church bells, faint footsteps on stone, ambient city hum.

FAQs

What's the difference between Sora 2 and Veo 3.1?

Both produce high-quality video with native audio. Sora 2 tends toward more natural human motion and physics, and is strong with cinematic single-shot clips. Sora 2 can create longer video clips of up to 20 seconds. Veo 3.1 leans toward controllable multi-shot sequences with start and end frame support. Both are in Kaiber Superstudio. Try both and see which fits your project.

Can Sora 2 generate audio?

Yes. Dialogue, ambient sound, effects, and music. Include audio cues in the prompt.

What's the longest video Sora 2 can generate?

Sora 2 can generate up to 20 seconds. Available lengths are 4, 8, 12, 16, and 20 seconds.

Can I use a reference image with Sora 2?

Yes. Upload a single image as a visual reference. It doesn't need to be the first frame.


Can I create multi-shot videos with Sora 2?

Yes. Add multiple shots to one generation, each with its own prompt.

Did this answer your question?