- Native synchronized video and audio generation
- Five multimodal inputs: text, image, video, mask, and audio
- 1080p output with 32 FPS cinematic motion
- Region-level inpainting for editing specific parts of a video
- Character reference support for consistency across shots
- Multilingual lip-sync and speech generation
- Beat-aware camera cuts for music-driven clips
- REST API and webhook access via APIMart