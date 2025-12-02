Kling AI, the AI-powered creative platform, announced the launch of its Video O1 and Image O1 models. The models are based on a unified architecture that integrates generation, editing, and understanding into one platform. By eliminating the need to switch between multiple models and tools, the rollout enables a seamless, end-to-end workflow for the creative industry.

Powered by advanced multimodal visual language, the new models bridge the gap between text semantics and visual signals. This allows creators to move from ideation to generation, and from generation to modification, all in one place.

Kling Video O1 Model: Full Creative Control Through Multimodal Input

The Kling Video O1 model is the first in the video generation field to integrate a wide range of tasks, including reference to video, text-to-video, start & end frames generation, video content editing, modifications, transformations, restyling, and camera extension — all into one unified model.

Leveraging the deep semantic understanding of the unified model, everything that users upload, whether it’s an image, video, subject, or text, is interpreted as a prompt by the video O1 model. Breaking the boundaries between modalities, the model can comprehensively understand a photo, video, or subject (a character from different perspectives) and generate precise details for the video.

Its key capabilities include:

Prompt-based Post-Production Editing : The video O1 model turns post-production editing into a simple conversation. For example, users can type in prompts like “remove bystanders,” “change daytime to dusk,” or “swap the main character’s outfit” and the model understands visual context to execute these tasks precisely, ranging from changing the subject, background, video style, object colors, weather, location and time in the video.

: The video O1 model turns post-production editing into a simple conversation. For example, users can type in prompts like “remove bystanders,” “change daytime to dusk,” or “swap the main character’s outfit” and the model understands visual context to execute these tasks precisely, ranging from changing the subject, background, video style, object colors, weather, location and time in the video. Multi-Subject Consistency: Addressing a long-standing industry challenge, Video O1 maintains consistency across characters, props, and scenes. It acts like a human director, ensuring that as the camera moves or the plot develops, the visual elements remain consistent, even in complex group scenes or interactive settings.

Addressing a long-standing industry challenge, Video O1 maintains consistency across characters, props, and scenes. It acts like a human director, ensuring that as the camera moves or the plot develops, the visual elements remain consistent, even in complex group scenes or interactive settings. Image/Element/Video Reference for Generation: The model supports referencing uploaded assets (characters, props, scenes) to generate new content. It also supports uploading referencing video content to extend shots, and apply specific camera movements to new generations.

The model supports referencing uploaded assets (characters, props, scenes) to generate new content. It also supports uploading referencing video content to extend shots, and apply specific camera movements to new generations. More Creativity Packed in One Generation: The video O1 model supports a combination of different tasks in one prompt, such as “adding a subject while modifying the background in the video”, or “changing the style while using elements”, allowing users to bring multiple creative ideas to life all in one prompt.

The model now supports the generation of videos that last between 3 to 10 seconds. In an internal test where professionals are invited to blind test and vote preference for each pair of comparison, the evaluation results show that Kling Video O1 Model excels in overall performance and across multiple specific dimensions compared with industry-leading video generation models.

On image reference video generation, Kling Video O1 Model achieves a performance win ratio of 247% compared to Google Veo 3.1 Fast’s Ingredients to Video. On instruction transformation, Kling Video O1 Model achieves a performance win ratio of 230% compared to Runway Aleph.

Kling Image O1 Model: Editing at Your Will

Also launching is the Image O1 model, featuring a robust knowledge base and multimodal reasoning capabilities. Image O1 interprets user intention with high fidelity, capable of processing up to ten reference images to rearrange elements, transfer styles, or extract features.

Based on enhanced prompt adherence capabilities, the model enables image editing with precision. Without any prior professional editing skills, users can add, remove, or modify objects and characters in the image while preserving the original style, lighting, and texture.

The model is designed for complex workflows, such as generating realistic 3D renderings from interior design sketches or adjusting lighting based on directional arrows. Crucially, it maintains subject consistency across different generated images—an essential feature for IP character design and comic creation.

On multi-image reference task, compared with Nano Banana and Dreamina Image 4.0 on the same dataset, Image O1 excelled in overall quality and sub-dimensions, achieving a 174% win rate vs. Nano Banana, and 123% vs. Dreamina Image 4.0, placing it at the industry forefront.

Empowering the Creative Industry

The Video O1 and Image O1 models are poised to streamline production workflows across the creative industry, benefiting filmmakers, studios, advertisers, fashion designers, and influencers alike, whether for creating original narratives from scratch or reshaping existing stories.

For filmmaking, the models can boost post-production efficiency by enabling filmmakers to tackle post-production tasks in one interface with natural language prompt. By uploading reference elements/images, filmmakers can lock in characters and props, ensuring visual consistency and continuity across multiple generated scenes.

For fashion designers, the models reduce the time and costs associated with physical shoots. By leveraging the tool to generate photoshoots from multiple angles and against various backgrounds, they can create a virtual runway for product showcase. Additionally, e-commerce advertisers can leverage these models to rapidly produce visual assets or virtual try-ons simply by uploading product and model images.

For more information on the Kling O1 Video Model, please read Kling O1 Video Release Note: https://app.klingai.com/global/release-notes/vaxrndo66h?type=dialog

For more on the Kling O1 Image Model, visit: https://app.klingai.com/global/quickstart/klingai-image-o1-user-guide