Your next favorite YouTuber may not even speak the same language as you. Following the recent launch of its auto-dubbing feature, which uses AI to translate audio, YouTube is now experimenting with sophisticated lip-sync technology to ensure face movements better match the translated language.
Modifying Pixels to Match Speech
Buddhika Kottahachchi, YouTube’s Product Lead for Autodubbing, explained that the streaming platform had to solve the complex problem of how to “modify the pixels on the screen to match the translated speech.” This involved developing specialized tools to deeply understand a speaker’s facial movements, lip shapes, teeth, and overall posture.
The feature is currently in an early testing phase with a select group of creators. In its initial testing, YouTube reported that the technology works best for Full HD videos, with its performance declining for 4K content, though this is expected to improve before a public launch. YouTube briefly teased the feature at a September event but has not yet announced a formal launch date for the public.
Language Expansion and Disclosures
An early version of the tool currently supports translations into English, French, German, Spanish, and Portuguese. The service’s ultimate goal is to expand the lip syncing feature across all languages covered by YouTube’s existing auto-dubbing system. This extensive list includes languages such as Bengali, Dutch, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Malayalam, Polish, Punjabi, Romanian, Russian, Tamil, Telugu, Turkish, Ukrainian, and Vietnamese.
Kottahachchi’s comments suggest that the feature may become a paid option for creators to access new global markets, as he noted, “We are not ready to make any broad statements about how broadly we will make it available, but we do want to make it available to more creators and understand the compute constraints and the quality.” Regarding transparency, YouTube has a plan for AI disclosures: it will state “both the audio and video in this video have been synthetically created or altered” in the description box to help viewers identify when the lip-syncing technology is active. The original AI auto-dubbing feature launched on YouTube last December and had already been used on over 60 million videos by the end of August 2025.
What The Author Thinks
Mosseri is correctly identifying that the most critical defense against the proliferation of convincing AI-generated content is not technology—which is easily defeated—but fundamental digital literacy and human skepticism. By placing the onus on parents and society to teach children to actively question the source and incentives behind every piece of online content, he is advocating for a profound, and necessary, cultural shift. The platform’s attempt to use crowdsourced fact-checking for AI labeling is a practical recognition that human consensus, however messy, is the only sustainable guardrail against a technology that is designed to make objective truth seamless and cheap to produce.
Featured image credit: Collabstr via Unsplash
For more stories like it, click the +Follow button at the top of this page to follow us.