User blog:Zakawer2/ElevenLabs AI dubbing

ElevenLabs has recently launched a tool that allows users to use its AI voice technology to dub videos in seconds or minutes. You can check it out here (ElevenLabs subscription required).

Here's how I think it works:
 * First, it does its best job to mute the original dialogue (in an attempt to create a clean M&E track).
 * Then, it clones the spoken voices heard in the video.
 * After that, it uses speech recognition to detect the spoken dialogue.
 * And then, it machine-translates the spoken dialogue using AI-based neural translation methods.
 * Finally, it uses the AI-cloned versions of the original voices to replicate the speech patterns and intonations of the original video's voices to the best of its ability, while ensuring that the dubbed AI voices start and end at roughly the same time.

It is possible to denote the number of speakers heard in the video (between one and nine speakers), or let the AI figure that out automatically by itself.

The ElevenLabs dubbing technology does not produce voice-over translations (i.e. something vaguely similar to a dub, except that the original audio is still clearly audible underneath the dubbed voices). Instead, it always attempts to fully mute the original voices, even without full access to an official M&E track. It also always tries to replicate the original voices heard as well, rather than relying on voices that may sound entirely different.

Unlike a lot of professionally made lip-sync dubbings, this AI dubbing technology does not make an extensive effort to precisely match the exact lip movements of the original dialogue. It also relies on machine translation rather than professional manual translation, which can be more literal but is also potentially less creative and more vulnerable to translation errors and/or stuff getting lost in translation. And because it relies on speech recognition, sometimes it might mishear lines of spoken dialogue and translate them incorrectly as a result.

It supports a very large list of source languages, while the target languages for dubbing are the exact same languages also supported by the ElevenLabs Multilingual v2 model.

Files on one's device can be automatically dubbed, as well as videos from YouTube, TikTok, X, Vimeo and elsewhere on the Internet.

The results can vary, ranging from solid and high-quality dubs to cursed and/or bizarre ones.

The implications that this AI dubbing thingy can have on the dubbing industry worldwide in the future are massive, so it is only natural that we discuss this pressing issue (and potentially also how to handle AI-generated dubs in the future) here on this wiki.