Google researchers unite to create Vlogger


Six Google researchers have joined forces to create VLOGGER, a new artificial intelligence (AI) tool that can generate realistic talking heads.

With the goal of generating photorealistic videos of varying lengths, VLOGGER intends to depict a specific human talking, moving, and gesturing in a photorealistic fashion.

“Our framework, which we call VLOGGER, is a two-stage pipeline based on stochastic diffusion models to model the one-to-many mapping from speech to video,” the GitHub project supplement reads.

vlogger-graphic
Image by VLOGGER/Google

The initial network takes an audio waveform that’s used to create “body motion controls” responsible for gaze, facial expression, and pose.

The secondary network is known as a “temporal image-to-image translation model that extends large image diffusion models, taking the predicted body controls to generate the corresponding frames.”

The AI model aims to function as an “embodied conversational agent” with audio and animated visuals that include realistic and complex facial expressions while demonstrating a high level of body motion.

VLOGGER is supposedly designed to “support natural conversations with a human user,” with the new tool able to be used as a solution for presentations, education, narration, and more.

This new model can function as an artificial intelligence agent that you can talk to while also being able to edit videos.

“One of the main applications of our model is on editing existing videos…in this case, VLOGGER takes a video and changes the expression of the subject by e.g. closing the mouth or the eyes,” the project reads.

Another aspect of VLOGGER is video translation, which takes an existing video in a specific language and alters the lip and facial expressions to fit the new audio.

Users can change an existing video to fit the dynamics of a different language. One of the examples that the Google researchers provide is an original video in English, translated into Spanish.

Through VLOGGER, it appears that Google researchers are testing the boundaries of artificial intelligence in video while also redefining how users should use these image-to-video services.