VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Google’s desperate attempt to showcase “something”

Mandar Karhade, MD. PhD.
Towards AI
Published in
6 min readMar 21, 2024

--

The research initiative led by Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan, Nikos Kolotouros, Thiemo Alldieck, and Cristian Sminchisescu at Google Research introduces an innovative framework named VLOGGER. This novel system showcases the capacity to generate photorealistic and temporally coherent videos of humans talking and moving vividly, all from a single input image and audio sample. Sure, it is innovative, but I am really struggling…

--

--