Deepfakes Are Getting Harder to Spot Than Ever with VASA-1: Artificial Intelligence Trends

Microsoft just introduced a new AI model that can generate hyper-realistic videos of talking human faces. With it, deepfakes are getting harder to spot.

This is what Microsoft said on its announcement page:

“We introduce VASA, a framework for generating lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512×512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.”

Advertisement
Cimplifi

And it’s absolutely mind-blowing, as several of the clips on the announcement page illustrate. Some are roughly a one-minute audio clip, others are roughly 15 seconds. Microsoft also notes that “all portrait images on this page are virtual, non-existing identities generated by StyleGAN2 or DALL·E-3 (except for Mona Lisa).” That’s right: Mona Lisa talks – actually raps – using VASA-1.

I learned about it this morning from Greg Bufithis, whose post about it was simply titled “We are doomed 😐”.

Fortunately, Microsoft says: “Currently, the videos generated by this method still contain identifiable artifacts, and the numerical analysis shows that there’s still a gap to achieve the authenticity of real videos”, though they don’t explain what they are. And they added: “We have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.” We’ll see how long that lasts.

Greg notes that use cases that have been suggested include: “reviving dead actors for new movie roles” and “bringing back dead relatives to ‘chat’ with you”. Regardless, deepfakes are getting harder to spot than ever.

Advertisement
Relativity

So, what do you think? Do you agree that deepfakes are getting harder to spot than ever? Please share any comments you might have or if you’d like to know more about a particular topic.

Image Copyright © Microsoft

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.


Discover more from eDiscovery Today by Doug Austin

Subscribe to get the latest posts sent to your email.

2 comments

Leave a Reply