Ana Ryu/Visual China Group/Getty Images
A Microsoft sign is seen at the company's headquarters on March 19, 2023 in Seattle, Washington.
New York
CNN
—
The Mona Lisa can now do more than just smile, thanks to new AI technology from Microsoft.
Last week, Microsoft researchers detailed a new AI model they've developed that can take a still image of a face and an audio clip of someone speaking and automatically create a realistic-looking video of that person speaking. The videos—which can be created from real-life faces, as well as caricatures or artwork—are complete with convincing lip syncs and natural facial and head movements.
In one experimental video, the researchers showed how they animated the Mona Lisa to recite a comedic rap song by actress Anne Hathaway.
The outputs from the AI model are called Vasa-1, both amusing and somewhat contradictory in their reality. Microsoft said the technology could be used for education, “improving accessibility for individuals with communication challenges,” or perhaps to create virtual companions for humans. But it's also easy to see how the tool could be abused and used to impersonate real people.
It's a concern that goes beyond Microsoft: As more tools emerge to create compelling AI-generated images, videos, and audio clips, Experts are concerned And their misuse can lead to new forms of misinformation. Some also worry that technology may further disrupt creative industries, from films to advertising.
At this time, Microsoft said it does not plan to release the VASA-1 model to the public immediately. The move is similar to how Microsoft partner OpenAI addresses surrounding concerns Video tool generated by artificial intelligenceSora: OpenAI teased Sora in February, but so far has only made it available to some professional users and cybersecurity professors for testing purposes.
“We oppose any behavior to create misleading or harmful content to real people,” Microsoft researchers said in a blog post. But they added that the company “has no plans to release” the product publicly “until we ensure the technology is used responsibly and in accordance with appropriate regulations.”
The researchers said Microsoft's new AI model was trained on several videos of people's faces while speaking, and is designed to recognize natural facial and head movements, including “lip movement, (non-lip) expression, eye gaze, and blinking, among others.” other things”. The result is more realistic video when the VASA-1 pans a still image.
For example, in one test video set to a clip of someone appearing agitated, apparently while playing video games, the speaking face had furrowed brows and pursed lips.
The AI tool can also be directed to produce a video where the subject looks in a certain direction or expresses a certain emotion.
When looking closely, there are still signs that the videos are machine-generated, such as infrequent blinking and exaggerated eyebrow movements. But Microsoft said it believes its model “significantly outperforms” other similar tools and “paves the way for real-time interactions with lifelike avatars that mimic human conversational behaviors.”