Synthesia’s hyperrealistic deepfakes will soon have full bodies
“It’s very impressive. No one else is able to do that,” says Jack Saunders, a researcher at the University of Bath, who was not involved in Synthesia’s work.
The full-body avatars he previewed are very good, he says, despite small errors such as hands “slicing” into each other at times. But “chances are you’re not really going to be looking that close to notice it,” Saunders says.
Synthesia launched its first version of hyperrealistic AI avatars, also known as deepfakes, in April. These avatars use large language models to match expressions and tone of voice to the sentiment of spoken text. Diffusion models, as used in image- and video-generating AI systems, create the avatar’s look. However, the avatars in this generation appear only from the torso up, which can detract from the otherwise impressive realism.
To create the full-body avatars, Synthesia is building an even bigger AI model. Users will have to go into a studio to record their body movements.
But before these full-body avatars become available, the company is launching another version of AI avatars that have hands and can be filmed from multiple angles. Their predecessors were only available in portrait mode and were just visible from the front.
Other startups, such as Hour One, have launched similar avatars with hands. Synthesia’s version, which I got to test in a research preview and will be launched in late July, has slightly more realistic hand movements and lip-synching.
Crucially, the coming update also makes it far easier to create your own personalized avatar. The company’s previous custom AI avatars required users to go into a studio to record their face and voice over the span of a couple of hours, as I reported in April.
This time, I recorded the material needed in just 10 minutes in the Synthesia office, using a digital camera, a lapel mike, and a laptop. But an even more basic setup, such as a laptop camera, would do. And while previously I had to record my facial movements and voice separately, this time the data was collected at the same time. The process also includes reading a script expressing consent to being recorded in this way, and reading out a randomly generated security passcode.