If we’re ever going to effectively communicate with robots, they’ve got to get better at lip-syncing.
Intricate mouth movements are vital for human connection, especially in loud environments—in such settings, we gaze at a speaker’s mouth up to half the time.
This means that they’re a key feature for robots we can comfortably chat with, but researchers have long struggled to create robots with lips that can skillfully synchronize with audio. Bots have mechanical constraints that limit the range of motion and speed of lip movements, for example, and they tend to lag after commands.
To overcome this hurdle, researchers from Columbia University in New York harnessed artificial intelligence models inspired by the human brain, known as neural networks, enabling the humanoid bot to make smooth mouth motions that sync up with a mix of words.
“The capability to form complex lip shapes … enhances overall more detailed speech synchronization, providing more lifelike interactions that mitigate some of the risks of the uncanny valley effect,” according to a new Science Robotics paper.
The team designed a human-like robot face with “skin” made of soft silicone. It has magnetic connectors that allow for 10 degrees of freedom, making all sorts of lip movements possible.
To train the models powering this bot, the team fed them recordings of their robot making various lip movements, like those associated with rounded vowels. Then, they incorporated AI-generated videos of “ideal” lip movements for certain sentences into their models.
The system allows a robot’s lips to form the shapes associated with 24 consonants and 16 vowels, the researchers reported in the paper.
Read more: “Deepfake Luke Skywalker Should Scare Us”
Using these “ideal” AI videos as a baseline, they compared their new system with existing techniques used to shape robot lip movements. Among all the methods, theirs had the least mismatch compared with mouth movements from the AI videos. The bot was also able to convincingly utter 10 different languages, including Korean, French, and Arabic, with varying phonetic structures, and it even did a bit of karaoke.
There’s still plenty of room for improvement, the researchers acknowledged, including incorporating more training data and adding more physical degrees of freedom. In the future, they think that their tool could be used in education and in caring for older adults experiencing cognitive decline, as it could help us connect with robots “on a human level.”
But they also caution that heightened emotional connection with robots could “be exploited to gain trust from unsuspecting users, especially children and the elderly,” and that designers should implement safeguards against these risks.
“The ability to create physical machines that are capable of connecting with humans at an emotional level is maturing rapidly,” the authors wrote. “The robots presented here are still far from natural, yet one step closer to crossing the uncanny valley.” ![]()
Enjoying Nautilus? Subscribe to our free newsletter.
Lead image: Yuhang Hu
