“…Baidu hasn’t named the voice recognition program that was so thoroughly fooled by its AI, and it’s possible that the state of the art in voice recognition is still far enough ahead of voice reproduction that this is not yet a serious concern.”
Baidu has announced a new AI system that can mimic a subject’s voice after training itself on less than a minute of audio snippets.
It won’t necessarily make a perfect copy — in Digital Trends’ assessment, the synthetic voice “doesn’t sound completely convincing” — but it is good enough to spoof a voice recognition system over 95 percent of the time, after training on 10 five-second audio snippets of a subject’s speech, according to Baidu. And with more audio to train on, in theory a given voice clone should only get more convincing.
The AI system, based on Baidu’s Deep Voice text-to-speech platform, points to a troubling new vulnerability in voice-based authentication systems, though Baidu hasn’t named the voice recognition program that was so thoroughly fooled by its AI, and it’s possible that the state of the art in voice recognition – and presentation attack detection software, for that matter – is still far enough ahead of voice reproduction that this is not yet a serious concern.
Of course, Baidu isn’t pitching the technology as a powerful new tool for fraudsters. But some of the potential applications offered by a Baidu spokesperson to Digital Trends still sound like something out of Black Mirror: “For example, a mom can easily configure an audiobook reader with her own voice,” the representative said.
Baidu has posted audio samples of its AI speech cloning in action online, so any readers who are excited — or concerned — about the technology can hear it for themselves.
Sources: Digital Trends, The Download