Google has drastically improved its speech recognition technology, the company’s Speech Team has announced. In a new blog post, the team members details how they improved the technology to make Google voice analysis both more accurate and faster.
Essentially, the team has refined the Deep Neural Networks (DNNs) that replaced the 30-year-old Gaussian Mixture Model (GMM) back in 2012. Now, the team has implemented specialized extensions of more accurate recurrent neural networks (RNNs) called sequence dsicriminative training techniques and Connectionist Temporal Classification (CTC).
What does it all mean? The short answer is that RNNs have a built-in feedback loop that lets them place each sound being analyzed into a context, so that rather than trying to identify any one sound (such as the first “m” in “museum”) in isolation from all the others, they can connect the dots right away to see how that sound fits into the larger word being spoken.
(The long answer can be found in the team’s blog post.)
With the new system in place, voice command and search systems in the Google app and on Android systems will benefit from improved accuracy and speed. That could prove particularly advantageous going forward as such software is incorporated into the growing Internet of Things, in which voice command technology could play a crucial role, and fundamentally change the infrastructure of computing in the process.