Amazon has confirmed that it is using end-to-end models to improve the speech recognition capabilities of the Alexa platform. With an end-to-end model, the entire speech recognition process can be completed on the device itself, from speech input all the way through to output and transcription. That contrasts with previous versions of Alexa, which processed data in the cloud because the models were too big to install on a standalone device.
Those earlier iterations of Alexa broke speech down into multiple components, such as acoustics and the actual language, each of which had to be processed with a separate model. The new version, on the other hand, is able to process speech as a single cohesive entity.
“With an end-to-end model, you end up getting away from having these separate pieces and end up with a combined neural network,” said Automatic Speech Recognition Head Shehzad Mevawalla in an interview with VentureBeat. “You’re going from gigabytes down to less than 100MB in size. That allows us to run these things in very constrained spaces.”
Despite the smaller footprint, the new Alexa model still needs to be paired with an on-device accelerator to deliver the expected performance speeds. With that in mind, Amazon has teamed up with MediaTek to develop the AZ1 Neural Edge processor, which has been deployed in the latest versions of Amazon’s various Echo devices.
According to Mevawalla, end-to-end models have also enhanced Alexa’s ability to identify individual speakers. The Natural Turn Taking feature is able to filter Alexa requests from regular background noise, and to use a camera to determine whether the speaker is directing their comments to Alexa or to a person or another device somewhere else in the room. The feature will still function without a camera, but is more accurate in devices that can capture video.
Mevawalla went on to claim that the use of end-to-end models has improved the accuracy of Alexa by as much as 25 percent. However, Natural Turn Taking will only be available in English when it debuts in 2021.
Amazon recently accredited Kudelski IoT Labs to test products with built-in Alexa capabilities. The tech giant is one of several companies working toward on-device speech and voice recognition. Frost & Sullivan has predicted that car manufacturers will prioritize hybrid voice assistants, while NXP has released a new MCU that will support offline voice recognition in IoT devices.