Facebook has announced that it will be making its wav2letter@anywhere online speech recognition framework more readily available as an open source platform. The framework was developed by Facebook AI Research (FAIR), which claims that it has created the fastest open source automatic speech recognition (ASR) platform currently on the market.
“The system has almost three times the throughput of a well-tuned hybrid ASR baseline while also having lower latency and a better word error rate,” wrote a group of eight FAIR researchers in a recent paper.
The wav2letter@anywhere framework is based on the wav2letter and wav2letter++ neural net language models, and utilizes time-depth separable (TDS) convolutional neural network (CNN) tech – rather than recurrent neural network (RNN) tech – to achieve its performance gains. Separable models have more traditionally been used for computer vision applications, but the FAIR researchers argue that their approach is superior to standard RNN baselines.
If the jargon is somewhat opaque, the upshot is that Facebook may have delivered an accurate speech recognition platform with lower latency that can be deployed on edge devices or through the cloud. If that proves to be the case, it would make it much easier for smaller developers to incorporate some form of speech recognition into their various solutions.
Of course, speech recognition has become increasingly common in the past few years, turning up in IoT products that range from smart cars to smart appliances. Grand View Research previously predicted that the joint speech and voice recognition market would be worth $31.82 billion in 2025.
In the meantime, it’s not yet clear what plans Facebook has for the speech recognition platform. The social media giant has spent the past few years developing a slew of new technologies, and recently launched a new payments platform and a solution that alters video content to thwart facial recognition.
Source: Venture Beat