Soniox has released a new self-learning Speech Recognition Platform that can teach itself new words without any human supervision. The company claims that its word-error rate is 24 percent better than those of competing solutions, and that it can therefore identify a wider variety of unusual words in context in a real-world setting.
According to Soniox, the thing that sets its Speech Recognition Platform apart is that it is able to process unlabeled audio and text data. As a result, the system will continue to improve even if no one takes the time to manually transcribe and label new audio and text files. That makes it far more efficient than other systems, which are usually trained using fully labeled datasets. The Soniox Platform, on the other hand, is trained using publicly available audio pulled from the web, and becomes more accurate as it makes its way through more files over time.
Now that it is commercially available, end-users can use the Soniox solution to transcribe their own audio and video files, or to transcribe live streams. The Platform is available through the web, or as an iOS mobile application, and is free for the first five hours of audio each month. That number is likely to be enough for many individuals, though enterprise customers will presumably need to pay for the service once they exceed that five-hour threshold.
Soniox has also released an API that allows developers to integrate the company’s speech recognition tech directly into their own apps with only a few lines of code. The company has provided tutorials to help make the process as intuitive as possible.
“Audio is becoming the prevalent medium for rapid, immersive communication,” said Soniox Founder and CEO Klemen Simonic. “With our self-learning AI platform, Soniox has built the industry’s strongest infrastructure and toolset to build advanced speech and audio understanding solutions.”
Soniox did not provide any details about the audio files that it uses to train its system. That raises some questions about what “publicly available” means in this particular context, if only because Clearview AI has made similar claims to justify its invasive and non-consensual data collection practices.
Soniox, at least, has placed a much greater emphasis on data privacy, and is offering on-premises and on-device versions of its platform. The former can be deployed within an existing corporate infrastructure, while the latter completes the transcription in a contained environment. The on-device version does not require a network connection, and ensures that the audio file remains in the sole possession of its owner.
The global market for speech recognition products is expected to pass $29 billion by 2026.