Sensory, Inc. is a pioneer in AI-enhanced face, voice and speech recognition technologies, with biometric software that leverages advanced on-device machine learning. As we enter a new decade in identity, connectivity and mobility, the Sensory’s technology is set to have an even greater impact on how we interact with each other and our devices in an increasingly automated and digital-dependent world.
That’s why now is such an excellent time to have Mobile ID World President Peter O’Neill interview Todd Mozer, CEO, Sensory. Their conversation begins with talk of Sensory’s 2019 milestones, including its acquisition of Vocalize, an independent testing house for speech and voice biometrics. The conversation goes on to detail the emergence of a new virtual avatar use case, and touches on the modern privacy landscape as it relates to virtual assistants. Mozer and O’Neill conclude on the fascinating topic of face and voice fusion in biometrics, and where Sensory is taking these contactless modalities in the coming decade.
Read about all this and more in our full interview with Todd Mozer, CEO, Sensory:
Peter O’Neill, President, Mobile ID World: I’d like to start off by talking with you about this past year. It’s been a busy one for our industry. Can you tell us about some of the highlights with regard to your company?
Todd Mozer, CEO, Sensory: Sure. It’s been a busy year, and the industry is obviously taking off. There’s a lot of dynamism going on in the industry as well as for Sensory. One of the big things Sensory did in 2019 was acquiring a company called Vocalize, which is a testing house for speech and voice biometric products. We have always done our own in-house testing, where we do a lot of recorded voices that are digitally recognized within a computer where we can model different kind of noise environments and conditions like distances, echoes, reverbs, car noise, people talking, etc. We would do statistically significant volumes of testing to recommend appropriate search points and accuracy assessments. However we found that when our customers were doing testing through speakers and through live audio channels, their results were not always correlated with what we were getting in-house.
And so, we started working with Vocalize, and after about six months we knew we wanted to continue with them, and we became their biggest customer. It was really useful to test both digitally and through live audio channels. One thing led to another and we ended up just acquiring them. But they remain independently operated — but if they are working with a Sensory customer we offer a nice discount on their standard pricing. Vocalize follows ANSI and Google/Amazon testing standards and some more traditional approaches to testing, so it provides our customers, or anybody, an independent source for testing, which for Sensory is really valuable.
Peter O’Neill, President, Mobile ID World: I know you are constantly, continuously improving your face and voice products, and actually have been a world leader in machine learning and neural nets for years. I think you were probably the first person I ever spoke to in this industry that started to educate me about those areas, but can you tell us what’s the latest?
Todd Mozer, CEO, Sensory: The whole industry has really moved towards deep learning in a variety of approaches using neural nets, and as you say, we’ve been doing it for a long time. One of the interesting things that’s been happening is more and more specialized chips are appearing, and because we do everything on-device or on the edge, and embedded is are our focus, this creates some interesting opportunities. More and more chips are emerging that specialize in running these neural net functions on-device in lower power and lower cost platforms. And we’ve started experimenting in porting to some of these platforms.
In 2019, for example, we announced that we partnered with Gyrfalcon, which has a very nice, efficient AI accelerator, and it requires a custom net on their deep learning architecture and we ported and quantized to their neural net. It took a bit of work, but once we got there, what we found is that we could run really, really powerful models much more efficiently than we ever could in something like an Android OS. We are also moving to a Syntiant chip which enables ultra-low power wake words and commands. These chips have been a nice development on the deep learning side. There’s a bit of an irony because Sensory’s first product, the RSC-164 was a low power microcomputer with a specialized neural net processor on chip.
Peter O’Neill, President, Mobile ID World: And in terms of your product portfolio, can you give our readers a little bit of an update as to what the latest is?
Todd Mozer, CEO, Sensory: Our roots are in speech recognition, and with the speech recognition side we do wake words and we do small vocabulary command and control that’s extremely robust to noise, and we can do large vocabulary, continuous speech recognition, and things that feel like dictation. We also have some really nice NLU engines, and have three flavors of NLU that vary in size, so we can do a relatively small footprint solution with NLU. We can do broad domain language models. We’re doing a lot more domain-specific assistant type functions and we have added a new technique for multi-wake words to support Amazons Voice Interoperability Initiative.
That’s what we’re doing on the speech recognition side. We’re also doing biometric voice where we can do text dependent or text independent speaker verification. On the computer vision side of things, we have the ability to do biometrics on a person’s face, where we have very good anti-spoofing technology. We have started to use the camera and we’re looking at the face to detect expressions and to understand demographics of the user. So we’re able to build a very interesting user profile combining face and voice together that gives us a lot of data for analytics of the customer, which can help the user in getting what they want faster and help the sellers in bringing them the things that they really want so that they can sell more and sell faster.
We’re finding this is a very interesting use case, and we’re putting it all together because we have all these different technologies that can have a common interface with an avatar that you talk to and it talks back to you. In essence, it’s kind of like a shopping assistant or a purchasing assistant that can help a person get what they want and help the entities that they’re interacting with better understand what they want without the person even saying it.
Peter O’Neill, President, Mobile ID World: And would the target market in that area Todd, be for the marketing and advertising folks? Is that for retail? Can you give me an example of how that would actually be used?
Todd Mozer, CEO, Sensory: Conceptually it’s for anybody that wants to sell something, but the initial demonstrations and the initial places that we’re targeting are quick service restaurants and large retail shops. There’s a millennial-driven phenomenon where people like to order ahead, and this is going on with restaurant food, groceries and more. The idea being that you call in with an app and fill out what you want and then you can show up and pick it up. And we take it one step further, where you can use your voice to do all the ordering and all the interfacing with the automated selling agent. And they get the benefit of all the data analytics that goes with a device that you’re talking to, that hears you and sees you while you’re communicating.
Peter O’Neill, President, Mobile ID World: Very cool. As we head into the next decade, with the current speed of advancement and deployment, we’re seeing a lot go on. What would you say are some of the key advances that you’ve seen, generally, in our industry over the past 10 years?
Todd Mozer, CEO, Sensory: Well, over the past 10 years, deep learning’s really taken off and the value of data has emerged and the rise of the assistants. I think the assistant market, which is Google and Amazon and others, is a huge, huge phenomena. They’re really taking over the whole consumer electronic space, and just over the Thanksgiving holiday, Amazon’s top-selling product was one of their speakers. And I had heard that they had priced some of the older Echo Dots at $10, so they’re just doing huge volumes and getting them out there to the market, which means more and more people getting used to interacting with things by voice. This huge growth will drive all these other sensor functions and products, including biometrics and peripheral devices like light switches or thermostats.
One of the phenomenon that’s going on in this market is driven by governments that are saying, “Hey, there’s all this private data that should be kept private, but these guys are taking it.” There’s a lot of fear by consumers about what these companies are doing with their microphones and their cameras and there have been literally dozens of reports coming out over the last year about people outside the company that are listening to transcripts of things they shouldn’t be listening to, in the bedroom and these kinds of things. So, there’s a movement towards privacy, which brings things more towards on-device, which is where Sensory is very focused. Here in California, in just less than a month, we will have new laws in place that protect the consumer, through the California Consumer Privacy Act (CCPA). And already it’s happened in Europe, and we think that’s going to spread pretty quickly across the States. CCPA will be a model for other states to follow.
Peter O’Neill, President, Mobile ID World: Which is not necessarily a bad thing. I guess you’re speaking about GDPR in the European community. If it makes consumers a little bit more confident that their privacy is protected, I think that’s good on all fronts.
Todd Mozer, CEO, Sensory: I think that’s right. There’s a value of having these companies have the data, but there’s also a big, big risk. And there have been study after study which says somewhere between 30 percent and 70 percent of users are concerned about the use of their private data.
Peter O’Neill, President, Mobile ID World: Can I follow up a little bit on what you said earlier about voice and how they are being utilized now? I know that you and I have talked for many years about the fact that voice would be utilized as the main communication tool with all the IoT, automotive, robotics, etc… and now that’s happening. What are we going to be looking at in 10 years time when it comes to frictionless travel, robotics, auto and IoT?
Todd Mozer, CEO, Sensory: I think it’s just going to get better and better. Just in the last couple of years, it’s amazing how much the smart speakers have advanced in their capabilities. When they first came out, basically they were music players and you could set alarms, and now you can actually ask a variety of questions and I’d say it gets them right about half the time. And as we go forward in time, it’s going to move towards, they will get more complex questions answered more of the time. We’ll be able to have more dialogue-oriented interactions rather than these kind of momentary pieces of time, and I suspect we will see more proactive Assistant interfaces rather than just reactive.
They’ll know more and more about us, for better or worse, which in a utopian world means that they can help us more and more seamlessly. In a dystopian world, things can get a little creepy, so it’ll be interesting to see how those things play out. But I’m pretty much a believer in the fact that AI is not good or evil and it’s a matter of how it’s deployed. And I think it can be deployed to a whole lot of really, really good benefits, and we just need to be careful about the possible downsides.
Peter O’Neill, President, Mobile ID World: Right, and I think education is critical in that area. We’ve certainly been feeling that in our business. We’re constantly asked questions about these issues.
What we can expect to see from Sensory in the coming few years?
Todd Mozer, CEO, Sensory: Because we have so many different technologies, and actually I didn’t even talk about our ability to listen to scenes and sounds and identify what they are, but because we have this wide range of very powerful technologies that can run on-device, we’re combining them more and more together into applications that take advantage of multiple things in parallel. So, we’re going to see more fusion of technologies. We’re doing a lot of work right now with face and voice fusion. And if you think about some of the things that I mentioned, like whether it’s demographics or other kinds of things, they’re really hard to do just looking at a person’s face or just listening to their voice, but when you combine them together, you get added insights.
Right now we’ve done them discretely and done an algorithmic approach to combining face and voice data, and what we’re looking at doing is more of a deep learning fusion, where we’re looking at the face and the voice in parallel to detect different kinds of things that are going on. This could include improving speech recognition and becoming more robust to outside noise. If you can watch a person’s lips while they’re speaking, then you can disregard other people that are talking in the same spectrum. Humans are very, very good at that. We call it the cocktail party effect, and machines can be good at that too, and so we’re working on those sort of things.
Peter O’Neill, President, Mobile ID World: Well, how exciting is that? I love talking with you because you’re on the cutting edge of our industry, and always a pleasure to hear your thoughts as we continue to move rapidly forward in our industry. Thank you very much for your time today.
Todd Mozer, CEO, Sensory: Thank you, Peter. It’s always a pleasure talking to you. And yeah, we’re on the bleeding edge, which can be good or bad.