INTERVIEW: Sensory CEO Todd Mozer Talks 3D Facial Recognition, AI, and More

INTERVIEW: Sensory CEO Todd Mozer Talks 3D Facial Recognition, AI, and MoreSensory, Inc. closed out 2017 as the provider of one of the world’s most sophisticated biometrics platforms for mobile devices. Having demoed a virtual bank teller AI assistant that uses facial and voice recognition to identify the user at the FinovateFall event, the company saw its technology integrated into a major new smartphone from LG, and the mobile app of Japan’s Mizuho Bank, thanks to a partnership with Fujitsu, in the closing months of 2017.

A year later, Sensory has launched a major upgrade of its TrulySecure platform, with version 4.0 supporting facial recognition based on 3D facial imaging, and an enhanced AI system that is more than 50 percent more accurate in applying 2D facial recognition.

It’s a big step forward, and one that Sensory CEO Todd Mozer was happy to dive into in a recent interview with FindBiometrics & Mobile ID World Managing Editor Peter Counter. The two started things off with an in-depth discussion of TrulySecure 4.0’s upgrades, and from there the talk touched on Apple’s Face ID, speech recognition technology and virtual assistants, on-device AI, and much more…

Read our full interview with Todd Mozer, CEO, Sensory, Inc.:

Peter Counter, Managing Editor, FindBiometrics & Mobile ID World (MIDW): Thanks for joining me today, Todd. Sensory recently upgraded its TrulySecure biometric authentication software, what is new with version 4.0?

Todd Mozer, CEO, Sensory: We have done all sorts of new things over the last year and 4.0 is the culmination of that. If you look at our TrulySecure offering there are two components; there is TrulySecure Speaker Verification which is the voice biometrics side and there is TrulySecure Face which is face authentication. So, let me quickly walk through some of the new things that we have added over the course of the year for each of those.

On the voice side we have incorporated wake words into TrulySecure, so now we can wake up with TrulySecure and something like “Alexa” or “Hey Google” can have the biometric built into it. We’ve added text independent speaker verification so when you just talk in general without speaking specific words, we can identify the speaker. Also, a whole lot of things have been done to improve the accuracy of voice verification, making it perform better with lower signal/noise ratios, better in reverb, and better distance. Most of that was through improved feature extraction techniques and more data collected.

On the face side we have added 3D cameras. With 3D cameras, which are becoming more and more popular, performance in darkness improves and we get better angle coverage, and further improve the anti-spoofing capabilities. On TrulySecure Face we have written code to take advantage of GPU’s to be more efficient on-device. In fact, we are an on-device company; everything that we do is on-device, but one of the new features that we have offered in TrulySecure 4.0 is the ability to have options for cloud-based integration. We actually had customers demanding to implement our 2D anti-spoofing and other features in the cloud and so we have enabled hooks for that.

There are a lot of accuracy improvements on the face side, as well as on voice, and a lot of that comes from our ability to gather a whole lot of data on how people use their mobile phones to unlock. We have done that with an app that we have in the Google Play Store – it’s called “AppLock by Sensory” and it has been a huge runaway hit. It has a 4.3 rating which is one of the highest rated biometric apps in the Google Play Store, and we have had just under a million downloads so we are getting terabytes of data every couple of weeks which gives us really good coverage of different faces, different environments, different angles – and that really helps out our accuracy improvements by building better and more robust models.

Also, with 4.0 we started to introduce things beyond face authentication where we can use the data that we are learning about people’s faces to judge things like whether your eyes are open; we’ve had specific requests for eyes-open. Let’s say if you were having a nap on your desk, people shouldn’t be able to pick up your phone and unlock it while you are sleeping, so we now have an eyes-open detection. Other things like emotions and demographic ID are also in development.

MIDW: That is really incredible how far that has come in what is relatively a short amount of time. I’m fascinated by how many of these new features are really on trend with a lot of what is going on in the mainstream. Among the new features that you mentioned was the 3D camera support, have you found a greater demand for these features just in the wake of Face ID?

Sensory: Absolutely! Since Apple introduced Face ID, all of a sudden all the mobile guys wanted to introduce face biometrics into their phones and so we immediately got traction. We were out selling 2D approaches for face authentication to the mobile phone companies and as soon as people knew Apple was doing it, we started signing up mobile phone manufacturers left and right to implement our 2D technology. One of the luxuries that we have in being heavily deployed is that we get feedback from our customers about what they want, so we started hearing from our 2D customers that they wanted to introduce 3D mobile phones with 3D face authentication. That really spurred our move in that direction, so it is very much market and customer driven innovations in these areas.

MIDW: And it seems to be coming at a really good time, there is some new motion in terms of getting VCSEL technology on Android phones, so it seems to be perfectly timed as well. So congratulations on that. Sensory also has speech recognition solutions, what is the latest for your TrulyHandsfree and your TrulyNatural products?

Sensory: TrulyHandsfree is a small-footprint solution for wake words and small vocabulary command and control. We have a ton of traction in that area and the solutions keep getting better and better there too. A lot of the work that we do is on accuracy improvements and with both TrulyNatural and TrulyHandsfree that has really been our main focus. One of the big things that has happened in the last 10 years is that Amazon introduced the Echo and Alexa and all of a sudden far field became a real thing. And so a lot of the accuracy improvements we have been doing over the last five years really have been focused on doing a better job on far field. So, not just recognizing when you are talking an arm’s length away but when you are talking across the room which brings into effect room echo and signal noise kind of things. Also, we have incorporating different techniques to improve the performance by adding acoustic echo cancellation into our technology stack – so a mobile phone for example, can be playing a ring tone but still be able to hear you when you say “Answer the phone.” We have some really great 3rd party partners that have amazing noise cancellation approaches that make our technologies work even better.

TrulyNatural is our large vocabulary natural language recognizer, and the work there has also been on accuracy improvements and making it smaller. We are down to about 20-30 MB which ten years ago seemed like a huge amount of memory but today given the sophistication of our natural language offerings it is quite impressive that we can be that small.

MIDW: Absolutely. I see such a strong need for these solutions specifically as I’m an early adopter. Having virtual assistants and smart home devices, there is a real uncanny nature to them when they don’t work properly, so it is really great that we are seeing such innovation in these areas.

Sensory: And it’s going to keep happening. It’s just getting useable and good enough but it has to get a lot better and so we are trying to push things so that we keep advancing, and so that our smartphones can really get smart, and our assistants really can assist.  

MIDW: Sort of in that line: there used to be some skepticism around voice command – and actually even going back to the TrulySecure area of authentication, there was skepticism about using voice as password replacement. Have you noticed a growing acceptance for these sorts of technologies specifically, and how much of that is related to the rise of virtual assistants?

Sensory: That is a good question and I would say that the use of the biometrics is somewhat bifurcated in the sense that a lot of our voice traction is with mobile phones OEMs, and a lot of our face traction is with banking applications. We have several dozen banks now that are using our biometrics for face and even though we offer our voice to them as well – I think it is only one or two that are doing both or combining both. On the mobile phone side, it is more mixed; we have both voice traction and face traction but we are not getting a lot of customers that are using our fusion approach of face and voice together which is surprising and interesting to me.

MIDW: That is really interesting especially considering that use case you would expect the high-risk transactions to be more of a multi-factor approach.

Going forward, artificial intelligence plays a role in all of Sensory’s technology and I think it is safe to say you are a pioneer of using neural networks to enhance biometrics. How has AI changed authentication as it’s become more mainstream?

Sensory: AI is a very broad concept which means a lot of different things to different people. We have certainly been working on machine learning and neural net algorithms for 25 years so, yes, thank you for calling us pioneers – we were very early in that space, especially in applying to consumer electronics. What has really changed a lot is that the algorithms are getting more and more access to data. What we have come to find in deep learning is that the more data you can get the better the models can be and the more the data you get then you start having more complex nets that can better take advantage of the data. So, it is sort of the same general technologies that have gone from computer machine learning to deep learning, it’s just gotten deeper and wider in the process. We are seeing more and more movement from the traditional programmed approaches to AI to more deep learned approaches, and deep learning has really proven to be better. I don’t think anybody doubts that today, for computer vision certainly, that deep learning outperforms any of the traditional approaches where you get experts programming the way the computer should think. The challenge with deep learning is that you get bigger and bigger models and require more complex processing. Sensory has a number of proprietary techniques to make big models smaller and more efficient.

MIDW: The deep learning seems to be the easiest answer to certain controversial ideas about computer vision and face recognition in terms of ethnicity and gender biases, so I completely agree there.

You mentioned TrulySecure was an on-device technology until recently when you had these cloud options, but relying heavily on AI that seems to add a certain challenge. I was wondering: is there actually more of a challenge implementing neural networks in AI in an on-device scenario?

Sensory: It is certainly more challenging doing it on-device but there are huge advantages that make it worthwhile. It is more challenging just because you are constrained in MIPS and memory, so you have to do more with less or at least do the same amount with less. But the advantages – and I think this is really going to become apparent in the years ahead – is on device gets around all of these privacy and security issues where all of a sudden people are realizing all of these big companies have been monetizing our data and personal information in ways that we didn’t expect. And so there is a big push-back happening all over the world. It seems like everyday you open the paper and read some story about Facebook or Google or any giant company, and how our data has been abused or stolen. That is leading to a movement to the edge where your personal information isn’t sent off to some cloud where it can be shared with other people or sold to other people, or be hacked by other people. So, the embedded approach just for privacy and security is really important.

I think there is an additional point that is starting to emerge and people are becoming more aware of, and it’s that if you are doing your AI on-device you can really take advantage of that device’s knowledge about the sensors that are on that device, and interact with your AI to adapt for the individual users behaviors and usage, so that it really improves for those individuals, rather than sending it off to some sort of general cloud that improves for everybody on the whole but for certain people it might get worse. If you are doing it all on-device you can really cater to each individual that is using it.

MIDW: That, in addition, seems to me like a natural anti-spoofing liveness detection synergy. So the benefits of that make a lot of sense to me. That on-device embedded FIDO approach, there seems to be a lot of demand, given that you are in charge of your data, and biometrics and AI can enhance your privacy.

We were both at Money20/20 in October and I ran into you on the exhibition floor, and we talked about how biometrics were everywhere at that show specifically, and contactless biometrics like voice and face seemed to be highly represented at all of the fintech booths – financial services, banking, payments, you name it. Sensory’s tech alone is integrated into 24 or more banking apps. What do you think is driving these specific modalities in financial services?

Sensory: It is really important that when a bank releases an app that it works across all their customers phones. So, this sort of broad-based use of technology is really driving the types of technologies used. People would be pretty happy with fingerprint as a biometric but fingerprint isn’t on every phone, especially when you move into some of the third world markets where phones are cheaper. But cameras and microphones are really on every phone, so if you can have algorithms that take advantage of cameras and take advantage of microphones then you can really be cross platform and that is what is needed.

MIDW: Yes, you are in a situation where they are deploying it on their end and bringing it to the customers so that device agnosticism is really important. Are there other vertical markets that will gravitate towards that sort of software-based approach? We at FindBiometrics and Mobile ID World have had our eye on the healthcare industry for some time but it seems there is also a lot of potential in consumer biometrics, IoT, automotive and other sectors.

Sensory: Yes, I think you nailed it. Medical is probably one of the big areas that will be taking more and more advantage of machine intelligence in general over the coming years. The idea that you can have devices at home and not go into a doctor’s office and get feedback on your health and how you are doing is really powerful, and AI will drive that. And access to your personal information through biometrics is key once again, pushing for on-device for privacy. So, there are all of these things that will play a role on the medical side of things.

I agree with you, automotive is an area that we are seeing a lot of interest in biometrics as well, and it’s not just for pure security, it’s for shared devices like a car where different people can sit in different spots, and the car knows who is who so it can adjust to your preferences automatically and just make the whole experience nicer and more convenient. When I sit in the driver’s seat why isn’t all the radio programmed in the car to my radio stations so I can play what I want, but when my son sits down it will be his music – those sorts of things.

MIDW: Additionally, the radio station thing just brought to mind the noise-cancelling in far-field, that is the perfect deployment when you are driving a car and there is plenty of noise to have that advanced far-field speech recognition happening.

Sensory: Yes, automotive is interesting: a lot of people think that the automotive is sort of the worst environment because of noise, but actually it is a pretty nice environment for doing speech recognition as long as you only have one person talking at once. The radio is very easy to get rid of with AEC and in general you have a pretty good signal to noise ratio because you can’t be that far away from the mike. So the only problem is the road noise, and because you are essentially in a soundproof little booth, the noise from outside comes in as a lot of white noise. It might be high amplitude but it is generally predictable. So, it is actually a fairly easy environment to deal with especially since the user is in a fairly stationary position in the driver’s seat, so if you want to use things like beamforming you can point the beam at the driver. Automotive is kind of a nice environment in that respect, especially in comparison with a smart speaker or home device that will have more varied noise, acoustics, and signal to noise ratios.

MIDW: That is really fascinating. That was not what my instinct was, but when you started talking about it, it is like being in some sort of a sound studio, and it is a nice closed environment. Plus there are going to be places where there is no signal, so that embedded approach would also seem to be pretty ideal for connected cars.

Sensory: Absolutely. The automotive players are interested in hybrid approaches that use cloud and on-device, but you have to have the on-device take over when the cloud is not available.

MIDW: Yes, in a non-automotive sense Google had to deal with that initially with their Drive Suite of office apps – the idea of having their own hardware with all of their word processing and spreadsheet programs and all of that, which was fine as long as you lived in a city, but as soon as you left it didn’t work.

Sensory: I’m here in the heart of Silicon Valley and if I drive up one of the most popular freeways in the world my 4G connection goes in and out so it’s crazy.

MIDW: And when you are in a car and that distraction, that’s a pretty high-risk thing. I know we were talking about changing radio stations and things like that but when technology doesn’t work properly that is when you get distracted and it is dangerous. That area out there is infamous for traffic problems so you want to have your eyes on the road.

Moving to a broader idea, what are some of the highlights for Sensory in 2018? You mentioned you had an exceptional 2018 that culminated in TrulySecure but what were some of the individual highlights?

Sensory: Just from a straight business perspective we had a great year. Our sales grew in the high double digits, we were very profitable and we got a lot of new customers. I think we had record revenues and near-record profits. We took over a hundred people to Hawaii on a company trip, so it was a pretty special year for us from a business perspective. We shipped in hundreds of different products that probably shipped in half a billion units. So, a lot of cool products, and a lot of volumes, and a lot of expansion into new markets – like we mentioned banking, which is a pretty new space for us, but we are doing pretty well in it. So, it’s exciting times for Sensory.

MIDW: Fantastic. What can we expect from you in 2019?

Sensory: Our goal is continued growth, continued profitability. We are doing a lot of interesting things and we are taking on a lot of new initiatives with both face and voice. We want to expand our face technologies beyond just the biometrics. And I mentioned things like eyes-open, but we are starting to locate expressions and judging things like if you are driving in a car are you distracted, are you falling asleep. These kinds of things can be very important for safety and for other purposes. Sometimes it’s good to know if this person is happy or angry, and we are looking at the face and doing those sorts of things. Even though a lot of our customers aren’t using the fusion approach I’m a big believer that will take over. So we continue to invest in the fusion of face and voice, and try to improve things by combining them together. This is a big thing for our future.

Another area where we are putting some focus on is identifying sounds because we are in so many homes with our TrulyHandsfree and other products we are starting to listen to the environment and not just listen to the wake words but identify things that can help the homeowner in security situations. Is there glass breaking, is the dog barking, is somebody screaming for help – and we want to be able to do that sort of security monitoring function through sound and probably eventually through the fusion of sound and vision.

MIDW: That’s fantastic. Computer vision is the hot word right now but it’s really cool to hear about the idea of computer hearing.

We are currently conducting our 2018 Year in Review. From a general perspective, what have been some of the more important industry trends in the past 12 months?

Sensory: Certainly, the growth of the smart assistant, the voice assistant and the smart speaker, and its expansion into other types of products, is a driving thing going on in our space, and how that will play out nobody really knows. Are people going to settle on a single assistant they use for everything or are they going to have different assistants for different use cases? It is evolving, the market is really taking off but it is a challenging market. We have a lot of customers that are introducing products with Alexa or with Google or some other assistants, but then they are finding that Google and Amazon compete against them so it is a challenging area in that sense. Outside of Voice Assistance is the rise of cameras and 3D cameras, embedded AI computing, privacy issues and legislation. And of course the China /US relations will have a big effect on the consumer electronic and AI industries.

MIDW: Well Todd, this has been a great conversation and I’m super excited about everything that Sensory has going on. I think your technology is really fantastic and, again, you have been a real pioneer in AI and biometrics. Thanks for taking the time to talk with me today.

Sensory: Thank you Peter, I’ve enjoyed it.