Jumio Emphasizes Need for Representative Data Sets in AI Development

Jumio is detailing some of the steps developers can take to minimize the amount of bias present in their AI algorithms. In that regard, the company emphasized the importance of the datasets used to train those algorithms, noting that a database that does not reflect the entire population will reproduce those same blind spots when that database is fed into an AI model.

With that in mind, Jumio argues that AI algorithms should only be trained using large databases that are representative of the population that they will be applied to. Developers that fail to take such precautions will end up with biased algorithms that do not perform as well for many members of the public, as in the case of speech recognition algorithms that are trained with the voice samples of white, upper-class Americans that struggle when asked to identify the accents and speech patterns for those who fall outside that narrow demographic.

In plain terms, that means that developers that want to eliminate bias need to create and implement a plan for doing so long before they start training their algorithm. They need to evaluate the origins and the integrity of the dataset they plan to use if they are getting that data from a third party, or they need to make sure that their data is representative if they are collecting their own.

They also need to apply the same level of rigor to the labeling process. If problems like glare and blur are mislabeled in the database, then the algorithm will end up learning and applying the wrong labels when it gets turned loose on the real world. Jumio advises developers to handle labeling internally instead of outsourcing it or using an automated program, and to introduce quality controls to make sure that any errors can be caught and amended.

Finally, Jumio noted that the team developing the algorithm and doing the labeling should be just as diverse as the population in the dataset, with members of different races, nationalities, genders, ages, and professional backgrounds. A diverse team will be less likely to recreate biased assumptions that would otherwise go overlooked if everyone on the team has similar life experiences.

The takeaway is that biased AI is created when a biased dataset gets encoded into an algorithm, and an algorithm that performs poorly can hurt business, and lead to discrimination and other legal issues if that bias is not addressed. Jumio believes that many organizations will start to demand AI solutions that minimize demographic bias moving forward, so developers that fail to adapt could fall behind competitors that are more careful with their data.

Sponsored Links

FaceTec’s patented, industry-leading 3D Face Authentication software anchors digital identity, creating a chain of trust from user onboarding to ongoing authentication on all modern smart devices and webcams. FaceTec’s 3D FaceMaps™ make trusted, remote identity verification finally possible. As the only technology backed by a persistent spoof bounty program and NIST/iBeta Certified Liveness Detection, FaceTec is the global standard for Liveness and 3D Face Matching with millions of users on six continents in financial services, border security, transportation, blockchain, e-voting, social networks, online dating and more. www.facetec.com

FACEPHI is a global leader in Facial Recognition technology and in Mobile Biometrics technologies. With a strong concentration in the financial sector, FacePhi’s product is rapidly becoming a service used by banks all over the world. Its implementation doesn’t just save money, it is also a way to attract clients and build loyalty, while increasing the security of transactions for both the customer and the business. To learn more about FacePhi, visit https://www.facephi.com/en/

Related News & Articles

Footer

Follow Us