Understanding AI bias for technical leaders

Andy is an experienced CTO, technical architect & leader. Over the last 15 years he built and scaled multiple early-stage companies into successful SaaS businesses. In this article he gives his perspective on AI bias.

Data is the new oil according to this article in the Economist. Thanks to smartphones, connected devices and the daily web services we all consume, the amount of data collected by the start of 2020 was around 463 exabytes. In the same way that oil changed day to day life across a wide array of products and transportation, data is changing the way businesses operate and automate their processes.

The development of AI and Machine Learning promises much by unlocking value from this vast amount of data but it often does so using black box data models that lack explanation about their inner workings. As a C Level exec or technologist, understanding the impact (both positive and negative) on your business and reputation is vital when deploying AI/ML in your business. One of the key things to understand is the impact that Bias has on AI.

In the last couple of years, there have been some high profile cases where AI/ML has gone wrong resulting in bad PR for the companies involved. When Apple launched their Apple Card, it was reported that their AI was heavily biased against women. This led to an investigation by the New York financial regulator. In 2020, a team from the University of California discovered bias within an AI system used to allocate healthcare services to 200 million people. The bias meant that black patients suffered reduced access and lower standards of care compared to white patients. Black people were assigned lower risk scores than white people because the algorithm was using healthcare costs instead of a more accurate measure of health status, and black people are significantly under-diagnosed and under-treated in the US. In fact, the opposite was the case, and Black patients were statistically more likely to suffer simultaneous conditions and experience higher levels of risk. How can the models that are increasingly determining financial and health care access decisions contain such bias? The causes can include but are not limited to: historical data which is incomplete and data entered by humans which can contain bias against certain groups of people.

As a leader, how do you ensure that your technology minimises the risk of bias in your AI? The first thing to consider is can you defend your AI and provide explanations of how your models generate their responses? There are several things to consider. Before starting to build out AI you should have the right cultural approach, processes and people in place to ensure success. Hire an experienced Chief Privacy or Chief Data Officer. They should be accountable for the processes and tools used to build your AI and the governance of its suitability and correctness. They should also author the ethical framework which would be tailored to the specific technology you are building. Introducing a framework such as ETIQ during your model's build stage, your data science team will be able to perform Exploratory Data Analysis, generate Fairness Metrics and review Safeguarding reports which provide a set of tools to explain your models in detail. These tools can report on why certain predictions are made and the correlations between inputs and outputs. By integrating directly in the data scientists notebooks and workflow they can continuously report on the effects of their changes while developing the models.

The need to provide transparent and accountable AI may not be limited to your company’s reputation. As the adoption of AI has increased so has the number of discussions around how it could or should be regulated. In April 2021 the EU proposed the new regulations for AI . The changes could be as wide-sweeping as GDPR if signed into law. Given that it would affect over 450 million EU citizens, it would be impossible to ignore even for non-EU companies, even if you wanted to.

While Data Science has seen huge growth over the last 5 years, it is still immature compared to other forms of computer programming. The possibilities are endless but real risks exist and need to be considered. In the same way that tools and processes have developed to decrease the risk of defective production code, proving and reporting on the fairness and correctness of your data models is becoming a top priority. In doing so you increase the trust of your technology from your users, protect your reputation and will be well prepared for the inevitable regulation that will be coming soon.