Fair and Explainable AI: Explaining Automated Decisions and Influencing

Fairness in AI – Part II: Explaining automated decisions

In the second of four articles about fairness in AI, Paolo Fantinel says understanding ML-driven choices can help companies and their customers.

More and more businesses are relying on artificial intelligence (AI) and machine learning (ML) to make – or help make – decisions more quickly and efficiently. The previous article considered how companies can ensure that such automated decision-making is fair, this one looks at the connected issue of “explainability”, or the ability to understand ML-driven choices that might seem odd. It enables businesses to identify and remove biases lurking in their algorithms and helps people on the receiving end of unwelcome decisions to potentially remedy the situation.

Take a person deemed creditworthy, but refused a loan. Using tools developed by DAIN Studios, an ML model crunched through the German Credit Dataset of 1,000 loan applicants to award them good (1) or bad (0) credit scores. One woman received a score of 1, but was not given a loan in real life. A closer look shows that the model calculated the probability of her repaying the loan at 54%, putting her on the right side of the model’s 50% threshold for rounding scores up to 1, although her underlying score was obviously still deemed too risky.

What-if analysis

“Explainable AI” (XAI) in this case also allows the bank or the loan applicant to see how the ML model calculated this probability of loan repayment. Shapley values – named after the Nobel Prize-winning economist and game-theory pioneer Lloyd Shapely – show how each of the loan applicant’s characteristics contributed to the model’s final outcome. Each calculation begins by assigning the applicant a base prediction, the average prediction for the dataset. It then looks at a range of individual features that add to or take away from the initial probability score.

In the case of the loan application, the chart shows blue bars for features that improved her chance of getting a loan and orange bars for those that decreased her credit score, relative to the grey bar of the base prediction. It shows that the applicant’s credit history is the biggest factor that speaks against a loan, single-handedly reducing her score from the 0.7 base to 0.56. While her savings and the duration of the loan would more than counterbalance this first hit to her credit score, all the other features combine to bring the score down again, to 0.54.

The applicant’s many negative Shapley values suggest the bank was justified in deciding against giving her a loan. But the model can also be used to help her in the form of a what-if analysis that shows how changes in her characteristics would change the prediction. Improving her credit history by one class increases the probability of her paying back the loan 0.69. She could also change other factors more readily – applying with a partner or parent (“applied alone=1”) or increasing her income (“account status” 3 instead of 2) would increase her chances to 0.6.

Rather than this approach of trial and error, it is often more efficient to ask what factors should best be changed to flip the model’s prediction or raise it to the desired level. In XAI, these are called counterfactual explanations, as they consider what would have been the result had facts been different. The most straightforward method is to look for prototypes, or other instances in the data that are similar to the case under examination, but different in outcome – the other exploits a genetic algorithm to generate data points and results based on that case.

Counterfactual explanations

Prototype counterfactuals are useful when confronted by fairness issues. This applicant did not get a loan, but is there one with similar characteristics that did? If so, what features did the other person have that the first one didn’t? It turns out there are two instances in our dataset that have characteristics very similar to our loan applicant – except that loans were given in these cases. Comparing each to the case under examination offers insights on whether the decision-criteria make sense. The Shapley values show credit history is the single most important factor.

The closest prototype counterfactual shows a woman with a better credit history got a loan – even though she had less savings, could only afford smaller repayments and was employed for a shorter time. But the second counterfactual shows a worse credit history is not always a deal breaker. This successful applicant had a credit history similar to the base case, but could afford to repay in higher installments, owned more property and was resident for longer – features that made up for her having less disposable income and requesting a longer loan horizon.

Genetic counterfactuals

If no prototype counterfactual can be found, XAI adopts the genetic approach. Using mutation, crossover and other methods known from biology, it generates synthetic data and calculates outcomes. It relies on simultaneous optimization of multiple functions, which place constraints on the outcomes resulting from predicting synthetic data. This ensures generated data are similar to the original instance, lie within its data distribution, differ in features as little as possible and produce a predicted value that is in a desired range close to the original value.

The first genetic counterfactual for the customer under examination shows that improving her credit history by one class would improve the probability of her getting a loan to 0.7. But, as has been shown previously, credit history is not all. The second counterfactual shows that more savings, being employed longer and providing for fewer people would increase her chances to 0.6. While credit histories change only slowly, the applicant could in the space of a year build up her savings and arrange to care for one less person, improving her chances for a loan.

The third genetic counterfactual presents a scenario that would make the applicant a strong candidate for a loan. Increasing her income (“account status”), being employed for longer, finding a spouse (“fam. status”), buying a house and being over 25 are factors that would more than make up for her not having many savings, raising her credit score to 0.8. This scenario suggests that being older in general makes banks think people are wiser – and that some genetic counterfactual scenarios are more realistic than others as a result.

As not all paths are equally feasible, working with multiple genetic counterfactuals is useful. In this instance, the first example suggested an improvement in credit history, which can be done relatively easily (say, by reducing her debt) and would be enough to make her a stronger candidate. The latter two examples would require a lot more time and things to happen that aren’t wholly in anyone’s control, like starting a family. But they would still be invaluable for the bank to explain clearly to the loan applicant why it had decided not to give her any credit.

The ability to explain ML models benefits the people who use them and those the computer decides about. It allows businesses to better explain their decisions, improving communication with customers and avoiding lawsuits about choices that appear odd. People analyzed can improve their outcomes – even finding within the AI “black box” winning patterns and strategies in certain feature-combinations. Lastly, data scientists can test hypotheses and debug their models, ensuring that fairer and more accurate solutions enter commercial use in the first place.

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFlare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non-necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	5 months 27 days	Used to store consent of guests regarding the use of cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
tableau_locale	session	We embed Tableau charts and interactivity on some of our pages. These cookies expire at the end of your session.
tableau_public_negotiated_locale	session	We embed Tableau charts and interactivity on some of our pages. These cookies expire at the end of your session.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
_dc_gtm_UA-111640802-1	1 minute	This cookie is used by Google Tag Manager to support Google Analytics on our Sites. It helps us monitor the use and performance of our Sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_JWW0KP3X8Q	2 years	This cookie is installed by Google Analytics 4.
_gat_UA-111640802-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
ai_session	30 minutes	This is a unique anonymous session identifier cookie set by Microsoft Application Insights software to gather statistical usage and telemetry data for apps built on the Azure cloud platform.
ai_user	1 year	A unique user identifier cookie, set by Microsoft Application Insights software, that enables counting of the number of users accessing the application over time.
AnalyticsSyncHistory	1 month	Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries
prism_252943399	1 month	This cookie is used by Active Campaign for site tracking purposes.
visitorId	1 year	By default, the visitor ID is supplied to Coveo UA using the visitor (string) query parameter and kept in the local storage of the user browser. A third-party cookie can also be used to store the visitor ID if the current user browser accepts these kinds of cookies.
WFESessionId	session	These cookies are used by Microsoft Azure Application Insights, which collects site telemetry information, allowing us to analyze how some of our Sites are performing and to perform optimization.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
LinkedIn
muc_ads	2 years	Collects data on user behaviour and interaction in order to optimize the website and make advertisement on the website more relevant.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.

Cookie	Duration	Description
CONSENT	16 years 7 months 20 days 16 hours 15 minutes	No description
GetLocalTimeZone	session	No description
hid	session	No description available.

Fair and Explainable AI: Explaining Automated Decisions and Influencing Them

Fairness in AI – Part II: Explaining automated decisions

What-if analysis

Counterfactual explanations

Genetic counterfactuals

About the authors

Details

Computer Vision: Create an API in 60 minutes

Data Governance Roles and Responsibilities

Guiding C-Level Executives Through Business Ethics in the Data and AI Age

DAIN Studios

Studio HELSINKI

Studio BERLIN

Studio MUNICH