<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="review-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">681915</article-id>
<article-id pub-id-type="doi">10.3389/frai.2021.681915</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Facing the Challenges of Developing Fair Risk Scoring Models</article-title>
<alt-title alt-title-type="left-running-head">Szepannek and L&#x00FC;bke</alt-title>
<alt-title alt-title-type="right-running-head">Developing Fair Risk Scoring Models</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Szepannek</surname>
<given-names>Gero</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1199410/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>L&#x00FC;bke</surname>
<given-names>Karsten</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1304765/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Institute of Applied Computer Science, Stralsund University of Applied Sciences, <addr-line>Stralsund</addr-line>, <country>Germany</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Institute for Empirical Research and Statistics, FOM University of Applied Sciences, <addr-line>Dortmund</addr-line>, <country>Germany</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/600304/overview">Jochen Papenbrock</ext-link>, NVIDIA GmbH, Germany</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/598962/overview">Laura Vana</ext-link>, Vienna University of Economics and Business, Austria</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1357783/overview">Henry Penikas</ext-link>, National Research University Higher School of Economics, Russia</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Gero Szepannek, <email>gero.szepannek@hochschule-stralsund.de</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Artificial Intelligence in Finance, a section of the journal Frontiers in Artificial Intelligence</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>10</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>681915</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>08</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Szepannek and L&#x00FC;bke.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Szepannek and L&#x00FC;bke</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Algorithmic scoring methods are widely used in the finance industry for several decades in order to prevent risk and to automate and optimize decisions. Regulatory requirements as given by the Basel Committee on Banking Supervision (BCBS) or the EU data protection regulations have led to an increasing interest and research activity on understanding black box machine learning models by means of explainable machine learning. Even though this is a step into a right direction, such methods are not able to guarantee for a fair scoring as machine learning models are not necessarily unbiased and may discriminate with respect to certain subpopulations such as a particular race, gender, or sexual orientation&#x2014;even if the variable itself is not used for modeling. This is also true for white box methods like logistic regression. In this study, a framework is presented that allows analyzing and developing models with regard to fairness. The proposed methodology is based on techniques of causal inference and some of the methods can be linked to methods from explainable machine learning. A definition of counterfactual fairness is given together with an algorithm that results in a fair scoring model. The concepts are illustrated by means of a transparent simulation and a popular real-world example, the German Credit data using traditional scorecard models based on logistic regression and weight of evidence variable pre-transform. In contrast to previous studies in the field for our study, a corrected version of the data is presented and used. With the help of the simulation, the trade-off between fairness and predictive accuracy is analyzed. The results indicate that it is possible to remove unfairness without a strong performance decrease unless the correlation of the discriminative attributes on the other predictor variables in the model is not too strong. In addition, the challenge in explaining the resulting scoring model and the associated fairness implications to users is discussed.</p>
</abstract>
<kwd-group>
<kwd>scoring</kwd>
<kwd>machine learning</kwd>
<kwd>causal inference</kwd>
<kwd>German credit data</kwd>
<kwd>algorithm fairness</kwd>
<kwd>explainable machine learning</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>The use of algorithmic scoring methods is very common in the finance industry for several decades in order to prevent risk and to automate and optimize decisions (<xref ref-type="bibr" rid="B6">Crook et&#x20;al., 2007</xref>). Regulatory requirements as given by the Basel Committee on Banking Supervision (BCBS) (<xref ref-type="bibr" rid="B10">European Banking Authority, 2017</xref>) or the EU data protection regulations (<xref ref-type="bibr" rid="B12">Goodman and Flaxman, 2017</xref>) have led to an increasing interest and research activity on understanding black box machine learning models by means of explainable machine learning (cf. e.g., <xref ref-type="bibr" rid="B3">B&#xfc;cker et&#x20;al., 2021</xref>). Even though this is a step into a right direction, such methods are not able to guarantee for a fair scoring as machine learning models are not necessarily unbiased and may discriminate with respect to certain subpopulations such as a particular race, gender, or sexual orientation&#x2014;even if the variable itself is not used for modeling. This is also true for white box methods like logistic regression.</p>
<p>In the study by <xref ref-type="bibr" rid="B23">O&#x2019;Neil (2016)</xref>, several popular examples are listed as to how algorithmic decisions enter and potentially negatively impact everyday lives. An expert group on the AI setup by the European Commission has worked out an assessment list for trustworthy artificial intelligence (ALTAI), where one requirement consists in diversity, non-discrimination, and fairness (<xref ref-type="bibr" rid="B9">EU Expert Group on AI, 2019</xref>).</p>
<p>There are different definitions of algorithm fairness. An overview is given by <xref ref-type="bibr" rid="B33">Verma and Rubin (2018)</xref> and will be summarized in <xref ref-type="sec" rid="s2">Section 2</xref>. In the remainder of the section, the framework of counterfactual fairness is introduced as well as an algorithm that allows developing fair models based on techniques of causal inference (<xref ref-type="bibr" rid="B24">Pearl et&#x20;al., 2016</xref>). In the study by <xref ref-type="bibr" rid="B18">Kusner and Loftus (2020)</xref>, three tests on algorithmic fairness are presented.</p>
<p>Subsequently, in <xref ref-type="sec" rid="s3">Section 3</xref>, fairness is discussed from the usage context of risk scoring models: as opposed to existing crisp fairness definitions a group unfairness index is introduced to quantify the degree of fairness of a given model. This allows for a fairness comparison of different models. Furthermore, it is shown how partial dependence profiles (<xref ref-type="bibr" rid="B11">Friedman, 2001</xref>) as they are popular in the field of explainable AI can be adapted in order to enable a visual fairness analysis of a&#x20;model.</p>
<p>With the scope of the financial application context the aforementioned algorithm is applied to real-world data of credit risk scoring: the German Credit data which is publicly available by the UCI machine learning data repository (<xref ref-type="bibr" rid="B8">Dua and Graff, 2017</xref>). The data are very popular and have been used in numerous studies (cf. e.g., <xref ref-type="bibr" rid="B21">Louzada et&#x20;al., 2016</xref>). In contrast to past publications, we used a corrected version of the data in our study as it has turned out that the original data were erroneous (<xref ref-type="bibr" rid="B13">Groemping, 2019</xref>). The latter observation has to be highlighted as the data from the UCI repository have been frequently used in credit scoring research during the last decades and thus have strongly influenced research results during the last years. The data and its correction are described in <xref ref-type="sec" rid="s4-1">Section 4.1</xref>. In <xref ref-type="sec" rid="s4-2">Section 4.2</xref>, the design of a simulation study based on the corrected German credit data is set up and its results are presented in <xref ref-type="sec" rid="s5">Section 5</xref>: both a traditional scorecard model using weights of evidence and logistic regression as well as a fairness-corrected version of it are compared on the simulated data with regard to the trade-off between fairness and predictive accuracy. Finally, a summary of our results is presented in <xref ref-type="sec" rid="s6">Section&#x20;6</xref>.</p>
</sec>
<sec id="s2">
<title>2 Fairness Definitions</title>
<sec id="s2-1">
<title>2.1 Overview</title>
<p>In the literature, different attempts have been made in order to define fairness. An overview together with a discussion is given in <xref ref-type="bibr" rid="B33">Verma and Rubin (2018)</xref>. In this section, a brief summary of important concepts is given using the following notation:<list list-type="simple">
<list-item>
<p>&#x2022; <italic>Y</italic> is the observed outcome of an individual. Credit risk scoring typically consists in binary classification, that is, <italic>Y</italic>&#x20;&#x3d;&#x20;1 denotes a good and <italic>Y</italic>&#x20;&#x3d; 0 denotes a bad performance of a credit.</p>
</list-item>
<list-item>
<p>&#x2022; <italic>P</italic> is a set of one or more protected attributes. With regard to these attributes fairness should be ensured.</p>
</list-item>
<list-item>
<p>&#x2022; <italic>X</italic> are the remaining attributes used for the model (<italic>X</italic>&#x20;&#x2229;&#x20;<italic>P</italic>&#x20;&#x3d; &#x2205;).</p>
</list-item>
<list-item>
<p>&#x2022; <italic>S</italic> is the risk score, typically a strictly monotonic function in the posterior probability <italic>Pr</italic>(<italic>Y</italic>&#x20;&#x3d; <italic>y</italic>&#x7c;<italic>X</italic>&#x20;&#x3d; <italic>x</italic>, <italic>P</italic>&#x20;&#x3d; <italic>p</italic>). Without loss in generality, in the context of this study both are chosen to be identical.</p>
</list-item>
<list-item>
<p>&#x2022; <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is a decision based on the score usually given by <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> where <italic>s</italic>
<sub>0</sub> is cut&#x20;off.</p>
</list-item>
</list>
</p>
<p>Typical examples of protected attributes are gender, race, or sexual orientation. An intuitive requirement of fairness is as follows: 1) to use only variables of <italic>X</italic> but no variables of <italic>P</italic> for the risk score model (unawareness). Note that while it is unrealistic that an attribute like sexual orientation directly enters the credit application process it has been demonstrated that this information is indirectly available from our digital footprint such as our Facebook profile (<xref ref-type="bibr" rid="B35">Youyou et&#x20;al., 2015</xref>) and recent research in credit risk modeling proposes to extend credit risk modeling by including such alternative data sources (<xref ref-type="bibr" rid="B7">De Cnudde et&#x20;al., 2019</xref>). From this it is easy to see the fairness definition of unawareness is not sufficient.</p>
<p>Many fairness definitions are based on the confusion matrix as it is given in <xref ref-type="table" rid="T1">Table&#x20;1</xref>: Confusion matrices are computed depending on <italic>P</italic> and resulting measures are compared.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Confusion matrix and measures derived from&#x20;it.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">
<italic>Y</italic>&#x20;&#x3d; 1</th>
<th align="center">
<italic>Y</italic>&#x20;&#x3d; 0</th>
<th align="center">
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>
</td>
<td align="left">True positives</td>
<td align="left">False positives</td>
<td align="left">Precision: <inline-formula id="inf4">
<mml:math id="m4">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:math>
</inline-formula>
</td>
<td align="left">False negatives</td>
<td align="left">True negatives</td>
<td align="left">
</td>
</tr>
<tr>
<td align="left">
</td>
<td align="left">Sensitivity: <inline-formula id="inf6">
<mml:math id="m6">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="left">Specificity: <inline-formula id="inf7">
<mml:math id="m7">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
<p>Fairness definitions related to the acceptance rate <inline-formula id="inf8">
<mml:math id="m8">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> are as follows: 2) <italic>Group fairness</italic> <inline-formula id="inf9">
<mml:math id="m9">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> requires the acceptance rate to be independent of the protected attributes <italic>P</italic>. In addition, 3) <italic>conditional statistical parity</italic> <inline-formula id="inf10">
<mml:math id="m10">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> requires this independence to hold for any combination of realizations of a set of legal predictors <italic>X</italic>. 4) <italic>Equal opportunity</italic> <inline-formula id="inf11">
<mml:math id="m11">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and 5) <italic>predictive equality</italic> <inline-formula id="inf12">
<mml:math id="m12">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> are based on the sensitivity and specificity while <italic>balance</italic> for the positive and negative class require the expected scores 6) <italic>E</italic> (<italic>S</italic>&#x7c;<italic>Y</italic>&#x20;&#x3d; 1, <italic>P</italic>) &#x3d; <italic>E</italic> (<italic>S</italic>&#x7c;<italic>Y</italic>&#x20;&#x3d; 1) and 7) <italic>E</italic> (<italic>S</italic>&#x7c;<italic>Y</italic>&#x20;&#x3d; 0, <italic>P</italic>) &#x3d; <italic>E</italic> (<italic>S</italic>&#x7c;<italic>Y</italic>&#x20;&#x3d; 0) to be independent of the protected attributes.</p>
<p>Fairness definitions based on the predicted posterior probability <italic>Pr</italic> (<italic>Y</italic>) are as follows: 8) <italic>Predictive parity</italic> <inline-formula id="inf13">
<mml:math id="m13">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> ensures the same precision independent of the protected attributes. In addition, 9) <italic>conditional use accuracy equality</italic> <inline-formula id="inf14">
<mml:math id="m14">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> extends this definition to all levels of <italic>Y</italic>. In contrast, 10) <italic>calibration</italic> <italic>Pr</italic>(<italic>Y</italic>&#x20;&#x3d; 1&#x7c;<italic>S</italic>&#x20;&#x3d; <italic>s</italic>, <italic>P</italic>) &#x3d; <italic>Pr</italic>(<italic>Y</italic>&#x20;&#x3d; 1&#x7c;<italic>S</italic>&#x20;&#x3d; <italic>s</italic>) ensures same predicted posterior probabilities given a score, independently of the protected attributes.</p>
<p>An alternative yet intuitive fairness definition is given by 11) <italic>individual fairness</italic>: similar individuals <italic>i</italic> and <italic>j</italic> should be assigned similar scores, independently of the protected attributes: <italic>d</italic>
<sub>1</sub> (<italic>S</italic>(<italic>x</italic>
<sub>
<italic>i</italic>
</sub>), <italic>S</italic> (<italic>x</italic>
<sub>
<italic>j</italic>
</sub>)) &#x2264; <italic>d</italic>
<sub>2</sub> (<italic>x</italic>
<sub>
<italic>i</italic>
</sub>, <italic>x</italic>
<sub>
<italic>j</italic>
</sub>) where <italic>d</italic>
<sub>1</sub> (.,.) and <italic>d</italic>
<sub>2</sub> (.,.) are distance metrics in the space of the scores and the predictor variables, respectively. 12) <italic>Causal discrimination</italic> requires a credit decision <inline-formula id="inf15">
<mml:math id="m15">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for two individuals with identical values in the attributes <italic>X</italic>&#x20;&#x3d; <italic>x</italic> to be constant independently of the protected attributes <italic>P</italic>. 13) <italic>Counterfactual fairness</italic> additionally requires that <inline-formula id="inf16">
<mml:math id="m16">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> does not depend on any descendant of <italic>P</italic> and will be explained in detail in <xref ref-type="sec" rid="s2-3">Section 2.3</xref>. For an in-depth overview and discussion of the different fairness definitions it is referred to in <xref ref-type="bibr" rid="B33">Verma and Rubin (2018)</xref>.</p>
<p>It should be noted that all these criteria can be incompatible so that it can be impossible to create a model that is fair with respect to all criteria simultaneously (<xref ref-type="bibr" rid="B4">Chouldechova, 2016</xref>).</p>
</sec>
<sec id="s2-2">
<title>2.2 Causal Inference</title>
<p>
<xref ref-type="bibr" rid="B25">Pearl (2019)</xref> distinguishes three levels of causal inference as follows:<list list-type="simple">
<list-item>
<p>1) <italic>Association</italic>: <italic>Pr</italic> (<italic>y</italic>&#x7c;<italic>x</italic>): Seeing: &#x201c;What is?,&#x201d; that is, the probability of <italic>Y</italic>&#x20;&#x3d; <italic>y</italic> given that we observe <italic>X</italic>&#x20;&#x3d;&#x20;<italic>x</italic>.</p>
</list-item>
<list-item>
<p>2) <italic>Intervention</italic>: <italic>Pr</italic> (<italic>y</italic>&#x7c;<italic>do</italic>(<italic>x</italic>)): Manipulation: &#x201c;What if?,&#x201d; that is, the probability of <italic>Y</italic>&#x20;&#x3d; <italic>y</italic> given that we intervene and set the value of <italic>X</italic> to&#x20;<italic>x</italic>.</p>
</list-item>
<list-item>
<p>3) <italic>Counterfactuals</italic>: <italic>Pr</italic> (<italic>y</italic>
<sub>
<italic>x</italic>
</sub>&#x7c;<italic>x</italic>&#x2032;, <italic>y</italic>&#x2032;): Imagining: &#x201c;What if I had acted differently?,&#x201d; that is, the probability of <italic>Y</italic>&#x20;&#x3d; <italic>y</italic> if <italic>X</italic> had been <italic>x</italic> given that we actually observed <italic>x</italic>&#x2032;, <italic>y</italic>&#x2032;.</p>
</list-item>
</list>
</p>
<p>For levels 2 and 3, subject matter knowledge about the causal mechanism that generates the data is needed. This structural causal model can be encoded in a directed acyclic graph (DAG). The basic elements of such a graph reveal if adjustment for variable <italic>C</italic> may introduce or remove bias in the causal effect of <italic>X</italic> on <italic>Y</italic> (see e.g., <xref ref-type="bibr" rid="B24">Pearl et&#x20;al., 2016</xref>):<list list-type="simple">
<list-item>
<p>&#x2022; <italic>Chain</italic>: <italic>X</italic>&#x20;&#x2192; <italic>C</italic>&#x20;&#x2192; <italic>Y</italic>, where <italic>C</italic> is a mediator between <italic>X</italic> and <italic>Y</italic> and adjusting for <italic>C</italic> would mask the causal effect of <italic>X</italic> on&#x20;<italic>Y</italic>.</p>
</list-item>
<list-item>
<p>&#x2022; <italic>Fork</italic>: <italic>X</italic>&#x20;&#x2190; <italic>C</italic>&#x20;&#x2192; <italic>Y</italic>, where <italic>C</italic> is a common cause of <italic>X</italic> and <italic>Y</italic> and adjusting for <italic>C</italic> would block the noncausal path between <italic>X</italic> and&#x20;<italic>Y</italic>.</p>
</list-item>
<list-item>
<p>&#x2022; <italic>Collider</italic>: <italic>X</italic>&#x20;&#x2192; <italic>C</italic>&#x20;&#x2190; <italic>Y</italic>, where <italic>C</italic> is a common effect of <italic>X</italic> and <italic>Y</italic> and adjusting for <italic>C</italic> would open a biasing path between <italic>X</italic> and&#x20;<italic>Y</italic>.</p>
</list-item>
</list>
</p>
<p>
<xref ref-type="bibr" rid="B22">Luebke et&#x20;al. (2020)</xref> provide easy to follow examples to illustrate these. In order to calculate the counterfactual (level 3) the assumed structural causal model and observed data is used in a three-step process as follows:<list list-type="simple">
<list-item>
<p>1) <italic>Abduction</italic>: Use the evidence, that is, data to determine the exogeneous variables, for example, error term, in a given structual causal model. For example assume a causal model with additive exogeneous <italic>Y</italic>&#x20;&#x3d; <italic>f</italic>(<italic>X</italic>) &#x2b; <italic>U</italic> and calculate <italic>u</italic> for a given observation <italic>x</italic>&#x2032;, <italic>y</italic>&#x2032;.</p>
</list-item>
<list-item>
<p>2) <italic>Action</italic>: Substitute in the causal model the values for <italic>X</italic> with the counterfactual <italic>x</italic> (instead of <italic>x</italic>&#x2032;).</p>
</list-item>
<list-item>
<p>3) <italic>Prediction</italic>: Calculate <italic>Y</italic> based on the previous&#x20;steps.</p>
</list-item>
</list>
</p>
<p>For a more detailed introduction the reader is referred to <xref ref-type="bibr" rid="B24">Pearl et&#x20;al. (2016)</xref>.</p>
</sec>
<sec id="s2-3">
<title>2.3 Counterfactual Fairness</title>
<p>Causal counterfactual thinking enables the notion of counterfactual fairness as follows:<disp-formula id="equ1">
<mml:math id="m17">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>&#x0176;</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>&#x0176;</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<disp-quote>
<p>&#x201c;Would the credit decision have been the same if the protected attribute had taken a different value (e.g. if the applicant had been male instead of female)?&#x201d;</p>
</disp-quote>
<p>
<xref ref-type="bibr" rid="B19">Kusner et&#x20;al. (2017)</xref> present a FairLearning algorithm which can be considered as a preprocessing debiaser in the context of <xref ref-type="bibr" rid="B1">Agrawal et&#x20;al. (2020)</xref>, that is, a transformation of the attributes before the modeling by means of the subsequent machine learning algorithm. It consists of three levels as follows:<list list-type="simple">
<list-item>
<p>1) Prediction of <italic>Y</italic> is only based on non-descendants of&#x20;<italic>P</italic>.</p>
</list-item>
<list-item>
<p>2) Use of postulated background variables.</p>
</list-item>
<list-item>
<p>3) Fully deterministic model with latent variables where the error term can be used as an input for the prediction of&#x20;<italic>Y</italic>.</p>
</list-item>
</list>
</p>
<p>It should be noted that in general causal modeling is non-parametric and therefore any machine learning method may be employed. In order to illustrate the concept we utilize a (simple) linear model with a least squares regression of the attributes <italic>X</italic> (e.g., status in the example below) on the protected attributes <italic>P</italic> (e.g., gender) assuming an independent error. The resulting residuals (<italic>E</italic>) are subsequently used to model the <italic>Y</italic> (e.g., default) instead of the original attributes <italic>X</italic> which may depend on the protected attributes <italic>P</italic>. Our algorithm can be summarized as follows:<list list-type="simple">
<list-item>
<p>1) Regress X on&#x20;P.</p>
</list-item>
<list-item>
<p>2) Calculate residuals <inline-formula id="inf17">
<mml:math id="m18">
<mml:mi>E</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>3) Model <italic>Y</italic> by&#x20;<italic>E</italic>.</p>
</list-item>
</list>
</p>
</sec>
</sec>
<sec id="s3">
<title>3 Analyzing Fairness of Credit Risk Scoring Models</title>
<sec id="s3-1">
<title>3.1 Quantifying Fairness</title>
<p>The definitions as presented in the previous subsection are crisp in the sense that a model can be either fair or not. It might be desirable to quantify the degree of fairness of a model. In <xref ref-type="sec" rid="s2">Section 2</xref> different competing definitions of fairness are presented. It can be shown that sometimes they are even mutually exclusive, for example, in <xref ref-type="bibr" rid="B4">Chouldechova (2016)</xref> it is shown that for a calibrated model (i.e.,&#x20;<italic>Pr</italic>(<italic>Y</italic>&#x20;&#x3d; 1&#x7c;<italic>S</italic>&#x20;&#x3d; <italic>s</italic>, <italic>P</italic>) &#x3d; <italic>Pr</italic>(<italic>Y</italic>&#x20;&#x3d; 1&#x7c;<italic>S</italic>&#x20;&#x3d; <italic>s</italic>), cf. above) not both equal opportunity (i.e.,&#x20;<inline-formula id="inf18">
<mml:math id="m19">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>) and predictive equality (i.e.,&#x20;<inline-formula id="inf19">
<mml:math id="m20">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>) can be given as long as there are different prior probabilities <italic>Pr</italic>(<italic>Y</italic>&#x20;&#x3d; 1&#x7c;<italic>P</italic>&#x20;&#x3d; <italic>p</italic>) with respect to the protected attributes. Although each of the definitions can be motivated for the credit scoring business context the group fairness which takes into account for the acceptance rates seems to be of major relevance. For this reason we concentrate on group fairness in order to quantify fairness of credit scoring models: By <inline-formula id="inf20">
<mml:math id="m21">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> the distribution of <inline-formula id="inf21">
<mml:math id="m22">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> with regard to the protected attributes are given. If <italic>P</italic>&#x20;&#x2208; {0, 1} is binary, like gender, a popular measure from scorecard development can be adapted, the population stability index (PSI, cf. e.g., <xref ref-type="bibr" rid="B29">Szepannek, 2020</xref>). Moreover, there are thumb rules available from literature that allows for an interpretation: <italic>PSI</italic> &#x3e; 0.25 is considered as unstable (<xref ref-type="bibr" rid="B28">Siddiqi, 2006</xref>). For our purpose, a group unfairness index (GUI) is defined as follows:<disp-formula id="e1">
<mml:math id="m23">
<mml:mi>G</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>I</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mspace width="0.17em"/>
<mml:mi>log</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>Analogously a similar index can be defined for other fairness definitions based on the acceptance rate <inline-formula id="inf22">
<mml:math id="m24">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> such as equal opportunity (cf. <xref ref-type="sec" rid="s2-1">Section 2.1</xref>). Nonetheless for the purpose of this study and the application context of credit application risk scoring we restrict to group fairness.</p>
</sec>
<sec id="s3-2">
<title>3.2 Visual Analysis of Fairness</title>
<p>A risk score <italic>S</italic>&#x2254;<italic>Pr</italic>(<italic>Y</italic>&#x20;&#x3d; <italic>y</italic>&#x7c;<italic>X</italic>&#x20;&#x3d; <italic>x</italic>, <italic>P</italic>&#x20;&#x3d; <italic>p</italic>) that is independent of <italic>P</italic> necessarily results in group fairness as <inline-formula id="inf23">
<mml:math id="m25">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>. From the field of explainable machine learning partial dependence profile (P<italic>DP)</italic> plots (<xref ref-type="bibr" rid="B11">Friedman, 2001</xref>) are known to be one of the most popular model-agnostic approaches for the purpose of understanding feature effects. The idea of PDPs can be adapted to visualize a model&#x2019;s partial profile with respect to the protected attributes <italic>P</italic> even if they are not necessarily among the predictors <italic>X</italic>:<disp-formula id="e2">
<mml:math id="m26">
<mml:mi>P</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x222b;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:math> <label>(2)</label>
</disp-formula>that is, average prediction given the protected attributes <italic>P</italic> take the value <italic>p</italic>. For our purpose, for a data set with <italic>n</italic> observations (<italic>x</italic>
<sub>
<italic>i</italic>
</sub>, <italic>p</italic>
<sub>
<italic>i</italic>
</sub>) a protected attribute dependence profile can be estimated by the conditional average<disp-formula id="e3">
<mml:math id="m27">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mi>&#xa0;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>p</mml:mn>
</mml:mrow>
<mml:mrow>
</mml:mrow>
</mml:munderover>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>In case of a fair model the protected attribute dependence profile should be constant.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Simulation Experiment</title>
<sec id="s4-1">
<title>4.1 From German Credit Data to South German Credit Data</title>
<p>Traditionally, credit scoring research has suffered from a lack of available real-world data for a long time as credit institutes are typically not willing to share their internal data. The German credit data have been collected by the StatLog project (<xref ref-type="bibr" rid="B15">Henery and Taylor, 1992</xref>) and go back to <xref ref-type="bibr" rid="B16">Hoffmann (1990)</xref>. They are freely available from the UCI machine learning repository (<xref ref-type="bibr" rid="B8">Dua and Graff, 2017</xref>) and consist of 21 variables: 7 numeric as well as 13 categorical predictors and a binary target variable where the predicted event denotes the default of a loan. The default rate on the data is has been oversampled to 0.3 on the available UCI data while the original sources report a prevalence of bad credits around&#x20;0.05.</p>
<p>In the recent past, a few data sets have been made publicly available, for example, by the peer-to-peer lending company LendingClub<xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref> or FICO<xref ref-type="fn" rid="fn2">
<sup>2</sup>
</xref> but a still a huge number of studies rely on the German credit data (<xref ref-type="bibr" rid="B21">Louzada et&#x20;al., 2016</xref>). This is even more notable as in <xref ref-type="bibr" rid="B13">Groemping (2019)</xref> it has been figured out that the data available in the UCI machine learning repository are erroneous, for example, the percentage of foreign workers in the UCI data is 0.963 (instead of 0.037) because the labels have been swapped. In total, eleven of the 20 predictor variables had to be corrected. For seven of them (A1: Status of existing checking account, A3: Credit history, A6: Savings account/bonds, A12: Property, A15: Housing, and A20: Foreign worker) the label assignments were wrong and for one of them (A9: Personal status and sex) even two of the levels (female non-singles and male singles) had to be merged as they can&#x2019;t be distinguished anymore. In addition, four other variables (A8: Installment rate in percentage of disposable income, A11: Present residence since, A16: Number of existing credits at this bank, and A18: Number of people being liable to provide maintenance for) which originally represent numeric attributes that are only available after binning such that their numeric values represent nothing but group indexes. For this reason the values of these variables are replaced by the corresponding bin labels. A table of all the changes can be found in the <xref ref-type="sec" rid="s11">Supplementary Table S1</xref>. The corrected data set has been made publicly available on the UCI machine learning repository under the name south German credit data<xref ref-type="fn" rid="fn3">
<sup>3</sup>
</xref> (<xref ref-type="bibr" rid="B13">Groemping, 2019</xref>).</p>
<p>For further modeling in this study the data have been randomly split into 70% training and 30% test data. Note that the size of the data is pretty small but as traditional scorecard development requires a manual plausibility check of the binning cross validation is not an option here (cf. also <xref ref-type="bibr" rid="B29">Szepannek, 2020</xref>).</p>
</sec>
<sec id="s4-2">
<title>4.2 Simulation of the Protected Attribute</title>
<p>A simulation study is conducted in order to compare both a traditional and a fair scoring model under different degrees of influence of the protected attributes. Note that the original variable personal status and sex (A9) does not allow for a unique distinction between men and women (cf. previous subsection) and cannot be used for this purpose. For this reason this variable has been removed from the&#x20;data.</p>
<p>In traditional scorecard modeling information values (IVs, <xref ref-type="bibr" rid="B28">Siddiqi, 2006</xref>) are often considered to assess the ability of single variables to discriminate good and bad customers. <xref ref-type="table" rid="T2">Table&#x20;2</xref> shows the IVs for the scorecard variables. In order to analyze the impact of building fair scoring models the data have been extended by an artificial protected variable Gender to mimic A9. For this purpose the variable Status with largest IV has been selected to construct the new protected variable. As it is shown in <xref ref-type="table" rid="T3">Table&#x20;3</xref>, in the first step two of the status-levels are assigned to women and the other two variables are assigned to men, respectively. In consequence, women take the lower risk compared to men from their corresponding status levels in the artificial data. The resulting graph is<disp-formula id="equ2">
<mml:math id="m28">
<mml:mtext>Gender</mml:mtext>
<mml:mo>&#x2192;</mml:mo>
<mml:mtext>Status</mml:mtext>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>.</mml:mo>
</mml:math>
</disp-formula>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Information values of the scorecard variables and the removed variable personal status and&#x20;sex.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Variable</th>
<th align="center">Status</th>
<th align="center">Credit history</th>
<th align="center">Duration</th>
<th align="center">Purpose</th>
<th align="center">Savings</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">IV</td>
<td align="char" char=".">0.672</td>
<td align="char" char=".">0.298</td>
<td align="char" char=".">0.254</td>
<td align="char" char=".">0.238</td>
<td align="char" char=".">0.154</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Construction of the variable gender for the training&#x20;data.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Status</th>
<th align="center">Female</th>
<th align="center">Male</th>
<th align="center">Default rate</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">No checking account</td>
<td align="center">0</td>
<td align="center">197</td>
<td align="char" char=".">0.452</td>
</tr>
<tr>
<td align="left">&#x2026; &#x3c; 0 DM</td>
<td align="center">0</td>
<td align="center">184</td>
<td align="char" char=".">0.397</td>
</tr>
<tr>
<td align="left">0&#x20;&#x3c; &#x3d; &#x2026; &#x3c; 200 DM</td>
<td align="center">44</td>
<td align="center">0</td>
<td align="char" char=".">0.205</td>
</tr>
<tr>
<td align="left">&#x2026; &#x3e; &#x3d; 200 DM/salary for at least 1&#xa0;year</td>
<td align="center">275</td>
<td align="center">0</td>
<td align="char" char=".">0.105</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Note that the model does not use the ethically critical variables Personal status and sex and Foreign worker nor the new simulated protected variable Gender (cf. <xref ref-type="sec" rid="s5">Section 5</xref>) and thus fulfills the definition of fairness through <italic>unawareness</italic>. But as it can be seen in the results of the next Section simply preventing protected variables from entering the model is not sufficient to obtain a fair scoring model (cf. also <xref ref-type="bibr" rid="B19">Kusner et&#x20;al., 2017</xref>).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>As Lemma 1 of <xref ref-type="bibr" rid="B19">Kusner et&#x20;al. (2017)</xref> states <inline-formula id="inf24">
<mml:math id="m29">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> can only be counterfactually fair if it is function of the non-descendants of <italic>P</italic> we illustrate the concept by using a chain where by construction the attribute <italic>X</italic> (Status) is a descendant of the protected attribute <italic>P</italic>. The idea can be generalized for larger sets of <italic>X</italic>,&#x20;<italic>P</italic>.</p>
<p>In a second step the strength effect of gender on the status is varied by randomly switching between 0<italic>%</italic> and 50<italic>%</italic> of the males into females and vice versa. As a result, the designed effect of gender on status is disturbed to some extent and only holds for the remaining observations. The degree of dependence between both categorical variables is measured using Cramer&#x2019;s V (<xref ref-type="bibr" rid="B5">Cram&#xe9;r, 1946</xref>,&#x20;282).</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s5">
<title>5 Results and Discussion</title>
<p>Logistic regression still represents the gold standard for credit risk scorecard modeling (<xref ref-type="bibr" rid="B6">Crook et&#x20;al., 2007</xref>; <xref ref-type="bibr" rid="B29">Szepannek, 2020</xref>) even if in the recent past many studies have demonstrated potential benefits from using modern machine learning algorithms (cf. e.g., <xref ref-type="bibr" rid="B20">Lessmann et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B2">Bischl et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B21">Louzada et&#x20;al., 2016</xref>). For this reason, a traditional scorecard using logistic regression is created as a baseline model for the simulation study. The model is built using preliminary automatic binning (based on the <italic>&#x3c7;</italic>
<sup>2</sup> statistic and a maximum set to six bins per variable) with subsequent assignment of weights of evidence (<inline-formula id="inf25">
<mml:math id="m30">
<mml:mi>W</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">&#x2223;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>) to the bins (<xref ref-type="bibr" rid="B34">Xie, 2020</xref>). For plausibility reasons the bins of four of the variables (Duration, Employment duration, Amount, as well as Age) are manually updated where only one of them (Duration) did enter the final model after BIC based stepward forward variable selection (cf. <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Automatically created bins <bold>(A)</bold> for the variable duration and manual update <bold>(B)</bold>: a plausible trend of increasing risk with increading duration.</p>
</caption>
<graphic xlink:href="frai-04-681915-g001.tif"/>
</fig>
<p>For plausibility reasons (i.e.,&#x20;the observed default rates for the different levels) the variable property has been removed from the data as from a business point of view there is no plausible reason for the observed increase in risk for owners of cars, life insurance or real estate (cf. <xref ref-type="table" rid="T4">Table&#x20;4</xref>). After forward variable selection using BIC on the training data the resulting scorecard model uses five input variables as they are listed in <xref ref-type="table" rid="T2">Table&#x20;2</xref>. The equation of the resulting logistic regression model is given in <xref ref-type="table" rid="T5">Table&#x20;5</xref>. The corresponding scorecard model with frequencies and default rates for all classes can be found in <xref ref-type="table" rid="T6">Table&#x20;6</xref>.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Default rates of the variable property on the training&#x20;data.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Property</th>
<th align="center">Non-default</th>
<th align="center">Default</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Unknown/no property</td>
<td align="char" char=".">0.79</td>
<td align="char" char=".">0.21</td>
</tr>
<tr>
<td align="left">Car or other</td>
<td align="char" char=".">0.72</td>
<td align="char" char=".">0.28</td>
</tr>
<tr>
<td align="left">Building soc. savings agr./life insurance</td>
<td align="char" char=".">0.73</td>
<td align="char" char=".">0.27</td>
</tr>
<tr>
<td align="left">Real estate</td>
<td align="char" char=".">0.54</td>
<td align="char" char=".">0.46</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Coefficients of the logistic regression&#x20;model.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Variable</th>
<th align="center">
<inline-formula id="inf26">
<mml:math id="m31">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Intercept</td>
<td align="char" char=".">&#x2212;0.910</td>
</tr>
<tr>
<td align="left">WOE (Status)</td>
<td align="char" char=".">0.830</td>
</tr>
<tr>
<td align="left">WOE (Duration)</td>
<td align="char" char=".">1.063</td>
</tr>
<tr>
<td align="left">WOE (Purpose)</td>
<td align="char" char=".">1.098</td>
</tr>
<tr>
<td align="left">WOE (Credit history)</td>
<td align="char" char=".">0.838</td>
</tr>
<tr>
<td align="left">WOE (Savings)</td>
<td align="char" char=".">0.758</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Resulting scorecard model. In practice it is usual to assign scorecard points to the posterior probabilities as given by the score. Here, a calibration with 500 points at odds of 1/19 and 20 points to double the odds is&#x20;used.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Variable</th>
<th align="center">Bin</th>
<th align="center">Points</th>
<th align="center">Default rate</th>
<th align="center">Distribution</th>
<th align="center">&#x23;Good</th>
<th align="center">&#x23;Bad</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Basepoints</td>
<td align="center">&#x2014;</td>
<td align="center">441</td>
<td align="center">&#x2014;</td>
<td align="center">&#x2014;</td>
<td align="center">&#x2014;</td>
<td align="center">&#x2014;</td>
</tr>
<tr>
<td align="left">Status</td>
<td align="center">No checking account</td>
<td align="center">&#x2013;17</td>
<td align="char" char=".">0.45</td>
<td align="char" char=".">0.28</td>
<td align="center">108</td>
<td align="center">89</td>
</tr>
<tr>
<td align="left">Status</td>
<td align="center">&#x2026; &#x3c; 0 DM</td>
<td align="center">&#x2013;12</td>
<td align="char" char=".">0.40</td>
<td align="char" char=".">0.26</td>
<td align="center">111</td>
<td align="center">73</td>
</tr>
<tr>
<td align="left">Status</td>
<td align="center">0&#x20;&#x3c; &#x3d; &#x2026; &#x3c; 200 DM</td>
<td align="center">11</td>
<td align="char" char=".">0.20</td>
<td align="char" char=".">0.06</td>
<td align="center">35</td>
<td align="center">9</td>
</tr>
<tr>
<td align="left">Status</td>
<td align="center">&#x2026; &#x3e; &#x3d; 200 DM/salary for at least 1&#xa0;year</td>
<td align="center">29</td>
<td align="char" char=".">0.11</td>
<td align="char" char=".">0.39</td>
<td align="center">246</td>
<td align="center">29</td>
</tr>
<tr>
<td align="left">Duration</td>
<td align="center">(&#x2212;Inf, 8)</td>
<td align="center">32</td>
<td align="char" char=".">0.12</td>
<td align="char" char=".">0.09</td>
<td align="center">57</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">Duration</td>
<td align="center">(8, 18)</td>
<td align="center">10</td>
<td align="char" char=".">0.22</td>
<td align="char" char=".">0.35</td>
<td align="center">189</td>
<td align="center">54</td>
</tr>
<tr>
<td align="left">Duration</td>
<td align="center">(18, 44)</td>
<td align="center">&#x2013;5</td>
<td align="char" char=".">0.32</td>
<td align="char" char=".">0.49</td>
<td align="center">233</td>
<td align="center">109</td>
</tr>
<tr>
<td align="left">Duration</td>
<td align="center">(44, Inf)</td>
<td align="center">&#x2013;38</td>
<td align="char" char=".">0.58</td>
<td align="char" char=".">0.07</td>
<td align="center">21</td>
<td align="center">29</td>
</tr>
<tr>
<td align="left">Purpose</td>
<td align="center">Repairs</td>
<td align="center">&#x2013;25</td>
<td align="char" char=".">0.47</td>
<td align="char" char=".">0.05</td>
<td align="center">19</td>
<td align="center">17</td>
</tr>
<tr>
<td align="left">Purpose</td>
<td align="center">Domestic appliances, business, others, radio/television</td>
<td align="center">&#x2013;16</td>
<td align="char" char=".">0.40</td>
<td align="char" char=".">0.29</td>
<td align="center">124</td>
<td align="center">81</td>
</tr>
<tr>
<td align="left">Purpose</td>
<td align="center">Retraining</td>
<td align="center">&#x2013;2</td>
<td align="char" char=".">0.30</td>
<td align="char" char=".">0.10</td>
<td align="center">49</td>
<td align="center">21</td>
</tr>
<tr>
<td align="left">Purpose</td>
<td align="center">Car (used)</td>
<td align="center">2</td>
<td align="char" char=".">0.27</td>
<td align="char" char=".">0.17</td>
<td align="center">89</td>
<td align="center">33</td>
</tr>
<tr>
<td align="left">Purpose</td>
<td align="center">Furniture/equipment</td>
<td align="center">17</td>
<td align="char" char=".">0.19</td>
<td align="char" char=".">0.27</td>
<td align="center">151</td>
<td align="center">35</td>
</tr>
<tr>
<td align="left">Purpose</td>
<td align="center">Car (new), vacation</td>
<td align="center">23</td>
<td align="char" char=".">0.16</td>
<td align="char" char=".">0.12</td>
<td align="center">68</td>
<td align="center">13</td>
</tr>
<tr>
<td align="left">Credit history</td>
<td align="center">Delay in paying off in the past, critical account/other credits elsewhere</td>
<td align="center">&#x2013;32</td>
<td align="char" char=".">0.60</td>
<td align="char" char=".">0.08</td>
<td align="center">22</td>
<td align="center">33</td>
</tr>
<tr>
<td align="left">Credit history</td>
<td align="center">No credits taken/all credits paid back duly</td>
<td align="center">&#x2013;3</td>
<td align="char" char=".">0.31</td>
<td align="char" char=".">0.54</td>
<td align="center">261</td>
<td align="center">116</td>
</tr>
<tr>
<td align="left">Credit history</td>
<td align="center">Existing credits paid back duly till now</td>
<td align="center">0</td>
<td align="char" char=".">0.29</td>
<td align="char" char=".">0.09</td>
<td align="center">45</td>
<td align="center">18</td>
</tr>
<tr>
<td align="left">Credit history</td>
<td align="center">All credits at this bank paid back duly</td>
<td align="center">18</td>
<td align="char" char=".">0.16</td>
<td align="char" char=".">0.29</td>
<td align="center">172</td>
<td align="center">33</td>
</tr>
<tr>
<td align="left">Savings</td>
<td align="center">Unknown/no savings account</td>
<td align="center">&#x2013;5</td>
<td align="char" char=".">0.34</td>
<td align="char" char=".">0.60</td>
<td align="center">276</td>
<td align="center">141</td>
</tr>
<tr>
<td align="left">Savings</td>
<td align="center">&#x2026; &#x3c; 100 DM</td>
<td align="center">&#x2013;3</td>
<td align="char" char=".">0.32</td>
<td align="char" char=".">0.11</td>
<td align="center">52</td>
<td align="center">24</td>
</tr>
<tr>
<td align="left">Savings</td>
<td align="center">100&#x20;&#x3c; &#x3d; &#x2026; &#x3c; 500 DM</td>
<td align="center">13</td>
<td align="char" char=".">0.18</td>
<td align="char" char=".">0.06</td>
<td align="center">37</td>
<td align="center">8</td>
</tr>
<tr>
<td align="left">Savings</td>
<td align="center">500&#x20;&#x3c; &#x3d; &#x2026; &#x3c; 1000 DM, &#x2026; &#x3e; &#x3d; 1000 DM</td>
<td align="center">15</td>
<td align="char" char=".">0.17</td>
<td align="char" char=".">0.23</td>
<td align="center">135</td>
<td align="center">27</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In addition to the traditional scorecard baseline model fair models are developed according to the algorithm presented in <xref ref-type="sec" rid="s2-3">Section 2.3</xref> by regressing <italic>WOE</italic> (<italic>Status</italic>) on the protected attribute Gender and using the residuals instead of the original variable Status as a new input variable, a level 3 assumption for a causal model in <xref ref-type="bibr" rid="B19">Kusner et&#x20;al. (2017)</xref>.</p>
<p>
<xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shows the results of the simulation study for different levels of dependence (measured by Cramer&#x2019;s V) between protected variable Gender and Status in terms of both: performance of the model (in terms of the Gini coefficient) as well as the group unfairness index calculated on the training data. Note that for companies depending on the business strategy and corresponding acceptance rates it can be more suitable to put more emphasis on other performance measures such as the partial AUC (<xref ref-type="bibr" rid="B26">Robin et&#x20;al., 2011</xref>) or the expected maximum profit (<xref ref-type="bibr" rid="B32">Verbraken et&#x20;al., 2014</xref>). Nonetheless, for the purpose of this study we decided to use the Gini coefficient <inline-formula id="inf27">
<mml:math id="m32">
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> as it represents the most commonly used performance measure in credit scoring. In order to compute the GUI a cut off <italic>s</italic>
<sub>0</sub> for the score <italic>S</italic> has to be defined: For the scope of the simulation study within this study the cut off has been set to the portfolio default rate 0.3, that is, an application is rejected if <inline-formula id="inf28">
<mml:math id="m33">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0.3</mml:mn>
</mml:math>
</inline-formula>. In practice, rejecting all customers with a risk above average will lead to an unrealistically high rejection rate. Therefore also the GUI is computed for a second cut off value of <italic>s</italic>
<sub>0</sub> &#x3d; 0.5. Both results are given in <xref ref-type="table" rid="T7">Table&#x20;7</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Fairness-performance trade-off: performance (solid) and unfairness (dashed) for traditional (blue) and fairness-corrected (green) model for different levels of correlation between the protected variable gender and the prediction variable status. The red dotted line indicates the thumb rule for unfairness.</p>
</caption>
<graphic xlink:href="frai-04-681915-g002.tif"/>
</fig>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>Results of the simulation study: Group unfairness index (GUI) of both the traditional as well as the fair model and performance on the test data of the fair model for different levels of dependence (Cramer&#x2019;s V) between the protected attribute and the variable status.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Cramer&#x2019;s V</th>
<th align="center">
<italic>GUI</italic>
<sub>
<italic>trad</italic>.</sub>(0.3)</th>
<th align="center">
<italic>GUI</italic>
<sub>
<italic>fair</italic>
</sub> (0.3)</th>
<th align="center">
<italic>GUI</italic>
<sub>
<italic>trad</italic>.</sub>(0.5)</th>
<th align="center">
<italic>GUI</italic>
<sub>
<italic>fair</italic>
</sub> (0.5)</th>
<th align="center">
<italic>Gini</italic>
<sub>
<italic>fair</italic>, <italic>test</italic>
</sub>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="char" char=".">1.00</td>
<td align="char" char=".">2.620</td>
<td align="char" char=".">0.141</td>
<td align="char" char=".">1.477</td>
<td align="char" char=".">0.078</td>
<td align="char" char=".">0.491</td>
</tr>
<tr>
<td align="char" char=".">0.90</td>
<td align="char" char=".">2.014</td>
<td align="char" char=".">0.145</td>
<td align="char" char=".">1.130</td>
<td align="char" char=".">0.120</td>
<td align="char" char=".">0.500</td>
</tr>
<tr>
<td align="char" char=".">0.80</td>
<td align="char" char=".">1.160</td>
<td align="char" char=".">0.112</td>
<td align="char" char=".">0.837</td>
<td align="char" char=".">0.057</td>
<td align="char" char=".">0.538</td>
</tr>
<tr>
<td align="char" char=".">0.67</td>
<td align="char" char=".">0.732</td>
<td align="char" char=".">0.053</td>
<td align="char" char=".">0.384</td>
<td align="char" char=".">0.005</td>
<td align="char" char=".">0.563</td>
</tr>
<tr>
<td align="char" char=".">0.57</td>
<td align="char" char=".">0.530</td>
<td align="char" char=".">0.050</td>
<td align="char" char=".">0.271</td>
<td align="char" char=".">0.002</td>
<td align="char" char=".">0.557</td>
</tr>
<tr>
<td align="char" char=".">0.47</td>
<td align="char" char=".">0.321</td>
<td align="char" char=".">0.040</td>
<td align="char" char=".">0.149</td>
<td align="char" char=".">0.001</td>
<td align="char" char=".">0.560</td>
</tr>
<tr>
<td align="char" char=".">0.38</td>
<td align="char" char=".">0.243</td>
<td align="char" char=".">0.037</td>
<td align="char" char=".">0.128</td>
<td align="char" char=".">0.000</td>
<td align="char" char=".">0.559</td>
</tr>
<tr>
<td align="char" char=".">0.26</td>
<td align="char" char=".">0.158</td>
<td align="char" char=".">0.049</td>
<td align="char" char=".">0.063</td>
<td align="char" char=".">0.003</td>
<td align="char" char=".">0.560</td>
</tr>
<tr>
<td align="char" char=".">0.17</td>
<td align="char" char=".">0.030</td>
<td align="char" char=".">0.014</td>
<td align="char" char=".">0.018</td>
<td align="char" char=".">0.003</td>
<td align="char" char=".">0.561</td>
</tr>
<tr>
<td align="char" char=".">0.09</td>
<td align="char" char=".">0.000</td>
<td align="char" char=".">0.001</td>
<td align="char" char=".">0.005</td>
<td align="char" char=".">0.001</td>
<td align="char" char=".">0.560</td>
</tr>
<tr>
<td align="char" char=".">0.07</td>
<td align="char" char=".">0.011</td>
<td align="char" char=".">0.000</td>
<td align="char" char=".">0.000</td>
<td align="char" char=".">0.015</td>
<td align="char" char=".">0.548</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The solid lines indicate performance on the test data for the traditional (blue) and the fair model (green). The traditional model is unaffected by the protected variable and thus of constant performance with a Gini coefficient of 0.554. Remarkably, for some of the simulated data sets the fair model even slightly outperforms the traditional one which might be explained by the small number of observations in the data resulting in a large 95<italic>%</italic> (bootstrap) confidence interval (<xref ref-type="bibr" rid="B26">Robin et&#x20;al., 2011</xref>) of (0.437,0.667) and only small performance decrease due to the fairness correction. The corresponding dashed lines show the group unfairness index of both models where the additional dotted red line represents the thumb rule threshold of 0.25 indicating unfairness. For the traditional model the threshold is already exceeded for dependencies as small as Cramer&#x2019;s V &#x3d; 0.3 while for the fair model it always stays below the threshold. Even more interesting for small and moderate levels of dependence (Cramer&#x2019;s V &#x2264; 0.6) there is no performance decrease observed while at the same time fairness can be increased.</p>
<p>
<xref ref-type="fig" rid="F3">Figure&#x20;3</xref> shows an example of the partial dependence profiles (cf. <xref ref-type="sec" rid="s3-2">Section 3.2</xref>) for the traditional and the fairness-corrected model on one of the simulated data sets (Cramer&#x2019;s V &#x3d; 0.47, cf. <xref ref-type="table" rid="T7">Table&#x20;7</xref>). For these data no strong differences in performance are observed (0.554 vs. 0.560) but the GUI of 0.321 of the traditional model indicates unfairness. This is also reflected by the profile plots where a shift in the score point distributions can be noticed for the traditional model (<inline-formula id="inf29">
<mml:math id="m34">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mi>&#xa0;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
</mml:math>
</inline-formula> 463.96 vs. <inline-formula id="inf30">
<mml:math id="m35">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mi>&#xa0;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
</mml:math>
</inline-formula> 438.32) in contrast to the fair model (<inline-formula id="inf31">
<mml:math id="m36">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mi>&#xa0;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
</mml:math>
</inline-formula> 453.57 vs. <inline-formula id="inf32">
<mml:math id="m37">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mi>&#xa0;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
</mml:math>
</inline-formula> 447.01) while at the same time the standard deviation of the points for both models, which is often and indicator for the predictive power of the model, remains pretty similar: <inline-formula id="inf33">
<mml:math id="m38">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
</mml:math>
</inline-formula> 39.28 vs. <inline-formula id="inf34">
<mml:math id="m39">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
</mml:math>
</inline-formula>&#x20;38.68.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Protected attribute dependence plot of the traditional scorecard <bold>(A)</bold> vs. the fairness-corrected scorecard <bold>(B)</bold>. Note that the calibrated versions of the scores are used with 500 points at odds of 1/19 and 20 points to double the&#x20;odds.</p>
</caption>
<graphic xlink:href="frai-04-681915-g003.tif"/>
</fig>
<p>Along with these promising results another side effect can be noticed: As a consequence of gender-wise fairness correction there are different WOEs for both genders in all bins and consequently also different scorecard points for each gender as it can be seen in <xref ref-type="table" rid="T8">Table&#x20;8</xref>. Thus, the price of having a fair scoring model is different points with respect to the protected attributes (here: Gender). This can be difficult to explain to technically less familiar users or moreover, this can be even critical under the regulation constraints of the customers&#x2019; right to explanation of algorithmic decisions (<xref ref-type="bibr" rid="B12">Goodman and Flaxman, 2017</xref>). Not enough, a traditional plausibility check during the scorecard modeling process concerns monotonicity of the WOEs with respect to the default rates (<xref ref-type="bibr" rid="B29">Szepannek, 2020</xref>) which now has to be done for all levels of the protected attribute and is not necessarily given anymore after fairness correction. Note that also in our example the order of the default rates of the two bins with the highest risk of the variable Status has changed for the female customers.</p>
<table-wrap id="T8" position="float">
<label>TABLE 8</label>
<caption>
<p>Comparison of the variable Status for the traditional model (left) and female (center) and male (right) gender in the fair&#x20;model.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Bin</th>
<th align="center">Dist</th>
<th align="center">%Bad</th>
<th align="center">Woe</th>
<th align="center">Points</th>
<th align="center">Dist&#x7c;f</th>
<th align="center">%Bad&#x7c;f</th>
<th align="center">Woe&#x7c;f</th>
<th align="center">Points&#x7c;f</th>
<th align="center">Dist&#x7c;m</th>
<th align="center">%Bad&#x7c;m</th>
<th align="center">Woe&#x7c;m</th>
<th align="center">Points&#x7c;m</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">No checking account</td>
<td align="char" char=".">0.281</td>
<td align="char" char=".">0.452</td>
<td align="char" char=".">0.723</td>
<td align="center">&#x2212;17</td>
<td align="char" char=".">0.142</td>
<td align="char" char=".">0.467</td>
<td align="char" char=".">1.343</td>
<td align="center">&#x2212;34</td>
<td align="char" char=".">0.397</td>
<td align="char" char=".">0.447</td>
<td align="char" char=".">0.527</td>
<td align="center">&#x2212;13</td>
</tr>
<tr>
<td align="left">&#x2026; &#x3c; 0 DM</td>
<td align="char" char=".">0.263</td>
<td align="char" char=".">0.397</td>
<td align="char" char=".">0.497</td>
<td align="center">&#x2212;12</td>
<td align="char" char=".">0.145</td>
<td align="char" char=".">0.522</td>
<td align="char" char=".">1.117</td>
<td align="center">&#x2212;28</td>
<td align="char" char=".">0.360</td>
<td align="char" char=".">0.355</td>
<td align="char" char=".">0.301</td>
<td align="center">&#x2212;8</td>
</tr>
<tr>
<td align="left">0&#x20;&#x3c; &#x3d; &#x2026; &#x3c; 200 DM</td>
<td align="char" char=".">0.063</td>
<td align="char" char=".">0.205</td>
<td align="char" char=".">&#x2212;0.442</td>
<td align="center">11</td>
<td align="char" char=".">0.098</td>
<td align="char" char=".">0.194</td>
<td align="char" char=".">0.178</td>
<td align="center">&#x2212;4</td>
<td align="char" char=".">0.034</td>
<td align="char" char=".">0.231</td>
<td align="char" char=".">&#x2212;0.638</td>
<td align="center">16</td>
</tr>
<tr>
<td align="left">&#x2026; &#x3e; &#x3d; 200 DM/salary</td>
<td align="char" char=".">0.393</td>
<td align="char" char=".">0.105</td>
<td align="char" char=".">&#x2212;1.222</td>
<td align="center">29</td>
<td align="char" char=".">0.615</td>
<td align="char" char=".">0.103</td>
<td align="char" char=".">-0.602</td>
<td align="center">15</td>
<td align="char" char=".">0.209</td>
<td align="char" char=".">0.112</td>
<td align="char" char=".">&#x2212;1.418</td>
<td align="center">36</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Although in general the presented methodology can be applied to arbitrary machine learning models the changes in the data as induced by the fairness correction put even more emphasis on a deep understanding of the resulting model and corresponding methodology of interpretable machine learning to achieve this goal (cf. e.g., <xref ref-type="bibr" rid="B3">B&#xfc;cker et&#x20;al., 2021</xref> for an overview in the credit risk scoring context). Further note that as it is demonstrated in <xref ref-type="bibr" rid="B30">Szepannek (2019)</xref> the obtained interpretations bear the risk to be misleading. For this reason other authors such as <xref ref-type="bibr" rid="B27">Rudin (2019)</xref> suggest restricting interpretable models and in summary a proper analysis of the benefits of using more complex models should be done in any specific situation (<xref ref-type="bibr" rid="B31">Szepannek, 2017</xref>).</p>
<p>For the simulations in this study only one protected attribute has been created which impacts only one of the predictor variables in a comparatively simple graph structure (Gender &#x2192; Status &#x2192; <italic>Y</italic>). For more complex data situations causal search algorithms can be used to identify potential causal relationships between the variables that are in line with the observed data (<xref ref-type="bibr" rid="B14">Hauser and B&#xfc;hlmann, 2012</xref>; <xref ref-type="bibr" rid="B17">Kalisch et&#x20;al., 2012</xref>). Then all descendants of the protected attributes must be corrected accordingly.</p>
</sec>
<sec id="s6">
<title>6 Summary</title>
<p>In this study, different definitions of fairness are presented from the credit risk scoring point of view as well as a fairness correction algorithm based on the concept of counterfactual fairness. Furthermore, the idea of population stability is transferred into a new group unfairness index which allows quantifying and comparing the degree of group fairness of different scoring models. In addition, partial dependence plots are proposed to visualize the fairness of a model with respect to some protected attribute. Based on these measures, a simulation study has been set up which makes use of a corrected version of the well-known German credit data. The results of the study are quite promising: Up to some degree fairness corrections are possible without strong loss in predictive accuracy as measured by the Gini coefficient on independent test data. Nonetheless, as inherent consequence traditional scores of fairness corrected models will typically differ with respect to protected attributes which may result in a new kind of problem under the perspective of the customer&#x2019;s regulatory right for an explanation of algorithmic decisions. The explanation of algorithmic decisions gets even more complicated and future work has to be done in order to investigate the observed effects of our study for other classes of machine learning models such as random forests, gradient boosting, support vector machines, or neural networks.</p>
</sec>
</body>
<back>
<sec id="s7">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s11">Supplementary Material</xref>; further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>GS: summary of fairness definitions, methodology for quantifying and visualizing fairness of credit risk scoring models, south German credit data, and simulation study. KL: causal inference and couterfactual fairness.</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>The authors would like to thank the reviewers for their valuable feedback and acknowledge the support of the Institute of Applied Computer Science at Stralsund University of Applied Sciences for funding open access publication.</p>
</ack>
<sec id="s11">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2021.681915/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2021.681915/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material>
<label>Supplementary Table S1</label>
<caption>
<p>Summary of the differences between the German credit data and the South German credit data.</p>
</caption>
</supplementary-material>
<supplementary-material xlink:href="Table1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet1.ZIP" id="SM2" mimetype="application/ZIP" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.lendingclub.com/">https://www.lendingclub.com/</ext-link>.</p>
</fn>
<fn id="fn2">
<label>2</label>
<p>
<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://community.fico.com/s/explainable-machine-learning-challenge">https://community.fico.com/s/explainable-machine-learning-challenge</ext-link>.</p>
</fn>
<fn id="fn3">
<label>3</label>
<p>
<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29">https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29</ext-link>
</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Agrawal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Pfisterer</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Bischl</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sood</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Debiasing Classifiers: Is Reality at Variance with Expectation?</article-title> <comment>Available at: <email>http://arxiv.org/abs/2011.02407</email>
</comment>. </citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bischl</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>K&#xfc;hn</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Szepannek</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>On Class Imbalance Correction for Classification Algorithms in Credit Scoring</article-title>,&#x201d; in <source>Operations Research Proceedings 2014, Selected Papers of the Annual International Conference of the German Operations Research Society (GOR)</source>. Editors <person-group person-group-type="editor">
<name>
<surname>L&#xfc;bbecke</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Koster</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Letmathe</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Madlener</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Peis</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Walther</surname>
<given-names>G.</given-names>
</name>
</person-group>, <fpage>37</fpage>&#x2013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-28697-6&#x2216;_610.1007/978-3-319-28697-6_6</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>B&#xfc;cker</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Szepannek</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gosiewska</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Biecek</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring</article-title>. <source>J.&#x20;Oper. Res. Soc.</source> <comment>in print</comment>. <pub-id pub-id-type="doi">10.1080/01605682.2021.1922098</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Chouldechova</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1610.07524">http://arxiv.org/abs/1610.07524</ext-link>
</comment>. </citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Cram&#xe9;r</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>1946</year>). <source>Mathematical Methods of Statistics</source>. <publisher-name>Princeton University Press</publisher-name>. </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crook</surname>
<given-names>J.&#x20;N.</given-names>
</name>
<name>
<surname>Edelman</surname>
<given-names>D. B.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>L. C.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Recent Developments in Consumer Credit Risk Assessment</article-title>. <source>Eur. J.&#x20;Oper. Res.</source> <volume>183</volume>, <fpage>1447</fpage>&#x2013;<lpage>1465</lpage>. <pub-id pub-id-type="doi">10.1016/j.ejor.2006.09.100</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Cnudde</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Moeyersoms</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Stankova</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tobback</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Javaly</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Martens</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>What Does Your Facebook Profile Reveal about Your Creditworthiness? Using Alternative Data for Microfinance</article-title>. <source>J.&#x20;Oper. Res. Soc.</source> <volume>70</volume>, <fpage>353</fpage>&#x2013;<lpage>363</lpage>. <pub-id pub-id-type="doi">10.1080/01605682.2018.1434402</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Dua</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Graff</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>UCI Machine Learning Repository</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/index.php">https://archive.ics.uci.edu/ml/index.php</ext-link>
</comment>. </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<collab>EU Expert Group on AI</collab> (<year>2019</year>). <source>Ethics Guidelines for Trustworthy AI</source>. </citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<collab>European Banking Authority</collab> (<year>2017</year>). <source>Guidelines on PD Estimation, LGD Estimation and the Treatment of Defaulted Exposures</source>. </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedman</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Greedy Function Approximation: A Gradient Boosting Machine</article-title>. <source>Ann. Stat.</source> <volume>29</volume>, <fpage>1189</fpage>&#x2013;<lpage>1232</lpage>. <pub-id pub-id-type="doi">10.1214/aos/1013203451</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goodman</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Flaxman</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation"</article-title>. <source>AIMag</source> <volume>38</volume>, <fpage>50</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1609/aimag.v38i3.2741</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Groemping</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2019</year>). <source>South German Credit Data: Correcting a Widely Used Data Set</source>. <publisher-name>Department II, Beuth University of Applied Sciences Berlin</publisher-name>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www1.beuth-hochschule.de/FB/TNQUnderScoreTNQ/II/reports/Report-2019-004.pdf">http://www1.beuth-hochschule.de/FB/UnderScore/II/reports/Report-2019-004.pdf</ext-link>
</comment>. </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hauser</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>B&#xfc;hlmann</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>13</volume>, <fpage>2409</fpage>&#x2013;<lpage>2464</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://jmlr.org/papers/v13/hauser12a.html">https://jmlr.org/papers/v13/hauser12a.html</ext-link>
</comment>. </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Henery</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>C. C.</given-names>
</name>
</person-group> (<year>1992</year>). &#x201c;<article-title>StatLog: An Evaluation of Machine Learning and Statistical Algorithms</article-title>,&#x201d; in <source>Computational Statistic</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Dodge</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Whittaker</surname>
<given-names>J.</given-names>
</name>
</person-group> (<publisher-loc>Heidelberg</publisher-loc>: <publisher-name>Physica</publisher-name>), <fpage>157</fpage>&#x2013;<lpage>162</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-662-26811-7_23</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffmann</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Die Anwendung des CART-Verfahren zur statistischen Bonit&#xe4;tsanalyse</article-title>. <source>Z. f&#xfc;r Betriebswirtschaft</source> <volume>60</volume>, <fpage>941</fpage>&#x2013;<lpage>962</lpage>. </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kalisch</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>M&#xe4;chler</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Colombo</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Maathuis</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>B&#xfc;hlmann</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Causal Inference Using Graphical Models with the R Package Pcalg</article-title>. <source>J.&#x20;Stat. Softw.</source> <volume>47</volume>, <fpage>1</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v047.i11</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kusner</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Loftus</surname>
<given-names>J.&#x20;R.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>The Long Road to Fairer Algorithms</article-title>. <source>Nature</source> <volume>578</volume>, <fpage>34</fpage>&#x2013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1038/d41586-020-00274-3</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kusner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Loftus</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Counterfactual Fairness</article-title>,&#x201d; in <conf-name>Proc. 31st int. Conf. Neural Information Processing Systems NIPS&#x2019;17</conf-name>, <conf-loc>Red Hook, NY, USA</conf-loc> (<publisher-name>Curran Associates Inc.</publisher-name>), <fpage>4069</fpage>&#x2013;<lpage>4079</lpage>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lessmann</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Baesens</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Seow</surname>
<given-names>H.-V.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>L. C.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Benchmarking State-Of-The-Art Classification Algorithms for Credit Scoring: An Update of Research</article-title>. <source>Eur. J.&#x20;Oper. Res.</source> <volume>247</volume>, <fpage>124</fpage>&#x2013;<lpage>136</lpage>. <pub-id pub-id-type="doi">10.1016/j.ejor.2015.05.030</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Louzada</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Ara</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fernandes</surname>
<given-names>G. B.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Classification Methods Applied to Credit Scoring: Systematic Review and Overall Comparison</article-title>. <source>Surv. Operations Res. Manage. Sci.</source> <volume>21</volume>, <fpage>117</fpage>&#x2013;<lpage>134</lpage>. <pub-id pub-id-type="doi">10.1016/j.sorms.2016.10.001</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>L&#xfc;bke</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Gehrke</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Horst</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Szepannek</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Why We Should Teach Causal Inference: Examples in Linear Regression with Simulated Data</article-title>. <source>J.&#x20;Stat. Educ.</source> <volume>28</volume>, <fpage>133</fpage>&#x2013;<lpage>139</lpage>. <pub-id pub-id-type="doi">10.1080/10691898.2020.1752859</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>O&#x2019;Neil</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy</source>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Crown Publishing Group</publisher-name>. </citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pearl</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Glymour</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Jewell</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Causal Inference in Statistics &#x2013; a Primer</source>. <publisher-loc>Chichester, UK</publisher-loc>: <publisher-name>Wiley</publisher-name>. </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pearl</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>The Seven Tools of Causal Inference, with Reflections on Machine Learning</article-title>. <source>Commun. ACM</source> <volume>62</volume>, <fpage>54</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1145/3241036</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robin</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Turck</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hainard</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tiberti</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lisacek</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Sanchez</surname>
<given-names>J.-C.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>pROC: An Open-Source Package for R and S&#x2b; to Analyze and Compare ROC Curves</article-title>. <source>BMC Bioinformatics</source> <volume>12</volume>, <fpage>77</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-12-77</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rudin</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead</article-title>. <source>Nat. Mach Intell.</source> <volume>1</volume>, <fpage>206</fpage>&#x2013;<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-019-0048-x</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Siddiqi</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2006</year>). <source>Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring</source>. <edition>Second edition</edition>. <publisher-name>Wiley</publisher-name>. </citation>
</ref>
<ref id="B29">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Szepannek</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>An Overview on the Landscape of R Packages for Credit Scoring</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2006.11835">http://arxiv.org/abs/2006.11835</ext-link>
</comment>. </citation>
</ref>
<ref id="B30">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Szepannek</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>How Much Can We See? A Note on Quantifying Explainability of Machine Learning Models</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1910.13376">http://arxiv.org/abs/1910.13376</ext-link>
</comment>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szepannek</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>On the Practical Relevance of Modern Machine Learning Algorithms for Credit Scoring Applications</article-title>. <source>WIAS Rep. Ser.</source> <volume>29</volume>, <fpage>88</fpage>&#x2013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.20347/wias.report.29</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Verbraken</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Bravo</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Weber</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Baesens</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Development and Application of Consumer Credit Scoring Models Using Profit-Based Classification Measures</article-title>. <source>Eur. J.&#x20;Oper. Res.</source> <volume>238</volume>, <fpage>505</fpage>&#x2013;<lpage>513</lpage>. <pub-id pub-id-type="doi">10.1016/j.ejor.2014.04.001</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Verma</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Fairness Definitions Explained</article-title>,&#x201d; in <conf-name>Proc. Int. Workshop on software fairness FairWare &#x2019;18</conf-name>, <conf-loc>New York, NY, USA</conf-loc>. (<publisher-name>ACM</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1145/3194770.3194776</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Scorecard: Credit Risk Scorecard &#x2013; R Package</article-title>. <comment>version 0.3.1</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=scorecard">https://CRAN.R-project.org/package&#x3d;scorecard</ext-link>
</comment>. </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Youyou</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Kosinski</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Stillwell</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Computer-based Personality Judgments Are More Accurate Than Those Made by Humans</article-title>. <source>Proc. Natl. Acad. Sci. USA.</source> <volume>112</volume>, <fpage>1036</fpage>&#x2013;<lpage>1040</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1418680112</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>