Navigating Fairness in Machine Learning: Definitions, Types, and Metrics

Introduction

Machine learning (ML) has revolutionized various industries by enabling systems to make predictions, automate processes, and assist in decision-making. However, the deployment of machine learning models comes with ethical and societal concerns, particularly when it comes to fairness. Fairness in machine learning refers to ensuring that the outcomes and predictions generated by these models do not discriminate against certain groups or individuals based on their characteristics, such as race, gender, age, or socioeconomic status. To address these concerns, researchers and practitioners have developed various fairness metrics that play a crucial role in evaluating, monitoring, and mitigating bias in machine learning systems.

The Importance of Fairness in Machine Learning

Fairness in machine learning is essential for several reasons. First, biased algorithms can perpetuate and even exacerbate existing social inequalities. For example, a biased model used in a hiring process might unfairly disadvantage candidates from underrepresented groups, thereby perpetuating discrimination in the workplace. Second, fairness is crucial for legal and ethical compliance. Ensuring that machine‐learning (ML) models are safe, effective, and equitable across all patients is critical for clinical decision‐making and for preventing the amplification of existing health disparities. In this work, we examine how fairness is conceptualized in ML for health, including why ML models may lead to unfair decisions and how fairness has been measured in diverse real‐world applications. We review commonly used fairness notions within group, individual, and causal-based frameworks.

Sources of Bias in Machine Learning

AI systems can behave unfairly for a variety of reasons. Bias can arise at each stage of the model development process, from data collection to deployment. Each stage influences the next, with bias being potentially perpetuated and compounded throughout the development process. After deployment, an unfair model can also introduce or reinforce societal bias. These reasons can be broadly categorized into issues arising from the data itself, decisions made during the development and deployment of these systems, characteristics of the systems themselves, or pre-existing societal biases. These categories are not mutually exclusive and often exacerbate one another.

Types of Bias

Several types of bias can affect machine learning models:

Historical bias: Arises from pre-existing inequalities and prejudices in the data used to train ML models.
Representation bias: Occurs when the data used to train an ML model does not accurately represent the population it is intended to serve.
Measurement bias: Happens when the features used to train the model are imperfect proxies for the actual concepts they are meant to capture.
Aggregation bias: Results from applying a one-size-fits-all model to a diverse population.
Evaluation bias: Arises when the metrics used to assess the model’s performance are not equally valid for all groups.
Deployment bias: Occurs when the context in which the ML model is deployed differs from the context in which it was trained.
Language bias: A type of statistical sampling bias tied to the language of a query that leads to "a systematic deviation in sampling information that prevents it from accurately representing the true coverage of topics and views available in their repository."
Gender bias: The tendency of these models to produce outputs that are unfairly prejudiced towards one gender over another.
Political bias: The tendency of algorithms to systematically favor certain political viewpoints, ideologies, or outcomes over others.

Examples of Bias in Real-World Applications

Sentencing Disparities: AI models are increasingly used in the criminal justice system to aid in sentencing decisions. However, studies have shown that AI-powered algorithms can perpetuate racial biases. For instance, an AI model may unknowingly be trained on historical data that disproportionately criminalize certain racial groups. Consequently, the model may recommend harsher sentences for individuals from these groups, exacerbating existing disparities.
Biased Hiring Practices: AI-powered tools often screen and shortlist job applicants in recruitment processes. However, if the training data used to develop these models reflects biased hiring patterns, the AI system can inadvertently discriminate against specific demographics. For example, if historically male-dominated industries predominantly feature male employees, the AI model may learn to favor male candidates, perpetuating gender biases.
Predatory Lending Algorithms: In the financial sector, AI algorithms are employed to assess creditworthiness and determine loan approvals. However, if these models are built using partial historical lending data, they can inadvertently discriminate against underprivileged communities. For instance, if the training data indicates that certain minority groups have been unfairly denied loans, the AI model may adopt this discriminatory behavior, perpetuating the cycle of financial exclusion.
Healthcare Disparities: AI models are increasingly utilized in healthcare settings to aid in diagnosing diseases and recommending treatments. These models can exhibit fairness bias if trained on partial healthcare data. If historically marginalized communities have faced inadequate healthcare access, an AI model trained on such data may perpetuate these disparities.

Approaches to Conceptualizing Fairness

There are many approaches to conceptualizing fairness in classification and regression. These approaches can be broadly categorized into three main frameworks: group fairness, individual fairness, and causal fairness.

Group Fairness

Group fairness criteria require ML models to perform similarly across groups defined by a sensitive attribute (A), such as age or race. Sensitive attributes define specific protected classes, which should be treated equitably by the model. Group fairness criteria are commonly used in health and deem a model as fair if its predictions are similarly accurate or calibrated across a predefined set of groups. Other attributes used in the health‐focused literature include disability, marital status, national origin, sex, and socioeconomic status.

The criteria primarily fall into three categories: independence, separation, and sufficiency.

Independence

Under independence, an ML model is said to be fair if its decisions do not depend on the protected attribute (i.e., (D \perp A)). Statistical (or demographic) parity is a common measure of independence that requires that the model classify individuals into the positive class at the same rate in each group. Conditional statistical parity relaxes this concept by requiring the rate of positive classifications to be the same within more granular groups defined by the protected attribute and other relevant factors.

Independence‐based fairness metrics, such as statistical parity, are infrequently used in health‐focused applications as the prevalence of clinical outcomes often differs across groups defined by protected attributes (e.g., multiple sclerosis is more common in females than males). Enforcing independence may also prevent a model from learning a genuine association between the protected attribute and the outcome, potentially leading to an overall reduction in performance. However, independence‐based metrics may still be informative when the goal is to assess whether a model disproportionately assigns high‐risk predictions to specific groups.

Separation

Separation requires that the model's decisions do not depend on the protected attribute within the positive and negative classes (i.e., (D \perp A | Y)). This implies that, among individuals in the positive (or negative) class, the rate of making a positive (or negative) decision is consistent across groups. Common separation-based metrics therefore aim to equalize error rates across the groups, including the false negative rate (FNR, known as equal opportunity), false positive rate (FPR, known as predictive equality), or both (known as equalized odds).

Read also: Revolutionizing Remote Monitoring

Separation-based metrics have been widely used in health-focused applications, although the specific choice of metric depends on the context. When false negatives have the most severe consequences, equal opportunity may be preferred. There are also situations where balancing both the FNR and FPR is more appropriate and equalized odds should be prioritized.

Sufficiency

Sufficiency requires that the protected attribute (A) is independent of the true label (Y) given the prediction (S), i.e., (A \perp Y | S). This means that the prediction (S) contains all the information needed to determine the true label (Y), and knowing the protected attribute (A) does not provide any additional information.

Individual Fairness

In contrast to group fairness, individual fairness is a less frequently used framework and requires that the model provide similar predictions to similar individuals based on user-defined similarity metrics. Individual fairness posits that similar individuals should receive similar outcomes. Measuring individual fairness requires a distance metric capturing similarity.

Causal Fairness

Causal fairness criteria utilize causal estimands to quantify unfairness and link disparities in model performance to their underlying cause. Causal fairness goes beyond observational statistics to ask whether sensitive attributes causally influence outcomes. If race influences a loan decision through a legitimate credit history channel, the causal effect may be justified. But if race influences outcomes directly, the model is unfair.

Fairness Metrics

Fairness metrics are numbers that quantify whether an ML model treats different groups consistently (for example, across gender or other protected characteristics). They help you spot bias, compare models, and track progress as you mitigate unfair outcomes.

Read also: Boosting Algorithms Explained

Common Fairness Metrics

Statistical Parity Difference (SPD): Also referred to as disparate impact, serves as a standard fairness metric in assessing and quantifying the discrepancies in outcomes among various demographic groups within machine learning models.
Demographic Parity: A model satisfies demographic parity if the probability of a positive outcome does not depend on the sensitive attribute. This metric assesses whether the model treats other groups fairly, regardless of their protected attributes. Identifying disparities and biases can be identified by comparing results.
Equal Opportunity: Promotes fairness and equal access to opportunities, regardless of demographics. It aims to eliminate biases and discrimination, ensuring decisions are based on qualifications and merits rather than protected attributes. AI systems should be unbiased and fair and address historical disadvantages.
Average Odds Difference: This metric averages the differences in FPR and TPR between groups.
PPV, FPR and NPV Parity: Positive predictive value (PPV) parity, false positive rate (FPR) parity and negative predictive value (NPV) parity require equal PPV, FPR and NPV across groups.
Recall Parity & False Positive Rate Parity: Recall (or true positive rate) parity ensures that recall is equal across groups; FPR parity demands equal false positive rates.
Theil Index & other disparity measures: Some frameworks compute the Theil index (inequality measure) or Gini coefficient to quantify unfairness.

Strategies for Ensuring Fairness in Machine Learning

Ensuring fairness in machine learning involves several steps, from the initial stages of data collection to the final deployment of the model.

Collecting data that accurately represents all demographic groups is crucial. Ethical data sourcing and diverse representation are critical. Before training, evaluate group statistics. Compute the base rates of positive outcomes across sensitive groups and look for disparities.
Engaging diverse teams in the development process can help identify and address potential biases early on.
Regularly auditing models for bias using fairness metrics is essential.
Incorporating fairness constraints and regularization techniques during model training can help ensure that the model does not disproportionately harm any group. Incorporate fairness metrics into the objective function.
Developing interpretable models that provide clear and understandable explanations for their decisions can help detect and mitigate biases. Strive to develop models that are transparent and explainable. This helps in understanding how the algorithm arrives at its decisions and facilitates the identification and mitigation of any unfair biases.

Bias Mitigation Techniques

Pre-processing: Involves modifying the training data to remove biases before feeding it into the model. Algorithms correcting bias at preprocessing remove information about dataset variables which might result in unfair decisions, while trying to alter as little as possible. A way to do this is to map each individual in the initial dataset to an intermediate representation in which it is impossible to identify whether it belongs to a particular protected group while maintaining as much information as possible. Reweighing is an example of a preprocessing algorithm. To compensate for the bias, the software adds a weight, lower for favored objects and higher for unfavored objects.
In-processing: Methods adjust the learning algorithm itself to reduce bias. This can be done by adding constraints to the optimization objective of the algorithm. These constraints force the algorithm to improve fairness, by keeping the same rates of certain measures for the protected group and the rest of individuals.
Post-processing: Involves modifying the model’s predictions to ensure fairness. The final method tries to correct the results of a classifier to achieve fairness. In this method, we have a classifier that returns a score for each individual and we need to do a binary prediction for them.
Learning Fair Representations: This strategy focuses on learning a representation of the data that is invariant to sensitive attributes.
Adversarial Debiasing: Involves training the model in conjunction with an adversary that tries to detect bias. The predictor minimizes prediction error while the adversary tries to predict the sensitive attribute from the predictor’s output. The predictor learns representations that obfuscate sensitive attributes, reducing bias.

Tools for Fairness Assessment and Mitigation

Several tools and toolkits are available to help developers assess and mitigate fairness issues in machine learning models:

Model Card Toolkit (Google): Simplifies and automates the creation of Model Cards, which are informative documents that offer insights and transparency into the performance of machine learning models.
AI Fairness 360 (AIF360) (IBM): A Python toolkit specifically designed to promote algorithmic fairness. It facilitates the integration of fairness research algorithms into industrial settings and provides a common framework for fairness researchers to collaborate, evaluate, and share their algorithms.
Fairlearn (Microsoft): A freely available toolkit aimed at evaluating and enhancing the fairness of AI systems. The toolkit comprises two main elements: an interactive visualization dashboard and algorithms for mitigating unfairness.
Responsible AI dashboard: The model overview component of the Responsible AI dashboard contributes to the identification stage of the model lifecycle by generating model performance metrics for your entire dataset and your identified cohorts of data. The fairness assessment capabilities of this component come from the Fairlearn package.

Trade-offs Between Fairness and Performance

Fairness rarely comes for free. Improving one fairness metric often worsens another or reduces accuracy. For example, achieving demographic parity may require lowering the threshold for unprivileged groups, increasing false positives. Equalized odds may sacrifice calibration or predictive parity. There is also tension between group fairness and individual fairness. Demographic parity treats groups equally, but individuals within groups may differ widely; equalized odds ensures equal error rates but may still treat similar individuals differently if they belong to different groups.

One way to manage trade‑offs is to use a fairness decision tree: identify the type of harm you want to mitigate (allocation vs quality of service), decide whether misclassification costs are symmetric or asymmetric, and choose metrics accordingly. When organizations analyze trade-offs, they evaluate how different algorithmic adjustments, feature selections, or model architectures impact both fairness and performance.

Fairness and Legal/Ethical Considerations

Fairness metrics interface with law and ethics. The emerging EU AI Act proposes risk‑based classifications requiring providers of high‑risk AI systems to conduct impact assessments, ensure transparency and include human oversight. Organizations may need to demonstrate that their models satisfy specific fairness metrics and provide explanations. Transparency and explainability are thus essential companions to fairness.

tags: #fairness #in #machine #learning #definition #types