Machine learning models extract patterns from large volumes of data. When such data reflect historical or social inequalities, algorithms tend to reproduce them in their predictions. This risk is particularly relevant in sensitive domains such as criminal justice, healthcare, employment, and finance, where algorithmic decisions can directly affect people's lives.
Although several techniques exist to mitigate unfairness, the appropriate degree of intervention remains underexplored, especially because it requires balancing fairness and predictive performance. This dissertation investigates how traditional algorithms behave when exposed to biased data without mitigation mechanisms, through a systematic analysis of their performance under progressively unfair conditions.
To this end, the Systematic Label Flipping for Fairness Stress Testing methodology was proposed, which introduces controlled bias into the training data. This approach makes it possible to assess the robustness of classifiers and to gradually observe how performance and fairness metrics evolve as data bias increases.
The models analyzed were Decision Tree, Random Forest, Logistic Regression, and Neural Network. Overall, results were similar, with the main exception being Logistic Regression, which on the COMPAS dataset suffered a greater drop in performance accompanied by increased unfairness. Decision Trees proved slightly more stable, but overall the differences across algorithms were modest.
The contributions of this dissertation are twofold: the proposal of a reproducible methodology for fairness stress testing and the presentation of empirical evidence on the robustness of traditional models when subjected to biased scenarios.