GESD: Beyond Outcome-Oriented Fairness
Pith reviewed 2026-05-19 16:26 UTC · model grok-4.3
The pith
GESD measures fairness by tracking how consistently machine learning models explain their predictions across demographic subgroups.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GESD is an explainer-agnostic and model-agnostic metric that computes group-wise disparities in explanation stability, robustness, and sensitivity for a protected category. When incorporated into the FEU optimization framework, it jointly improves utility, outcome-based fairness, and explanation-based fairness on benchmark datasets, extending fairness analysis from decisions alone to the underlying explanations.
What carries the argument
Group-level Explanation Stability Disparity (GESD), which aggregates differences in explanation quality metrics (stability, robustness, sensitivity) across subgroups of a protected attribute.
If this is right
- Practitioners can now diagnose bias in the reasoning steps of a model rather than only in its final classifications.
- Optimization routines can be extended to penalize large explanation disparities without sacrificing predictive accuracy.
- Audits of deployed systems gain a concrete, quantitative handle on whether explanations themselves are equitable.
- The same framework can be applied to any black-box model and any post-hoc explainer.
Where Pith is reading between the lines
- If GESD becomes standard, regulators could require explanation-stability reports alongside accuracy and demographic-parity checks.
- The metric might surface new failure modes where a model achieves outcome fairness only by giving unstable or contradictory reasons to one group.
- Extending GESD to longitudinal settings could reveal whether explanation fairness drifts over time as models are retrained.
Load-bearing premise
Disparities in how stably, robustly, and sensitively a model explains its outputs for different groups serve as a direct signal of procedural unfairness.
What would settle it
A controlled study in which models with large measured GESD values produce explanations that human experts rate as equally trustworthy and consistent across groups, while models with low GESD show no such advantage.
Figures
read the original abstract
Machine learning (ML) algorithms are increasingly deployed in high-stakes decision-making domains such as loan approvals, hiring, and recidivism predictions. While existing fairness metrics (e.g., statistical parity, equal opportunity) effectively quantify outcome-oriented disparities, they offer limited insight into the procedure or explanation behind biased decisions. To address this gap, we propose Group-level Explanation Stability Disparity (GESD), a \textit{procedural-oriented} fairness metric that measures disparities in the stability, robustness, and sensitivity of model explanations across different subgroups in a protected category. %GESD is explainer-agnostic, model-agnostic, and extends the scope of fairness analyses to the level of explainability. We further integrate GESD into a multi-objective optimization framework that jointly optimizes for utility, outcome-based fairness, and explanation-based fairness called FEU (Fairness--Explainability--Utility). Empirical results on multiple benchmark datasets show that GESD effectively captures group-wise discrepancies in explanation quality, and that FEU improves both utility and fairness over state-of-the-art methods. By bridging outcome-based and explanation-based fairness, GESD offers a comprehensive tool for diagnosing and mitigating bias in predictive modeling. Our code and datasets are available on GitHub {\hyperlink{https://github.com/horlahsunbo/GESD}{https://github.com/horlahsunbo/GESD}}
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Group-level Explanation Stability Disparity (GESD), a procedural-oriented fairness metric measuring disparities in explanation stability, robustness, and sensitivity across protected subgroups. It integrates GESD into the FEU multi-objective optimization framework that jointly optimizes utility, outcome-based fairness, and explanation-based fairness. The authors claim that GESD captures group-wise discrepancies in explanation quality and that FEU improves both utility and fairness over state-of-the-art methods on benchmark datasets, with code and data released on GitHub.
Significance. If the central claims hold after addressing the noted concerns, the work would be significant for extending fairness analysis beyond outcome metrics to procedural aspects of explanations in high-stakes domains. The public release of code and datasets is a clear strength supporting reproducibility.
major comments (2)
- [Abstract] Abstract: The claim that 'GESD effectively captures group-wise discrepancies in explanation quality' and that 'FEU improves both utility and fairness over state-of-the-art methods' is asserted without any quantitative results, error bars, data-split details, or baseline comparisons visible in the text. This absence leaves the empirical validation of the central contribution unsupported and is load-bearing for the paper's contribution.
- [Section 3] Section 3 (GESD definition): GESD is defined directly via disparities in stability, robustness, and sensitivity, which are typically computed through input perturbations or sampling sensitive to feature statistics. No explicit correction for inter-group marginal distribution differences (e.g., variances or supports) is described, so observed disparities may reflect data heterogeneity rather than explanation quality. This assumption is load-bearing for the explainer- and model-agnostic claims and the assertion that GESD isolates procedural fairness.
minor comments (1)
- [Abstract] Abstract: The GitHub link is provided; confirm that the repository includes full experimental scripts, hyperparameter settings, and exact dataset preprocessing steps to enable reproduction.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications and planned revisions to improve the manuscript's rigor and clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'GESD effectively captures group-wise discrepancies in explanation quality' and that 'FEU improves both utility and fairness over state-of-the-art methods' is asserted without any quantitative results, error bars, data-split details, or baseline comparisons visible in the text. This absence leaves the empirical validation of the central contribution unsupported and is load-bearing for the paper's contribution.
Authors: We agree that the abstract would benefit from more explicit ties to the quantitative evidence. The full manuscript reports these results in Sections 4 and 5, including tables with mean performance metrics and standard deviations across multiple data splits (e.g., 5-fold cross-validation) as well as direct comparisons to baselines. To address the concern, we will revise the abstract to include concise references to key empirical outcomes and pointers to the relevant tables and sections, making the support for the central claims more immediately visible. revision: yes
-
Referee: [Section 3] Section 3 (GESD definition): GESD is defined directly via disparities in stability, robustness, and sensitivity, which are typically computed through input perturbations or sampling sensitive to feature statistics. No explicit correction for inter-group marginal distribution differences (e.g., variances or supports) is described, so observed disparities may reflect data heterogeneity rather than explanation quality. This assumption is load-bearing for the explainer- and model-agnostic claims and the assertion that GESD isolates procedural fairness.
Authors: This is a substantive point. The current GESD formulation applies uniform perturbation strategies across groups without explicit normalization for differences in feature marginals, which could allow data heterogeneity to influence the measured disparities. We will revise Section 3 to explicitly discuss this potential confounding factor, introduce a normalized variant of GESD that accounts for group-specific variances and supports, and add supporting experiments to demonstrate that the reported disparities remain after such adjustments. These changes will strengthen the justification for the procedural fairness interpretation and the model-agnostic claims. revision: yes
Circularity Check
No significant circularity: GESD and FEU are definitional proposals evaluated empirically
full rationale
The paper defines GESD directly as a metric that quantifies disparities in explanation stability, robustness, and sensitivity across protected subgroups, then embeds it in the FEU multi-objective optimizer. No equations or claims reduce a derived quantity back to a fitted input or self-referential definition by construction. Empirical results on benchmark datasets are presented as external validation rather than a closed loop. The derivation chain is therefore self-contained: the metric is introduced by definition, the framework combines it with existing utility and fairness terms, and performance is assessed on held-out data without the central claims collapsing into the inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
S. Barocas and A. D. Selbst, “Big data’s disparate impact,”Calif. L. Rev., vol. 104, p. 671, 2016
work page 2016
-
[2]
Investigating and mitigating the performance–fairness tradeoff via protected-category sampling,
G. Popoola and J. Sheppard, “Investigating and mitigating the performance–fairness tradeoff via protected-category sampling,”Elec- tronics, vol. 13, no. 15, p. 3024, 2024
work page 2024
-
[3]
Optimized pre-processing for discrimination prevention,
F. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and K. R. Varshney, “Optimized pre-processing for discrimination prevention,” Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[4]
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,
A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,”Big data, vol. 5, no. 2, pp. 153–163, 2017
work page 2017
-
[5]
Equality of opportunity in super- vised learning,
M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in super- vised learning,”Advances in Neural Information Processing Systems, vol. 29, 2016
work page 2016
-
[6]
Algorithmic decision making and the cost of fairness,
S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq, “Algorithmic decision making and the cost of fairness,” inProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 797–806
work page 2017
-
[7]
On the Robustness of Interpretability Methods
D. Alvarez-Melis and T. S. Jaakkola, “On the robustness of inter- pretability methods,”arXiv preprint arXiv:1806.08049, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
Z. C. Lipton, “The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.” Queue, vol. 16, no. 3, pp. 31–57, 2018
work page 2018
-
[9]
Marrying fairness and explainability in supervised learning,
P. A. Grabowicz, N. Perello, and A. Mishra, “Marrying fairness and explainability in supervised learning,” inProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1905–1916
work page 2022
-
[10]
Towards robust interpretability with self-explaining neural networks,
D. Alvarez Melis and T. Jaakkola, “Towards robust interpretability with self-explaining neural networks,”Advances in neural information processing systems, vol. 31, 2018
work page 2018
-
[11]
Fairness and explainability: Bridging the gap towards fair model explanations,
Y . Zhao, Y . Wang, and T. Derr, “Fairness and explainability: Bridging the gap towards fair model explanations,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, 2023, pp. 11 363– 11 371
work page 2023
-
[12]
Generating diagnostic and actionable explanations for fair graph neural networks,
Z. Wang, Q. Zeng, W. Lin, M. Jiang, and K. C. Tan, “Generating diagnostic and actionable explanations for fair graph neural networks,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 19, 2024, pp. 21 690–21 698
work page 2024
-
[13]
Methods for interpreting and understanding deep neural networks,
G. Montavon, W. Samek, and K.-R. M ¨uller, “Methods for interpreting and understanding deep neural networks,”Digital Signal Processing, vol. 73, pp. 1–15, 2018
work page 2018
-
[14]
Evaluating and aggregating feature-based model explanations,
U. Bhatt, A. Weller, and J. M. Moura, “Evaluating and aggregating feature-based model explanations,” inProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 3016–3022
work page 2020
-
[15]
http://arxiv.org/abs/2211.05667 arXiv:2211.05667 [cs]
Z. Chen, V . Subhash, M. Havasi, W. Pan, and F. Doshi-Velez, “What makes a good explanation?: A harmonized view of properties of explanations,”arXiv preprint arXiv:2211.05667, 2022
-
[16]
Fair feature subset selection using multiobjective genetic algorithm,
A. U. Rehman, A. Nadeem, and M. Z. Malik, “Fair feature subset selection using multiobjective genetic algorithm,” inProceedings of the Genetic and Evolutionary Computation Conference Companion, 2022, pp. 360–363
work page 2022
-
[17]
A fast and elitist multiobjective genetic algorithm: Nsga-ii,
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,”IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002
work page 2002
-
[18]
Uci machine learning repository,
A. Asuncion, D. Newmanet al., “Uci machine learning repository,” 2007
work page 2007
-
[19]
The effect of race/ethnicity on sentencing: Examining sentence type, jail length, and prison length,
K. L. Jordan and T. L. Freiburger, “The effect of race/ethnicity on sentencing: Examining sentence type, jail length, and prison length,” Journal of Ethnicity in Criminal Justice, vol. 13, no. 3, pp. 179–196, 2015
work page 2015
-
[20]
Using data mining to predict secondary school student performance,
P. Cortez and A. M. G. Silva, “Using data mining to predict secondary school student performance,” 2008
work page 2008
-
[21]
Mitigating unwanted bi- ases with adversarial learning,
B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted bi- ases with adversarial learning,” inProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 335–340
work page 2018
-
[22]
A reductions approach to fair classification,
A. Agarwal, A. Beygelzimer, M. Dud ´ık, J. Langford, and H. Wal- lach, “A reductions approach to fair classification,” inInternational conference on machine learning. PMLR, 2018, pp. 60–69
work page 2018
-
[23]
Data preprocessing techniques for classi- fication without discrimination,
F. Kamiran and T. Calders, “Data preprocessing techniques for classi- fication without discrimination,”Knowledge and Information Systems, vol. 33, no. 1, pp. 1–33, 2012
work page 2012
-
[24]
The fairness-accuracy Pareto front,
S. Wei and M. Niethammer, “The fairness-accuracy Pareto front,” Statistical Analysis and Data Mining, vol. 15, no. 3, pp. 287–302, June 2022
work page 2022
-
[25]
Fairness- aware class imbalanced learning on multiple subgroups,
D. A. Tarzanagh, B. Hou, B. Tong, Q. Long, and L. Shen, “Fairness- aware class imbalanced learning on multiple subgroups,” inUncer- tainty in Artificial Intelligence. PMLR, 2023, pp. 2123–2133
work page 2023
-
[26]
Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations,
J. Dai, S. Upadhyay, U. Aivodji, S. H. Bach, and H. Lakkaraju, “Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations,” inProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, pp. 203–214
work page 2022
-
[27]
Reliable post hoc explanations: Modeling uncertainty in explainability,
D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju, “Reliable post hoc explanations: Modeling uncertainty in explainability,”Advances in Neural Information Processing Systems, vol. 34, pp. 9391–9404, 2021
work page 2021
-
[28]
K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints,”IEEE transactions on evolutionary computation, vol. 18, no. 4, pp. 577–601, 2013
work page 2013
-
[29]
H. Jain and K. Deb, “An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, Part II: Handling constraints and extending to an adaptive approach,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp. 602–622, 2014
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.