pith. sign in

arxiv: 2603.13452 · v3 · pith:YUEHXSRFnew · submitted 2026-03-13 · 💻 cs.AI · cs.CY· cs.LG

MESD: A Risk-Sensitive Metric for Explanation Fairness Across Intersectional Subgroups

Pith reviewed 2026-05-21 11:23 UTC · model grok-4.3

classification 💻 cs.AI cs.CYcs.LG
keywords procedural fairnessintersectionalityexplanation stabilityMESDCVaRempirical Bayesmulti-objective optimizationfairness gerrymandering
0
0 comments X

The pith

MESD quantifies how stable a model's explanations are across intersectional demographic subgroups, exposing procedural fairness gaps that outcome metrics miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MESD to measure whether a model applies systematically different reasoning to different combinations of protected attributes such as race and gender. Outcome fairness checks like demographic parity can pass while the underlying explanations vary sharply for intersectional groups, a problem called fairness gerrymandering. MESD combines label-aware aggregation of explanation quality, shrinkage to handle small subgroups, and CVaR weighting to focus on the worst disparities. When embedded in a multi-objective optimizer, it produces models that trade off accuracy, group-level prediction parity, and consistent explanation behavior. A sympathetic reader would care because this gives auditors a concrete way to check procedural justice rather than only final results.

Core claim

MESD, or Multi-category Explanation Stability Disparity, is a procedural fairness metric that quantifies disparities in explanation quality across intersectional subgroups formed by the Cartesian product of multiple protected attributes. It integrates label-aware aggregation aligned with outcome-conditional fairness, empirical-Bayes shrinkage to stabilize estimates for small groups, and Conditional Value-at-Risk weighting to emphasize worst-case subgroup disparities. The metric is placed inside the UEF framework, which jointly optimizes utility, outcome fairness, and procedural fairness via NSGA-II, and experiments on benchmark datasets show it detects procedural disparities invisible to the

What carries the argument

Multi-category Explanation Stability Disparity (MESD), a composite metric that aggregates explanation stability with label awareness, empirical-Bayes stabilization, and CVaR risk weighting over all Cartesian combinations of protected attributes.

If this is right

  • Audits can now check whether explanation stability holds for every race-by-gender combination even when single-attribute checks pass.
  • The UEF optimizer yields models that simultaneously improve predictive utility, demographic parity, and explanation consistency.
  • Regulatory reports can include MESD scores as evidence that reasoning processes do not gerrymander across intersections.
  • Training loops can penalize explanation disparity directly rather than relying only on prediction parity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same shrinkage and CVaR ideas could be applied to other procedural signals such as feature importance stability or counterfactual consistency.
  • Deployed systems in lending or hiring might be required to publish MESD values alongside demographic parity to satisfy intersectional equity rules.
  • If MESD remains stable across different explanation generators, it could serve as a model-agnostic diagnostic layer for any black-box system.

Load-bearing premise

Explanation quality can be quantified and compared across subgroups in a manner that genuinely reflects procedural fairness and remains independent of the particular method used to generate the explanations.

What would settle it

A dataset and model in which human raters judge explanations as procedurally unfair for an intersectional subgroup while MESD reports no disparity, or conversely a case where MESD flags disparity but outcome metrics and human review both find the explanations equivalent.

Figures

Figures reproduced from arXiv: 2603.13452 by Gideon Popoola, John Sheppard.

Figure 1
Figure 1. Figure 1: Pareto Fronts of each algorithm on the Adult Income dataset [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MESD variants from each algorithm on the Recidivism Dataset [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Fairness in machine learning is predominantly evaluated through outcome-oriented metrics, such as Demographic parity, which measure whether predictions are statistically consistent across protected groups. However, these metrics cannot detect whether a model uses systematically different reasoning for different demographic groups, which violates procedural fairness principles. This problem is compounded by intersectionality, where models may appear fair on individual attributes (e.g., race) while exhibiting significant disparities for intersectional subgroups (e.g., race $\times$ gender), a phenomenon known as fairness gerrymandering. In this work, we introduce Multi-category Explanation Stability Disparity (MESD), a procedural fairness metric that quantifies disparities in explanation quality across intersectional subgroups formed by the Cartesian product of multiple protected attributes. MESD integrates three components, which are label-aware aggregation aligned with outcome-conditional fairness, empirical-Bayes shrinkage to stabilize estimates for small intersectional groups, and Conditional Value-at-Risk (CVaR) weighting to emphasize worst-case subgroup disparities. We integrate MESD within a multi-objective optimization framework (UEF) that jointly optimizes utility, outcome fairness, and procedural fairness using NSGA-II. We evaluated MESD and UEF on three benchmark datasets along with four state-of-the-art methods in several experiments, and we demonstrate that MESD reveals procedural disparities invisible to outcome metrics alone. We position our contribution within procedural justice theory and discuss implications for regulatory compliance and intersectional equity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Multi-category Explanation Stability Disparity (MESD), a procedural fairness metric for quantifying disparities in explanation quality across intersectional subgroups formed by the Cartesian product of protected attributes. MESD integrates label-aware aggregation aligned with outcome-conditional fairness, empirical-Bayes shrinkage to stabilize small-group estimates, and Conditional Value-at-Risk (CVaR) to focus on worst-case disparities. The metric is embedded in a multi-objective optimization framework (UEF) optimized via NSGA-II to jointly consider utility, outcome fairness, and procedural fairness. Evaluations on three benchmark datasets using four state-of-the-art XAI methods demonstrate that MESD identifies procedural disparities not visible through outcome metrics alone.

Significance. If the assumptions hold, this work provides a valuable tool for assessing procedural fairness in explanations, particularly addressing intersectionality and fairness gerrymandering. The combination of empirical Bayes and CVaR offers a risk-sensitive approach suitable for small subgroups. The multi-objective framework allows balancing competing objectives. This could have implications for regulatory compliance and advancing procedural justice in AI systems. The empirical evaluation on benchmarks adds practical relevance.

major comments (2)
  1. [MESD definition and integration] The claim that label-aware aggregation is independent of the particular XAI method used and aligns with procedural fairness principles lacks an explicit invariance proof or ablation study. Without showing that explanation quality measures (such as stability) remain comparable when swapping explainers or that label-conditioning removes outcome correlations, the interpretation of MESD as revealing procedural disparities rests on an unverified assumption. This is central to the paper's contribution.
  2. [Experiments section] The experimental results should include controls or ablations demonstrating that MESD detects disparities due to procedural differences rather than artifacts of the chosen XAI methods or aggregation. The current evaluation on four methods does not isolate whether the label-aware component truly decouples from outcome parity.
minor comments (2)
  1. [Abstract] The abstract describes the three components but provides no equations or high-level pseudocode; adding a brief mathematical sketch of how label-aware aggregation, empirical Bayes, and CVaR are combined would improve accessibility.
  2. [Method] Clarify the exact definition of 'explanation quality' (e.g., stability, fidelity) and how it is computed for each subgroup in the Cartesian product.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and will incorporate revisions to strengthen the theoretical grounding and empirical validation of MESD.

read point-by-point responses
  1. Referee: [MESD definition and integration] The claim that label-aware aggregation is independent of the particular XAI method used and aligns with procedural fairness principles lacks an explicit invariance proof or ablation study. Without showing that explanation quality measures (such as stability) remain comparable when swapping explainers or that label-conditioning removes outcome correlations, the interpretation of MESD as revealing procedural disparities rests on an unverified assumption. This is central to the paper's contribution.

    Authors: We thank the referee for highlighting this point. The label-aware aggregation is constructed by design to condition stability estimates on the true label, which aligns with outcome-conditional fairness and removes direct dependence on model predictions (hence on outcome parity). This property holds independently of the specific explainer because stability is computed within label-stratified groups. While our multi-method experiments provide supporting evidence, we agree an explicit argument would clarify the contribution. In revision we will add a short formal subsection deriving the invariance under label conditioning and an ablation that fixes the aggregation while varying XAI methods to confirm consistent procedural disparity signals. revision: yes

  2. Referee: [Experiments section] The experimental results should include controls or ablations demonstrating that MESD detects disparities due to procedural differences rather than artifacts of the chosen XAI methods or aggregation. The current evaluation on four methods does not isolate whether the label-aware component truly decouples from outcome parity.

    Authors: We appreciate the call for targeted controls. Our existing evaluation already applies MESD to four distinct XAI methods across three datasets and shows disparities invisible to outcome metrics, but we agree this does not fully isolate the label-aware component. In the revised experiments we will add (i) a direct comparison of MESD versus its non-label-aware counterpart and (ii) correlation analysis between MESD scores and standard outcome-fairness metrics to quantify decoupling. These controls will be reported alongside the existing results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in MESD definition or components

full rationale

The paper defines MESD as a composite of label-aware aggregation, empirical-Bayes shrinkage, and CVaR weighting applied to explanation stability measures across intersectional subgroups. These components are standard statistical techniques drawn from external literature and applied to the new procedural fairness setting; no equation reduces the metric to a self-defined quantity, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. The central claim that MESD reveals procedural disparities rests on empirical evaluation and alignment with outcome-conditional fairness principles rather than tautological construction, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

MESD depends on the choice of explanation method, the definition of explanation quality, and tunable parameters for shrinkage and risk weighting that are not derived from first principles.

free parameters (2)
  • CVaR alpha level
    Risk parameter that controls emphasis on worst-case subgroup disparities
  • empirical-Bayes shrinkage strength
    Hyperparameter controlling stabilization for small intersectional groups
axioms (2)
  • domain assumption Explanation quality is a well-defined, comparable scalar that can be aggregated in a label-aware manner
    Invoked when defining the label-aware aggregation component of MESD
  • domain assumption Protected attributes can be combined via Cartesian product to form meaningful intersectional subgroups without introducing new confounding
    Used to define the subgroups over which disparities are measured

pith-pipeline@v0.9.0 · 5789 in / 1426 out tokens · 39326 ms · 2026-05-21T11:23:06.175442+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI

    cs.AI 2026-05 unverdicted novelty 6.0

    A conditional invariance framework defines explanation fairness as explanations being statistically independent of protected attributes given task-relevant features, unifying existing metrics and enabling procedural b...

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 1 Pith paper

  1. [1]

    Involvement of machine learning tools in healthcare decision making,

    S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of machine learning tools in healthcare decision making,”Journal of healthcare engineering, vol. 2021, no. 1, p. 6679512, 2021

  2. [2]

    Big data’s disparate impact,

    S. Barocas and A. D. Selbst, “Big data’s disparate impact,”Calif. L. Rev., vol. 104, p. 671, 2016

  3. [3]

    Bias in machine learning: A literature review,

    K. Mavrogiorgos, A. Kiourtis, A. Mavrogiorgou, A. Menychtas, and D. Kyriazis, “Bias in machine learning: A literature review,”Applied Sciences, vol. 14, no. 19, p. 8860, 2024

  4. [4]

    Equality of opportunity in supervised learning,

    M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,”Advances in Neural Information Processing Systems, vol. 29, 2016

  5. [5]

    Marrying fairness and explainability in supervised learning,

    P. A. Grabowicz, N. Perello, and A. Mishra, “Marrying fairness and explainability in supervised learning,” inProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1905–1916

  6. [6]

    Fairness and explainability: Bridging the gap towards fair model explanations,

    Y . Zhao, Y . Wang, and T. Derr, “Fairness and explainability: Bridging the gap towards fair model explanations,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 9, 2023, pp. 11 363– 11 371

  7. [7]

    A unified approach to interpreting model predictions,

    S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”Advances in neural information processing systems, vol. 30, 2017

  8. [8]

    ” why should i trust you?

    M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135–1144

  9. [9]

    Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations,

    J. Dai, S. Upadhyay, U. Aivodji, S. H. Bach, and H. Lakkaraju, “Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations,” inProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, pp. 203–214

  10. [10]

    The road to explainability is paved with bias: Measuring the fairness of explanations,

    A. Balagopalan, H. Zhang, K. Hamidieh, T. Hartvigsen, F. Rudzicz, and M. Ghassemi, “The road to explainability is paved with bias: Measuring the fairness of explanations,” inProceedings of the 2022 ACM conference on fairness, accountability, and transparency, 2022, pp. 1194–1206

  11. [11]

    The fairness-accuracy pareto front,

    S. Wei and M. Niethammer, “The fairness-accuracy pareto front,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 15, no. 3, pp. 287–302, 2022

  12. [12]

    Preventing fairness gerrymandering: Auditing and learning for subgroup fairness,

    M. Kearns, S. Neel, A. Roth, and Z. S. Wu, “Preventing fairness gerrymandering: Auditing and learning for subgroup fairness,” inIn- ternational conference on machine learning. PMLR, 2018, pp. 2564– 2572

  13. [13]

    A review on fairness in machine learning,

    D. Pessach and E. Shmueli, “A review on fairness in machine learning,” ACM Computing Surveys (CSUR), vol. 55, no. 3, pp. 1–44, 2022

  14. [14]

    Demographic parity: Mitigating biases in real-world data,

    O. Loukas and H.-R. Chung, “Demographic parity: Mitigating biases in real-world data,”arXiv preprint arXiv:2309.17347, 2023

  15. [15]

    Data preprocessing techniques for classi- fication without discrimination,

    F. Kamiran and T. Calders, “Data preprocessing techniques for classi- fication without discrimination,”Knowledge and Information Systems, vol. 33, no. 1, pp. 1–33, 2012

  16. [16]

    Mitigating unwanted biases with adversarial learning,

    B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases with adversarial learning,” inProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 335–340

  17. [17]

    The intersectionality problem for algorithmic fairness,

    J. Himmelreich, A. Hsu, K. Lum, and E. Veomett, “The intersectionality problem for algorithmic fairness,”arXiv preprint arXiv:2411.02569, 2024

  18. [18]

    Fairness with overlapping groups; a probabilistic perspective,

    F. Yang, M. Cisse, and S. Koyejo, “Fairness with overlapping groups; a probabilistic perspective,”Advances in neural information processing systems, vol. 33, pp. 4067–4078, 2020

  19. [19]

    Investigating and mitigating the performance–fairness tradeoff via protected-category sampling,

    G. Popoola and J. Sheppard, “Investigating and mitigating the performance–fairness tradeoff via protected-category sampling,”Elec- tronics, vol. 13, no. 15, p. 3024, 2024

  20. [20]

    A critical survey on fairness benefits of explainable ai,

    L. Deck, J. Schoeffer, M. De-Arteaga, and N. K ¨uhl, “A critical survey on fairness benefits of explainable ai,” inProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024, pp. 1579–1595

  21. [21]

    Fairness, explainability and in- between: Understanding the impact of different explanation methods on non-expert users’ perceptions of fairness toward an algorithmic system,

    A. Shulner-Tal, T. Kuflik, and D. Kliger, “Fairness, explainability and in- between: Understanding the impact of different explanation methods on non-expert users’ perceptions of fairness toward an algorithmic system,” Ethics and Information Technology, vol. 24, no. 1, p. 2, 2022

  22. [22]

    What will it take to generate fairness-preserving explanations?

    J. Dai, S. Upadhyay, S. H. Bach, and H. Lakkaraju, “What will it take to generate fairness-preserving explanations?”arXiv preprint arXiv:2106.13346, 2021

  23. [23]

    Fooling lime and shap: Adversarial attacks on post hoc explanation methods,

    D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, “Fooling lime and shap: Adversarial attacks on post hoc explanation methods,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020, pp. 180–186

  24. [24]

    Generating diagnostic and actionable explanations for fair graph neural networks,

    Z. Wang, Q. Zeng, W. Lin, M. Jiang, and K. C. Tan, “Generating diagnostic and actionable explanations for fair graph neural networks,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 19, 2024, pp. 21 690–21 698

  25. [25]

    Explainability for fair machine learning,

    T. Begley, T. Schwedes, C. Frye, and I. Feige, “Explainability for fair machine learning,”arXiv preprint arXiv:2010.07389, 2020

  26. [26]

    Evaluating and aggregating feature-based model explanations,

    U. Bhatt, A. Weller, and J. M. Moura, “Evaluating and aggregating feature-based model explanations,” inProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020, pp. 3016– 3022

  27. [27]

    A reductions approach to fair classification,

    A. Agarwal, A. Beygelzimer, M. Dud ´ık, J. Langford, and H. Wallach, “A reductions approach to fair classification,” inInternational conference on machine learning. PMLR, 2018, pp. 60–69

  28. [28]

    Fairness- aware class imbalanced learning on multiple subgroups,

    D. A. Tarzanagh, B. Hou, B. Tong, Q. Long, and L. Shen, “Fairness- aware class imbalanced learning on multiple subgroups,” inUncertainty in Artificial Intelligence. PMLR, 2023, pp. 2123–2133