Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

Abin Shoby; Jessica Schrouff; Lauren Oakden-Rayner; Luke Whitbread; Lyle J. Palmer; Mark Jenkinson; Nikhil Cherian Kurian; Robert Vandersluis; Victor Caquilpan Parra

arxiv: 2605.19214 · v1 · pith:6J3NABWInew · submitted 2026-05-19 · 💻 cs.LG · cs.CV

Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

Nikhil Cherian Kurian , Victor Caquilpan Parra , Abin Shoby , Luke Whitbread , Lauren Oakden-Rayner , Robert Vandersluis , Jessica Schrouff , Lyle J. Palmer

show 1 more author

Mark Jenkinson

This is my paper

Pith reviewed 2026-05-20 07:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords fairness in machine learningmedical image classificationequalized oddsworst-group regularizationmulti-attribute fairnessdemographic disparitiesmulti-label classification

0 comments

The pith

A worst-group equalized-odds margin regularizer reduces demographic disparities in true and false positive rates for medical image classifiers while preserving overall AUC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Medical AI often produces different true-positive and false-positive behaviors across demographic groups even when aggregate accuracy metrics appear similar. The paper proposes a regularizer that, at each training step, identifies the subgroups defined by single attributes such as age, sex, or race that show the largest deviations from equalized odds and applies a single penalty to close those gaps. This avoids the need to enumerate every combination of attributes. Experiments across two multi-label medical imaging datasets show consistent drops in equalized odds and opportunity disparities with only minimal change to overall AUC.

Core claim

The central claim is that a worst-group equalized-odds margin regularizer, which locates the demographic subgroups with the most extreme true-positive and false-positive margin deviations and applies a unified penalty, enables fairness optimization across multiple attributes at once and yields reduced equalized odds and opportunity gaps with negligible effect on aggregate AUC in realistic multi-label medical settings.

What carries the argument

Worst-group equalized-odds margin regularizer: at each update it selects the subgroups defined by explicit single demographic attributes that exhibit the largest margin deviations on both the true-positive and false-positive sides and applies one penalty to them.

If this is right

Diagnostic performance measured by AUC stays nearly unchanged across the tested medical imaging datasets.
Disparities in equalized odds and equalized opportunity decrease consistently.
The approach handles multiple demographic attributes using only single-attribute subgroup definitions.
The method applies directly to multi-label classification tasks common in medical imaging.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same worst-group penalty structure could be tested in non-medical domains where multiple attributes affect decision thresholds.
Single-attribute worst-group selection may capture most of the needed fairness gains even when attributes interact in practice.
Extending the regularizer to continuous or high-cardinality attributes would test whether the current single-attribute grouping remains sufficient.

Load-bearing premise

That subgroups defined by single demographic attributes and a unified penalty on the worst ones can optimize fairness across multiple axes without requiring explicit intersectional subgroup definitions or constraints.

What would settle it

Applying the regularizer to a comparable medical imaging dataset and observing either no reduction in equalized odds disparities or a substantial drop in AUC would show the method does not deliver the stated benefits.

Figures

Figures reproduced from arXiv: 2605.19214 by Abin Shoby, Jessica Schrouff, Lauren Oakden-Rayner, Luke Whitbread, Lyle J. Palmer, Mark Jenkinson, Nikhil Cherian Kurian, Robert Vandersluis, Victor Caquilpan Parra.

**Figure 1.** Figure 1: Defining EO Margins: (1) Aggregate samples by label: • (positive), • (negative). (2) Identify g [+] min as the positive subgroup with lowest µ [+] gi . (3) Define marginEO+ as the separation between the worst sample in g [+] min and the worst negative sample; marginEO− is defined analogously for the worst negative sample in g [−] max. In Eq. (1), m and n index individual samples within the current mini-ba… view at source ↗

**Figure 2.** Figure 2: Class-wise Joint EOdds, EOM, and ∆AUC (shown as box in bottom) on MIMIC-CXR. Error bars show standard error; grey shading denotes baseline standard error [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Diagnostic performance in medical AI varies systematically across demographic groups, yet subgroup AUC can mask clinically important disparities. At a fixed inference-time operating point, some groups may exhibit over-diagnostic behaviour, characterized by elevated true and false positive rates, while others show under-diagnostic patterns with reduced true and false positive rates. These opposing tendencies can cancel in aggregate AUCs while producing meaningful inequities in clinical decision-making. Motivated by the need to assess and mitigate such disparities at the operating point and across multiple demographic attributes simultaneously, we propose a worst-group equalized-odds margin regularizer. The proposed regularizer explicitly targets subgroup-level deviations on both the true positive and false positive sides at inference. At each update, the method identifies subgroups defined by explicit demographic attributes (e.g., age, sex, and race) that exhibit the most extreme margin deviations and applies a unified penalty, enabling fairness optimization across multiple demographic axes without requiring explicit intersectional constraints. Across two medical imaging datasets in realistic multi-label settings, our method consistently reduces disparities in Equalized Odds and Equalized Opportunity with minimal impact on AUC, preserving diagnostic performance while improving fairness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a worst-group equalized odds margin regularizer for multi-attribute fairness in medical imaging but provides thin evidence and may miss intersectional disparities.

read the letter

The main thing to know about this paper is that it introduces a worst-group equalized odds margin regularizer designed for medical image classification with multiple demographic attributes. The goal is to reduce gaps in true positive and false positive rates at the inference operating point while keeping overall AUC stable. The new element is the unified penalty that picks the worst subgroups on the fly for both sides of the equalized odds metric and applies it across attributes without explicit intersectional groups. This builds on prior fairness work but tailors it to the multi-label medical setting. The paper does well in motivating the problem with the idea that opposing over- and under-diagnosis can cancel out in aggregate metrics but still cause inequities in practice. The approach is practical because it can be added to standard training without major changes to the model. The soft spots are in the validation. The abstract says the method consistently reduces disparities in Equalized Odds and Equalized Opportunity with minimal AUC impact, but there are no specific numbers, dataset sizes, or comparisons to baselines provided here. Without those, it's difficult to see how large the fairness gains are or if they hold under different conditions. The method's reliance on single-attribute subgroups is another point. As noted in the stress-test, this could leave intersectional disparities unaddressed if the worst margins don't capture joint effects. If the full paper has experiments showing that single attributes suffice or monitoring intersectional performance, that would strengthen it. This paper is for people developing fair AI systems for healthcare imaging who need to balance performance across several patient groups. A reader focused on regularization techniques for fairness metrics would find the formulation interesting. It deserves a serious referee. The core idea is sound and the application area is important. I recommend sending it to peer review, asking the authors to provide detailed results and address potential intersectional limitations.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a worst-group equalized-odds margin regularizer for multi-attribute fairness in medical image classification. Subgroups are defined by single explicit demographic attributes (age, sex, race); at each step the method identifies those with the largest TPR/FPR margin deviations and applies a unified penalty. This is claimed to reduce Equalized Odds and Equalized Opportunity disparities across multiple axes without explicit intersectional constraints, while preserving AUC, with supporting results reported on two medical imaging datasets in realistic multi-label settings.

Significance. If the empirical claims hold under proper intersectional scrutiny, the approach would supply a practical, low-overhead regularizer for multi-attribute fairness in medical imaging where sample sizes preclude reliable intersectional cells. The emphasis on operating-point TPR/FPR gaps rather than AUC alone addresses a clinically relevant form of disparity.

major comments (2)

[Method] Method section (description of subgroup construction): the regularizer penalizes only the current worst single-attribute margins. The central claim that this suffices for multi-attribute EO/EOp fairness therefore requires that worst single-attribute deviations are reliable proxies for all relevant joint distributions. No argument or diagnostic is supplied showing that large intersectional gaps (e.g., older Black females) cannot persist while single-attribute margins remain moderate; this assumption is load-bearing for the “without requiring explicit intersectional constraints” claim.
[Experiments] Experiments / Results section: the abstract asserts “consistent reductions” in EO/EOp disparities, yet the manuscript provides neither quantitative deltas, confidence intervals, nor ablation tables on regularization strength. Without these, the reader cannot assess whether the observed fairness gains are statistically reliable or merely post-hoc.

minor comments (1)

[Abstract] Abstract: the two datasets are not named and no basic subgroup sample sizes or label prevalences are stated, making it impossible to judge whether the reported fairness improvements are driven by well-powered cells.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional analysis and quantitative reporting where the concerns are valid.

read point-by-point responses

Referee: [Method] Method section (description of subgroup construction): the regularizer penalizes only the current worst single-attribute margins. The central claim that this suffices for multi-attribute EO/EOp fairness therefore requires that worst single-attribute deviations are reliable proxies for all relevant joint distributions. No argument or diagnostic is supplied showing that large intersectional gaps (e.g., older Black females) cannot persist while single-attribute margins remain moderate; this assumption is load-bearing for the “without requiring explicit intersectional constraints” claim.

Authors: We acknowledge that the manuscript does not include an explicit diagnostic comparing single-attribute and intersectional disparities. The regularizer is motivated by practical constraints in medical imaging, where intersectional cells often have too few samples for reliable estimation, as noted in the referee summary. By iteratively penalizing the worst single-attribute TPR/FPR margins, the approach targets the most extreme observed deviations across the defined attributes. To address the load-bearing assumption, the revised manuscript will add a new subsection with a post-hoc diagnostic on intersectional subgroups (e.g., age-sex-race combinations) from the available datasets, along with a discussion of when single-attribute proxies may or may not capture joint effects. revision: yes
Referee: [Experiments] Experiments / Results section: the abstract asserts “consistent reductions” in EO/EOp disparities, yet the manuscript provides neither quantitative deltas, confidence intervals, nor ablation tables on regularization strength. Without these, the reader cannot assess whether the observed fairness gains are statistically reliable or merely post-hoc.

Authors: We agree that the results section would benefit from more precise quantitative support. The current text describes consistent reductions based on the reported trends across the two datasets, but does not tabulate exact deltas or include statistical measures. In the revision we will add (i) a table of pre- and post-regularization EO/EOp values with mean deltas and 95% confidence intervals computed over multiple runs, and (ii) an ablation study varying the regularization coefficient to show the trade-off with AUC and fairness metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical regularizer evaluated on external datasets

full rationale

The paper introduces a worst-group equalized-odds margin regularizer that identifies subgroups (defined by single demographic attributes) with the largest TPR/FPR deviations at each update and applies a unified penalty term. This is framed as a training-time optimization objective rather than a derived prediction. The central claims of reduced Equalized Odds and Opportunity disparities (with preserved AUC) rest on empirical results across two medical imaging datasets in multi-label settings. No equations, self-citations, or fitted inputs are presented that would make the reported fairness gains equivalent to the method's own inputs by construction. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard supervised learning assumptions plus domain-specific fairness definitions; a regularization strength hyperparameter is implied but not quantified in the abstract.

free parameters (1)

regularization strength
Hyperparameter controlling the penalty applied to worst-group margin deviations; value not reported in abstract.

axioms (1)

domain assumption Subgroup disparities in TPR and FPR at a fixed operating point are clinically meaningful and can be mitigated by a unified penalty without intersectional constraints.
Invoked in the motivation and method description to justify targeting worst groups across attributes.

pith-pipeline@v0.9.0 · 5767 in / 1303 out tokens · 40595 ms · 2026-05-20T07:56:24.819201+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

In: Inter- national Conference on Information Processing in Medical Imaging

Deng, W., Zhong, Y., Dou, Q., Li, X.: On fairness of medical image classification with multiple sensitive attributes via learning orthogonal representations. In: Inter- national Conference on Information Processing in Medical Imaging. pp. 158–169. Springer (2023)

work page 2023
[2]

In: European Conference on Computer Vision

Du, S., Hers, B., Bayasi, N., Hamarneh, G., Garbi, R.: Fairdisco: Fairer ai in dermatology via disentanglement contrastive learning. In: European Conference on Computer Vision. pp. 185–202. Springer (2022)

work page 2022
[3]

In: Karlinsky, L., Michaeli, T., Nishino, K

Du, S., Hers, B., Bayasi, N., Hamarneh, G., Garbi, R.: Fairdisco: Fairer ai in dermatology via disentanglement contrastive learning. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops. pp. 185–202. Springer Nature Switzerland, Cham (2023)

work page 2022
[4]

Medical Image Analysis p

Gao, Y., Hao, J., Zhou, B.: Fairread: Re-fusing demographic attributes after disen- tanglement for fair medical image classification. Medical Image Analysis p. 103858 (2025)

work page 2025
[5]

In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024

Ghadiri, A., Pagnucco, M., Song, Y.: XTranPrune: eXplainability-aware Trans- former Pruning for Bias Mitigation in Dermatological Disease Classification . In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. vol. LNCS 15010. Springer Nature Switzerland (October 2024)

work page 2024
[6]

EBioMedicine89 (2023)

Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of pro- tected characteristics in chest x-ray disease detection models. EBioMedicine89 (2023)

work page 2023
[7]

Advances in neural information processing systems29(2016)

Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. Advances in neural information processing systems29(2016)

work page 2016
[8]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

work page 2016
[9]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017) 10 Kurian et al

work page 2017
[10]

Scientific Data6, 317 (2019) https://doi.org/10.1038/s41597-019-0322-0

Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data6(1) (2019). https://doi.org/10.1038/s41597-019-0322-0, cited by: 1067; All Open Access, Gold Open Access, Green Open Access

work page doi:10.1038/s41597-019-0322-0 2019
[11]

In: International conference on machine learning

Kearns, M., Neel, S., Roth, A., Wu, Z.S.: Preventing fairness gerrymandering: Au- diting and learning for subgroup fairness. In: International conference on machine learning. pp. 2564–2572. PMLR (2018)

work page 2018
[12]

In: Meila, M., Zhang, T

Liu, E.Z., Haghgoo, B., Chen, A.S., Raghunathan, A., Koh, P.W., Sagawa, S., Liang, P., Finn, C.: Just train twice: Improving group robustness with- out training group information. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Ma- chine Learning Research, vol. 139, pp. 6781–6792. PMLR (...

work page 2021
[13]

IEEE Transactions on Medical Imaging 43(7), 2623–2633 (2024)

Luo, Y., Tian, Y., Shi, M., Pasquale, L.R., Shen, L.Q., Zebardast, N., Elze, T., Wang, M.: Harvard glaucoma fairness: A retinal nerve disease dataset for fairness learning and fair identity normalization. IEEE Transactions on Medical Imaging 43(7), 2623–2633 (2024). https://doi.org/10.1109/TMI.2024.3377552

work page doi:10.1109/tmi.2024.3377552 2024
[14]

ACM computing surveys (CSUR)54(6), 1–35 (2021)

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM computing surveys (CSUR)54(6), 1–35 (2021)

work page 2021
[15]

In: Conference on Fairness, accountability and transparency

Menon, A.K., Williamson, R.C.: The cost of fairness in binary classification. In: Conference on Fairness, accountability and transparency. pp. 107–118. PMLR (2018)

work page 2018
[16]

In: Proceedings of the ACM conference on health, inference, and learning

Oakden-Rayner, L., Dunnmon, J., Carneiro, G., Ré, C.: Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In: Proceedings of the ACM conference on health, inference, and learning. pp. 151–159 (2020)

work page 2020
[17]

9th International Conference on Learning Representations (2021), https://par.nsf.gov/biblio/10279881

Roh, Y., Lee, K., Whang, S.E., Suh, C.: Fairbatch: Batch selection for model fairness. 9th International Conference on Learning Representations (2021), https://par.nsf.gov/biblio/10279881

work page arXiv 2021
[18]

In: proceedings of Medical Image ComputingandComputerAssistedIntervention–MICCAI2025.vol.LNCS15973

Sadri, A.R., DeSilvio, T., Viswanath, S.E.: Mutual Information Regularization for Fairness-aware Deep Imaging Representations . In: proceedings of Medical Image ComputingandComputerAssistedIntervention–MICCAI2025.vol.LNCS15973. Springer Nature Switzerland (September 2025)

work page 2025
[19]

In: International Conference on Learning Representations (2020), https://openreview.net/forum?id=ryxGuJrFvS

Sagawa*, S., Koh*, P.W., Hashimoto, T.B., Liang, P.: Distributionally robust neu- ral networks. In: International Conference on Learning Representations (2020), https://openreview.net/forum?id=ryxGuJrFvS

work page 2020
[20]

In: BIOCOMPUTING 2021: proceedings of the Pacific symposium

Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I.Y., Ghassemi, M.: Chex- clusion: Fairness gaps in deep chest x-ray classifiers. In: BIOCOMPUTING 2021: proceedings of the Pacific symposium. pp. 232–243. World Scientific (2020)

work page 2021
[21]

Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction

Wadsworth, C., Vera, F., Piech, C.: Achieving fairness through adversarial learn- ing: an application to recidivism prediction. ArXivabs/1807.00199(2018), https://api.semanticscholar.org/CorpusID:49558315

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2025

Xu, G., Duan, Y., Liu, Z., Li, X., Jiang, M., Lemmon, M., Jin, W., Shi, Y.: Incor- porating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Experts . In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2025. vol. LNCS 15973. Springer Nature Switzerland (September 2025) Worst-...

work page 2025
[23]

npj Digital Medicine7(1), 286 (2024)

Xu, Z., Li, J., Yao, Q., Li, H., Zhao, M., Zhou, S.K.: Addressing fairness issues in deep learning-based medical image analysis: a systematic review. npj Digital Medicine7(1), 286 (2024)

work page 2024

[1] [1]

In: Inter- national Conference on Information Processing in Medical Imaging

Deng, W., Zhong, Y., Dou, Q., Li, X.: On fairness of medical image classification with multiple sensitive attributes via learning orthogonal representations. In: Inter- national Conference on Information Processing in Medical Imaging. pp. 158–169. Springer (2023)

work page 2023

[2] [2]

In: European Conference on Computer Vision

Du, S., Hers, B., Bayasi, N., Hamarneh, G., Garbi, R.: Fairdisco: Fairer ai in dermatology via disentanglement contrastive learning. In: European Conference on Computer Vision. pp. 185–202. Springer (2022)

work page 2022

[3] [3]

In: Karlinsky, L., Michaeli, T., Nishino, K

Du, S., Hers, B., Bayasi, N., Hamarneh, G., Garbi, R.: Fairdisco: Fairer ai in dermatology via disentanglement contrastive learning. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops. pp. 185–202. Springer Nature Switzerland, Cham (2023)

work page 2022

[4] [4]

Medical Image Analysis p

Gao, Y., Hao, J., Zhou, B.: Fairread: Re-fusing demographic attributes after disen- tanglement for fair medical image classification. Medical Image Analysis p. 103858 (2025)

work page 2025

[5] [5]

In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024

Ghadiri, A., Pagnucco, M., Song, Y.: XTranPrune: eXplainability-aware Trans- former Pruning for Bias Mitigation in Dermatological Disease Classification . In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. vol. LNCS 15010. Springer Nature Switzerland (October 2024)

work page 2024

[6] [6]

EBioMedicine89 (2023)

Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of pro- tected characteristics in chest x-ray disease detection models. EBioMedicine89 (2023)

work page 2023

[7] [7]

Advances in neural information processing systems29(2016)

Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. Advances in neural information processing systems29(2016)

work page 2016

[8] [8]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

work page 2016

[9] [9]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017) 10 Kurian et al

work page 2017

[10] [10]

Scientific Data6, 317 (2019) https://doi.org/10.1038/s41597-019-0322-0

Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data6(1) (2019). https://doi.org/10.1038/s41597-019-0322-0, cited by: 1067; All Open Access, Gold Open Access, Green Open Access

work page doi:10.1038/s41597-019-0322-0 2019

[11] [11]

In: International conference on machine learning

Kearns, M., Neel, S., Roth, A., Wu, Z.S.: Preventing fairness gerrymandering: Au- diting and learning for subgroup fairness. In: International conference on machine learning. pp. 2564–2572. PMLR (2018)

work page 2018

[12] [12]

In: Meila, M., Zhang, T

Liu, E.Z., Haghgoo, B., Chen, A.S., Raghunathan, A., Koh, P.W., Sagawa, S., Liang, P., Finn, C.: Just train twice: Improving group robustness with- out training group information. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Ma- chine Learning Research, vol. 139, pp. 6781–6792. PMLR (...

work page 2021

[13] [13]

IEEE Transactions on Medical Imaging 43(7), 2623–2633 (2024)

Luo, Y., Tian, Y., Shi, M., Pasquale, L.R., Shen, L.Q., Zebardast, N., Elze, T., Wang, M.: Harvard glaucoma fairness: A retinal nerve disease dataset for fairness learning and fair identity normalization. IEEE Transactions on Medical Imaging 43(7), 2623–2633 (2024). https://doi.org/10.1109/TMI.2024.3377552

work page doi:10.1109/tmi.2024.3377552 2024

[14] [14]

ACM computing surveys (CSUR)54(6), 1–35 (2021)

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM computing surveys (CSUR)54(6), 1–35 (2021)

work page 2021

[15] [15]

In: Conference on Fairness, accountability and transparency

Menon, A.K., Williamson, R.C.: The cost of fairness in binary classification. In: Conference on Fairness, accountability and transparency. pp. 107–118. PMLR (2018)

work page 2018

[16] [16]

In: Proceedings of the ACM conference on health, inference, and learning

Oakden-Rayner, L., Dunnmon, J., Carneiro, G., Ré, C.: Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In: Proceedings of the ACM conference on health, inference, and learning. pp. 151–159 (2020)

work page 2020

[17] [17]

9th International Conference on Learning Representations (2021), https://par.nsf.gov/biblio/10279881

Roh, Y., Lee, K., Whang, S.E., Suh, C.: Fairbatch: Batch selection for model fairness. 9th International Conference on Learning Representations (2021), https://par.nsf.gov/biblio/10279881

work page arXiv 2021

[18] [18]

In: proceedings of Medical Image ComputingandComputerAssistedIntervention–MICCAI2025.vol.LNCS15973

Sadri, A.R., DeSilvio, T., Viswanath, S.E.: Mutual Information Regularization for Fairness-aware Deep Imaging Representations . In: proceedings of Medical Image ComputingandComputerAssistedIntervention–MICCAI2025.vol.LNCS15973. Springer Nature Switzerland (September 2025)

work page 2025

[19] [19]

In: International Conference on Learning Representations (2020), https://openreview.net/forum?id=ryxGuJrFvS

Sagawa*, S., Koh*, P.W., Hashimoto, T.B., Liang, P.: Distributionally robust neu- ral networks. In: International Conference on Learning Representations (2020), https://openreview.net/forum?id=ryxGuJrFvS

work page 2020

[20] [20]

In: BIOCOMPUTING 2021: proceedings of the Pacific symposium

Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I.Y., Ghassemi, M.: Chex- clusion: Fairness gaps in deep chest x-ray classifiers. In: BIOCOMPUTING 2021: proceedings of the Pacific symposium. pp. 232–243. World Scientific (2020)

work page 2021

[21] [21]

Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction

Wadsworth, C., Vera, F., Piech, C.: Achieving fairness through adversarial learn- ing: an application to recidivism prediction. ArXivabs/1807.00199(2018), https://api.semanticscholar.org/CorpusID:49558315

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2025

Xu, G., Duan, Y., Liu, Z., Li, X., Jiang, M., Lemmon, M., Jin, W., Shi, Y.: Incor- porating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Experts . In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2025. vol. LNCS 15973. Springer Nature Switzerland (September 2025) Worst-...

work page 2025

[23] [23]

npj Digital Medicine7(1), 286 (2024)

Xu, Z., Li, J., Yao, Q., Li, H., Zhao, M., Zhou, S.K.: Addressing fairness issues in deep learning-based medical image analysis: a systematic review. npj Digital Medicine7(1), 286 (2024)

work page 2024