Neuron Incidence Redistribution for Fairness in Medical Image Classification

Abin Shoby; Lyle John Palmer; Nikhil Cherian Kurian

arxiv: 2605.19393 · v1 · pith:2JIJ7FK6new · submitted 2026-05-19 · 💻 cs.CV · cs.LG

Neuron Incidence Redistribution for Fairness in Medical Image Classification

Abin Shoby , Lyle John Palmer , Nikhil Cherian Kurian This is my paper

Pith reviewed 2026-05-20 05:59 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords medical image classificationdemographic fairnesspenultimate layeractivation varianceregularizationsubgroup disparitiestransfer learningskin lesion diagnosis

0 comments

The pith

Penalizing variance in penultimate-layer neuron activations reduces demographic disparities in medical image classification without needing group labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that transfer-learned medical image models develop biased predictions because dominant neurons in the penultimate layer activate together for both the target disease and privileged demographic groups like older or male patients. This produces over-diagnosis for those groups and under-diagnosis for others. Neuron Incidence Redistribution counters the pattern by adding a regularization term that penalizes variance among the probability-weighted mean activations across all penultimate neurons. The approach spreads latent disease evidence more evenly through the layer. On skin lesion and retinal scan datasets this yields large drops in true-positive and false-positive rate gaps across age, gender, and race while AUC stays the same or improves slightly.

Core claim

In transfer-learned models the dominant penultimate-layer activation channel under positive predictions is co-activated by both disease-positive samples and privileged demographic groups, producing over-diagnosis, while the channel under negative predictions is co-activated by disadvantaged groups, producing under-diagnosis. Neuron Incidence Redistribution penalizes the variance of predicted-probability-weighted mean activations across all penultimate-layer neurons, forcing disease evidence to be distributed more uniformly without any demographic labels at training time.

What carries the argument

Neuron Incidence Redistribution (NIR), a regularization loss that penalizes the variance of predicted-probability-weighted mean activations across penultimate-layer neurons to redistribute latent disease evidence.

Load-bearing premise

The observed demographic disparities are mainly produced by concentrated co-activation in one or two dominant penultimate neurons rather than by biases elsewhere in the network or data.

What would settle it

Applying NIR to the same HAM10000 training setup and finding that age or gender TPR disparity remains above 5 percent while the activation-variance term is active would show the mechanism does not correct the identified root cause.

Figures

Figures reproduced from arXiv: 2605.19393 by Abin Shoby, Lyle John Palmer, Nikhil Cherian Kurian.

read the original abstract

Deep learning models for medical image classification are susceptible to subgroup performance disparities across demographic attributes such as age, gender, and race. We identify a latent representational mechanism underlying these disparities: in transfer-learned models, the dominant penultimate-layer activation channel under positive predictions is co-activated by both disease-positive samples and privileged demographic groups (male, older patients), producing over-diagnosis; conversely, the dominant channel under negative predictions is co-activated by disadvantaged groups (female, younger patients), producing systematic under-diagnosis. To address this, we propose Neuron Incidence Redistribution (NIR), a lightweight regularization method that penalizes the variance of predicted-probability-weighted mean activations across penultimate-layer neurons, requiring no demographic labels at training time. On HAM10000, TPR disparity drops from 10.81% to 0.93% across age groups and from 12.04% to 0.74% across gender, with a marginal AUC improvement of 0.51 points. On Harvard OCT-RNFL, NIR reduces FPR disparity for race (from 15.68% to 10.66%) and age (from 12.69% to 1.80%), demonstrating that distributing latent disease evidence across the full penultimate layer is a principled and effective strategy for improving demographic fairness in medical AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links demographic bias to co-activation in one dominant penultimate channel and offers a simple variance penalty on probability-weighted activations to spread disease evidence more evenly, with solid reported drops in TPR and FPR gaps on two medical datasets.

read the letter

The central point is that transfer-learned medical classifiers show a specific pattern: the strongest penultimate neuron fires for both true positives and privileged demographics like older males, while another channel picks up disadvantaged groups on negatives. NIR adds a regularization term that penalizes variance across the weighted mean activations of all those neurons, pushing the model to distribute the signal instead of concentrating it. No demographic labels are needed at training time, which is the practical hook.

Referee Report

2 major / 2 minor

Summary. The paper identifies a latent representational mechanism in transfer-learned models for medical image classification: the dominant penultimate-layer activation channel under positive predictions is co-activated by both disease-positive samples and privileged demographic groups (e.g., male, older), causing over-diagnosis, while the negative-prediction channel is co-activated by disadvantaged groups, causing under-diagnosis. To address this, the authors propose Neuron Incidence Redistribution (NIR), a lightweight regularization that penalizes the variance of predicted-probability-weighted mean activations across all penultimate-layer neurons without requiring demographic labels at training time. Experiments on HAM10000 report TPR disparity reductions from 10.81% to 0.93% across age and 12.04% to 0.74% across gender, with marginal AUC gains; similar FPR disparity reductions are shown on Harvard OCT-RNFL for race and age.

Significance. If the identified mechanism is causal and NIR specifically corrects it rather than providing generic regularization benefits, this offers a meaningful advance for demographic fairness in medical AI. The method is label-free and computationally light, with reported disparity reductions that are large in magnitude while preserving discriminative performance. Such an approach could be practically useful in clinical settings where demographic annotations are unavailable or restricted.

major comments (2)

[Mechanism Identification and NIR Formulation] The central claim attributes fairness gains to redistribution of disease evidence by severing specific co-activations in the dominant penultimate channel. However, the manuscript provides only observational identification of these activation patterns; no causal tests (e.g., targeted ablation of the dominant channel or pre/post measurement of activation-demographic correlations) are described to confirm that variance penalization directly addresses the root cause rather than acting as an implicit regularizer.
[Experiments and Results] The reported disparity reductions (e.g., TPR gap 10.81% to 0.93% on HAM10000 age) are given as point estimates without error bars, standard deviations across multiple runs, or details of the full experimental protocol including hyperparameter selection for the regularization coefficient and statistical significance testing. This weakens assessment of robustness and reproducibility of the claimed mechanism-specific improvements.

minor comments (2)

[Method] The exact mathematical definition of the NIR regularization term (variance over predicted-probability-weighted mean activations) would benefit from an explicit equation to aid reproducibility.
[Figures] Figure captions and axis labels for activation visualizations could be expanded to clarify how the dominant channel is identified and how pre/post-NIR distributions differ.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We have carefully reviewed the concerns and provide point-by-point responses below. Where the comments identify areas for strengthening, we commit to revisions that will be incorporated in the next version of the paper.

read point-by-point responses

Referee: [Mechanism Identification and NIR Formulation] The central claim attributes fairness gains to redistribution of disease evidence by severing specific co-activations in the dominant penultimate channel. However, the manuscript provides only observational identification of these activation patterns; no causal tests (e.g., targeted ablation of the dominant channel or pre/post measurement of activation-demographic correlations) are described to confirm that variance penalization directly addresses the root cause rather than acting as an implicit regularizer.

Authors: We appreciate the referee's emphasis on establishing a stronger causal link between the observed activation patterns and the fairness improvements from NIR. The manuscript's identification of co-activations in the dominant channels is indeed observational, derived from analyzing activation statistics conditioned on predictions and demographics. This analysis directly informed the design of the variance-penalization objective in NIR. To address the concern, we will add targeted ablation experiments in the revised manuscript: we will zero out the dominant penultimate-layer channel post-training and measure resulting changes in both overall performance and subgroup disparities. We will also report pre- and post-NIR Pearson correlations between neuron activations and demographic attributes across the dataset. These additions will help demonstrate that NIR specifically mitigates the identified co-activation mechanism rather than functioning as generic regularization. revision: yes
Referee: [Experiments and Results] The reported disparity reductions (e.g., TPR gap 10.81% to 0.93% on HAM10000 age) are given as point estimates without error bars, standard deviations across multiple runs, or details of the full experimental protocol including hyperparameter selection for the regularization coefficient and statistical significance testing. This weakens assessment of robustness and reproducibility of the claimed mechanism-specific improvements.

Authors: We agree that reporting variability and experimental details is essential for evaluating robustness. The current results are presented as single-run point estimates. In the revised manuscript, we will rerun all experiments across five random seeds and report means with standard deviations and error bars. We will also expand the experimental protocol section to detail the hyperparameter selection process for the regularization coefficient (including the grid search range and validation-based selection criterion). Finally, we will include statistical significance testing (e.g., paired t-tests or Wilcoxon tests) comparing disparity reductions under NIR versus baselines. These updates will improve reproducibility and allow readers to better assess the reliability of the reported gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; regularization is independent of demographic metrics

full rationale

The paper's core derivation identifies an observational co-activation pattern in penultimate-layer channels, then defines NIR as a variance penalty on predicted-probability-weighted mean activations that requires no demographic labels or disparity targets. Fairness gains (e.g., TPR disparity reductions) are measured post-hoc on held-out test sets and are not forced by construction, as the loss term operates solely on internal activations and model outputs. No self-citations, fitted inputs renamed as predictions, or ansatzes imported via prior work appear in the provided text; the method remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard deep-learning assumptions plus one likely hyperparameter for regularization strength and the domain assumption that penultimate-layer activations encode redistributable disease evidence.

free parameters (1)

Regularization coefficient
The weight on the variance penalty term must be chosen or tuned and directly affects the strength of redistribution.

axioms (1)

domain assumption Penultimate-layer activations contain separable disease evidence that can be redistributed across neurons without loss of discriminative power.
This premise underpins the design of the variance penalty and is invoked to justify why redistribution improves fairness.

pith-pipeline@v0.9.0 · 5765 in / 1357 out tokens · 51184 ms · 2026-05-20T05:59:00.957840+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

NIR, a lightweight regularization method that penalizes the variance of predicted-probability-weighted mean activations across penultimate-layer neurons

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

CheXclusion: Fairness gaps in deep chest X-ray classifiers,

S. Seyyed-Kalantari, G. Liu, M. McDermott, I. Y . Chen, and M. Ghassemi, “CheXclusion: Fairness gaps in deep chest X-ray classifiers,” inProc. Pacific Symp. Biocomputing, 2021, pp. 232–243

work page 2021
[2]

An empirical framework for domain generalization in clinical settings,

H. Zhang, N. Dullerud, L. Seyyed-Kalantari, Q. Morris, S. Joshi, and M. Ghassemi, “An empirical framework for domain generalization in clinical settings,” inProc. ACM Conf. Health, Inference, and Learning (CHIL), 2021, pp. 279–290

work page 2021
[3]

MEDFAIR: Benchmarking fairness for medical imaging,

Y . Zong, Y . Yang, and A. Kan, “MEDFAIR: Benchmarking fairness for medical imaging,” inProc. Int. Conf. Learning Representations (ICLR), 2023

work page 2023
[4]

Evaluating deep neural networks trained on clinical images in dermatology with the ISIC 2019 challenge,

M. Groh, C. Harris, L. Soenksen, et al., “Evaluating deep neural networks trained on clinical images in dermatology with the ISIC 2019 challenge,”J. Investigative Dermatology, vol. 141, no. 5, pp. 1177–1184, 2021

work page 2019
[5]

Ad- dressing artificial intelligence bias in retinal diagnostics,

P. M. Burlina, N. Joshi, K. D. Pacheco, T. Liu, and N. M. Bressler, “Ad- dressing artificial intelligence bias in retinal diagnostics,”Translational Vision Science & Technology, vol. 10, no. 2, pp. 13–13, 2021

work page 2021
[6]

Probabilistic machine learning for healthcare,

I. Y . Chen, S. Joshi, M. Ghassemi, and R. Ranganath, “Probabilistic machine learning for healthcare,”Annual Review of Biomedical Data Science, vol. 4, pp. 393–415, 2021

work page 2021
[7]

Mitigating unwanted biases with adversarial learning,

B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases with adversarial learning,” inProc. AAAI/ACM Conf. AI, Ethics, and Society (AIES), 2018, pp. 335–340

work page 2018
[8]

FairALM: Augmented Lagrangian method for training fair models with little regret,

V . S. Lokhande, A. K. Akash, S. N. Ravi, and V . Singh, “FairALM: Augmented Lagrangian method for training fair models with little regret,” inProc. ECCV, 2020, pp. 365–381

work page 2020
[9]

Equality of opportunity in supervised learning,

M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 3315–3323

work page 2016
[10]

Toy models of super- position,

N. Elhage, T. Hume, C. Olsson, et al., “Toy models of super- position,”Transformer Circuits Thread, 2022. [Online]. Available: https://transformer-circuits.pub/2022/toy model/index.html

work page 2022
[11]

Multimodal neurons in artificial neural networks,

G. Goh, N. Carter, M. Petrov, et al., “Multimodal neurons in artificial neural networks,”Distill, 2021. [Online]. Available: https://distill.pub/2021/multimodal-neurons

work page 2021
[12]

Superposition, memorization, and double descent,

N. Elhage, T. Hume, C. Olsson, et al., “Superposition, memorization, and double descent,”Transformer Circuits Thread, 2022

work page 2022
[13]

Challenging common assumptions in the unsuper- vised learning of disentangled representations,

F. Locatello, S. Bauer, M. Lucic, G. R ¨atsch, S. Gelly, B. Sch ¨olkopf, and O. Bachem, “Challenging common assumptions in the unsuper- vised learning of disentangled representations,” inProc. ICML, 2019, pp. 4114–4124

work page 2019
[14]

In- variant representations without adversarial training,

D. Moyer, S. Gao, R. Brekelmans, A. Galstyan, and G. Ver Steeg, “In- variant representations without adversarial training,” inProc. NeurIPS, 2018, pp. 9084–9093

work page 2018
[15]

FSDR: Frequency space domain randomization for domain generalization,

J. Huang, D. Guan, A. Laili, and S. Lu, “FSDR: Frequency space domain randomization for domain generalization,” inProc. CVPR, 2021, pp. 6891–6901

work page 2021
[16]

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,

P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,”Scientific Data, vol. 5, no. 1, pp. 1–9, 2018

work page 2018
[17]

Harvard glaucoma fairness: A retinal nerve disease dataset for fairness learning and fair identity normalization,

Y . Luo, Y . Tian, M. Shi, L. R. Pasquale, L. Q. Shen, N. Zebardast, T. Elze, and M. Wang, “Harvard glaucoma fairness: A retinal nerve disease dataset for fairness learning and fair identity normalization,” IEEE Transactions on Medical Imaging, vol. 43, no. 7, pp. 2623–2633, 2024

work page 2024
[18]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. CVPR, 2016, pp. 770–778

work page 2016
[19]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inProc. ICLR, 2015

work page 2015

[1] [1]

CheXclusion: Fairness gaps in deep chest X-ray classifiers,

S. Seyyed-Kalantari, G. Liu, M. McDermott, I. Y . Chen, and M. Ghassemi, “CheXclusion: Fairness gaps in deep chest X-ray classifiers,” inProc. Pacific Symp. Biocomputing, 2021, pp. 232–243

work page 2021

[2] [2]

An empirical framework for domain generalization in clinical settings,

H. Zhang, N. Dullerud, L. Seyyed-Kalantari, Q. Morris, S. Joshi, and M. Ghassemi, “An empirical framework for domain generalization in clinical settings,” inProc. ACM Conf. Health, Inference, and Learning (CHIL), 2021, pp. 279–290

work page 2021

[3] [3]

MEDFAIR: Benchmarking fairness for medical imaging,

Y . Zong, Y . Yang, and A. Kan, “MEDFAIR: Benchmarking fairness for medical imaging,” inProc. Int. Conf. Learning Representations (ICLR), 2023

work page 2023

[4] [4]

Evaluating deep neural networks trained on clinical images in dermatology with the ISIC 2019 challenge,

M. Groh, C. Harris, L. Soenksen, et al., “Evaluating deep neural networks trained on clinical images in dermatology with the ISIC 2019 challenge,”J. Investigative Dermatology, vol. 141, no. 5, pp. 1177–1184, 2021

work page 2019

[5] [5]

Ad- dressing artificial intelligence bias in retinal diagnostics,

P. M. Burlina, N. Joshi, K. D. Pacheco, T. Liu, and N. M. Bressler, “Ad- dressing artificial intelligence bias in retinal diagnostics,”Translational Vision Science & Technology, vol. 10, no. 2, pp. 13–13, 2021

work page 2021

[6] [6]

Probabilistic machine learning for healthcare,

I. Y . Chen, S. Joshi, M. Ghassemi, and R. Ranganath, “Probabilistic machine learning for healthcare,”Annual Review of Biomedical Data Science, vol. 4, pp. 393–415, 2021

work page 2021

[7] [7]

Mitigating unwanted biases with adversarial learning,

B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases with adversarial learning,” inProc. AAAI/ACM Conf. AI, Ethics, and Society (AIES), 2018, pp. 335–340

work page 2018

[8] [8]

FairALM: Augmented Lagrangian method for training fair models with little regret,

V . S. Lokhande, A. K. Akash, S. N. Ravi, and V . Singh, “FairALM: Augmented Lagrangian method for training fair models with little regret,” inProc. ECCV, 2020, pp. 365–381

work page 2020

[9] [9]

Equality of opportunity in supervised learning,

M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” inProc. Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 3315–3323

work page 2016

[10] [10]

Toy models of super- position,

N. Elhage, T. Hume, C. Olsson, et al., “Toy models of super- position,”Transformer Circuits Thread, 2022. [Online]. Available: https://transformer-circuits.pub/2022/toy model/index.html

work page 2022

[11] [11]

Multimodal neurons in artificial neural networks,

G. Goh, N. Carter, M. Petrov, et al., “Multimodal neurons in artificial neural networks,”Distill, 2021. [Online]. Available: https://distill.pub/2021/multimodal-neurons

work page 2021

[12] [12]

Superposition, memorization, and double descent,

N. Elhage, T. Hume, C. Olsson, et al., “Superposition, memorization, and double descent,”Transformer Circuits Thread, 2022

work page 2022

[13] [13]

Challenging common assumptions in the unsuper- vised learning of disentangled representations,

F. Locatello, S. Bauer, M. Lucic, G. R ¨atsch, S. Gelly, B. Sch ¨olkopf, and O. Bachem, “Challenging common assumptions in the unsuper- vised learning of disentangled representations,” inProc. ICML, 2019, pp. 4114–4124

work page 2019

[14] [14]

In- variant representations without adversarial training,

D. Moyer, S. Gao, R. Brekelmans, A. Galstyan, and G. Ver Steeg, “In- variant representations without adversarial training,” inProc. NeurIPS, 2018, pp. 9084–9093

work page 2018

[15] [15]

FSDR: Frequency space domain randomization for domain generalization,

J. Huang, D. Guan, A. Laili, and S. Lu, “FSDR: Frequency space domain randomization for domain generalization,” inProc. CVPR, 2021, pp. 6891–6901

work page 2021

[16] [16]

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,

P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,”Scientific Data, vol. 5, no. 1, pp. 1–9, 2018

work page 2018

[17] [17]

Harvard glaucoma fairness: A retinal nerve disease dataset for fairness learning and fair identity normalization,

Y . Luo, Y . Tian, M. Shi, L. R. Pasquale, L. Q. Shen, N. Zebardast, T. Elze, and M. Wang, “Harvard glaucoma fairness: A retinal nerve disease dataset for fairness learning and fair identity normalization,” IEEE Transactions on Medical Imaging, vol. 43, no. 7, pp. 2623–2633, 2024

work page 2024

[18] [18]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. CVPR, 2016, pp. 770–778

work page 2016

[19] [19]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inProc. ICLR, 2015

work page 2015