People-Centred Medical Image Analysis
Pith reviewed 2026-05-07 16:30 UTC · model grok-4.3
The pith
PecMan uses a dynamic gating mechanism to jointly optimize fairness, accuracy, and clinician workload in medical image analysis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that a people-centred approach to medical image analysis, implemented via PecMan's dynamic gating that routes cases to AI, human clinicians, or joint review while respecting workload limits, achieves better combined performance on accuracy, fairness across diverse populations, and workflow integration than prior separate solutions.
What carries the argument
The dynamic gating mechanism within PecMan, which assigns each medical image case to AI alone, clinician alone, or both, subject to overall clinician availability constraints, while pursuing joint optimization of diagnostic accuracy and fairness.
If this is right
- Performance biases that hinder regulatory approval can be mitigated by explicit fairness optimization.
- Clinician adoption increases when AI does not disrupt established workflows or overload staff.
- Trade-offs between the three goals can be quantified and managed using the FairHAI benchmark.
- The framework demonstrates consistent gains over methods optimizing only subsets of these objectives.
Where Pith is reading between the lines
- Applying similar gating logic could help in other high-stakes domains with scarce expert time.
- Real-world deployment might require adapting the workload model to specific hospital schedules and team structures.
- The benchmark provides a template for testing other human-AI systems on fairness and integration metrics simultaneously.
Load-bearing premise
The assumption that clinician availability can be modeled as a simple dynamic constraint that captures real clinical settings without overlooking workflow disruptions or introducing new barriers.
What would settle it
If a study in an actual clinic finds that using PecMan results in lower overall diagnostic quality or higher clinician burnout than using separate fairness and deferral tools, the joint optimization benefit would be falsified.
Figures
read the original abstract
Recent advances in data-centric medical AI have produced highly accurate diagnostic systems, but the emphasis on data curation and performance metrics has not translated into widespread clinical adoption. We conjecture that this limited uptake stems from insufficient attention dedicated to the optimisation of fair performance across diverse patient populations and to workflow integration: performance biases can create regulatory barriers, and poorly integrated automation can disrupt clinical routines, degrade the quality of human-AI collaboration, and reduce clinicians' willingness to adopt AI tools. Prior work on workflow integration (e.g., Learning to Defer (L2D) and Learning to Complement (L2C)) and AI fairness has typically examined these challenges in isolation, overlooking their natural interdependence and the practical constraints of clinical environments, such as restricted clinician availability. We propose People-Centred Medical Image Analysis (PecMan), a human-AI framework that jointly optimises fairness, diagnostic accuracy, and workflow effectiveness through a dynamic gating mechanism that assigns cases to AI, clinicians, or both under clinician workload constraints. We also introduce the Fairness and Human-Centred AI (FairHAI) benchmark for evaluating trade-offs between accuracy, fairness, and clinician workload. Experiments using this benchmark show that PecMan consistently outperforms existing methods, paving the way for more trustworthy and clinically viable AI systems. Code will be available upon paper acceptance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes People-Centred Medical Image Analysis (PecMan), a human-AI framework that jointly optimizes fairness, diagnostic accuracy, and workflow effectiveness through a dynamic gating mechanism that assigns cases to AI, clinicians, or both under clinician workload constraints. It introduces the FairHAI benchmark for evaluating trade-offs between accuracy, fairness, and clinician workload, and reports that experiments show PecMan consistently outperforms existing methods.
Significance. If the results hold and the modeled constraints align with real clinical environments, this work would be significant in advancing clinically viable medical AI by addressing the interdependence of fairness and workflow integration, areas previously studied in isolation. The FairHAI benchmark could serve as a useful tool for future research in human-centred AI.
major comments (1)
- The central claim that PecMan outperforms baselines on FairHAI depends on the dynamic gating jointly optimizing under a modeled clinician availability constraint. However, this treats availability as a clean resource allocation problem, while real clinical settings introduce unmodeled factors including communication costs, decision latency, EHR integration friction, and variable case complexity that could invert the trade-offs. Without validation that the synthetic constraint matches observed clinical logs, the outperformance does not establish clinical viability.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our manuscript. We address the major concern point by point below, acknowledging the limitations of our modeled constraints while clarifying the scope of our claims.
read point-by-point responses
-
Referee: The central claim that PecMan outperforms baselines on FairHAI depends on the dynamic gating jointly optimizing under a modeled clinician availability constraint. However, this treats availability as a clean resource allocation problem, while real clinical settings introduce unmodeled factors including communication costs, decision latency, EHR integration friction, and variable case complexity that could invert the trade-offs. Without validation that the synthetic constraint matches observed clinical logs, the outperformance does not establish clinical viability.
Authors: We agree that the clinician availability constraint in PecMan and FairHAI is modeled as a simplified resource allocation problem and does not incorporate additional real-world factors such as communication costs, decision latency, EHR integration friction, and variable case complexity. The FairHAI benchmark is a controlled, synthetic environment intended to isolate and evaluate the effects of joint optimization of fairness, accuracy, and workflow under workload constraints. Our central claim is limited to outperformance within this benchmark; we do not assert that the results establish clinical viability. In the revised manuscript, we will expand the limitations and discussion sections to explicitly address these unmodeled factors, analyze how they could alter the observed trade-offs, and propose directions for empirical validation against clinical logs and real workflow data. revision: yes
Circularity Check
No circularity in derivation chain; framework and benchmark are independently proposed
full rationale
The paper introduces PecMan as a joint optimization framework via dynamic gating under workload constraints and the FairHAI benchmark, with performance claims resting on experimental comparisons rather than any closed-form derivation, fitted parameter renamed as prediction, or self-referential definition. No equations, ansatzes, or uniqueness theorems are presented in the provided text that reduce to inputs by construction. Prior work on L2D/L2C is cited externally without self-citation load-bearing the central claim. The derivation chain is self-contained as a proposal validated by new experiments.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Li, Y. Jiang, Y. Zhang, H. Zhu, Medical image analysis using deep learning algorithms, Frontiers in Public Health 11 (2023) 1273253
2023
-
[2]
T. R. C. of Radiologists, Clinical radiology workforce census 2023, Tech. rep., The Royal College of Radiologists (2023). 21
2023
-
[3]
E. J. Topol, High-performance medicine: The convergence of human and artificial intelligence, Nature medicine 25 (1) (2019) 44–56
2019
-
[4]
M. M. Abuzaid, W. Elshami, H. Tekin, B. Issa, Assessment of the will- ingness of radiologists and radiographers to accept the integration of artificial intelligence into radiology practice, Academic Radiology 29 (1) (2022) 87–94
2022
-
[5]
Derevianko, S
A. Derevianko, S. F. M. Pizzoli, F. Pesapane, A. Rotili, D. Monzani, R. Grasso, E. Cassano, G. Pravettoni, The use of artificial intelligence (ai) in the radiology field: What is the state of doctor–patient commu- nication in cancer diagnosis?, Cancers 15 (2) (2023) 470
2023
-
[6]
Zhang, Y
S. Zhang, Y. Li, W. Liu, Q. Chu, Y. Chen, A decade of review in global regulation and research of artificial intelligence medical devices (2015- 2025), Frontiers in Medicine 12 (2025) 1630408
2015
-
[7]
Jones, J
C. Jones, J. Thornton, J. C. Wyatt, Artificial intelligence and clinical decision support: clinicians’ perspectives on trust, trustworthiness, and liability, Medical law review 31 (4) (2023) 501–520
2023
-
[8]
Kumah, Artificial intelligence in healthcare and its implications for patient centered care, Discover Public Health 22 (1) (2025) 524
E. Kumah, Artificial intelligence in healthcare and its implications for patient centered care, Discover Public Health 22 (1) (2025) 524
2025
-
[9]
S. S. Jain, S. Goto, J. L. Hall, S. S. Khan, C. A. MacRae, C. Ofori, C. Pegus, M. Pencina, E. D. Peterson, L. H. Schwamm, et al., Pragmatic approaches to the evaluation and monitoring of artificial intelligence in health care: A science advisory from the american heart association, Circulation 152 (23) (2025) e433–e442
2025
-
[10]
E. U. Alum, O. P.-C. Ugwu, Artificial intelligence in personalized medicine: transforming diagnosis and treatment, Discover Applied Sci- ences 7 (3) (2025) 193
2025
-
[11]
L. A. Celi, J. Cellini, M.-L. Charpignon, E. C. Dee, F. Dernoncourt, R. Eber, W. G. Mitchell, L. Moukheiber, J. Schirmer, J. Situ, et al., Sources of bias in artificial intelligence that perpetuate healthcare dis- parities - a global review, PLOS Digital Health 1 (3) (2022) e0000022. 22
2022
-
[12]
Oakden-Rayner, J
L. Oakden-Rayner, J. Dunnmon, G. Carneiro, C. Ré, Hidden stratifica- tion causes clinically meaningful failures in machine learning for medical imaging, in: ACM CHIL, 2020, pp. 151–159
2020
-
[13]
M. A. Ricci Lara, R. Echeveste, E. Ferrante, Addressing fairness in artificialintelligenceformedicalimaging, NatureCommunications13(1) (2022) 4581
2022
-
[14]
Madras, T
D. Madras, T. Pitassi, R. Zemel, Predict responsibly: Improving fairness and accuracy by learning to defer, in: NeurIPS, Vol. 31, 2018
2018
-
[15]
Wilder, E
B. Wilder, E. Horvitz, E. Kamar, Learning to complement humans, in: International Joint Conference on Artificial Intelligence, 2021
2021
-
[16]
Y. Zong, Y. Yang, T. Hospedales, MEDFAIR: Benchmarking fairness for medical imaging, in: ICLR, 2023
2023
-
[17]
Iqbal, M
T. Iqbal, M. Masud, B. Amin, C. Feely, M. Faherty, T. Jones, M. Tier- ney, A. Shahzad, P. Vazquez, Towards integration of artificial intelli- gence into medical devices as a real-time recommender system for per- sonalised healthcare: State-of-the-art and future prospects, Health Sci- ences Review (2024)
2024
-
[18]
Quadrianto, V
N. Quadrianto, V. Sharmanska, O. Thomas, Discovering fair represen- tations in the data domain, in: CVPR, 2019, pp. 8227–8236
2019
-
[19]
Zhang, J
Y. Zhang, J. Sang, Towards accuracy-fairness paradox: Adversarial example-based data augmentation for visual debiasing, in: ACM Multi- media, 2020
2020
-
[20]
V. V. Ramaswamy, S. S. Kim, O. Russakovsky, Fair attribute classifica- tion through latent space de-biasing, in: CVPR, 2021, pp. 9301–9310
2021
-
[21]
S. Park, J. Lee, P. Lee, S. Hwang, D. Kim, H. Byun, Fair contrastive learning for facial attribute classification, in: CVPR, 2022, pp. 10389– 10398
2022
-
[22]
Y. Roh, K. Lee, S. Whang, C. Suh, Fr-train: A mutual information- based approach to fair and robust training, in: ICML, PMLR, 2020, pp. 8147–8157. 23
2020
-
[23]
M. B. Zafar, I. Valera, M. G. Rogriguez, K. P. Gummadi, Fairness con- straints: Mechanisms for fair classification, in: AISTATS, PMLR, 2017, pp. 962–970
2017
-
[24]
B. H. Zhang, B. Lemoine, M. Mitchell, Mitigating unwanted biases with adversarial learning, in: AIES, 2018, pp. 335–340
2018
-
[25]
Z. Wang, X. Dong, H. Xue, Z. Zhang, W. Chiu, T. Wei, K. Ren, Fairness-aware adversarial perturbation towards bias mitigation for de- ployed deep models, in: CVPR, 2022, pp. 10379–10388
2022
-
[26]
M. P. Kim, A. Ghorbani, J. Zou, Multiaccuracy: Black-box post- processing for fairness in classification, in: AIES, 2019, pp. 247–254
2019
-
[27]
Herington, M
J. Herington, M. D. McCradden, K. Creel, R. Boellaard, E. C. Jones, A. K. Jha, A. Rahmim, P. J. Scott, J. J. Sunderland, R. L. Wahl, et al., Ethical considerations for artificial intelligence in medical imaging: de- ployment and governance, Journal of Nuclear Medicine 64 (10) (2023) 1509–1515
2023
-
[28]
Obermeyer, B
Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting racial bias in an algorithm used to manage the health of populations, Science 366 (6464) (2019)
2019
-
[29]
A. J. Larrazabal, N. Nieto, V. Peterson, D. H. Milone, E. Ferrante, Gen- der imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis, National Academy of Sciences 117 (23) (2020) 12592–12594
2020
-
[30]
V. C. Nitesh, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research 16 (1) (2002) 321
2002
-
[31]
Pleiss, M
G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, K. Q. Weinberger, On fairness and calibration, in: NeurIPS, Vol. 30, 2017
2017
-
[32]
B. Kim, H. Kim, K. Kim, S. Kim, J. Kim, Learning not to learn: Train- ing deep neural networks with biased data, in: CVPR, 2019, pp. 9012– 9020
2019
-
[33]
Madras, E
D. Madras, E. Creager, T. Pitassi, R. Zemel, Learning adversarially fair and transferable representations, in: ICML, PMLR, 2018, pp. 3384– 3393. 24
2018
- [34]
-
[35]
Sagawa, P
S. Sagawa, P. W. Koh, T. B. Hashimoto, P. Liang, Distributionally robust neural networks for group shifts: On the importance of regular- ization for worst-case generalization, in: ICLR, 2020
2020
-
[36]
J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, S. Park, Swad: Domain generalization by seeking flat minima, in: NeurIPS, Vol. 34, 2021, pp. 22405–22418
2021
-
[37]
Tartaglione, C
E. Tartaglione, C. A. Barbano, M. Grangetto, End: Entangling and disentangling deep representations for bias correction, in: CVPR, 2021, pp. 13508–13517
2021
-
[38]
M. H. Sarhan, N. Navab, A. Eslami, S. Albarqouni, Fairness by learning orthogonal disentangled representations, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, Springer, 2020, pp. 746–761
2020
-
[39]
Y. Tian, M. Shi, Y. Luo, A. Kouhana, T. Elze, M. Wang, Fairseg: A large-scale medical image segmentation dataset for fairness learning us- ing segment anything model with fair error-bound scaling, in: ICLR, 2024
2024
-
[40]
Y. Luo, M. Shi, M. O. Khan, M. M. Afzal, H. Huang, S. Yuan, Y. Tian, L. Song, A. Kouhana, T. Elze, et al., FairCLIP: Harnessing fairness in vision-language learning, in: CVPR, 2024, pp. 12289–12301
2024
-
[42]
Rosenfeld, M
A. Rosenfeld, M. D. Solbach, J. K. Tsotsos, Totally looks like-how hu- mans compare, compared to machines, in: IEEE Conf. Comput. Vis. Pattern Recog. Worksh., 2018, pp. 1961–1964
2018
-
[43]
Serre, Deep learning: the good, the bad, and the ugly, Annual Review of Vision Science 5 (2019) 399–426
T. Serre, Deep learning: the good, the bad, and the ugly, Annual Review of Vision Science 5 (2019) 399–426. 25
2019
-
[44]
Kamar, S
E. Kamar, S. Hacker, E. Horvitz, Combining human and machine in- telligence in large-scale crowdsourcing., in: International Conference on Autonomous Agents and Multiagent Systems, Vol. 12, 2012, pp. 467– 474
2012
-
[45]
E. K. Chiou, J. D. Lee, Trusting automation: Designing for responsivity and resilience, Human Factors 65 (1) (2023) 137–165
2023
-
[46]
Z. Lu, M. Yin, Human reliance on machine learning models when per- formance feedback is limited: Heuristics and risks, in: CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–16
2021
-
[47]
M. Yin, J. Wortman Vaughan, H. Wallach, Understanding the effect of accuracy on trust in machine learning models, in: CHI Conference on Human Factors in Computing Systems, 2019, pp. 1–12
2019
-
[48]
D. Shin, The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI, International Journal of Human-Computer Studies 146 (2021) 102551
2021
-
[49]
do you trust me?
K. Weitz, D. Schiller, R. Schlagowski, T. Huber, E. André, "do you trust me?" increasing user-trust by integrating virtual agents in explainable ai interaction design, in: ACM International Conference on Intelligent Virtual Agents, 2019, pp. 7–9
2019
-
[50]
Bansal, B
G. Bansal, B. Nushi, E. Kamar, E. Horvitz, D. S. Weld, Is the most accurate AI the best teammate? Optimizing AI for teamwork, in: AAAI Conference on Artificial Intelligence, Vol. 35(13), 2021, pp. 11405–11414
2021
-
[51]
Agarwal, A
N. Agarwal, A. Moehring, P. Rajpurkar, T. Salz, Combining human ex- pertise with artificial intelligence: experimental evidence from radiology, Tech. rep., National Bureau of Economic Research (2023)
2023
-
[52]
Vodrahalli, R
K. Vodrahalli, R. Daneshjou, T. Gerstenberg, J. Zou, Do humans trust advice more if it comes from ai? an analysis of human-AI interactions, in: AIES, 2022, pp. 763–777
2022
-
[53]
X. Wu, L. Xiao, Y. Sun, J. Zhang, T. Ma, L. He, A survey of human- in-the-loop for machine learning, Future Generation Computer Systems 135 (C) (2022) 364–381.doi:10.1016/j.future.2022.05.014. URLhttps://doi.org/10.1016/j.future.2022.05.014 26
-
[54]
Keswani, M
V. Keswani, M. Lease, K. Kenthapadi, Towards unbiased and accurate deferral to multiple experts, in: AIES, 2021, pp. 154–165
2021
-
[55]
Narasimhan, W
H. Narasimhan, W. Jitkrittum, A. K. Menon, A. Rawat, S. Kumar, Post-hoc estimators for learning to defer to an expert, in: NeurIPS, Vol. 35, 2022
2022
-
[56]
A. Mao, C. Mohri, M. Mohri, Y. Zhong, Two-stage learning to defer with multiple experts, in: NeurIPS, 2023
2023
-
[57]
Zhang, C
Z. Zhang, C. Nguyen, K. Wells, T.-T. Do, D. Rosewarne, G. Carneiro, Coverage-constrained human-ai cooperation with multiple experts, in: AAAI, 2026
2026
-
[58]
Cortes, G
C. Cortes, G. DeSalvo, M. Mohri, Learning with rejection, in: ALT, Springer, 2016
2016
-
[59]
Charoenphakdee, Z
N. Charoenphakdee, Z. Cui, Y. Zhang, M. Sugiyama, Classification with rejection based on cost-sensitive classification, in: ICML, PMLR, 2021, pp. 1507–1517
2021
-
[60]
Raghu, K
M. Raghu, K. Blumer, G. Corrado, J. Kleinberg, Z. Obermeyer, S. Mul- lainathan, The algorithmic automation problem: Prediction, triage, and human effort, in: Machine Learning for Health Symposium, 2018
2018
-
[61]
Okati, A
N. Okati, A. De, M. Rodriguez, Differentiable learning under triage 34 (2021) 9140–9151
2021
-
[62]
Mozannar, D
H. Mozannar, D. Sontag, Consistent estimators for learning to defer to an expert, in: ICML, PMLR, 2020, pp. 7076–7087
2020
-
[63]
Verma, E
R. Verma, E. Nalisnick, Calibrated learning to defer with one-vs-all classifiers, in: ICML, PMLR, 2022, pp. 22184–22202
2022
-
[64]
Mozannar, H
H. Mozannar, H. Lang, D. Wei, P. Sattigeri, S. Das, D. Sontag, Who should predict? Exact algorithms for learning to defer to humans, in: AISTATS, PMLR, 2023, pp. 10520–10545
2023
-
[65]
Charusaie, H
M.-A. Charusaie, H. Mozannar, D. Sontag, S. Samadi, Sample efficient learningofpredictorsthatcomplementhumans, in: ICML,PMLR,2022, pp. 2972–3005. 27
2022
-
[66]
Y. Cao, H. Mozannar, L. Feng, H. Wei, B. An, In defense of soft- max parametrization for calibrated and consistent learning to defer, in: NeurIPS, Vol. 36, 2024
2024
-
[67]
Straitouri, L
E. Straitouri, L. Wang, N. Okati, M. G. Rodriguez, Improving expert predictions with conformal prediction, in: ICML, PMLR, 2023, pp. 32633–32653
2023
-
[68]
S. Liu, Y. Cao, Q. Zhang, L. Feng, B. An, Mitigating underfitting in learning to defer with consistent losses, in: AISTATS, 2024
2024
-
[69]
Mozannar, A
H. Mozannar, A. Satyanarayan, D. Sontag, Teaching humans when to defer to a classifier via exemplars, in: AAAI Conference on Artificial Intelligence, Vol. 36, 2022, pp. 5323–5331
2022
-
[70]
Verma, D
R. Verma, D. Barrejón, E. Nalisnick, On the calibration of learning to defer to multiple experts, in: ICML Workshop on HMCT, 2022
2022
-
[71]
Verma, D
R. Verma, D. Barrejon, E. Nalisnick, Learning to defer to multiple ex- perts: Consistent surrogate losses, confidence calibration, and conformal ensembles, in: AISTATS, PMLR, 2023, pp. 11415–11434
2023
-
[72]
Babbar, U
V. Babbar, U. Bhatt, A. Weller, On the utility of prediction sets in human-AI teams, in: International Joint Conference on Artificial Intel- ligence, 2022
2022
-
[73]
A. Mao, M. Mohri, Y. Zhong, Principled approaches for learning to defer with multiple experts, in: International Symposium on Artificial Intelligence and Mathematics, 2024
2024
-
[74]
Hemmer, L
P. Hemmer, L. Thede, M. Vössing, J. Jakubik, N. Kühl, Learning to defer with limited expert predictions, in: AAAI Conference on Artificial Intelligence, Vol. 37, 2023, pp. 6002–6011
2023
-
[75]
Tailor, A
D. Tailor, A. Patra, R. Verma, P. Manggala, E. Nalisnick, Learning to defer to a population: A meta-learning approach, in: AISTATS, 2024
2024
-
[76]
Leitão, P
D. Leitão, P. Saleiro, M. A. Figueiredo, P. Bizarro, Human-AI collabora- tion in decision-making: Beyond learning to defer, in: ICML Workshop on Human-Machine Collaboration and Teaming, 2022. 28
2022
-
[77]
Steyvers, H
M. Steyvers, H. Tejeda, G. Kerrigan, P. Smyth, Bayesian modeling of human–AI complementarity, National Academy of Sciences 119 (11) (2022) e2111547119
2022
-
[78]
Kerrigan, P
G. Kerrigan, P. Smyth, M. Steyvers, Combining human predictions with model probabilities via confusion matrices and calibration, in: NeurIPS, Vol. 34, 2021, pp. 4421–4434
2021
- [79]
-
[80]
P. Hemmer, S. Schellhammer, M. Vössing, J. Jakubik, G. Satzger, Forming effective human-AI teams: Building machine learning models that complement the capabilities of multiple experts, in: L. D. Raedt (Ed.), International Joint Conference on Artificial Intelligence, Interna- tional Joint Conferences on Artificial Intelligence Organization, 2022, pp. 2478–...
-
[81]
Zhang, W
Z. Zhang, W. Ai, K. Wells, D. Rosewarne, T.-T. Do, G. Carneiro, Learn- ing to complement and to defer to multiple users, in: ECCV, Springer, 2025, pp. 144–162
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.