Recognition: unknown
When To Adapt? Adapting the Model or Data in Federated Medical Imaging
Pith reviewed 2026-05-09 20:29 UTC · model grok-4.3
The pith
Choosing between model personalization and data harmonization in federated medical imaging depends on the type of domain heterogeneity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the relative effectiveness of data harmonization versus model personalization in addressing domain heterogeneity in federated medical imaging is conditional on the nature of the shifts. Harmonization outperforms when variations are mainly appearance-based, such as in chest X-ray classification for tuberculosis, whereas personalization is superior for structural differences, as in colon polyp segmentation. In cases of limited inter-client variation, both yield similar results. This conclusion comes from a unified evaluation of state-of-the-art methods across six diverse medical imaging tasks covering segmentation and classification.
What carries the argument
A unified evaluation framework that applies multiple harmonization and personalization methods to six medical imaging datasets representing appearance-based and structural domain shifts.
If this is right
- Harmonization methods should be selected for tasks dominated by appearance variations like certain classifications.
- Personalization approaches are better suited to structural variations in segmentation tasks.
- Both strategies are viable when domain differences between clients are small.
- Guidelines for method selection can be based on diagnosing whether shifts are appearance or structural in nature.
- Hybrid methods combining elements of both may address cases with mixed heterogeneity types.
Where Pith is reading between the lines
- Clinics could analyze their data distributions beforehand to decide on the adaptation strategy before federated training.
- The findings may generalize to non-medical federated learning if similar types of heterogeneity are present.
- Developing adaptive systems that switch or combine harmonization and personalization based on detected shift types could be a next step.
- Further validation on larger-scale real federated networks with actual institutional data would strengthen the practical guidelines.
Load-bearing premise
The selected six medical imaging settings adequately represent the variety of domain shifts that arise in real federated medical imaging applications.
What would settle it
A counterexample in which a new dataset with primarily appearance-based shifts shows personalization clearly outperforming harmonization would falsify the conditional trade-off claim.
Figures
read the original abstract
Federated learning enables collaborative model training across medical institutions without sharing raw data, but its performance is often limited by domain heterogeneity across clients. Existing approaches to address this challenge fall into two main paradigms: model-side personalization, which adapts model parameters to each client, and data-side harmonization, which reduces inter-client variation at the input level. Despite their widespread use, these strategies have not been systematically compared. In this work, we conduct a comprehensive study across six medical imaging settings-colon polyp, skin lesion, and breast tumor segmentation, and tuberculosis CXR, brain tumor, and breast tumor classification-covering diverse types of domain shift. We evaluate a broad set of state-of-the-art harmonization and personalization methods under a unified framework. Our results reveal a conditional trade-off driven by the nature of heterogeneity: harmonization is more effective when variation is primarily appearance-based (e.g., CXR classification), while personalization performs better when differences are structural (e.g., colon polyp segmentation). When inter-client variation is limited, both strategies perform similarly. These findings demonstrate that the effectiveness of adaptation in federated medical imaging depends on the type and magnitude of domain shift rather than the strategy alone. We provide practical guidelines for selecting between harmonization and personalization and highlight directions for future hybrid approaches that combine both paradigms. Code is available at https://github.com/ChamaniS/WhenToAdapt.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper performs an empirical comparison of data-side harmonization methods versus model-side personalization methods in federated learning for medical imaging. Across six tasks (colon polyp, skin lesion, and breast tumor segmentation; tuberculosis CXR, brain tumor, and breast tumor classification), it reports a conditional trade-off: harmonization outperforms when domain shift is primarily appearance-based (e.g., CXR), personalization is better for structural differences (e.g., polyp segmentation), and the two are comparable when inter-client variation is low. The work supplies practical guidelines and points to hybrid approaches, with code released.
Significance. If the reported ordering holds under a clearer definition of heterogeneity type, the study supplies actionable guidance for federated medical imaging deployments and highlights the value of matching adaptation strategy to shift characteristics rather than applying one paradigm universally. The public code release is a positive contribution that supports reproducibility.
major comments (2)
- [Abstract and §4 (Results)] Abstract and §4 (Results): the central claim that the performance ordering is 'driven by the nature of heterogeneity' (appearance-based vs. structural) rests on post-hoc qualitative labels assigned to the six tasks. No section defines or computes a quantitative proxy (e.g., intensity histogram divergence, low-level feature statistics, or shape/semantic variance) that would allow the observed trade-off to be tested on new data or shown to track the claimed driver rather than dataset-specific artifacts or partitioning choices.
- [§3 (Experimental Setup) and §4] §3 (Experimental Setup) and §4: the manuscript does not report a measurable criterion or ablation that isolates appearance-based from structural shift, so the conditional guideline cannot be shown to generalize beyond the chosen six settings; the weakest assumption identified in the review (that these tasks sufficiently span real-world domain shifts) therefore remains unaddressed.
minor comments (2)
- [§4 and Tables/Figures] Add error bars, statistical significance tests, and explicit data-split / hyperparameter details to all result tables and figures so that the reported performance differences can be independently verified.
- [§3] Clarify the exact client partitioning and domain-shift simulation protocol for each of the six datasets so readers can reproduce the heterogeneity levels.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important ways to strengthen the presentation of our empirical findings. We address each major comment below, indicating the revisions we will incorporate in the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract and §4 (Results)] Abstract and §4 (Results): the central claim that the performance ordering is 'driven by the nature of heterogeneity' (appearance-based vs. structural) rests on post-hoc qualitative labels assigned to the six tasks. No section defines or computes a quantitative proxy (e.g., intensity histogram divergence, low-level feature statistics, or shape/semantic variance) that would allow the observed trade-off to be tested on new data or shown to track the claimed driver rather than dataset-specific artifacts or partitioning choices.
Authors: We agree that the absence of quantitative proxies leaves the central claim more vulnerable to the interpretation that it reflects dataset idiosyncrasies rather than a general principle. Our qualitative labels were assigned on the basis of established domain knowledge about each task (scanner/protocol-induced intensity variation for CXR versus morphological and lesion-shape differences for polyp segmentation). In the revised manuscript we will add a dedicated paragraph in §4 that reports two simple quantitative proxies computed on the client-wise data distributions: (1) average pairwise Jensen-Shannon divergence of intensity histograms and (2) client-wise variance of low-level texture features (e.g., local binary pattern histograms). These numbers will be tabulated alongside the qualitative labels to provide supporting evidence. We will also add a short discussion of how such metrics could be used prospectively to select an adaptation strategy for a new deployment. This revision directly addresses the post-hoc labeling concern while remaining within the scope of the existing experimental design. revision: yes
-
Referee: [§3 (Experimental Setup) and §4] §3 (Experimental Setup) and §4: the manuscript does not report a measurable criterion or ablation that isolates appearance-based from structural shift, so the conditional guideline cannot be shown to generalize beyond the chosen six settings; the weakest assumption identified in the review (that these tasks sufficiently span real-world domain shifts) therefore remains unaddressed.
Authors: We concur that controlled ablations isolating appearance-based from structural shifts would provide stronger causal evidence and improve generalizability claims. Designing such ablations without introducing unrealistic artifacts is non-trivial for medical images; therefore our study deliberately used naturally occurring shifts from public datasets. In the revision we will (i) expand the task-selection rationale in §3 with explicit references to the primary shift characteristics documented in the source dataset papers, and (ii) add a limitations subsection that acknowledges the finite coverage of the six tasks and lists concrete directions for future isolating experiments (e.g., style-transfer augmentations for appearance shifts and elastic deformations for structural shifts). These additions clarify the scope of the current guidelines without overstating their universality. revision: partial
Circularity Check
Empirical evaluation of existing methods with no derivation or self-referential reduction.
full rationale
The paper performs a comparative study of harmonization and personalization methods across six fixed medical imaging datasets. All claims (conditional trade-off by heterogeneity type) are presented as direct observations from reported performance metrics rather than derived from equations, fitted parameters, or self-citations that reduce to the inputs. No load-bearing step matches any enumerated circularity pattern; the work is self-contained against external benchmarks and does not rename or smuggle prior results as new derivations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Domain heterogeneity in federated medical imaging can be usefully classified as primarily appearance-based or structural.
Reference graph
Works this paper leans on
-
[1]
In: Proc
McMahan, B., Moore, E., Ramage, D., Hampson, S., Aguera y Arcas, B.: Communication- Efficient Learning of Deep Networks from Decentralized Data. In: Proc. of AISTATS, pp. When to Adapt? Adapting the Model or Data in Federated Medical Imaging 9 1273–1282. PMLR, Fort Lauderdale, FL, USA (2017)
2017
-
[2]
IEEE Access11, 54188–54209 (2023)
Shiranthika, C., Saeedi, P., Bajić, I.V.: Decentralized Learning in Health- care: A Review of Emerging Techniques. IEEE Access11, 54188–54209 (2023). https://doi.org/10.1109/ACCESS.2023.3281832
-
[3]
IEEE Access12, 182496–182515 (2024)
Shiranthika, C., Hadizadeh, H., Saeedi, P., Bajić, I.V.: Adaptive Asynchronous Split Fed- erated Learning for Medical Image Segmentation. IEEE Access12, 182496–182515 (2024). https://doi.org/10.1109/ACCESS.2024.3511430
-
[4]
Federated Learning with Personalization Layers
Arivazhagan, M.G., Aggarwal, V., Singh, A.K., Choudhary, S.: Federated Learning with Per- sonalization Layers. arXiv preprint arXiv:1912.00818 (2019)
work page internal anchor Pith review arXiv 1912
-
[5]
In: Proc
Collins, L., Hassani, H., Mokhtari, A., Shakkottai, S.: Exploiting Shared Representations for Personalized Federated Learning. In: Proc. of ICML, Proc. Mach. Learn. Res., vol. 139, pp. 2089–2099. PMLR (2021)
2089
-
[6]
Fedbn: Federated learning on non-iid features via local batch normalization,
Li, X., Jiang, M., Zhang, X., Kamp, M., Dou, Q.: FedBN: Federated Learning on Non-IID Features via Local Batch Normalization. arXiv preprint arXiv:2102.07623 (2021)
-
[7]
Chen, F., Luo, M., Dong, Z., Li, Z., He, X.: Federated Meta-Learning with Fast Convergence and Efficient Communication. arXiv preprint arXiv:1802.07876 (2018)
-
[9]
In: Proc
Fallah, A., Mokhtari, A., Ozdaglar, A.: Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach. In: Proc. of NeurIPS, 33, 3557–3568 (2020)
2020
-
[10]
In: Proc
Huang, Y., Chu, L., Zhou, Z., Wang, L., Liu, J., Pei, J., Zhang, Y.: Personalized Cross-Silo Federated Learning on Non-IID Data. In: Proc. of AAAI 35(9), 7865–7873 (2021)
2021
-
[11]
In: Proc
Smith, V., Chiang, C.-K., Sanjabi, M., Talwalkar, A.S.: Federated Multi-Task Learning. In: Proc. of NeurIPS, 30 (2017)
2017
-
[12]
arXiv preprint arXiv:1906.06268 , year=
Corinzia, L., Beuret, A., Buhmann, J.M.: Variational Federated Multi-Task Learning. arXiv preprint arXiv:1906.06268 (2019)
-
[13]
In: Proc
Ghosh, A., Chung, J., Yin, D., Ramchandran, K.: An Efficient Framework for Clustered Fed- erated Learning. In: Proc. of NeurIPS, 33, 19586–19597 (2020)
2020
-
[14]
IEEE BigData 2020, pp
Chen, C., Chen, Z., Zhou, Y., Kailkhura, B.: FedCluster: Boosting the Convergence of Feder- ated Learning via Cluster-Cycling. IEEE BigData 2020, pp. 5017–5026. IEEE (2020)
2020
-
[15]
Duan, M., Liu, D., Ji, X., Liu, R., Liang, L., Chen, X., Tan, Y.: FedGroup: Efficient Fed- erated Learning via Decomposed Similarity-Based Clustering. In: 2021 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud...
2021
-
[16]
Fedmd: Heterogenous federated learning via model distillation,
Li, D., Wang, J.: FedMD: Heterogenous Federated Learning via Model Distillation. arXiv preprint arXiv:1910.03581 (2019)
-
[17]
Zhang, H., Zhu, Y., Wu, T., Cheng, S., Liu, J.: Model-Heterogeneous Federated Learn- ing With Bidirectional Knowledge Distillation. IEEE Trans. Mob. Comput. (2025). https://doi.org/10.1109/TMC.2025.3599315
-
[18]
In: Proc
Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečný, J., Kumar, S., McMahan, H.B.: Adaptive Federated Optimization. In: Proc. of ICLR (2021)
2021
-
[19]
In: Proc
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated Optimization in Heterogeneous Networks. In: Proc. of MLSys 2, 429–450 (2020)
2020
-
[20]
In: Proc
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In: Proc. of the 37th International Conference on Machine Learning (ICML 2020), Proc. Mach. Learn. Res., vol. 119, pp. 5132–
2020
-
[21]
In: Proc
Li, T., Hu, S., Beirami, A., Smith, V.: Ditto: Fair and Robust Federated Learning Through Personalization. In: Proc. of ICML 2021, Proc. Mach. Learn. Res., vol. 139, pp. 6357–6368. PMLR (2021)
2021
-
[22]
Mali, S.A., Ibrahim, A., Woodruff, H.C., Andrearczyk, V., Müller, H., Primakov, S., Salahud- din, Z., Chatterjee, A., Lambin, P.: Making radiomics more reproducible across scanner and imaging protocol variations: A review of harmonization methods. J. Pers. Med. 11(9), 842 (2021)
2021
-
[23]
Liu, M., Zhu, A.H., Maiti, P., Thomopoulos, S.I., Gadewar, S., Chai, Y., Kim, H., Jahanshad, N., Alzheimer’s Disease Neuroimaging Initiative: Style transfer generative adversarial networks to harmonize multisite MRI to a single reference image to avoid overcorrection. Hum. Brain Mapp. 44(14), 4875–4892 (2023)
2023
-
[24]
and Zhang, Richard and Zhu, Jun-Yan , title =
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive Learning for Unpaired Image- to-Image Translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Com- puter Vision – ECCV 2020, LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
-
[25]
Dewey, B.E., Zhao, C., Reinhold, J.C., Carass, A., Fitzgerald, K.C., Sotirchos, E.S., et al.: DeepHarmony: A deep learning approach to contrast harmonization across scanner changes. Magn. Reson. Imaging, 64, 160–170 (2019). https://doi.org/10.1016/j.mri.2019.05.041 10 C. Shiranthika et al
-
[26]
Cackowski, S., Barbier, E.L., Dojat, M., Christen, T.: Imunity: A generalizable VAE- GAN solution for multicenter MR image harmonization. MIA, 88, 102799 (2023). https://doi.org/10.1016/j.media.2023.102799
-
[27]
In: Proc
Jeong, H., Byun, H., Kang, D.U., Lee, J.: BlindHarmony: "Blind" Harmonization for MR Images via Flow Model. In: Proc. of ICCV, pp. 21129–21139.(2023)
2023
-
[28]
In: Proc
Huang, X., Belongie, S.: Arbitrary Style Transfer in Real-Time with Adaptive Instance Nor- malization. In: Proc. of ICCV, pp. 1501–1510. IEEE (2017)
2017
-
[29]
In: Proc
Pizzati, F., Cerri, P., De Charette, F.: CoMoGAN: Continuous Model-Guided Image-to-Image Translation. In: Proc. of CVPR, pp. 14288–14298. IEEE/CVF (2021)
2021
-
[30]
In: Proc
Hu, X., Zhou, X., Huang, Q., Shi, Z., Sun, L., Li, Q.: QS-Attn: Query-Selected Attention for Contrastive Learning in Image-to-Image Translation. In: Proc. of CVPR, pp. 18291–18300. IEEE/CVF (2022)
2022
-
[31]
In: Proc
Guo, Z., Guo, D., Zheng, H., Gu, Z., Zheng, B., Dong, J.: Image Harmonization with Trans- former. In: Proc. of ICCV, pp. 14870–14879. (2021)
2021
-
[32]
In: Proc
Dinh, C.T., Tran, N., Nguyen, J.: Personalized Federated Learning with Moreau Envelopes. In: Proc. of NeurIPS, 33, 21394–21405 (2020)
2020
-
[33]
In: Proc
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation using Cycle- Consistent Adversarial Networks. In: Proc. of ICCV, pp. 2223–2232. IEEE (2017)
2017
-
[34]
Flower: A friendly federated learning research framework.arXiv preprint arXiv:2007.14390,
Beutel, D.J., Topal, T., Mathur, A., Qiu, X., Fernandez-Marques, J., Gao, Y., Sani, L., Li, K.H., Parcollet, T., De Gusmão, P.P.B., Lane, N.D.: Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390 (2020)
-
[35]
In: Proc
Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., De Lange, T., Johansen, D., Jo- hansen, H.D.: Kvasir-SEG: A segmented polyp dataset. In: Proc. of MMM, LNCS, pp. 451–462. Springer, Cham (2019)
2019
-
[36]
IJCARS 9(2), 283–293 (2014)
Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. IJCARS 9(2), 283–293 (2014)
2014
-
[37]
Pattern Recognit 45(9), 3166–3182 (2012)
Bernal, J., Sánchez, J., Vilarino, F.: Towards Automatic Polyp Detection with a Polyp Ap- pearance Model. Pattern Recognit 45(9), 3166–3182 (2012)
2012
-
[38]
saliency maps from physicians
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM- DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput. Med. Imaging Graph 43, 99–111 (2015)
2015
-
[39]
Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multi- source dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1), 180161 (2018)
2018
-
[40]
In: Proc
Mendonça, T., Ferreira, P.M., Marques, J., Marcal, A.R.S., Rozeira, J.: PH2 - A dermoscopic image database for research and benchmarking. In: Proc. of IEEE EMBC, Osaka, Japan (2013)
2013
-
[41]
arXiv preprint arXiv:1703.00523 (2017)
Codella, N.C.F., Nguyen, Q.-B., Pankanti, S., Gutman, D.A., Helba, B., Halpern, A.C., Smith, J.R.: Skin lesion analysis toward melanoma detection: 2017 ISBI challenge. arXiv preprint arXiv:1703.00523 (2017)
-
[42]
arXiv preprint arXiv:1807.07391 (2018)
Codella, N.C.F., Rotemberg, V., Tschandl, P., Celebi, M.E., Dusza, S.W., Gutman, D.A., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M.A., Kittler, H., Halpern, A.C.: Skin lesion analysis toward melanoma detection 2018: 2017 ISIC challenge. arXiv preprint arXiv:1807.07391 (2018)
-
[43]
Medical Physics 51(4), 3110–3123 (2024)
Gómez-Flores, W., Gregorio-Calas, M.J., Coelho de Albuquerque Pereira, W.: BUS-BRA: A breast ultrasound dataset for assessing computer-aided diagnosis systems. Medical Physics 51(4), 3110–3123 (2024)
2024
-
[44]
Mendeley Data, V1 (2017)
Rodrigues, P.S., et al.: BUS_UC: Breast ultrasound dataset. Mendeley Data, V1 (2017). https://data.mendeley.com/datasets/3ksd7w7jkx/1
2017
-
[45]
Data in Brief 28, 104863 (2020)
Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data in Brief 28, 104863 (2020)
2020
-
[46]
IEEE JBHI 22(4), 1218–1226 (2018)
Yap, M.H., Pons, G., Martí, J., Ganau, S., Sentís, M., Zwiggelaar, R., Davison, A.K., Martí, R.: Automated breast ultrasound lesions detection using convolutional neural networks. IEEE JBHI 22(4), 1218–1226 (2018)
2018
-
[47]
Quantitative Imaging in Medicine and Surgery4, 475–477 (2014)
Jaeger, S., Candemir, S., Antani, S., Wáng, Y.-X.J., Lu, P.-X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quantitative Imaging in Medicine and Surgery4, 475–477 (2014)
2014
-
[48]
In: Proc
Liu, Y., Wu, Y.-H., Ban, Y., Wang, H., Cheng, M.-M.: Rethinking Computer-Aided Tuber- culosis Diagnosis. In: Proc. of CVPR, IEEE/CVF (2020) In: Proc. of CVPR, pp. 2646–2655. IEEE/CVF (2020)
2020
-
[49]
Kaggle dataset (2020)
Bhuvaji, S.: Brain tumor classification (MRI). Kaggle dataset (2020). https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri
2020
-
[50]
Kaggle dataset
rm1000: Brain tumor MRI scans. Kaggle dataset. https://www.kaggle.com/datasets/rm1000/brain-tumor-mri-scans
-
[51]
Kaggle dataset (2020)
Dubail, T.: Brain tumors (256x256). Kaggle dataset (2020). https://www.kaggle.com/datasets/thomasdubail/brain-tumors-256x256
2020
-
[52]
PLoS ONE 10(10), e0140381 (2015)
Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z., Wang, Z., Feng, Q.: Enhanced performance of brain tumor classification via tumor region augmentation and partition. PLoS ONE 10(10), e0140381 (2015)
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.