arxiv: 2604.26324 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation

Martina Pavan , Matteo Caligiuri , Francesco Barbato , Pietro Zanuttigh

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords federated learningmedical image classificationsynthetic data generationclass imbalancedomain shiftprivacy preservationheterogeneous data

0 comments

The pith

Generating and distributing synthetic samples in federated learning improves medical image classification under class and domain imbalance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FedSSG, a federated learning framework that generates synthetic samples to cover rare pathologies and varied imaging devices, then distributes them to clients. This setup tackles the privacy barriers that prevent pooling medical data across hospitals while addressing the resulting imbalances in class frequency and scanner characteristics. A sympathetic reader would care because standard federated models often fail on underrepresented conditions or new equipment, limiting reliable diagnostic tools. The method claims to deliver better accuracy and generalization with only small added computation at each participating site.

Core claim

By creating synthetic samples that fill gaps in pathology representation and imaging domain coverage, then sharing those samples across clients in the federated process, the global model learns more balanced and robust features from siloed real data alone.

What carries the argument

The synthetic sample generation and distribution strategy inside the FedSSG federated framework, which augments local training sets to reduce class and domain imbalance.

If this is right

Accuracy on rare pathologies rises because their coverage is artificially increased during training.
Models generalize better to images from unseen imaging devices or protocols.
Privacy constraints remain satisfied since only synthetic data crosses institutional boundaries.
Client-side training cost stays low because synthetic generation occurs centrally or with limited local effort.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same synthetic-distribution idea could be tested in other privacy-constrained domains such as financial fraud detection where class imbalance is common.
If synthetic quality scales with model size, the approach might reduce reliance on collecting ever-larger real medical datasets.
A direct follow-up experiment would measure how performance changes when the proportion of synthetic samples is varied while holding real data fixed.

Load-bearing premise

Synthetic samples can be produced and shared so that they accurately represent missing pathologies and device variations without adding artifacts that lower performance on real images.

What would settle it

If a controlled experiment shows that models trained with the distributed synthetics achieve equal or lower accuracy on held-out real images from diverse institutions and rare classes compared to plain federated learning, the benefit would be refuted.

Figures

Figures reproduced from arXiv: 2604.26324 by Francesco Barbato, Martina Pavan, Matteo Caligiuri, Pietro Zanuttigh.

**Figure 1.** Figure 1: Architecture of our federated multi-domain classification approach view at source ↗

**Figure 2.** Figure 2: Samples from ISIC (top) and their generated counterparts (bottom). view at source ↗

**Figure 3.** Figure 3: Examples of the nevus class acquired using different imaging devices. Since the same lesion may appear multiple times in the dataset under varying conditions – such as different zoom levels, lighting, or acquisition settings – we retained only one representative image per unique lesion. When multiple images of the same lesion were available, a single image was randomly selected to avoid redundancy and redu… view at source ↗

read the original abstract

Exploiting deep learning in medical imaging faces critical challenges, including strict privacy constraints, heterogeneous imaging devices with varying acquisition properties, and class imbalance due to the uneven prevalence of pathologies. In this work, we propose FedSSG, a novel Federated Learning framework that addresses domain shifts caused by diverse imaging devices while mitigating the under-representation of rare pathologies. The key contribution is a strategy for generating synthetic samples and distributing them across clients to improve coverage of both underrepresented pathologies and imaging devices. Experimental results demonstrate that our approach significantly enhances model performance and generalization across heterogeneous institutions, with minimal computational overhead at the client side.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedSSG generates and distributes synthetic samples inside federated learning to cover rare classes and different imaging domains, but the gains are not clearly linked to accurate distribution matching.

read the letter

The paper's central move is FedSSG, a federated setup that creates synthetic samples and shares them across clients to handle both underrepresented pathologies and scanner-specific domain shifts in medical image classification. The abstract positions this as a single mechanism that improves performance and generalization while keeping client compute low. That combination is the main new element: prior federated medical work often tackles privacy or heterogeneity separately, and class imbalance is usually handled with standard reweighting or oversampling rather than targeted synthetic distribution inside the FL round. The framing is practical and directly names the barriers that stop models from working across hospitals. If the synthetics actually fill the gaps without adding noise, the low-overhead claim would be useful for real deployments. The paper earns credit for trying to solve two problems at once without moving real data. The soft spot is the missing link between the synthetics and the claimed coverage. The stress-test concern lands: the abstract asserts better results from improved representation of rare classes and domains, yet supplies no distribution-matching checks, no pathology-specific fidelity metrics, and no ablation where poor synthetics are shown to hurt. Without those, the accuracy lift could come from plain data augmentation rather than faithful balancing. Dataset details, exact baselines, and external test-set results are also absent at this level, so the causal story stays untested. This is for groups already working on federated medical imaging or multi-center imbalance problems. Someone building FL pipelines for pathology detection would find the idea worth reading, even if they end up re-implementing the synthetic step with stronger validation. It deserves peer review because the problem is real and the framework is concrete enough to critique and improve; referees can ask for the missing fidelity experiments and full protocol.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces FedSSG, a federated learning framework for medical image classification that generates and distributes synthetic samples across clients to address class imbalance (rare pathologies) and domain shifts (heterogeneous imaging devices), claiming significant gains in performance and generalization with minimal client-side overhead.

Significance. If the synthetic samples faithfully cover underrepresented distributions without artifacts or shifts, the approach could advance privacy-preserving FL for medical imaging by improving robustness to real-world imbalances; the low client overhead is a practical strength if reproducible.

major comments (2)

Abstract and Experimental Results: the claim of 'significantly enhances model performance' is unsupported by any reported metrics, baselines, datasets, or protocol details, so the central performance claim cannot be evaluated.
Synthetic Sample Generation and Experiments: no distribution-matching metrics (e.g., FID, MMD, or pathology-specific statistics), ablation on synthetic quality, or external real-test-set results are described to confirm that synthetics improve coverage rather than acting as generic augmentation; this is load-bearing for attributing gains to the proposed mechanism.

minor comments (2)

Method section: provide the exact generative architecture, training procedure for synthetics, and how they are distributed in the federated rounds to allow reproduction of the 'minimal computational overhead' claim.
Notation and terminology: define 'FedSSG' and all acronyms at first use; ensure consistent reference to 'synthetic samples' versus 'real samples' throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We will address the points raised by providing more detailed experimental information and validation metrics in the revised version.

read point-by-point responses

Referee: Abstract and Experimental Results: the claim of 'significantly enhances model performance' is unsupported by any reported metrics, baselines, datasets, or protocol details, so the central performance claim cannot be evaluated.

Authors: We agree that the abstract lacks specific details to support the performance claim. Although the full manuscript describes the experiments, we will revise the abstract to include key metrics (e.g., accuracy and F1-score improvements), mention the datasets and baselines, and outline the evaluation protocol. This will allow readers to evaluate the claims more readily. revision: yes
Referee: Synthetic Sample Generation and Experiments: no distribution-matching metrics (e.g., FID, MMD, or pathology-specific statistics), ablation on synthetic quality, or external real-test-set results are described to confirm that synthetics improve coverage rather than acting as generic augmentation; this is load-bearing for attributing gains to the proposed mechanism.

Authors: We understand the need for rigorous validation of the synthetic samples. We will incorporate distribution-matching metrics such as FID and MMD in the revised manuscript to quantify how well the synthetics match the real data distributions. Additionally, we will include ablations on synthetic sample quality and report results on external real test sets to demonstrate that the performance gains are attributable to improved coverage of rare pathologies and domain variations rather than generic data augmentation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes the FedSSG framework for federated learning with synthetic sample generation to handle class/domain imbalance in medical imaging. The abstract and description outline a strategy for generating and distributing synthetic samples, supported by experimental results on performance gains. No equations, fitted parameters called predictions, self-definitional steps, or load-bearing self-citations are present in the provided text. The central claims rest on empirical validation rather than any derivation that reduces to its own inputs by construction, making the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified premise that synthetic samples can stand in for missing real data distributions. No free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Synthetic samples can be generated that usefully represent both rare pathologies and scanner-specific domain characteristics.
This is the load-bearing premise stated in the abstract for the entire FedSSG strategy.

invented entities (1)

FedSSG framework no independent evidence
purpose: Mechanism to generate and distribute synthetic samples across federated clients.
Newly introduced method name and procedure.

pith-pipeline@v0.9.0 · 5401 in / 1143 out tokens · 69792 ms · 2026-05-07T13:47:06.052103+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 11 canonical work pages

[1]

Flower: A friendly federated learning research framework.arXiv preprint arXiv:2007.14390,

Beutel, D.J., Topal, T., Mathur, A., Qiu, X., Fernandez-Marques, J., Gao, Y., Sani, L., Kwing, H.L., Parcollet, T., Gusmão, P.P.d., Lane, N.D.: Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390 (2020)

work page arXiv 2007
[2]

Information11(2) (2020)

Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: Fast and flexible image augmentations. Information11(2) (2020)

2020
[3]

arXiv preprint arXiv:2508.03356 (2025)

Caligiuri, M., Barbato, F., Shenaj, D., Michieli, U., Zanuttigh, P.: Fedpromo: Fed- erated lightweight proxy models at the edge bring new domains to foundation models. arXiv preprint arXiv:2508.03356 (2025)

work page arXiv 2025
[4]

In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023)

Chen, H.Y., Tu, C.H., Li, Z., Shen, H.W., Chao, W.L.: On the importance and applicability of pre-training for federated learning. In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023)

2023
[5]

In: 2009 IEEE conference on computer vision and pattern recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255 (2009)

2009
[6]

Pattern Recognition151, 110424 (2024)

Guan, H., Yap, P.T., Bozoki, A., Liu, M.: Federated learning for med- ical image analysis: A survey. Pattern Recognition151, 110424 (2024). https://doi.org/10.1016/j.patcog.2024.110424

work page doi:10.1016/j.patcog.2024.110424 2024
[7]

Scientific Data11(1), 641 (2024)

Hernández-Pérez, C., Combalia, M., Podlipnik, S., Codella, N.C., Rotemberg, V., Halpern, A.C., Reiter, O., Carrera, C., Barreiro, A., Helba, B., Puig, S., Vilaplana, V., Malvehy, J.: Bcn20000: Dermoscopic lesions in the wild. Scientific Data11(1), 641 (2024). https://doi.org/10.1038/s41597-024-03387-w

work page doi:10.1038/s41597-024-03387-w 2024
[8]

(Ac- cessed: 2025-10-22), https://www.isic-archive.com/

ISIC Collaboration: International Skin Imaging Collaboration (ISIC) Archive. (Ac- cessed: 2025-10-22), https://www.isic-archive.com/

2025
[9]

Foundations and Trends®in Machine Learning 14(1–2), 1–210 (2021)

Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and open problems in federated learning. Foundations and Trends®in Machine Learning 14(1–2), 1–210 (2021)

2021
[10]

Nature Machine Intelligence 3(6), 305–311 (2021)

Kaissis, G., Makowski, M.R., Rückert, D., Braren, R.F.: Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence 3(6), 305–311 (2021)

2021
[11]

Nature Machine Intelligence2(6), 305–311 (2020)

Kaissis, G.A., Makowski, M.R., Rückert, D., Braren, R.F.: Secure, privacy- preserving and federated machine learning in medical imaging. Nature Machine Intelligence2(6), 305–311 (2020)

2020
[12]

PLOS ONE20(7), e0326579 (2025)

Kamran, H., Hussain, S.J., Latif, S., Soomro, I.A., Alnfiai, M.M., Alotaibi, N.N.: Fedgan: Federated diabetic retinopathy image generation. PLOS ONE20(7), e0326579 (2025). https://doi.org/10.1371/journal.pone.0326579

work page doi:10.1371/journal.pone.0326579 2025
[13]

In: International Conference on Machine Learning

Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: Stochastic controlled averaging for federated learning. In: International Conference on Machine Learning. pp. 5132–5143. PMLR (2020)

2020
[14]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Q., He, B., Song, D.: Model-contrastive federated learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10713–10722 (2021)

2021
[15]

arXiv preprint arXiv:2406.12844 (2024)

Li, S., Ye, F., Fang, M., Zhao, J., Chan, Y.H., Ngai, E.C.H., Voigt, T.: Syn- ergizing foundation models and federated learning: A survey. arXiv preprint arXiv:2406.12844 (2024)

work page arXiv 2024
[16]

Proceedings of Machine Learning and Systems2, 429–450 (2020) Federated Unbalanced Classification via Generative Sampling 15

Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems2, 429–450 (2020) Federated Unbalanced Classification via Generative Sampling 15

2020
[17]

Medical image analysis42, 60–88 (2017)

Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Medical image analysis42, 60–88 (2017)

2017
[18]

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s (2022), https://arxiv.org/abs/2201.03545

work page arXiv 2022
[19]

In: International Conference on Learning Representations (ICLR) (2017)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2017)

2017
[20]

In: International Conference on Learning Representations (ICLR) (2017)

Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) (2017)

2017
[21]

In: Artificial Intelligence and Statistics

McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics. pp. 1273–1282. PMLR (2017)

2017
[22]

Diagnostics13(9), 1532 (2023)

Nazir, S., Kaleem, M.: Federated learning for medical image analysis with deep neural networks. Diagnostics13(9), 1532 (2023)

2023
[23]

In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023)

Nguyen, J., Wang, J., Malik, K., Sanjabi, M., Rabbat, M.: Where to begin? on the impact of pre-training and initialization in federated learning. In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023)

2023
[24]

Perez, E., Strub, F., de Vries, H., Dumoulin, V., Courville, A.: Film: Visual rea- soning with a general conditioning layer (2017), https://arxiv.org/abs/1709.07871

work page Pith review arXiv 2017
[25]

Scientific reports10(1), 1–12 (2020)

Sheller, M.J., Edwards, B., Reina, D.G., Martin, J., Pati, S., Kotrotsou, A., Milchenko, M., Xu, W., Marcus, D., Colen, R.R., Bakas, S.: Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific reports10(1), 1–12 (2020)

2020
[26]

Annual review of biomedical engineering19, 221–248 (2017)

Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annual review of biomedical engineering19, 221–248 (2017)

2017
[27]

In: IEEE/CVF Win- ter Conference on Applications of Computer Vision

Shenaj, D., Fanì, E., Toldo, M., Caldarola, D., Tavera, A., Michieli, U., Ciccone, M., Zanuttigh, P., Caputo, B.: Learning across domains and devices: Style-driven source-free domain adaptation in clustered federated learning. In: IEEE/CVF Win- ter Conference on Applications of Computer Vision. pp. 444–454 (2023)

2023
[28]

IEEE Access (2023)

Shenaj, D., Rizzoli, G., Zanuttigh, P.: Federated learning in computer vision. IEEE Access (2023)

2023
[29]

In: International Conference on Machine Learning

Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)

2019
[30]

Scientific Data5(1), 180161 (Aug 2018)

Tschandl, P., Rosendahl, C., Kittler, H.: The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data5(1), 180161 (2018). https://doi.org/10.1038/sdata.2018.161

work page doi:10.1038/sdata.2018.161 2018
[31]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Wang, X., Yu, R., Wu, J., Gu, K., Liu, C., Dong, W., Loy, C.C., Qiao, Y.: Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1905–1914 (2021)

1905
[32]

In: Medical Image Computing and Computer Assisted Intervention – MICCAI

Wu, N., Yu, L., Yang, X., Cheng, K.T., Yan, Z.: Fediic: Towards robust feder- ated learning for class-imbalanced medical image classification. In: Medical Image Computing and Computer Assisted Intervention – MICCAI. pp. 692–702. Springer (2023). https://doi.org/10.1007/978-3-031-43895-0_65

work page doi:10.1007/978-3-031-43895-0_65 2023
[33]

arXiv preprint arXiv:2106.00645 (2021)

Zhou, Y., Yao, Z., Xu, X., Yang, Y.: Fedbn: Federated learning on non-iid features via local batch normalization. arXiv preprint arXiv:2106.00645 (2021)

work page arXiv 2021
[34]

arXiv preprint (2025)

Zhou, Z., Luo, G., Chen, M., Weng, Z., Zhu, Y.: Federated learning for medical image classification: A comprehensive benchmark. arXiv preprint (2025)

2025
[35]

arXiv (2025)

Zhuang, W., Chen, C., Li, J., Chen, C., Jin, Y., Lyu, L.: When foundation model meets federated learning: Motivations, challenges, and future directions. arXiv (2025)

2025