Recognition: 2 theorem links
· Lean TheoremOvercoming data scarcity through multi-center federated learning for organs-at-risk segmentation in pediatric upper abdominal radiotherapy
Pith reviewed 2026-05-11 00:58 UTC · model grok-4.3
The pith
Federated learning across two centers yields OAR segmentation models for pediatric upper abdominal tumors that match local accuracy while improving cross-center robustness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using 310 postoperative CT scans from 272 patients at Utrecht and Heidelberg, the authors show that a federated nnU-Net trained with secure weight exchange produces a single model whose cross-center Dice scores exceed those of either local model by 0.003 to 0.007 while preserving in-center accuracy for at least seven of nine organs at risk and reducing false-positive kidney labels.
What carries the argument
nnU-Net framework adapted for federated learning via secure weight exchange on cloud storage across institutional firewalls
If this is right
- The federated model maintains stable performance when patient orientation varies.
- It reduces false-positive segmentations of kidneys that have been surgically removed.
- It delivers the best cross-center results across Dice, 95th-percentile Hausdorff, and mean surface distance for the nine evaluated OARs.
- It matches local-model performance for at least seven of the nine OARs on each center's own data.
Where Pith is reading between the lines
- The approach could scale to additional centers to further reduce domain shift.
- Similar federated pipelines may help other pediatric imaging tasks limited by small per-site cohorts.
- The modest Dice gains may grow in clinical value once models are deployed on larger multi-center test sets.
Load-bearing premise
The two-center dataset and the specific nnU-Net federated implementation are representative enough that the small observed gains will appear at other pediatric centers and scanner protocols.
What would settle it
Evaluation of the same federated model on CT data from a third independent pediatric center with different scanners and protocols, checking whether it still matches or beats a newly trained local model there.
Figures
read the original abstract
Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robust pediatric-specific models is hindered by data scarcity and fragmentation across centers. Federated learning (FL) enables privacy-preserving collaborative training without the need for data sharing. We evaluated the feasibility and performance of FL for developing pediatric-specific OAR segmentation models across two European medical centers. Computed tomography (CT) images from pediatric patients from Utrecht and Heidelberg with a renal tumor or abdominal neuroblastoma were retrospectively collected and locally processed. An nnU-Net-based framework segmented 19 OARs using local and FL schemes. FL was implemented with secure weight exchange on a cloud storage across institutional firewalls. Performance was assessed using the Dice similarity coefficient (DSC), 95th percentile Hausdorff distance, and mean surface distance. Robustness to patient orientation, false-positive segmentation of surgically removed kidneys, and failure cases were identified. A total of 310 postoperative CTs from 272 patients (105 renal tumors, 167 neuroblastomas) were included. Local models performed well on their respective center data but showed significantly reduced cross-center performance for four to seven of the nine evaluated OARs (DSC). In contrast, the FL model matched local performance for at least seven of nine OARs and achieved the best cross-center results across three metrics, with DSC gains of 0.003-0.007 over local models. FL also maintained stable performance across patient orientations and reduced false-positive kidney segmentations. Real-world FL improves cross-center robustness of CT-based OAR segmentation models in pediatric upper abdominal tumors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates federated learning (FL) with an nnU-Net framework for segmenting 19 organs-at-risk (OARs) on postoperative CT scans from 272 pediatric patients (310 scans total) with renal tumors or abdominal neuroblastomas across two centers (Utrecht and Heidelberg). Local models are shown to degrade on cross-center data for 4-7 of 9 evaluated OARs, while the FL model (implemented via secure weight exchange over cloud storage) matches local performance on at least 7/9 OARs and yields small DSC gains of 0.003-0.007 on cross-center tests, with additional checks for orientation robustness and false-positive kidney segmentations.
Significance. If the small observed gains prove robust, the work demonstrates practical feasibility of real-world FL for improving cross-center generalization in a privacy-sensitive, data-scarce pediatric radiotherapy setting without requiring data sharing, which could support multi-center model development.
major comments (2)
- [Abstract and Results] Abstract and Results: the central claim of improved cross-center robustness rests on DSC gains of only 0.003-0.007 with no statistical tests, confidence intervals, or patient-level variability reported, leaving unclear whether these differences are significant or clinically meaningful.
- [Methods and Results] Methods and Results: the evaluation uses data from only two centers (Utrecht and Heidelberg); the generalization claim for FL robustness would require at least one additional independent center or external validation set to rule out site-specific similarities as the source of the observed gains.
minor comments (2)
- [Abstract] Abstract states 19 OARs are segmented but only 9 are evaluated in cross-center tests; clarify the selection criteria and list the specific OARs.
- [Results] Results section lacks details on the exact nnU-Net FL implementation (e.g., aggregation method, number of communication rounds) and any hyperparameter differences between local and FL runs.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review of our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results: the central claim of improved cross-center robustness rests on DSC gains of only 0.003-0.007 with no statistical tests, confidence intervals, or patient-level variability reported, leaving unclear whether these differences are significant or clinically meaningful.
Authors: We acknowledge that the reported DSC gains are modest and that the absence of statistical testing and variability measures limits interpretation of their significance. In the revised manuscript we will add 95% confidence intervals for all DSC, HD95 and MSD values and differences; report per-patient standard deviations and ranges to illustrate variability; and include results of paired non-parametric statistical tests (Wilcoxon signed-rank test with Bonferroni correction) comparing the federated model against each local model on cross-center test sets. We will also expand the discussion to address the potential clinical relevance of these small but consistent gains in a pediatric radiotherapy context where even minor reductions in manual contouring effort are valuable. revision: yes
-
Referee: [Methods and Results] Methods and Results: the evaluation uses data from only two centers (Utrecht and Heidelberg); the generalization claim for FL robustness would require at least one additional independent center or external validation set to rule out site-specific similarities as the source of the observed gains.
Authors: We agree that claims of broad generalization are not supported by a two-center design. The two participating sites differ in scanner vendors, acquisition protocols, and patient demographics, providing a non-trivial test of cross-center performance; however, we cannot exclude the possibility that unobserved site-specific factors contribute to the observed results. In the revision we will (i) rephrase the abstract, results and conclusions to state that improved robustness is demonstrated between these two specific centers, (ii) add an explicit limitations section discussing the two-center scope and the risk of site-specific similarities, and (iii) include a supplementary analysis of inter-center imaging differences. We do not have access to data from additional centers under current approvals. revision: partial
- Requirement for at least one additional independent center or external validation set, as the study is restricted to the two centers from which retrospective data were available and obtaining further multi-center data would require new ethical approvals and collaborations outside the present work.
Circularity Check
No circularity: purely empirical comparison of local vs federated training on held-out data
full rationale
The manuscript presents an empirical evaluation of nnU-Net models trained locally versus via federated learning on a two-center dataset of 310 CT scans. Performance is measured directly via DSC, Hausdorff distance, and surface distance on cross-center held-out cases, with no equations, fitted parameters, or derivations invoked. The reported DSC gains of 0.003-0.007 are computed outputs from the experiments rather than predictions forced by any self-referential construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear in the abstract or described methods; the central claim rests on observable metric differences, not on renaming or re-deriving inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption nnU-Net framework is an appropriate base model for multi-organ CT segmentation
- domain assumption Federated averaging of model weights produces a model that generalizes across institutional data distributions
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
An nnU-Net-based framework segmented 19 OARs using local and FL schemes. FL was implemented with secure weight exchange... DSC gains of 0.003-0.007 over local models.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Federated learning (FL) enables multi-center collaboration by retaining patient data locally and sharing only model parameters
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Littooij, MD a,b, Prof
1 Overcoming data scarcity through multi-center federated learning for organs-at-risk segmentation in pediatric upper abdominal radiotherapy Mianyong Ding, Msca,b, Maximilian Knoll, MDd, Semi Harrabi, MDd, Martine van Grotel, MD a, Annemieke S. Littooij, MD a,b, Prof. Max van Noesel, MDa,e, Prof Jens-Peter Schenk, MDf, Prof Marry M. van den Heuvel-Eibrink...
2016
-
[2]
The HEI-cohort comprised 82 CTs from 57 similar patients acquired between 2017 and
2017
-
[3]
2 van den Heuvel-Eibrink MM, Hol JA, Pritchard-Jones K, van Tinteren H, Furtwängler R, Verschuur AC et al
CA Cancer J Clin 2019; 69: 7–34. 2 van den Heuvel-Eibrink MM, Hol JA, Pritchard-Jones K, van Tinteren H, Furtwängler R, Verschuur AC et al. Rationale for the treatment of Wilms tumour in the UMBRELLA SIOP–RTSG 2016 protocol. nature.comMM Van Den Heuvel-eibrink, JA Hol, K Pritchard-Jones, H Van Tinteren, R FurtwänglerNature Reviews Urology, 2017•nature.com
2019
-
[4]
3 Janssens GO, Melchior P, Mul J, Saunders D, Bolle S, Cameron AL et al
doi:10.1038/nrurol.2017.163. 3 Janssens GO, Melchior P, Mul J, Saunders D, Bolle S, Cameron AL et al. The SIOP-Renal Tumour Study Group consensus statement on flank target volume delineation for highly conformal radiotherapy. Lancet Child Adolesc Health 2020; 4: 846–852. 4 Ding M, Maspero M, Harrabi S, Jouglar E, Vennarini S, Spencer T et al. Impact of de...
-
[5]
Multicentre evaluation of deep learning CT autosegmentation of the head and neck region for radiotherapy
6 Pang EPP, Tan HQ, Wang F, Niemelä J, Bolard G, Ramadan S et al. Multicentre evaluation of deep learning CT autosegmentation of the head and neck region for radiotherapy. NPJ Digit Med 2025; 8: 1–11. 7 Choi MS, Chang JS, Kim K, Kim JH, Kim TH, Kim S et al. Assessment of deep learning-based auto-contouring on interobserver consistency in target volume and...
2025
-
[6]
Communication-efficient learning of deep networks from decentralized data,
12 Ding M, Maspero M, Littooij AS, van Grotel M, Fajardo RD, van Noesel MM et al. Deep learning-based auto-contouring of organs/structures-at-risk for pediatric upper abdominal radiotherapy. Radiotherapy and Oncology 2025; 208: 110914. 13 Janssens GO, Timmermann B, Laprie A, Mandeville H, Padovani L, Chargari C et al. The organization of care in pediatric...
-
[7]
18 Lee EH, Han M, Wright J, Kuwabara M, Mevorach J, Fu G et al
doi:10.1148/RYAI.240485. 18 Lee EH, Han M, Wright J, Kuwabara M, Mevorach J, Fu G et al. An international study presenting a federated learning AI platform for pediatric brain tumors. Nat Commun 2024; 15:
-
[8]
Federated brain tumor segmentation: An extensive benchmark
19 Manthe M, Duffner S, Lartizien C. Federated brain tumor segmentation: An extensive benchmark. Med Image Anal 2024; 97: 103270. 21 20 Cao K, Zou Y , Zhang C, Zhang W, Zhang J, Wang G et al. A multicenter bladder cancer MRI dataset and baseline evaluation of federated learning in clinical application. Scientific Data 2024; 11: 1–10. 21 Teo ZL, Jin L, Li ...
2024
-
[9]
26 Somasundaram E, Taylor Z, Alves V V ., Qiu L, Fortson BL, Mahalingam N et al
doi:10.48550/arXiv.2210.13291. 26 Somasundaram E, Taylor Z, Alves V V ., Qiu L, Fortson BL, Mahalingam N et al. Deep Learning Models for Abdominal CT Organ Segmentation in Children: Development and Validation in Internal and Heterogeneous Public Datasets. https://www.ajronline.org/
-
[10]
27 Thibodeau-Antonacci A, Popovic M, Ates O, Hua CH, Schneider J, Skamene S et al
doi:10.2214/AJR.24.30931. 27 Thibodeau-Antonacci A, Popovic M, Ates O, Hua CH, Schneider J, Skamene S et al. Trade-off of different deep learning-based auto-segmentation approaches for treatment planning of pediatric craniospinal irradiation autocontouring of OARs for pediatric CSI. Med Phys 2025; 52: 3541–3556. 28 Xu X, Deng HH, Gateno J, Yan P. Federate...
-
[11]
Federated learning with knowledge distillation for multi-organ segmentation with partially labeled datasets
29 Kim S, Park H, Kang M, Jin KH, Adeli E, Pohl KM et al. Federated learning with knowledge distillation for multi-organ segmentation with partially labeled datasets. Med Image Anal 2024; 95: 103156. 30 Schoenpflug LA, Benavides RB, Nowak M, Sheikhzadeh F, Moayyedi A, Wasag K et al. Navigating real-world challenges: A case study on federated learning in c...
2024
-
[12]
Efficiency Optimization Techniques in Privacy-Preserving Federated Learning With Homomorphic Encryption: A Brief Survey
33 Xie Q, Jiang S, Jiang L, Huang Y , Zhao Z, Khan S et al. Efficiency Optimization Techniques in Privacy-Preserving Federated Learning With Homomorphic Encryption: A Brief Survey. IEEE Internet Things J 2024; 11: 24569–24580. 34 Jere MS, Farnan T, Koushanfar F. A Taxonomy of Attacks on Federated Learning. IEEE Secur Priv 2021; 19: 20–28
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.