pith. sign in

arxiv: 2605.08161 · v1 · submitted 2026-05-04 · 💻 cs.CV

Advanced Tumor Segmentation in PET/CT Imaging: A Training Strategy Study with nnU-Net for AutoPET III

Pith reviewed 2026-05-12 01:13 UTC · model grok-4.3

classification 💻 cs.CV
keywords tumor segmentationPET/CT imagingnnU-NetAutoPET challengedata augmentationintensity normalizationdeep learningmedical imaging
0
0 comments X

The pith

Training strategies in nnU-Net raise PET/CT tumor segmentation Dice score to 0.80 and third place in AutoPET III.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how intensity normalization, batch Dice optimization, and CraveMix augmentation affect nnU-Net performance on whole-body tumor segmentation in PET/CT scans. These choices are evaluated on the AutoPET III challenge data to see if they cut false positives and handle differences in lesion size, contrast, and location. The best combination reaches a Dice score of 0.80 on the preliminary test set. If the strategies hold up, automated segmentation becomes more dependable for disease assessment and treatment planning across varied scanners and tracers. The work also reports a third-place ranking in the challenge.

Core claim

Using the nnU-Net framework with a ResNet-based encoder as baseline, the authors systematically vary intensity normalization, batch Dice optimization, and CraveMix data augmentation. These adjustments reduce false positives and increase robustness to lesion variability in multi-center, multi-tracer settings. The strongest configuration reaches a Dice score of 0.80 on the preliminary test phase and places third in the AutoPET III challenge.

What carries the argument

nnU-Net with ResNet encoder plus intensity normalization, batch Dice loss optimization, and CraveMix augmentation that together steer training toward lower false positives and greater robustness across lesion types.

If this is right

  • Lower false positive rate in the output segmentations
  • Greater stability when lesions vary in size, contrast, and body location
  • Improved generalization across PET tracers and imaging centers
  • More consistent automated support for disease evaluation and treatment planning
  • Third-place ranking among AutoPET III submissions

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same strategy combinations could be tested on other whole-body imaging tasks such as lymphoma or metastasis detection
  • Clinical deployment would require prospective validation on streaming hospital data to confirm time savings over manual contours
  • The reported ranking invites direct comparison with the top two entries to isolate which training choices drove the gap
  • If the augmentation and normalization steps prove stable, they could serve as a reusable recipe for future nnU-Net challenges in hybrid imaging

Load-bearing premise

The chosen training strategies will generalize to unseen multi-center data and different PET tracers without substantial performance drop or overfitting to the training distribution.

What would settle it

A clear drop in Dice score below 0.70 when the same model is evaluated on a new external dataset collected at different centers with a different PET tracer.

read the original abstract

Tumor segmentation in whole-body PET/CT imaging is crucial for precise disease evaluation and treatment planning. However, it remains challenging due to variability in lesion size, contrast, and anatomical distribution. Relying on manual segmentation makes the process time-consuming and prone to intra- and inter-observer variability. This work presents a whole-body tumor segmentation method developed for the AutoPET III challenge, where the goal is to build models that generalize across tracers and multi-center data. We employ the nnU-Net framework with a ResNet-based encoder as our baseline and systematically investigate the impact of training strategies, including intensity normalization, batch dice optimization, and data augmentation using CraveMix. Our experiments show that these strategies significantly influence model performance, particularly in reducing false positives and improving robustness to lesion variability. The best-performing configuration achieves a Dice score of up to 0.80 on the preliminary test phase, and our method ranked third in the AutoPET III challenge. The code is publicly available here.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a whole-body tumor segmentation method for the AutoPET III challenge based on the nnU-Net framework with a ResNet encoder. It systematically examines the effects of three training strategies—intensity normalization, batch Dice optimization, and CraveMix augmentation—claiming these choices significantly improve performance by reducing false positives and increasing robustness to lesion variability. The best configuration reaches a Dice score of 0.80 on the preliminary test phase and places third in the challenge; code is released publicly.

Significance. If the performance gains can be rigorously attributed to the listed strategies through controlled experiments, the work would supply useful empirical guidance for nnU-Net users facing multi-tracer, multi-center PET/CT data. The public code release is a clear positive that aids reproducibility.

major comments (2)
  1. Abstract: the central claim that intensity normalization, batch Dice optimization, and CraveMix 'significantly influence model performance, particularly in reducing false positives' is unsupported by any ablation results. The manuscript reports only the final Dice score of 0.80 and the challenge ranking; it contains no tables or figures that compare baseline nnU-Net against each strategy (or their combinations) on the same validation split, nor any precision, false-positive counts, or per-component deltas.
  2. The manuscript provides no quantitative details on hyperparameter values, data-split statistics, statistical significance tests, or error bars, preventing verification that the reported ranking is attributable to the proposed strategies rather than the base nnU-Net pipeline or dataset characteristics.
minor comments (2)
  1. The CraveMix augmentation is mentioned without a reference or implementation details, which reduces clarity for readers unfamiliar with the method.
  2. Exact training hyperparameters (learning rate, batch size, normalization parameters, etc.) and the precise composition of training/validation/test splits are not stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We agree that the current manuscript lacks the necessary ablation studies and quantitative details to fully support the claims regarding the impact of the training strategies. We will revise the manuscript to include these elements for improved rigor and reproducibility.

read point-by-point responses
  1. Referee: Abstract: the central claim that intensity normalization, batch Dice optimization, and CraveMix 'significantly influence model performance, particularly in reducing false positives' is unsupported by any ablation results. The manuscript reports only the final Dice score of 0.80 and the challenge ranking; it contains no tables or figures that compare baseline nnU-Net against each strategy (or their combinations) on the same validation split, nor any precision, false-positive counts, or per-component deltas.

    Authors: We acknowledge that the abstract's claim is not supported by explicit ablation results in the current manuscript, which reports only the final performance and ranking. In the revised version, we will add a dedicated ablation study section with tables and figures comparing the baseline nnU-Net (with ResNet encoder) against configurations using intensity normalization, batch Dice loss, and CraveMix augmentation individually and in combination. These will be evaluated on the same validation split and will include additional metrics such as precision, false-positive counts, and qualitative examples demonstrating reduced false positives. revision: yes

  2. Referee: The manuscript provides no quantitative details on hyperparameter values, data-split statistics, statistical significance tests, or error bars, preventing verification that the reported ranking is attributable to the proposed strategies rather than the base nnU-Net pipeline or dataset characteristics.

    Authors: We agree that the absence of these details limits the ability to verify the contributions of the proposed strategies. In the revision, we will expand the methods and experimental sections to include: specific hyperparameter values for the nnU-Net training (e.g., learning rate, batch size, patch size), data-split statistics (number of cases per split, tracer and center distributions), any statistical significance tests performed, and error bars or standard deviations from repeated experiments or cross-validation where available. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is an empirical study applying the nnU-Net framework to PET/CT tumor segmentation for the AutoPET III challenge. It reports experimental outcomes (Dice scores up to 0.80 and third-place ranking) obtained by training on provided data and evaluating on an external preliminary test phase. No equations, derivations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the text. Claims about the influence of intensity normalization, batch Dice loss, and CraveMix augmentation rest on reported configurations rather than any closed-form reduction to the same inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on the standard assumptions of the nnU-Net framework and supervised deep learning for segmentation; no new free parameters, axioms, or invented entities are introduced beyond routine training choices.

pith-pipeline@v0.9.0 · 5468 in / 1227 out tokens · 63917 ms · 2026-05-12T01:13:45.392160+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    2022 , note=

    nnU-Net for Automated Lesion Segmentation in Whole-body FDG-PET/CT , author=. 2022 , note=

  2. [2]

    arXiv preprint arXiv:2209.01112 , year=

    Autopet challenge: Combining nn-unet with swin unetr augmented by maximum intensity projection classifier , author=. arXiv preprint arXiv:2209.01112 , year=

  3. [3]

    arXiv preprint arXiv:2210.07490 , year=

    Exploring vanilla u-net for lesion segmentation from whole-body fdg-pet/ct scans , author=. arXiv preprint arXiv:2210.07490 , year=

  4. [4]

    nnU-Net for brain tumor segmentation , author=. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers, Part II 6 , pages=. 2021 , organization=

  5. [5]

    NeuroImage , volume=

    CarveMix: a simple data augmentation method for brain lesion segmentation , author=. NeuroImage , volume=. 2023 , publisher=

  6. [6]

    arXiv preprint arXiv:2309.13747 , year=

    Look Ma, no code: fine tuning nnU-Net for the AutoPET II challenge by only adjusting its JSON plans , author=. arXiv preprint arXiv:2309.13747 , year=

  7. [7]

    Nature methods , volume=

    nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation , author=. Nature methods , volume=. 2021 , publisher=

  8. [8]

    Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=

    U-net: Convolutional networks for biomedical image segmentation , author=. Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=. 2015 , organization=

  9. [9]

    arXiv preprint arXiv:2404.09556 , year=

    nnu-net revisited: A call for rigorous validation in 3d medical image segmentation , author=. arXiv preprint arXiv:2404.09556 , year=

  10. [10]

    2022 , howpublished =

    FDG-PET-CT-Lesions , author =. 2022 , howpublished =

  11. [11]

    Nature Machine Intelligence , volume =

    Results from the autoPET challenge on fully automated lesion segmentation in whole-body FDG-PET/CT , author =. Nature Machine Intelligence , volume =. 2024 , doi =