Weakly-Supervised Lung Nodule Segmentation via Training-Free Guidance of 3D Rectified Flow
Pith reviewed 2026-05-10 18:17 UTC · model grok-4.3
The pith
A frozen 3D rectified flow model produces accurate lung nodule segmentations when guided by a predictor fine-tuned only on image-level labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By pairing a pretrained 3D rectified flow model with a predictor fine-tuned solely on image-level labels and applying training-free guidance, the method achieves improved weakly-supervised segmentation of lung nodules without any retraining of the generative model. The approach produces better segmentations than baseline methods on LUNA16 and detects nodules consistently across varying sizes and shapes when tested with two different predictors.
What carries the argument
Training-free guidance of a frozen pretrained 3D rectified flow model directed by signals from a predictor fine-tuned only on image-level labels.
If this is right
- The plug-and-play combination improves segmentation quality over attribution-based baselines for lung nodules.
- The same guidance strategy works with at least two distinct predictor models without retraining the flow component.
- Nodules of varying sizes and shapes are detected more reliably than with prior weakly-supervised techniques.
- Generative foundation models can be reused across different weakly-supervised 3D medical segmentation tasks.
Where Pith is reading between the lines
- Annotation costs in clinical imaging pipelines could drop if only image-level labels are needed to adapt existing generative models.
- The training-free guidance pattern might extend to other 3D structures such as organs or lesions in CT or MRI volumes.
- Pretrained generative models could become standard reusable components for multiple downstream medical vision tasks that currently require dense labels.
Load-bearing premise
A predictor fine-tuned solely on image-level labels supplies sufficiently precise guidance signals to steer the frozen rectified flow model toward accurate voxel-level boundaries for small and variable lung nodules.
What would settle it
Evaluating the full pipeline on the LUNA16 test set and observing that segmentation metrics such as Dice scores fall below those of standard weakly-supervised baselines, especially on small nodules.
Figures
read the original abstract
Dense annotations, such as segmentation masks, are expensive and time-consuming to obtain, especially for 3D medical images where expert voxel-wise labeling is required. Weakly supervised approaches aim to address this limitation, but often rely on attribution-based methods that struggle to accurately capture small structures such as lung nodules. In this paper, we propose a weakly-supervised segmentation method for lung nodules by combining pretrained state-of-the-art rectified flow and predictor models in a plug-and-play manner. Our approach uses training-free guidance of a 3D rectified flow model, requiring only fine-tuning of the predictor using image-level labels and no retraining of the generative model. The proposed method produces improved-quality segmentations for two separate predictors, consistently detecting lung nodules of varying size and shapes. Experiments on LUNA16 demonstrate improvements over baseline methods, highlighting the potential of generative foundation models as tools for weakly supervised 3D medical image segmentation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a weakly-supervised 3D lung nodule segmentation method that combines a pretrained rectified flow generative model with a predictor fine-tuned solely on image-level labels. It uses training-free guidance of the frozen 3D flow model in a plug-and-play manner, without retraining the generative component, and claims that this produces improved segmentations on the LUNA16 dataset for two separate predictors while consistently detecting nodules of varying sizes and shapes.
Significance. If the central claim holds, the work has moderate significance by showing how pretrained generative foundation models can support weakly-supervised medical image segmentation with reduced annotation and retraining costs. The training-free guidance approach is a potential strength for practical deployment. However, the lack of quantitative metrics, ablations, or detailed baseline comparisons in the experiments makes the practical impact difficult to assess at present.
major comments (2)
- [§4] §4 (Experiments): The results on LUNA16 are described only qualitatively as 'improvements over baseline methods' and 'improved-quality segmentations' without reporting standard metrics (e.g., Dice score, IoU, sensitivity), error bars, statistical significance, or specific baseline implementations. This is load-bearing for the central claim of demonstrated improvements and consistent detection across nodule sizes/shapes.
- [§3] §3 (Method): The training-free guidance mechanism is not specified in sufficient detail to show how an image-level predictor (fine-tuned only on presence/absence labels) produces reliable voxel-level guidance signals for the frozen 3D rectified flow model, especially for small, variable lung nodules where attribution methods are noted to struggle. This transfer is the core assumption and requires explicit formulation or pseudocode.
minor comments (2)
- The abstract and introduction would benefit from a brief qualitative figure or example showing the guidance process and resulting segmentations to illustrate the plug-and-play claim.
- [§3] Notation for the rectified flow model and predictor components should be introduced consistently with equation numbers or definitions in §3 to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and will revise the paper to incorporate additional details and quantitative results where needed.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The results on LUNA16 are described only qualitatively as 'improvements over baseline methods' and 'improved-quality segmentations' without reporting standard metrics (e.g., Dice score, IoU, sensitivity), error bars, statistical significance, or specific baseline implementations. This is load-bearing for the central claim of demonstrated improvements and consistent detection across nodule sizes/shapes.
Authors: We agree that quantitative evaluation is necessary to fully support the claims of improvement. The current version focuses on qualitative visualization to highlight consistency across nodule sizes and shapes, but we will revise Section 4 to report Dice scores, IoU, sensitivity, with error bars, statistical significance tests, and explicit descriptions of the baseline methods and their implementations on LUNA16. These metrics will be added from our existing experiments to demonstrate the gains. revision: yes
-
Referee: [§3] §3 (Method): The training-free guidance mechanism is not specified in sufficient detail to show how an image-level predictor (fine-tuned only on presence/absence labels) produces reliable voxel-level guidance signals for the frozen 3D rectified flow model, especially for small, variable lung nodules where attribution methods are noted to struggle. This transfer is the core assumption and requires explicit formulation or pseudocode.
Authors: We will expand Section 3 with an explicit formulation of the training-free guidance. The image-level predictor outputs a scalar probability that is converted into a guidance gradient applied directly in the voxel space of the frozen 3D rectified flow at each sampling step; this steers the flow trajectory toward label-consistent regions without retraining the generative model. The 3D nature of the flow provides the voxel-level resolution, bypassing direct attribution limitations for small nodules by leveraging the pretrained generative prior. We will include the mathematical equations and pseudocode for the full guidance procedure in the revision. revision: yes
Circularity Check
No circularity: method uses external pretrained models and standard image-level fine-tuning
full rationale
The paper proposes a plug-and-play combination of a frozen pretrained 3D rectified flow model with a predictor that is fine-tuned only on image-level labels. No equation or derivation step reduces the claimed voxel-level segmentation output to a quantity defined by the target segmentation itself, nor does any central claim rest on a self-citation chain that is unverified or tautological. The reported improvements on LUNA16 are presented as empirical outcomes of this external-model guidance procedure rather than as a mathematical identity or a fitted prediction by construction. The approach is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pretrained state-of-the-art rectified flow models can be effectively steered for segmentation tasks using only a fine-tuned predictor without retraining the generative model.
Reference graph
Works this paper leans on
-
[1]
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021
work page 2021
-
[2]
Universal guidance for diffusion models
Bansal et al. Universal guidance for diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 843–852, 2023
work page 2023
-
[3]
A survey on gans for anomaly detection.arXiv preprint arXiv:1906.11632, 2019
Di Mattia et al. A survey on gans for anomaly detection.arXiv preprint arXiv:1906.11632, 2019
-
[4]
Dutandeetal. Deepresidualseparableconvolutionalneuralnetworkforlungtumor segmentation.Computers in biology and medicine, 141:105161, 2022
work page 2022
-
[5]
Scaling rectified flow transformers for high-resolution image synthesis
Esser et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
work page 2024
-
[6]
Maisi: Medical ai for synthetic imaging
Guo et al. Maisi: Medical ai for synthetic imaging. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 4430–4441. IEEE, 2025
work page 2025
-
[7]
Kumar et al. A flexible 2.5 d medical image segmentation approach with in-slice and cross-slice attention.Computers in Biology and Medicine, 182:109173, 2024
work page 2024
-
[8]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Liu et al. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[9]
Segment anything in medical images.Nature Communications, 15(1):654, 2024
Ma et al. Segment anything in medical images.Nature Communications, 15(1):654, 2024
work page 2024
-
[10]
Mei et al. Radimagenet: an open radiologic deep learning research dataset for effective transfer learning.Radiology: Artificial Intelligence, 4(5):e210315, 2022
work page 2022
-
[11]
Anomaly detection with conditioned denoising diffusion models
Mousakhan et al. Anomaly detection with conditioned denoising diffusion models. InDAGM German Conference on Pattern Recognition, pages 181–195. Springer, 2024
work page 2024
-
[12]
Flowchef: Steering of rectified flow models for controlled generations
Patel et al. Flowchef: Steering of rectified flow models for controlled generations. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15308–15318, 2025
work page 2025
-
[13]
What is healthy? generative counterfactual diffusion for lesion local- ization
Sanchez et al. What is healthy? generative counterfactual diffusion for lesion local- ization. InMICCAI workshop on deep generative models, pages 34–44. Springer, 2022
work page 2022
-
[14]
Schlegl et al. f-anogan: Fast unsupervised anomaly detection with generative ad- versarial networks.Medical image analysis, 54:30–44, 2019
work page 2019
-
[15]
Grad-cam: Visual explanations from deep networks via gradient- based localization
Selvaraju et al. Grad-cam: Visual explanations from deep networks via gradient- based localization. InProceedings of the IEEE international conference on com- puter vision, pages 618–626, 2017
work page 2017
-
[16]
Setio et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 chal- lenge.Medical image analysis, 42:1–13, 2017
work page 2017
-
[17]
Axiomatic attribution for deep networks
Sundararajan et al. Axiomatic attribution for deep networks. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017
work page 2017
-
[18]
Score-cam: Score-weighted visual explanations for convolutional neural networks
Wang et al. Score-cam: Score-weighted visual explanations for convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 24–25, 2020. 10 Anonymized Author et al
work page 2020
-
[19]
Wang et al. 3d meddiffusion: A 3d medical latent diffusion model for control- lable and high-quality medical image generation.IEEE Transactions on Medical Imaging, 2025
work page 2025
-
[20]
Wang et al. Weakmedsam: Weakly-supervised medical image segmentation via sam with sub-class exploration and prompt affinity mining.IEEE Transactions on Medical Imaging, 2025
work page 2025
-
[21]
Descargan: Disease-specific anomaly detection with weak supervision
Wolleb et al. Descargan: Disease-specific anomaly detection with weak supervision. InInternational conference on medical image computing and computer-assisted in- tervention, pages 14–24. Springer, 2020
work page 2020
-
[22]
Diffusion models for medical anomaly detection
Wolleb et al. Diffusion models for medical anomaly detection. InInternational Conference on Medical image computing and computer-assisted intervention, pages 35–45. Springer, 2022
work page 2022
-
[23]
Anoddpm: Anomaly detection with denoising diffusion probabilis- tic models using simplex noise
Wyatt et al. Anoddpm: Anomaly detection with denoising diffusion probabilis- tic models using simplex noise. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 650–656, 2022
work page 2022
-
[24]
Diff-unet: A diffu- sion embedded network for volumetric segmentation,
Xing et al. Diff-unet: A diffusion embedded network for volumetric segmentation. arXiv preprint arXiv:2303.10326, 2023
-
[25]
Xu et al. Medsyn: text-guided anatomy-aware synthesis of high-fidelity 3-d ct images.IEEE Transactions on Medical Imaging, 43(10):3648–3660, 2024
work page 2024
-
[26]
Ye et al. Tfg: Unified training-free guidance for diffusion models.Advances in Neural Information Processing Systems, 37:22370–22417, 2024
work page 2024
-
[27]
Freedom: Training-free energy-guided conditional diffusion model
Yu et al. Freedom: Training-free energy-guided conditional diffusion model. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 23174–23184, 2023
work page 2023
-
[28]
Multiple sclerosis lesion segmentation with tiramisu and 2.5 d stacked slices
Zhang et al. Multiple sclerosis lesion segmentation with tiramisu and 2.5 d stacked slices. InInternational Conference on Medical Image Computing and Computer- Assisted Intervention, pages 338–346. Springer, 2019
work page 2019
-
[29]
Adding conditional control to text-to-image diffusion models
Zhang et al. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023
work page 2023
-
[30]
Zhao et al. Maisi-v2: Accelerated 3d high-resolution medical image synthesis with rectified flow and region-specific contrastive loss.arXiv preprint arXiv:2508.05772, 2025
-
[31]
Learning deep features for discriminative localization
Zhou et al. Learning deep features for discriminative localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016
work page 2016
-
[32]
Cancer facts and figures 2016, 2016
American Cancer Society. Cancer facts and figures 2016, 2016
work page 2016
-
[33]
National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening.New England Journal of Medicine, 365(5):395–409, 2011
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.