pith. sign in

arxiv: 2604.08313 · v2 · submitted 2026-04-09 · 💻 cs.CV

Weakly-Supervised Lung Nodule Segmentation via Training-Free Guidance of 3D Rectified Flow

Pith reviewed 2026-05-10 18:17 UTC · model grok-4.3

classification 💻 cs.CV
keywords weakly-supervised segmentationlung nodule3D rectified flowtraining-free guidancemedical image segmentationimage-level labelsLUNA16
0
0 comments X

The pith

A frozen 3D rectified flow model produces accurate lung nodule segmentations when guided by a predictor fine-tuned only on image-level labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Creating detailed 3D segmentation masks for medical images requires expensive expert voxel labeling. The paper demonstrates that a pretrained rectified flow model can be kept frozen and steered via training-free guidance from a separate predictor that needs only image-level labels for its fine-tuning. This plug-and-play combination yields higher-quality nodule segmentations than standard weakly-supervised baselines. The method works for multiple predictor architectures and reliably locates nodules of different sizes and shapes. Results on the LUNA16 dataset support the claim that generative models can serve as reusable backbones for weakly-supervised 3D medical segmentation.

Core claim

By pairing a pretrained 3D rectified flow model with a predictor fine-tuned solely on image-level labels and applying training-free guidance, the method achieves improved weakly-supervised segmentation of lung nodules without any retraining of the generative model. The approach produces better segmentations than baseline methods on LUNA16 and detects nodules consistently across varying sizes and shapes when tested with two different predictors.

What carries the argument

Training-free guidance of a frozen pretrained 3D rectified flow model directed by signals from a predictor fine-tuned only on image-level labels.

If this is right

  • The plug-and-play combination improves segmentation quality over attribution-based baselines for lung nodules.
  • The same guidance strategy works with at least two distinct predictor models without retraining the flow component.
  • Nodules of varying sizes and shapes are detected more reliably than with prior weakly-supervised techniques.
  • Generative foundation models can be reused across different weakly-supervised 3D medical segmentation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Annotation costs in clinical imaging pipelines could drop if only image-level labels are needed to adapt existing generative models.
  • The training-free guidance pattern might extend to other 3D structures such as organs or lesions in CT or MRI volumes.
  • Pretrained generative models could become standard reusable components for multiple downstream medical vision tasks that currently require dense labels.

Load-bearing premise

A predictor fine-tuned solely on image-level labels supplies sufficiently precise guidance signals to steer the frozen rectified flow model toward accurate voxel-level boundaries for small and variable lung nodules.

What would settle it

Evaluating the full pipeline on the LUNA16 test set and observing that segmentation metrics such as Dice scores fall below those of standard weakly-supervised baselines, especially on small nodules.

Figures

Figures reproduced from arXiv: 2604.08313 by Fredrik Kahl, Jennifer Alv\'en, Richard Petersen.

Figure 1
Figure 1. Figure 1: Overview of the weakly-supervised segmentation (WSS) framework. A predictor-guided rectified flow model generates a counterfactual reconstruction, and the residual image with respect to the input yields the segmentation mask. 2 Method Our method leverages pretrained foundation models in a plug-and-play man￾ner to extract weakly-supervised segmentations of lung nodules. Specifically, we combine MAISI-v2, a … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed training-free guidance (TFG) framework for predictor-guided rectified flow in latent space, performed at inference. The symbol indicates that the models are frozen. Training-free guidance. In order to avoid costly retraining of the generative model, we leverage the TFG framework [2,26,27], which enables guiding an ar￾bitrary generative model using a predictor model, rather than tra… view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparisons of WSS on LUNA16 for the MedSAM TinyVit predictor. Success and failure cases in green and red frame, respectively. The proposed method suppresses the lung nodules in the guided reconstruction (Columns 1-2), resulting in WSS that closely match the shape and size of the ground-truth masks (Columns 3- 4). The CAM-based methods generally over-segments nodules, producing masks that extend bey… view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparisons of the WSS on LUNA16 for the RadImgNet ResNet50 pre￾dictor. Similar trends can be observed with a CNN-based predictor, where the proposed method produces masks that more closely follow the ground-truth nodule boundaries compared to the baseline methods. Experimental results. We evaluate the proposed WSS method in a plug-and-play setting where the predictor is pretrained and kept fixed. F… view at source ↗
read the original abstract

Dense annotations, such as segmentation masks, are expensive and time-consuming to obtain, especially for 3D medical images where expert voxel-wise labeling is required. Weakly supervised approaches aim to address this limitation, but often rely on attribution-based methods that struggle to accurately capture small structures such as lung nodules. In this paper, we propose a weakly-supervised segmentation method for lung nodules by combining pretrained state-of-the-art rectified flow and predictor models in a plug-and-play manner. Our approach uses training-free guidance of a 3D rectified flow model, requiring only fine-tuning of the predictor using image-level labels and no retraining of the generative model. The proposed method produces improved-quality segmentations for two separate predictors, consistently detecting lung nodules of varying size and shapes. Experiments on LUNA16 demonstrate improvements over baseline methods, highlighting the potential of generative foundation models as tools for weakly supervised 3D medical image segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a weakly-supervised 3D lung nodule segmentation method that combines a pretrained rectified flow generative model with a predictor fine-tuned solely on image-level labels. It uses training-free guidance of the frozen 3D flow model in a plug-and-play manner, without retraining the generative component, and claims that this produces improved segmentations on the LUNA16 dataset for two separate predictors while consistently detecting nodules of varying sizes and shapes.

Significance. If the central claim holds, the work has moderate significance by showing how pretrained generative foundation models can support weakly-supervised medical image segmentation with reduced annotation and retraining costs. The training-free guidance approach is a potential strength for practical deployment. However, the lack of quantitative metrics, ablations, or detailed baseline comparisons in the experiments makes the practical impact difficult to assess at present.

major comments (2)
  1. [§4] §4 (Experiments): The results on LUNA16 are described only qualitatively as 'improvements over baseline methods' and 'improved-quality segmentations' without reporting standard metrics (e.g., Dice score, IoU, sensitivity), error bars, statistical significance, or specific baseline implementations. This is load-bearing for the central claim of demonstrated improvements and consistent detection across nodule sizes/shapes.
  2. [§3] §3 (Method): The training-free guidance mechanism is not specified in sufficient detail to show how an image-level predictor (fine-tuned only on presence/absence labels) produces reliable voxel-level guidance signals for the frozen 3D rectified flow model, especially for small, variable lung nodules where attribution methods are noted to struggle. This transfer is the core assumption and requires explicit formulation or pseudocode.
minor comments (2)
  1. The abstract and introduction would benefit from a brief qualitative figure or example showing the guidance process and resulting segmentations to illustrate the plug-and-play claim.
  2. [§3] Notation for the rectified flow model and predictor components should be introduced consistently with equation numbers or definitions in §3 to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and will revise the paper to incorporate additional details and quantitative results where needed.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): The results on LUNA16 are described only qualitatively as 'improvements over baseline methods' and 'improved-quality segmentations' without reporting standard metrics (e.g., Dice score, IoU, sensitivity), error bars, statistical significance, or specific baseline implementations. This is load-bearing for the central claim of demonstrated improvements and consistent detection across nodule sizes/shapes.

    Authors: We agree that quantitative evaluation is necessary to fully support the claims of improvement. The current version focuses on qualitative visualization to highlight consistency across nodule sizes and shapes, but we will revise Section 4 to report Dice scores, IoU, sensitivity, with error bars, statistical significance tests, and explicit descriptions of the baseline methods and their implementations on LUNA16. These metrics will be added from our existing experiments to demonstrate the gains. revision: yes

  2. Referee: [§3] §3 (Method): The training-free guidance mechanism is not specified in sufficient detail to show how an image-level predictor (fine-tuned only on presence/absence labels) produces reliable voxel-level guidance signals for the frozen 3D rectified flow model, especially for small, variable lung nodules where attribution methods are noted to struggle. This transfer is the core assumption and requires explicit formulation or pseudocode.

    Authors: We will expand Section 3 with an explicit formulation of the training-free guidance. The image-level predictor outputs a scalar probability that is converted into a guidance gradient applied directly in the voxel space of the frozen 3D rectified flow at each sampling step; this steers the flow trajectory toward label-consistent regions without retraining the generative model. The 3D nature of the flow provides the voxel-level resolution, bypassing direct attribution limitations for small nodules by leveraging the pretrained generative prior. We will include the mathematical equations and pseudocode for the full guidance procedure in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: method uses external pretrained models and standard image-level fine-tuning

full rationale

The paper proposes a plug-and-play combination of a frozen pretrained 3D rectified flow model with a predictor that is fine-tuned only on image-level labels. No equation or derivation step reduces the claimed voxel-level segmentation output to a quantity defined by the target segmentation itself, nor does any central claim rest on a self-citation chain that is unverified or tautological. The reported improvements on LUNA16 are presented as empirical outcomes of this external-model guidance procedure rather than as a mathematical identity or a fitted prediction by construction. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that pretrained rectified flow models are sufficiently general to be guided by a lightly fine-tuned predictor for accurate small-structure segmentation; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Pretrained state-of-the-art rectified flow models can be effectively steered for segmentation tasks using only a fine-tuned predictor without retraining the generative model.
    Invoked when the method is described as plug-and-play with no retraining of the flow model.

pith-pipeline@v0.9.0 · 5461 in / 1401 out tokens · 69016 ms · 2026-05-10T18:17:49.141125+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021

  2. [2]

    Universal guidance for diffusion models

    Bansal et al. Universal guidance for diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 843–852, 2023

  3. [3]

    A survey on gans for anomaly detection.arXiv preprint arXiv:1906.11632, 2019

    Di Mattia et al. A survey on gans for anomaly detection.arXiv preprint arXiv:1906.11632, 2019

  4. [4]

    Deepresidualseparableconvolutionalneuralnetworkforlungtumor segmentation.Computers in biology and medicine, 141:105161, 2022

    Dutandeetal. Deepresidualseparableconvolutionalneuralnetworkforlungtumor segmentation.Computers in biology and medicine, 141:105161, 2022

  5. [5]

    Scaling rectified flow transformers for high-resolution image synthesis

    Esser et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

  6. [6]

    Maisi: Medical ai for synthetic imaging

    Guo et al. Maisi: Medical ai for synthetic imaging. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 4430–4441. IEEE, 2025

  7. [7]

    A flexible 2.5 d medical image segmentation approach with in-slice and cross-slice attention.Computers in Biology and Medicine, 182:109173, 2024

    Kumar et al. A flexible 2.5 d medical image segmentation approach with in-slice and cross-slice attention.Computers in Biology and Medicine, 182:109173, 2024

  8. [8]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Liu et al. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

  9. [9]

    Segment anything in medical images.Nature Communications, 15(1):654, 2024

    Ma et al. Segment anything in medical images.Nature Communications, 15(1):654, 2024

  10. [10]

    Radimagenet: an open radiologic deep learning research dataset for effective transfer learning.Radiology: Artificial Intelligence, 4(5):e210315, 2022

    Mei et al. Radimagenet: an open radiologic deep learning research dataset for effective transfer learning.Radiology: Artificial Intelligence, 4(5):e210315, 2022

  11. [11]

    Anomaly detection with conditioned denoising diffusion models

    Mousakhan et al. Anomaly detection with conditioned denoising diffusion models. InDAGM German Conference on Pattern Recognition, pages 181–195. Springer, 2024

  12. [12]

    Flowchef: Steering of rectified flow models for controlled generations

    Patel et al. Flowchef: Steering of rectified flow models for controlled generations. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15308–15318, 2025

  13. [13]

    What is healthy? generative counterfactual diffusion for lesion local- ization

    Sanchez et al. What is healthy? generative counterfactual diffusion for lesion local- ization. InMICCAI workshop on deep generative models, pages 34–44. Springer, 2022

  14. [14]

    f-anogan: Fast unsupervised anomaly detection with generative ad- versarial networks.Medical image analysis, 54:30–44, 2019

    Schlegl et al. f-anogan: Fast unsupervised anomaly detection with generative ad- versarial networks.Medical image analysis, 54:30–44, 2019

  15. [15]

    Grad-cam: Visual explanations from deep networks via gradient- based localization

    Selvaraju et al. Grad-cam: Visual explanations from deep networks via gradient- based localization. InProceedings of the IEEE international conference on com- puter vision, pages 618–626, 2017

  16. [16]

    Setio et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 chal- lenge.Medical image analysis, 42:1–13, 2017

  17. [17]

    Axiomatic attribution for deep networks

    Sundararajan et al. Axiomatic attribution for deep networks. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017

  18. [18]

    Score-cam: Score-weighted visual explanations for convolutional neural networks

    Wang et al. Score-cam: Score-weighted visual explanations for convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 24–25, 2020. 10 Anonymized Author et al

  19. [19]

    3d meddiffusion: A 3d medical latent diffusion model for control- lable and high-quality medical image generation.IEEE Transactions on Medical Imaging, 2025

    Wang et al. 3d meddiffusion: A 3d medical latent diffusion model for control- lable and high-quality medical image generation.IEEE Transactions on Medical Imaging, 2025

  20. [20]

    Weakmedsam: Weakly-supervised medical image segmentation via sam with sub-class exploration and prompt affinity mining.IEEE Transactions on Medical Imaging, 2025

    Wang et al. Weakmedsam: Weakly-supervised medical image segmentation via sam with sub-class exploration and prompt affinity mining.IEEE Transactions on Medical Imaging, 2025

  21. [21]

    Descargan: Disease-specific anomaly detection with weak supervision

    Wolleb et al. Descargan: Disease-specific anomaly detection with weak supervision. InInternational conference on medical image computing and computer-assisted in- tervention, pages 14–24. Springer, 2020

  22. [22]

    Diffusion models for medical anomaly detection

    Wolleb et al. Diffusion models for medical anomaly detection. InInternational Conference on Medical image computing and computer-assisted intervention, pages 35–45. Springer, 2022

  23. [23]

    Anoddpm: Anomaly detection with denoising diffusion probabilis- tic models using simplex noise

    Wyatt et al. Anoddpm: Anomaly detection with denoising diffusion probabilis- tic models using simplex noise. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 650–656, 2022

  24. [24]

    Diff-unet: A diffu- sion embedded network for volumetric segmentation,

    Xing et al. Diff-unet: A diffusion embedded network for volumetric segmentation. arXiv preprint arXiv:2303.10326, 2023

  25. [25]

    Medsyn: text-guided anatomy-aware synthesis of high-fidelity 3-d ct images.IEEE Transactions on Medical Imaging, 43(10):3648–3660, 2024

    Xu et al. Medsyn: text-guided anatomy-aware synthesis of high-fidelity 3-d ct images.IEEE Transactions on Medical Imaging, 43(10):3648–3660, 2024

  26. [26]

    Tfg: Unified training-free guidance for diffusion models.Advances in Neural Information Processing Systems, 37:22370–22417, 2024

    Ye et al. Tfg: Unified training-free guidance for diffusion models.Advances in Neural Information Processing Systems, 37:22370–22417, 2024

  27. [27]

    Freedom: Training-free energy-guided conditional diffusion model

    Yu et al. Freedom: Training-free energy-guided conditional diffusion model. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 23174–23184, 2023

  28. [28]

    Multiple sclerosis lesion segmentation with tiramisu and 2.5 d stacked slices

    Zhang et al. Multiple sclerosis lesion segmentation with tiramisu and 2.5 d stacked slices. InInternational Conference on Medical Image Computing and Computer- Assisted Intervention, pages 338–346. Springer, 2019

  29. [29]

    Adding conditional control to text-to-image diffusion models

    Zhang et al. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023

  30. [30]

    Maisi-v2: Accelerated 3d high-resolution medical image synthesis with rectified flow and region-specific contrastive loss.arXiv preprint arXiv:2508.05772,

    Zhao et al. Maisi-v2: Accelerated 3d high-resolution medical image synthesis with rectified flow and region-specific contrastive loss.arXiv preprint arXiv:2508.05772, 2025

  31. [31]

    Learning deep features for discriminative localization

    Zhou et al. Learning deep features for discriminative localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016

  32. [32]

    Cancer facts and figures 2016, 2016

    American Cancer Society. Cancer facts and figures 2016, 2016

  33. [33]

    Reduced lung-cancer mortality with low-dose computed tomographic screening.New England Journal of Medicine, 365(5):395–409, 2011

    National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening.New England Journal of Medicine, 365(5):395–409, 2011