pith. sign in

arxiv: 2606.00109 · v1 · pith:P3VKDDARnew · submitted 2026-05-27 · 💻 cs.CV · cs.AI· cs.LG

VDSB-GWSyn: Diffusion Schr\"{o}dinger Bridge for Controllable and Anatomically Feasible Guidewire Synthesis in Coronary Angiography

Pith reviewed 2026-06-29 13:59 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords guidewire synthesisdiffusion Schrödinger bridgecoronary angiographyendpoint localizationsynthetic pre-traininganatomical feasibilityvessel segmentation constraintsSPADE conditioning
0
0 comments X

The pith

A diffusion Schrödinger bridge model generates controllable guidewire images under vessel constraints that improve real-data endpoint localization when used for pre-training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VDSB-GWSyn to synthesize realistic guidewire samples in coronary angiography images, addressing the scarcity of annotated data for robot-assisted PCI. It first learns basic guidewire geometry with a shape prior algorithm, then generates masks constrained by vessel segmentation, and finally produces images via a diffusion Schrödinger bridge conditioned with SPADE. The resulting samples achieve favorable image quality metrics and, when used for synthetic pre-training followed by real fine-tuning, cut mean pixel error in guidewire endpoint localization from 16.01 px to 7.71 px while raising PCK at 3 px from 52.63% to 86.27%. This demonstrates that strictly constrained synthetic data can supply useful training signal for downstream perception tasks where real annotations are limited. The approach emphasizes background preservation and anatomical feasibility throughout synthesis.

Core claim

VDSB-GWSyn learns guidewire geometry via a shape prior, produces masks under vessel segmentation constraints, and synthesizes realistic samples on real CAG backgrounds using a diffusion Schrödinger bridge conditioned with SPADE. The generated samples yield favorable ROI-FID, ROI-KID, and high IPR scores. Pre-training on these samples then fine-tuning on real data substantially boosts guidewire endpoint localization performance.

What carries the argument

Diffusion Schrödinger Bridge (DSB) conditioned with SPADE, combined with a shape prior algorithm and vessel segmentation mask constraints for controllable, anatomically feasible synthesis.

If this is right

  • Synthetic pre-training on the generated samples followed by real fine-tuning reduces mean pixel error in endpoint localization from 16.01 px to 7.71 px.
  • The same pipeline raises PCK at 3 px from 52.63% to 86.27% on the downstream task.
  • The synthesis method produces samples with favorable ROI-FID, ROI-KID, and high IPR scores while preserving real anatomical backgrounds.
  • The core design of controllable device synthesis with strict background preservation and anatomical feasibility can transfer to other interventional device perception tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The constrained synthesis approach may reduce the volume of manual annotations needed for training interventional imaging models.
  • Similar vessel-constrained diffusion bridges could be tested on synthesis of other thin devices such as catheters or stents in the same imaging modality.
  • If the localization gains hold under varied imaging protocols, the method could support safer deployment of robot-assisted guidewire systems by improving endpoint accuracy.

Load-bearing premise

The guidewire samples generated under vessel segmentation constraints and shape prior are high-fidelity and anatomically feasible enough to supply net positive training signal on real images.

What would settle it

A model trained solely on real annotated images outperforms any model that first pre-trains on the synthesized data then fine-tunes on the same real images.

Figures

Figures reproduced from arXiv: 2606.00109 by Haoyuan Tang, Jiachen Yang, Jialin Li, Shuai Xiao, Zhuo Zhang.

Figure 1
Figure 1. Figure 1: Overview of the proposed VDSB-GWSyn framework for controllable guidewire synthesis. 1 Introduction Reliable guidewire perception is a prerequisite for computer aided navigation in coronary angiography [14]. In fluoroscopy, the distal tip is often weakly visible and easily confounded by complex anatomical backgrounds [19], making end￾point localization difficult and error prone [13]. Deep learning based end… view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative visualization of controllable guidewire synthesis in coro [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of guidewire synthesis. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Coronary guidewire endpoint localization is a fundamental capability for computer-assisted PCI, and its importance increases as robot-assisted PCI is progressively adopted to reduce operator radiation exposure. However, the scarcity of annotated CAG images with guidewires and the limited adaptability of existing guidewire synthesis models remain key bottlenecks for guidewire endpoint localization. To address this issue, we propose VDSB-GWSyn, a Diffusion Schr\"{o}dinger Bridge (DSB) model-based framework, enabling synthesis of controllable, high-fidelity guidewire samples under complex anatomical backgrounds. VDSB-GWSyn first uses our shape prior algorithm to learn the basic guidewire geometry. It then generates guidewire masks under constraints imposed by the vessel segmentation masks and outputs the corresponding endpoint coordinates. Finally, it synthesizes realistic guidewire samples on real CAG images using DSB conditioned with SPADE. Experimental results show that the guidewire samples synthesized by VDSB-GWSyn achieve favorable ROI-FID and ROI-KID, as well as high IPR scores. In addition, incorporating our synthesized data for synthetic pre-training followed by real fine-tuning substantially improves downstream guidewire endpoint localization, reducing MPE from 16.01~px to 7.71~px and increasing PCK at 3~px from 52.63\% to 86.27\%, leading to more clinically reliable deployment of robot-assisted guidewire delivery systems. Moreover, the core design philosophy of controllable device synthesis with strict background preservation and anatomical feasibility constraints has the potential to transfer to other interventional device perception tasks where annotated data are scarce.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces VDSB-GWSyn, a Diffusion Schrödinger Bridge (DSB) framework for synthesizing controllable, high-fidelity guidewire samples in coronary angiography (CAG) images. It employs a shape-prior algorithm to learn guidewire geometry, generates masks constrained by vessel segmentations, outputs endpoint coordinates, and renders realistic samples on real CAG backgrounds via DSB conditioned on SPADE. The paper reports favorable ROI-FID, ROI-KID, and high IPR scores for the synthesized samples, and claims that synthetic pre-training on VDSB-GWSyn data followed by real-data fine-tuning substantially improves downstream guidewire endpoint localization (MPE reduced from 16.01 px to 7.71 px; PCK@3 px increased from 52.63% to 86.27%).

Significance. If the central empirical claims hold after proper validation, the work addresses a practical data-scarcity bottleneck in interventional imaging for robot-assisted PCI. The core design of enforcing anatomical feasibility via vessel and shape constraints while preserving background realism could transfer to other device-perception tasks. The reported downstream gains, if shown to be robust, would constitute a concrete, falsifiable demonstration of utility for synthetic data in this domain.

major comments (2)
  1. [Results / downstream evaluation] Results section (performance table reporting MPE/PCK): the headline claim that synthetic pre-training yields the observed gains (16.01 px → 7.71 px MPE; 52.63% → 86.27% PCK@3 px) is load-bearing for the paper’s contribution, yet the text supplies no experimental protocol, baseline list, ablation removing the vessel/shape constraints, statistical tests, or verification that generated endpoints lie on the rendered guidewires. Without these controls it is impossible to rule out that the improvement arises from data volume or training schedule rather than anatomical fidelity.
  2. [Method / synthesis pipeline] Method section (DSB + SPADE synthesis and endpoint generation): the description of how endpoint coordinates are produced and guaranteed to be consistent with the vessel-constrained guidewire mask is insufficient to assess label-noise risk. If alignment error is non-negligible, the pre-training stage trains on noisy labels, directly undermining the assumption that the synthetic pairs supply net positive signal.
minor comments (2)
  1. [Abstract] Abstract and results: “high IPR scores” is stated without definition of the IPR metric or the numerical values obtained.
  2. [Method] Notation: the relationship between the shape-prior algorithm output and the subsequent DSB conditioning variables is not made explicit (e.g., whether the prior is injected as an additional input channel or as a loss term).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments highlight important areas where additional rigor and clarity are needed to support the central claims. We address each major comment below and will revise the manuscript to incorporate the requested details and controls.

read point-by-point responses
  1. Referee: [Results / downstream evaluation] Results section (performance table reporting MPE/PCK): the headline claim that synthetic pre-training yields the observed gains (16.01 px → 7.71 px MPE; 52.63% → 86.27% PCK@3 px) is load-bearing for the paper’s contribution, yet the text supplies no experimental protocol, baseline list, ablation removing the vessel/shape constraints, statistical tests, or verification that generated endpoints lie on the rendered guidewires. Without these controls it is impossible to rule out that the improvement arises from data volume or training schedule rather than anatomical fidelity.

    Authors: We agree that the current presentation of the downstream results lacks sufficient supporting controls and documentation. In the revised manuscript we will expand the Results section to include a complete experimental protocol (training schedules, data splits, and hyper-parameters), an exhaustive baseline list with reported metrics, ablation studies that systematically remove the vessel and shape constraints, appropriate statistical significance tests on the MPE and PCK improvements, and explicit verification (both quantitative and visual) that the generated endpoints lie on the rendered guidewires. These additions will allow readers to assess whether the reported gains derive from anatomical fidelity rather than data volume or schedule effects. revision: yes

  2. Referee: [Method / synthesis pipeline] Method section (DSB + SPADE synthesis and endpoint generation): the description of how endpoint coordinates are produced and guaranteed to be consistent with the vessel-constrained guidewire mask is insufficient to assess label-noise risk. If alignment error is non-negligible, the pre-training stage trains on noisy labels, directly undermining the assumption that the synthetic pairs supply net positive signal.

    Authors: We acknowledge that the Method section does not currently provide enough detail on endpoint coordinate generation and its consistency with the vessel-constrained mask. We will revise the relevant subsections to describe the full pipeline: how the shape-prior algorithm produces candidate geometry, how vessel segmentation masks enforce anatomical boundaries, the exact procedure for extracting endpoint coordinates from the resulting mask, and any post-processing steps that enforce alignment. We will also add quantitative alignment metrics or failure-case analysis to demonstrate that label noise remains negligible and does not undermine the pre-training benefit. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical synthesis method with independent downstream validation

full rationale

The paper describes an empirical pipeline (shape prior → vessel-constrained mask generation → DSB+SPADE synthesis) and reports measured improvements in ROI-FID/KID/IPR plus downstream localization metrics (MPE, PCK). No equations, fitted parameters, or self-citations are presented as load-bearing derivations that reduce the claimed gains to quantities defined by the inputs themselves. The central performance claims are framed as experimental observations rather than algebraic identities or self-referential predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; typical diffusion-model hyperparameters and the assumption that vessel masks provide valid anatomical constraints are implicit but unstated.

pith-pipeline@v0.9.1-grok · 5851 in / 1004 out tokens · 33724 ms · 2026-06-29T13:59:42.838230+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    In: International conference on medical image computing and computer-assisted intervention

    Ambrosini, P., Ruijters, D., Niessen, W.J., Moelker, A., van Walsum, T.: Fully au- tomatic and real-time catheter segmentation in x-ray fluoroscopy. In: International conference on medical image computing and computer-assisted intervention. pp. 577–585. Springer (2017)

  2. [2]

    Demystifying MMD GANs

    Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd gans. arXiv preprint arXiv:1801.01401 (2018)

  3. [3]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Cao, Y., Zhang, Z., Xiao, S., Li, J., Lan, G., Wen, J., Yang, J.: Automatic trans- lational correction of multi-view coronary angiography based on auto-annotation data generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 2652–2660 (2026)

  4. [4]

    Advances in neural informa- tion processing systems34, 17695–17709 (2021)

    De Bortoli, V., Thornton, J., Heng, J., Doucet, A.: Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in neural informa- tion processing systems34, 17695–17709 (2021)

  5. [5]

    IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

    Dorjsembe, Z., Pao, H.K., Odonchimed, S., Xiao, F.: Conditional diffusion mod- els for semantic 3d brain mri synthesis. IEEE Journal of Biomedical and Health Informatics28(7), 4084–4093 (2024)

  6. [6]

    In: International conference on medical image computing and computer-assisted intervention

    Du, Y., Jiang, Y., Tan, S., Wu, X., Dou, Q., Li, Z., Li, G., Wan, X.: Arsdm: colonoscopy images synthesis with adaptive refinement semantic diffusion models. In: International conference on medical image computing and computer-assisted intervention. pp. 339–349. Springer (2023) 10 Tang et al

  7. [7]

    Computer Methods and Programs in Biomedicine192, 105420 (2020)

    Gherardini, M., Mazomenos, E., Menciassi, A., Stoyanov, D.: Catheter segmen- tation in x-ray fluoroscopy using synthetic data and transfer learning with light u-nets. Computer Methods and Programs in Biomedicine192, 105420 (2020)

  8. [8]

    Advances in neural information processing systems30(2017)

    Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems30(2017)

  9. [9]

    RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose,

    Jiang, T., Lu, P., Zhang, L., Ma, N., Han, R., Lyu, C., Li, Y., Chen, K.: Rtm- pose: Real-time multi-person pose estimation based on mmpose. arXiv preprint arXiv:2303.07399 (2023)

  10. [10]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Konz, N., Chen, Y., Dong, H., Mazurowski, M.A.: Anatomically-controllable med- ical image generation with segmentation-guided diffusion models. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 88–98. Springer (2024)

  11. [11]

    Advances in neural information processing systems32(2019)

    Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. Advances in neural information processing systems32(2019)

  12. [12]

    Geometrically Constrained Stenosis Editing in Coronary Angiography via Entropic Optimal Transport

    Li, J., Zhang, Z., Cao, Y., Lan, G., Wen, J., Xiao, S., Yang, J.: Geometrically con- strained stenosis editing in coronary angiography via entropic optimal transport. arXiv preprint arXiv:2605.08851 (2026)

  13. [13]

    IEEE Transactions on Medical Imaging40(8), 2002–2014 (2021)

    Li, R.Q., Xie, X.L., Zhou, X.H., Liu, S.Q., Ni, Z.L., Zhou, Y.J., Bian, G.B., Hou, Z.G.: Real-time multi-guidewire endpoint localization in fluoroscopy images. IEEE Transactions on Medical Imaging40(8), 2002–2014 (2021)

  14. [14]

    IEEE Transactions on Biomedical Engineering69(4), 1406–1416 (2021)

    Li, R.Q., Xie, X.L., Zhou, X.H., Liu, S.Q., Ni, Z.L., Zhou, Y.J., Bian, G.B., Hou, Z.G.: A unified framework for multi-guidewire endpoint localization in fluoroscopy images. IEEE Transactions on Biomedical Engineering69(4), 1406–1416 (2021)

  15. [15]

    In: European conference on computer vision

    Li, Y., Yang, S., Liu, P., Zhang, S., Wang, Y., Wang, Z., Yang, W., Xia, S.T.: Simcc: A simple coordinate classification perspective for human pose estimation. In: European conference on computer vision. pp. 89–106. Springer (2022)

  16. [16]

    org/abs/2302.05872

    Liu, G.H., Vahdat, A., Huang, D.A., Theodorou, E.A., Nie, W., Anandkumar, A.: I2sb: Image-to-image schrödinger bridge. arXiv preprint arXiv:2302.05872 (2023)

  17. [17]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Re- paint: Inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11461– 11471 (2022)

  18. [18]

    U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

    Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomed- ical image segmentation. arXiv preprint arXiv:2401.04722 (2024)

  19. [19]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Pan, S., Liu, Y., Zhao, L., Chen, E.Z., Chen, X., Chen, T., Sun, S.: Label-efficient data augmentation with video diffusion models for guidewire segmentation in car- diac fluoroscopy. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 6308–6316 (2025)

  20. [20]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2337–2346 (2019)

  21. [21]

    Scientific data11(1), 20 (2024)

    Popov, M., Amanturdieva, A., Zhaksylyk, N., Alkanov, A., Saniyazbekov, A., Aimyshev,T.,Ismailov,E.,Bulegenov,A.,Kuzhukeyev,A.,Kulanbayeva,A.,etal.: Dataset for automatic region-based coronary artery disease diagnostics using x-ray angiography images. Scientific data11(1), 20 (2024)

  22. [22]

    Applied Sciences11(4), 1638 (2021) VDSB-GWSyn for Guidewire Synthesis 11

    Ullah, I., Chikontwe, P., Choi, H., Yoon, C.H., Park, S.H.: Synthesize and segment: Towards improved catheter segmentation via adversarial augmentation. Applied Sciences11(4), 1638 (2021) VDSB-GWSyn for Guidewire Synthesis 11

  23. [23]

    In: International Workshop on PRedictive Intelli- gence In MEdicine

    Ullah, I., Chikontwe, P., Park, S.H.: Catheter synthesis in x-ray fluoroscopy with generative adversarial networks. In: International Workshop on PRedictive Intelli- gence In MEdicine. pp. 125–133. Springer (2019)

  24. [24]

    Remote Sensing17(22), 3771 (2025)

    Wang, D., Zhang, Y., Bai, B., Yu, X., Shu, X., Dai, Y.: Grade: A generalization robustness assessment via distributional evaluation for remote sensing object de- tection. Remote Sensing17(22), 3771 (2025)

  25. [25]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High- resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8798–8807 (2018)

  26. [26]

    In: Proceedings of the Com- puter Vision and Pattern Recognition Conference

    Xu, J., Wang, S., Chen, J., Li, Z., Jia, P., Zhao, F., Xiang, G., Hao, Z., Zhang, S., Xie, X.: Decouple distortion from perception: Region adaptive diffusion for extreme-low bitrate perception image compression. In: Proceedings of the Com- puter Vision and Pattern Recognition Conference. pp. 18051–18061 (2025)

  27. [27]

    IEEE transactions on pattern analysis and machine intelligence35(12), 2878–2890 (2012)

    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE transactions on pattern analysis and machine intelligence35(12), 2878–2890 (2012)

  28. [28]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Yazdani, M., Medghalchi, Y., Ashrafian, P., Hacihaliloglu, I., Shahriari, D.: Flow matching for medical image synthesis: Bridging the gap between speed and quality. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 216–226. Springer (2025)

  29. [29]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3836–3847 (2023)