arxiv: 2604.10737 · v1 · submitted 2026-04-12 · 📡 eess.IV

Recognition: unknown

Generative Data-engine Foundation Model for Universal Few-shot 2D Vascular Image Segmentation

Rongjun Ge , Xin Li , Yuxing Liu , Chengliang Liu , Pinzheng Zhang , Jiong Zhang , Jian Yang , Jean-Louis Dillenseger

show 3 more authors

Chunfeng Yang Yuting He Yang Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:39 UTC · model grok-4.3

classification 📡 eess.IV

keywords vascular segmentationfew-shot learninggenerative foundation modelimage synthesiscompositional learningcross-modality adaptationmedical imagingdata-efficient segmentation

0 comments

The pith

UniVG learns to decompose and recombine vascular structures into synthetic data, enabling a foundation model that matches full supervision in few-shot segmentation across modalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UniVG as a generative foundation model that tackles the scarcity of annotated 2D vascular images for clinical segmentation. It decomposes vessel images into morphological components and background configurations, then recombines them to create large numbers of realistic synthetic image-label pairs for pre-training. A dataset of 58,689 such images across five modalities supports this step. With only five real labeled examples per task, the model adapts by generating additional authentic vessel images, closing the synthetic-to-real gap for downstream segmentation. Experiments across eleven tasks show performance comparable to models trained on full datasets, lowering annotation demands.

Core claim

UniVG decomposes vascular images into varying morphological features and foreground-background setups whose recombination produces diverse synthetic training pairs for large-scale generative pre-training. Few-shot fine-tuning then adapts the model by synthesizing realistic vessel images from minimal real annotations, yielding a universal segmenter that achieves accuracy comparable to fully supervised training on eleven tasks across five modalities.

What carries the argument

Compositional decomposition and recombination of vascular structures to generate synthetic image-label pairs for pre-training and few-shot generative adaptation.

If this is right

One pre-trained model can adapt to many new vascular segmentation tasks with minimal new labels.
Annotation and data collection costs for clinical vascular analysis fall sharply while accuracy holds.
A single model handles multiple imaging modalities without separate training from scratch.
Synthetic data at this scale provides a reusable base for other vessel-related analysis tasks.
The approach supports rapid deployment in settings where large annotated datasets are unavailable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Compositional recombination could extend to other medical structures with clear parts, such as airways or retinal vessels.
If synthetic diversity covers enough pathology variation, similar engines might cut privacy risks by reducing real patient data needs.
The method might scale to 3D or time-series vascular data if the decomposition rules transfer to volumes.
Limits would appear first on rare vessel anomalies poorly captured by recombination of standard features.

Load-bearing premise

Recombining decomposed vascular parts produces synthetic images diverse and realistic enough for few-shot adaptation to close the gap to real clinical data without introducing artifacts or biases that hurt segmentation accuracy.

What would settle it

Test UniVG on a sixth imaging modality not among the original five, using only five labeled images, and check whether its Dice scores on vessel segmentation match those of a model trained on hundreds of real labels; a clear shortfall would falsify the transfer claim.

Figures

Figures reproduced from arXiv: 2604.10737 by Chengliang Liu, Chunfeng Yang, Jean-Louis Dillenseger, Jian Yang, Jiong Zhang, Pinzheng Zhang, Rongjun Ge, Xin Li, Yang Chen, Yuting He, Yuxing Liu.

**Figure 2.** Figure 2: The motivation in our generative data-engine foundation model is that [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The process of node growth in the space colonization algorithm. Figure [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: UniVG framework architecture overview. The framework consists of two stages: (a) Compositional learning for flexible and diverse vascular synthesis, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: We curated a large-scale 2D vascular dataset, UniVG-58K, for pre [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative evaluation shows the visual superiority of our UniVG compared to other benchmark methods. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: In the few-shot coronary artery segmentation task, (a) analyzes the im [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: t-SNE visualization comparing the distribution of coronary artery masks [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of coronary artery masks generated by di [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Generation results of coronary artery masks with di [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Hyper-parameter sensitivity analysis on SBCD dataset for coronary artery vessel segmentation. [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Parameter sensitivity analysis of SCA parameters on generated vascular structure quality. Da represents attraction distance (pixels), Dk represents kill distance (pixels), Ls represents segment length (pixels) [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Performance of synthetic mask diversity and downstream segmentation [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

read the original abstract

The segmentation of 2D vascular structures via deep learning holds significant clinical value but is hindered by the scarcity of annotated data, severely limiting its widespread application. Developing a universal few-shot vascular segmentation model is highly desirable, yet remains challenging due to the need for extensive training and the inherent complexities of vascular imaging. In this work, we propose UniVG (Generative Data-engine Foundation Model for Universal Few-shot 2D Vascular Image Segmentation), a novel approach that learns the compositionality of vascular images and constructing a generative foundation model for robust vascular segmentation. UniVG enables the synthesis and learning of diverse and realistic vascular images through two key innovations: 1) Compositional learning for flexible and diverse vascular synthesis: It decomposes and recombines vascular structures with varying morphological features and diverse foreground-background configurations to generate richly diverse synthetic image-label pairs. 2) Few-shot generative adaptation for transferable segmentation: It fine-tunes pre-trained models with minimal annotated data to bridge the gap between synthetic and real vascular domains, synthesizing authentic and diverse vessel images for downstream few-shot vascular segmentation learning. To support our approach, we develop UniVG-58K, a large dataset comprising 58,689 vascular images across five imaging modalities, facilitating robust large-scale generative pre-training. Extensive experiments on 11 vessel segmentation tasks cross 5 modalties (only with 5 labeled images on each task) demonstrate that UniVG achieves performance comparable to fully supervised models, significantly reducing data collection and annotation costs. All code and datasets will be made publicly available at https://github.com/XinAloha/UniVG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes UniVG, a generative foundation model for universal few-shot 2D vascular image segmentation. It learns the compositionality of vascular images by decomposing and recombining vascular structures with varying morphological features and foreground-background configurations to generate synthetic image-label pairs. A few-shot generative adaptation is used to fine-tune for real domains. The work introduces the UniVG-58K dataset with 58,689 images across five modalities for pre-training. Extensive experiments on 11 vessel segmentation tasks across 5 modalities using only 5 labeled images per task claim that UniVG achieves performance comparable to fully supervised models.

Significance. If the empirical results hold, this work has high significance for medical imaging as it addresses the critical issue of limited annotated data in vascular segmentation, which has clinical value. By using generative synthesis and few-shot learning, it could substantially reduce annotation costs. The public availability of code and dataset is a positive aspect for the field.

major comments (2)

[Experimental Results] The central claim that UniVG matches fully supervised performance on 11 tasks with 5-shot learning is load-bearing but the abstract provides no quantitative results, tables, ablation studies, or statistical tests. The full manuscript must include specific metrics (e.g., Dice coefficients) and comparisons to verify this.
[Method (Compositional Learning)] The assumption that compositional decomposition and recombination produces sufficiently diverse and domain-realistic synthetic images is not supported by any quantitative checks such as FID scores, vessel topology statistics, or perceptual metrics. Without these, it is unclear if artifacts or biases are introduced that could affect downstream segmentation accuracy.

minor comments (2)

[Abstract] Typo: 'modalties' should be 'modalities'.
[Abstract] The sentence describing decomposition and recombination could be clarified for grammar and flow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive evaluation of the work's significance for medical imaging. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Experimental Results] The central claim that UniVG matches fully supervised performance on 11 tasks with 5-shot learning is load-bearing but the abstract provides no quantitative results, tables, ablation studies, or statistical tests. The full manuscript must include specific metrics (e.g., Dice coefficients) and comparisons to verify this.

Authors: We agree that the abstract would benefit from including key quantitative results to immediately support the central claim. The full manuscript already presents detailed experimental results, including tables with Dice coefficients, Hausdorff distances, and direct comparisons to fully supervised models as well as other few-shot baselines across all 11 tasks in Section 4. Ablation studies on the compositional components and few-shot adaptation are also included. We will revise the abstract to report representative metrics (e.g., average Dice scores showing comparability to fully supervised performance) and ensure statistical significance is explicitly highlighted. This addresses the presentation concern without altering the underlying results. revision: yes
Referee: [Method (Compositional Learning)] The assumption that compositional decomposition and recombination produces sufficiently diverse and domain-realistic synthetic images is not supported by any quantitative checks such as FID scores, vessel topology statistics, or perceptual metrics. Without these, it is unclear if artifacts or biases are introduced that could affect downstream segmentation accuracy.

Authors: We acknowledge that explicit quantitative validation of the synthetic images would strengthen the methodological claims. The current manuscript primarily validates the compositional learning through its impact on downstream few-shot segmentation performance on real data, which serves as an indirect but task-relevant check. However, we agree this leaves room for improvement. In the revision, we will add FID scores between the generated synthetic images and real images from the UniVG-58K dataset, along with vessel topology statistics (e.g., total length, branching points, and curvature distributions) and perceptual metrics to quantify diversity and realism. These additions will help rule out potential artifacts or biases. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical method with external validation

full rationale

The manuscript describes a generative foundation model (UniVG) that decomposes/recombines vascular structures for synthetic data generation, performs few-shot adaptation, and evaluates on 11 segmentation tasks across 5 modalities using 5 labeled images per task. No equations, derivations, fitted parameters presented as predictions, or self-referential definitions appear in the abstract or described approach. Central claims rest on experimental comparisons to fully-supervised baselines, which constitute independent empirical checks rather than reductions to the method's own inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on standard deep-learning assumptions about generative model capacity and domain adaptation rather than new axioms or fitted parameters that define the central claim.

pith-pipeline@v0.9.0 · 5624 in / 1095 out tokens · 34522 ms · 2026-05-10T15:39:13.326207+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Fsg-net: a deep learning model for semantic robot grasping through few-shot learning, in: PROCEEDINGS-IEEE INTERNATIONAL CON- FERENCE ON ROBOTICS AND AUTOMATION, IEEE. pp. 1793–1799. Bankhead, P., Scholfield, C.N., McGeown, J.G., Curtis, T.M., 2012. Fast retinal vessel detection and measurement using wavelets and edge location refine- ment. PloS one 7, e3...

work page internal anchor Pith review arXiv 2012
[2]

IEEE Transactions on Medical Imaging 42, 245–256

Contrastive semi-supervised learning for domain adaptive segmen- tation across similar anatomical structures. IEEE Transactions on Medical Imaging 42, 245–256. Gulshan, V ., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al., 2016. Development and validation of a deep learning algorit...

work page doi:10.1016/j.cmpb 2016
[3]

iBOT: Image BERT Pre-Training with Online Tokenizer

ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832 . Zhou, Z.H., 2018. A brief introduction to weakly supervised learning. National science review 5, 44–53. Zhu, J.Y ., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image trans- lation using cycle-consistent adversarial networks, in: Proceedings of the IEEE int...

work page internal anchor Pith review arXiv 2018