HTC-SGA Former: A Hybrid Transformer-CNN Network with Self-Guided Attention and a New Boundary-Weighted Adaptive Loss for Coronary DSA Vessel Segmentation
Pith reviewed 2026-06-30 06:39 UTC · model grok-4.3
The pith
HTC-SGA Former segments coronary DSA vessels more accurately than 14 prior methods using only 0.81 million parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HTC-SGA Former employs a CNN encoder for local vessel morphology extraction and a Transformer decoder for contextual feature modeling. A Multi-Scale Global-Local Window Attention block performs efficient global-local contextual modeling, a Self-Guided Feature Attention module enhances weak-vessel responses, and a Boundary-Weighted Adaptive Compound Loss emphasizes thin-vessel boundaries while adaptively balancing recovery and refinement. On private right and left coronary artery DSA subsets this architecture outperforms 14 state-of-the-art segmentation methods while using only 0.81M parameters; the loss also improves results when substituted into four different encoder-decoder backbones.
What carries the argument
Multi-Scale Global-Local Window Attention (MS-GLWA) block combined with Self-Guided Feature Attention (SGFA) module and Boundary-Weighted Adaptive Compound Loss (BWACL), which together supply global-local context, weak-vessel emphasis, and boundary-focused optimization inside the hybrid encoder-decoder.
If this is right
- Thin distal branches and vessel boundaries become recoverable at higher fidelity than with prior encoder-decoder or pure Transformer models.
- Vessel continuity improves, reducing fragmentation that currently limits downstream CAD analysis.
- BWACL can be dropped into other segmentation networks to raise performance without changing their architecture.
- The 0.81M-parameter footprint makes real-time or edge deployment feasible for clinical DSA workflows.
- Complementary global-local modeling plus adaptive boundary weighting supports more reliable computer-assisted cardiovascular interventions.
Where Pith is reading between the lines
- The same lightweight hybrid pattern could be tried on other low-contrast tubular structures such as retinal or hepatic vessels where class imbalance is also severe.
- Public benchmark datasets for coronary segmentation would allow direct comparison and test whether gains persist outside the private subsets used here.
- If BWACL generalizes across backbones, it could serve as a drop-in replacement for standard losses in any medical segmentation pipeline that must preserve fine boundaries.
Load-bearing premise
The observed performance gains are produced by the MS-GLWA, SGFA, and BWACL components rather than by dataset properties, preprocessing steps, or hyperparameter choices.
What would settle it
An ablation study on the same private datasets that replaces MS-GLWA with standard window attention, removes SGFA, and swaps BWACL for binary cross-entropy plus Dice loss, yet still matches or exceeds the reported scores, would falsify the claim that these three elements drive the improvement.
Figures
read the original abstract
Accurate coronary Digital Subtraction Angiography (DSA) vessel segmentation is essential for computer-aided diagnosis and treatment planning of coronary artery disease (CAD). However, thin low-contrast vessels, background interference, and severe vessel-background class imbalance make reliable segmentation of weak distal branches and vessel boundaries challenging. Existing methods struggle to balance global contextual reasoning with preservation of weak vessels, vessel continuity, and fine boundaries. To address these limitations, we propose HTC-SGA Former, a lightweight hybrid Transformer-CNN framework for coronary DSA vessel segmentation. It employs a CNN encoder for local vessel morphology extraction and a Transformer decoder for contextual feature modeling. A Multi-Scale Global-Local Window Attention (MS-GLWA) block performs efficient global-local contextual modeling, while a Self-Guided Feature Attention (SGFA) module enhances weak-vessel responses. In addition, a Boundary-Weighted Adaptive Compound Loss (BWACL) emphasizes thin-vessel boundaries and adaptively balances vessel recovery and boundary refinement. Experiments on private right and left coronary artery DSA subsets show that HTC-SGA Former outperforms 14 state-of-the-art segmentation methods while maintaining a compact architecture with only 0.81M parameters. BWACL also improves performance over binary cross-entropy and Dice losses across four encoder-decoder architectures, demonstrating strong cross-backbone applicability. HTC-SGA Former improves thin-vessel recovery, vessel continuity, and boundary localization through complementary global-local contextual modeling, vessel-focused refinement, and adaptive optimization, supporting reliable and computationally efficient coronary vessel analysis for future computer-assisted cardiovascular interventions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes HTC-SGA Former, a lightweight hybrid Transformer-CNN for coronary DSA vessel segmentation. It uses a CNN encoder and Transformer decoder augmented by a Multi-Scale Global-Local Window Attention (MS-GLWA) block, a Self-Guided Feature Attention (SGFA) module, and a Boundary-Weighted Adaptive Compound Loss (BWACL). The central empirical claims are that the model outperforms 14 state-of-the-art segmentation methods on private right- and left-coronary DSA subsets while using only 0.81 M parameters, and that BWACL improves results over binary cross-entropy and Dice losses across four encoder-decoder backbones.
Significance. If the reported gains prove reproducible and attributable to the proposed modules rather than dataset-specific tuning, the compact architecture and boundary-aware loss could be useful for clinical vessel analysis. The absence of any public-dataset validation or statistical rigor, however, prevents assessment of generalizability, so the practical significance remains limited.
major comments (2)
- [Experiments] Experiments section: all quantitative comparisons (Tables reporting Dice, sensitivity, etc., versus 14 SOTA methods) and all ablation results for MS-GLWA, SGFA, and BWACL are confined to two private right/left coronary DSA subsets. No acquisition parameters, annotation protocol, train/val/test splits, or preprocessing details are supplied that would permit reproduction, and no experiments on any public coronary or vessel dataset appear. This directly blocks verification of the outperformance and component-attribution claims that constitute the paper's central contribution.
- [Results] Results and ablation subsections: the abstract asserts that HTC-SGA Former and BWACL outperform baselines, yet no statistical significance tests, standard deviations, or error bars are reported, nor is any protocol described that guarantees identical hyper-parameters, data augmentation, and optimization settings across all compared methods and backbones. Without these, the attribution of gains to the proposed modules cannot be isolated from confounding factors.
minor comments (1)
- [Abstract] Abstract: the 0.81 M parameter count is stated without any corresponding parameter counts for the 14 reference methods, making the compactness claim difficult to evaluate.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for improved reproducibility and statistical rigor. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Experiments] Experiments section: all quantitative comparisons (Tables reporting Dice, sensitivity, etc., versus 14 SOTA methods) and all ablation results for MS-GLWA, SGFA, and BWACL are confined to two private right/left coronary DSA subsets. No acquisition parameters, annotation protocol, train/val/test splits, or preprocessing details are supplied that would permit reproduction, and no experiments on any public coronary or vessel dataset appear. This directly blocks verification of the outperformance and component-attribution claims that constitute the paper's central contribution.
Authors: We agree that additional experimental details are required. In the revised manuscript we will add a dedicated subsection with full acquisition parameters (e.g., imaging system, contrast protocol, resolution), annotation protocol (number of experts, inter-observer agreement), exact train/val/test split sizes and selection criteria, and all preprocessing steps. Because the data are private clinical DSA images governed by patient privacy regulations, public release is not possible; we will state this limitation explicitly. We will also explore inclusion of at least one public vessel segmentation dataset for supplementary validation. revision: partial
-
Referee: [Results] Results and ablation subsections: the abstract asserts that HTC-SGA Former and BWACL outperform baselines, yet no statistical significance tests, standard deviations, or error bars are reported, nor is any protocol described that guarantees identical hyper-parameters, data augmentation, and optimization settings across all compared methods and backbones. Without these, the attribution of gains to the proposed modules cannot be isolated from confounding factors.
Authors: We concur that statistical controls are essential. The revised manuscript will report standard deviations from repeated experiments (different random seeds), include paired statistical significance tests (e.g., Wilcoxon signed-rank) between HTC-SGA Former and each baseline, and add error bars to all quantitative tables and figures. We will also document the hyper-parameter search procedure and confirm that every compared method and backbone was trained and evaluated under identical data-augmentation and optimization protocols. revision: yes
Circularity Check
No circularity: empirical architecture proposal with no derivation chain
full rationale
The paper introduces HTC-SGA Former (CNN encoder + Transformer decoder with MS-GLWA and SGFA modules) plus BWACL loss, then reports experimental metrics on two private DSA datasets. No equations derive predictions from inputs, no fitted parameters are relabeled as predictions, and no self-citation chain supports a uniqueness claim. All performance assertions rest on direct comparisons to 14 baselines and cross-backbone ablations; the derivation is therefore self-contained and contains no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Supervised learning on pixel-wise annotated DSA images is a valid way to train vessel segmentation models.
invented entities (3)
-
MS-GLWA block
no independent evidence
-
SGFA module
no independent evidence
-
BWACL loss
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Biomedical Signal Processing and Control 123, 110539
Msa-unet3+: Multi-scale attention unet3+ with new supervised prototypical contrastive loss for coronary dsa image segmentation. Biomedical Signal Processing and Control 123, 110539. doi:10.1016/j.bspc.2026.110539. Algarni, M., Al-Rezqi, A., Saeed, F., Alsaeedi, A., Ghabban, F.,
-
[2]
PeerJ Computer Science 8, e993
Multi-constraints based deep learning model for automated segmentation and diagnosis of coronary artery disease in x-ray angiographic images. PeerJ Computer Science 8, e993. doi:10.7717/peerj-cs.993. Cao,H.,Wang,Y.,Chen,J.,Jiang,D.,Zhang,X.,Tian,Q.,Wang,M.,2022. Swin-unet:Unet-likepuretransformerformedicalimagesegmentation, in: European conference on comp...
-
[3]
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.,
doi:10.1038/s41598-024-57198-5. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.,
-
[4]
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 doi:10.48550/arXiv.2102.04306. Rayan Merghani Ahmed et al.:Preprint submitted to ElsevierPage 18 of 20 HTC-SGA Former Deng, H., Li, Y., Liu, X., Cheng, K., Fang, T., Min, X.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2102.04306
-
[5]
Multi-scale dual attention embedded u-shaped network for accurate segmentation of coronary vessels in digital subtraction angiography. Medical Physics 52, 3135–3150. doi:10.1002/mp.17618. Deng, H., Liu, X., Fang, T., Li, Y., Min, X.,
-
[6]
Frąk, W., Wojtasińska, A., Lisińska, W., Młynarska, E., Franczyk, B., Rysz, J.,
doi:10.1186/s40537-024-00904-x. Frąk, W., Wojtasińska, A., Lisińska, W., Młynarska, E., Franczyk, B., Rysz, J.,
-
[7]
Fu,B.,Peng,Y.,He,J.,Tian,C.,Sun,X.,Wang,R.,2024a
doi:10.3390/ biomedicines10081938. Fu,B.,Peng,Y.,He,J.,Tian,C.,Sun,X.,Wang,R.,2024a. Hmsu-net:Ahybridmulti-scaleu-netbasedonacnnandtransformerformedicalimage segmentation. Computers in Biology and Medicine 170, 108013. doi:10.1016/j.compbiomed.2024.108013. Fu, Z., Fu, Z., Lu, C., Yan, J., Fei, J., Han, H., 2024b. Robust implementation of foreground extrac...
-
[8]
A multi-scale global attention network for blood vessel segmentation from fundus images. Measurement 222, 113553. doi:10.1016/j.measurement.2023.113553. Gao,Y.,Wang,Y.,Ai,D.,Shang,F.,Song,H.,Fan,J.,Xiao,D.,Wang,Y.,Yang,J.,2026. Iterativejointlearningintegratingtemporalandgeometric information for vessel segmentation in x-ray coronary angiography. Medical ...
-
[9]
IEEE Transactions on Medical Imaging 42, 2763–2775
H2former: An efficient hierarchical hybrid transformer for medical image segmentation. IEEE Transactions on Medical Imaging 42, 2763–2775. doi:10.1109/tmi.2023.3264513. He,K.,Zhang,X.,Ren,S.,Sun,J.,2016. Deepresiduallearningforimagerecognition,in:ProceedingsoftheIEEEconferenceoncomputervision and pattern recognition, pp. 770–778. doi:10.1109/cvpr.2016.90....
-
[10]
Deep learning model for coronary artery segmentation and quantitative stenosis detection in angiographic images. Medical Physics 52, e17970. doi:10.1002/mp.17970. Huang, X., Deng, Z., Li, D., Yuan, X., Fu, Y.,
-
[11]
IEEE transactions on medical imaging 42, 1484–1494
Missformer: An effective transformer for 2d medical image segmentation. IEEE transactions on medical imaging 42, 1484–1494. doi:10.1109/TMI.2022.3230943. Huang, X., Gong, H., Zhang, J.,
-
[12]
IEEE Journal of Biomedical and Health Informatics 28, 4048–4061
Hst-mrf: heterogeneous swin transformer with multi-receptive field for medical image segmentation. IEEE Journal of Biomedical and Health Informatics 28, 4048–4061. doi:10.1109/jbhi.2024.3397047. Li,C.,Tan,Y.,Chen,W.,Luo,X.,Gao,Y.,Jia,X.,Wang,Z.,2020. Attentionunet++:Anestedattention-awareu-netforliverctimagesegmentation, in: 2020 IEEE international confer...
-
[13]
Computer Methods and Programs in Biomedicine 233, 107493
Mestrans: Multi-scale embedding spatial transformer for medical image segmentation. Computer Methods and Programs in Biomedicine 233, 107493. doi:10.1016/j.cmpb.2023.107493. Liu,Z.,Lin,Y.,Cao,Y.,Hu,H.,Wei,Y.,Zhang,Z.,Lin,S.,Guo,B.,2021. Swintransformer:Hierarchicalvisiontransformerusingshiftedwindows, in: Proceedings of the IEEE/CVF international conferen...
-
[14]
Biomedical Signal Processing and Control 101, 107175
Segmentation of coronary arteries from x-ray angiographic images using density based spatial clustering of applications with noise (dbscan). Biomedical Signal Processing and Control 101, 107175. doi:10.1016/j.bspc.2024.107175. Milletari, F., Navab, N., Ahmadi, S.A.,
-
[15]
V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), Ieee. pp. 565–571. doi:10.1109/3dv.2016.79. Moccia, S., De Momi, E., El Hadji, S., Mattos, L.S.,
-
[16]
Computer methods and programs in biomedicine 158, 71–91
Blood vessel segmentation algorithms—review of methods, datasets and evaluation metrics. Computer methods and programs in biomedicine 158, 71–91. doi:10.1016/j.cmpb.2018.02.001. Molenaar, M.A., Hebbo, E., Selder, J.L., Shekiladze, N., Sandesara, P.B., Nicholson, W.J., Asselbergs, F.W., Ahmad, S., Gold, D.A., Sakr, S.M., et al.,
-
[17]
Deep learning–based segmentation of coronary arteries and stenosis detection in x-ray coronary angiography. JACC: Advances 4, 102360. doi:10.1016/j.jacadv.2025.102360. NobreMenezes,M.,Silva,J.L.,Silva,B.,Rodrigues,T.,Guerreiro,C.,Guedes,J.P.,Santos,M.O.,Oliveira,A.L.,Pinto,F.J.,2023. Coronaryx-ray angiography segmentation using artificial intelligence: a ...
-
[18]
Attention U-Net: Learning Where to Look for the Pancreas
Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 doi:10.48550/arXiv.1804.03999. Pagliaro, B.R., Cannata, F., Stefanini, G.G., Bolognese, L.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1804.03999
-
[19]
Heart failure reviews 25, 53–65
Myocardial ischemia and coronary disease in heart failure. Heart failure reviews 25, 53–65. doi:10.1007/s10741-019-09831-z. Pan, L.S., Li, C.W., Su, S.F., Tay, S.Y., Tran, Q.V., Chan, W.P.,
-
[20]
Coronary artery segmentation under class imbalance using a u-net based architecture on computed tomography angiography images. Scientific reports 11, 14493. doi:10.1038/s41598-021-93889-z. Park, J., Kweon, J., Kim, Y.I., Back, I., Chae, J., Roh, J.H., Kang, D.Y., Lee, P.H., Ahn, J.M., Kang, S.J., et al.,
-
[21]
Selective ensemble methods for deep learning segmentation of major vessels in invasive coronary angiography. Medical physics 50, 7822–7839. doi:10.1002/mp.16554. Peng,J.,Wang,P.,Pedersoli,M.,Desrosiers,C.,2024. Boundary-awareinformationmaximizationforself-supervisedmedicalimagesegmentation. Medical Image Analysis 94, 103150. doi:10.1016/j.media.2024.10315...
-
[22]
Computers in Biology and Medicine 156, 106493
Semi-supervised segmentation of coronary dsa using mixed networks and multi-strategies. Computers in Biology and Medicine 156, 106493. doi:10.1016/j.compbiomed.2022.106493. Qiu, M., Zhang, C., Song, Z.,
-
[23]
Dynamic boundary-insensitive loss for magnetic resonance medical image segmentation. Medical physics 49, 1739–1753. doi:10.1002/mp.15386. Ronneberger, O., Fischer, P., Brox, T.,
-
[24]
U-net: Convolutional networks for biomedical image segmentation, in: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, Springer. pp. 234–241. doi:10.1007/978-3-319-24574-4_28. Samuel, P.M., Veeramalai, T.,
-
[25]
Computer methods and programs in biomedicine 198, 105769
Vssc net: vessel specific skip chain convolutional network for blood vessel segmentation. Computer methods and programs in biomedicine 198, 105769. doi:10.1016/j.cmpb.2020.105769. Shen, Y., Chen, Z., Tong, J., Jiang, N., Ning, Y.,
-
[26]
The International Journal of Cardiovascular Imaging 39, 1571–1579
Dbcu-net: deep learning approach for segmentation of coronary angiography images. The International Journal of Cardiovascular Imaging 39, 1571–1579. doi:10.1007/s10554-023-02849-3. Tang, F., Ding, J., Quan, Q., Wang, L., Ning, C., Zhou, S.K.,
-
[27]
Cmunext: An efficient medical image segmentation network based on large kernel and skip fusion, in: 2024 IEEE International Symposium on Biomedical Imaging (ISBI), IEEE. pp. 1–5. doi:10.1109/isbi56570. 2024.10635609. Tang, F., Wang, L., Ning, C., Xian, M., Ding, J., 2023a. Cmu-net: a strong convmixer-based medical ultrasound image segmentation network, in...
-
[28]
Physics in Medicine & Biology 69, 025012
Coronary vessel segmentation in coronary angiography with a multi- scale u-shaped transformer incorporating boundary aggregation and topology preservation. Physics in Medicine & Biology 69, 025012. doi:10.1088/1361-6560/ad0b63. Wang, H., Qi, Y., Liu, W., Guo, K., Lv, W., Liang, Z.,
-
[29]
IEEE Journal of Biomedical and Health Informatics doi:10.1109/jbhi.2025.3601025
Dpgnet: A boundary-aware medical image segmentation framework via uncertainty perception. IEEE Journal of Biomedical and Health Informatics doi:10.1109/jbhi.2025.3601025. Wang, R., Chen, S., Ji, C., Fan, J., Li, Y.,
-
[30]
Medical image analysis 78, 102395
Boundary-aware context neural network for medical image segmentation. Medical image analysis 78, 102395. doi:10.1016/j.media.2022.102395. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.,
-
[31]
Cbam: Convolutional block attention module, in: Proceedings of the European conference on computer vision (ECCV), pp. 3–19. doi:10.1007/978-3-030-01234-2_1. Wu, H., Min, W., Gai, D., Huang, Z., Geng, Y., Wang, Q., Chen, R.,
-
[32]
Computers in Biology and Medicine 178, 108671
Hd-former: A hierarchical dependency transformer for medical image segmentation. Computers in Biology and Medicine 178, 108671. doi:10.1016/j.compbiomed.2024.108671. Wu, W., Dai, T., Chen, Z., Huang, X., Xiao, J., Ma, F., Ouyang, R.,
-
[33]
Engineering Applications of Artificial Intelligence 139, 109626
Adaptive patch contrast for weakly supervised semantic segmentation. Engineering Applications of Artificial Intelligence 139, 109626. doi:10.1016/j.engappai.2024.109626. Xia, C., Rook, M., Pelgrim, G.J., Sidorenkov, G., Wisselink, H.J., Van Bolhuis, J.N., van Ooijen, P.M., Guo, J., Oudkerk, M., Groen, H., et al.,
-
[34]
Early imaging biomarkers of lung cancer, copd and coronary artery disease in the general population: rationale and design of the imalife (imaging in lifelines) study: C. xia et al. European journal of epidemiology 35, 75–86. doi:10.1007/s10654-019-00519-0. Xie, X., Zhang, W., Pan, X., Xie, L., Shao, F., Zhao, W., An, J.,
-
[35]
Biomedical Signal Processing and Control 81, 104437
Canet: Context aware network with dual-stream pyramid for medical image segmentation. Biomedical Signal Processing and Control 81, 104437. doi:10.1016/j.bspc.2022.104437. Yang, S., Kweon, J., Roh, J.H., Lee, J.H., Kang, H., Park, L.J., Kim, D.J., Yang, H., Hur, J., Kang, D.Y., et al.,
-
[36]
Deep learning segmentation of major vessels in x-ray coronary angiography. Scientific reports 9, 16897. doi:10.1038/s41598-019-53254-7. Zeng,Y.,Liu,X.,Xiao,N.,Li,Y.,Jiang,Y.,Feng,J.,Guo,S.,2019. Automaticdiagnosisbasedonspatialinformationfusionfeatureforintracranial aneurysm. IEEE transactions on medical imaging 39, 1448–1458. doi:10.1109/tmi.2019.2951439...
-
[37]
arXiv (Cornell University) URL:https://arxiv.org/abs/1912.05074
Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. arXiv (Cornell University) URL:https://arxiv.org/abs/1912.05074. Rayan Merghani Ahmed et al.:Preprint submitted to ElsevierPage 20 of 20
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.