Anatomy-Aware Text-Visual Fusion with Dual-Perspective Prompts for Fine-Grained Lumbar Spine Segmentation
Pith reviewed 2026-05-22 20:58 UTC · model grok-4.3
The pith
ATM-Net fuses anatomy-aware text prompts with images to improve fine-grained lumbar spine segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ATM-Net is an anatomy-aware text-guided multi-modal fusion framework that uses the Anatomy-aware Text Prompt Generator to turn image annotations into prompts across views, the Holistic Anatomy-aware Semantic Fusion module to combine them with image features for comprehensive anatomical context, and the Channel-wise Contrastive Anatomy-Aware Enhancement module to boost class discrimination via multi-modal contrastive learning, resulting in finer segmentation of vertebrae, intervertebral discs, and spinal canal.
What carries the argument
The anatomy-aware text-visual fusion mechanism that converts annotations into prompts and integrates them with image features through dedicated fusion and contrastive modules.
If this is right
- Higher Dice scores and lower boundary errors on datasets like SPIDER and MRSpineSeg.
- Fewer misclassifications among vertebrae, discs, and spinal canal categories.
- More accurate capture of fine segmentation details needed for spinal disorder diagnosis.
- Consistent gains in class discrimination through channel-level contrastive learning.
Where Pith is reading between the lines
- The prompt-generation step could be reused on other bony or soft-tissue structures where annotations exist but semantic context is weak.
- The multi-view prompt strategy might reduce reliance on massive labeled sets by injecting prior anatomical knowledge.
- Extending the fusion to CT or ultrasound data could test whether the same text-visual pairing improves segmentation in mixed-modality clinics.
Load-bearing premise
The method assumes that turning image annotations into anatomy-aware text prompts and fusing them with visual features will add useful context and sharpen discrimination without introducing offsetting errors or biases.
What would settle it
A direct comparison on a held-out MRI dataset where ATM-Net's Dice score and HD95 do not exceed those of the strongest visual baseline such as SpineParseNet would falsify the central performance claim.
Figures
read the original abstract
Accurate lumbar spine segmentation is crucial for diagnosing spinal disorders. Existing methods typically use coarse-grained segmentation strategies that lack the fine detail needed for precise diagnosis. Additionally, their reliance on visual-only models hinders the capture of anatomical semantics, leading to misclassified categories and poor segmentation details. To address these limitations, we present ATM-Net, an innovative framework that employs an anatomy-aware, text-guided, multi-modal fusion mechanism for fine-grained segmentation of lumbar substructures, i.e., vertebrae (VBs), intervertebral discs (IDs), and spinal canal (SC). ATM-Net adopts the Anatomy-aware Text Prompt Generator (ATPG) to adaptively convert image annotations into anatomy-aware prompts in different views. These insights are further integrated with image features via the Holistic Anatomy-aware Semantic Fusion (HASF) module, building a comprehensive anatomical context. The Channel-wise Contrastive Anatomy-Aware Enhancement (CCAE) module further enhances class discrimination and refines segmentation through class-wise channel-level multi-modal contrastive learning. Extensive experiments on the MRSpineSeg and SPIDER datasets demonstrate that ATM-Net significantly outperforms state-of-the-art methods, with consistent improvements regarding class discrimination and segmentation details. For example, ATM-Net achieves Dice of 79.39% and HD95 of 9.91 pixels on SPIDER, outperforming the competitive SpineParseNet by 8.31% and 4.14 pixels, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ATM-Net, a multi-modal architecture for fine-grained lumbar spine segmentation (vertebrae, intervertebral discs, spinal canal) that uses the Anatomy-aware Text Prompt Generator (ATPG) to convert image annotations into dual-perspective anatomy-aware prompts, fuses them with visual features via the Holistic Anatomy-aware Semantic Fusion (HASF) module, and refines class discrimination with the Channel-wise Contrastive Anatomy-Aware Enhancement (CCAE) module. It reports consistent outperformance over prior methods on the MRSpineSeg and SPIDER datasets, including Dice of 79.39% and HD95 of 9.91 pixels on SPIDER (8.31% and 4.14 pixels better than SpineParseNet).
Significance. If the gains are shown to arise from the fusion modules rather than privileged label information, the work would provide evidence that text-visual integration can improve anatomical context and class separation in medical segmentation tasks. The empirical results on two datasets indicate potential clinical utility for more precise spinal disorder diagnosis, though the absence of open code or parameter details limits immediate reproducibility.
major comments (2)
- [Abstract] Abstract: the central claim that ATM-Net 'builds a comprehensive anatomical context' and 'enhances class discrimination' via ATPG, HASF, and CCAE rests on the assumption that prompts are generated without ground-truth segmentation masks. The description of ATPG as converting 'image annotations' into prompts leaves open whether these are training-time labels; if ground-truth masks are used, the 8.31% Dice and 4.14-pixel HD95 gains on SPIDER become non-comparable to visual-only baselines such as SpineParseNet and cannot be attributed to the proposed fusion mechanism.
- [Methods] Methods (ATPG, HASF, CCAE descriptions): no explicit statement clarifies whether anatomy-aware prompts are available at inference time or only during training, nor whether the contrastive learning in CCAE uses paired text-image features derived from labels. This detail is load-bearing for the claim of 'consistent improvements regarding class discrimination' and must be resolved to evaluate the architecture's contribution.
minor comments (2)
- [Abstract] Abstract and Experiments: the manuscript reports aggregate Dice/HD95 but provides no per-class breakdown, statistical significance tests, or ablation isolating ATPG vs. HASF vs. CCAE contributions.
- [Experiments] The paper would benefit from a clear statement on training/inference protocol for the text prompts and release of code or model weights to support verification of the reported metrics.
Simulated Author's Rebuttal
We thank the referee for the careful review and for identifying ambiguities in our description of the prompt generation process. We address each major comment below and will revise the manuscript to improve clarity on training versus inference procedures.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that ATM-Net 'builds a comprehensive anatomical context' and 'enhances class discrimination' via ATPG, HASF, and CCAE rests on the assumption that prompts are generated without ground-truth segmentation masks. The description of ATPG as converting 'image annotations' into prompts leaves open whether these are training-time labels; if ground-truth masks are used, the 8.31% Dice and 4.14-pixel HD95 gains on SPIDER become non-comparable to visual-only baselines such as SpineParseNet and cannot be attributed to the proposed fusion mechanism.
Authors: The referee correctly notes an ambiguity. The ATPG module converts ground-truth segmentation masks (image annotations) into dual-perspective anatomy-aware prompts during training. This enables the HASF and CCAE modules to learn multi-modal fusion that transfers anatomical context into the visual features. At inference the model operates on visual input alone. We will revise the abstract to explicitly state that prompts are generated from ground-truth annotations exclusively at training time. The reported gains are therefore attributable to the improved visual representations learned via the proposed fusion mechanism, preserving comparability with visual-only baselines evaluated under identical inference conditions. revision: yes
-
Referee: [Methods] Methods (ATPG, HASF, CCAE descriptions): no explicit statement clarifies whether anatomy-aware prompts are available at inference time or only during training, nor whether the contrastive learning in CCAE uses paired text-image features derived from labels. This detail is load-bearing for the claim of 'consistent improvements regarding class discrimination' and must be resolved to evaluate the architecture's contribution.
Authors: We agree that the manuscript lacks an explicit statement on this point. Anatomy-aware prompts are generated from ground-truth labels only during training; they are not required at inference. The CCAE module performs class-wise channel-level contrastive learning on paired text-image features derived from labels exclusively during training. We will add a dedicated paragraph in the Methods section (and a corresponding note in the implementation details) that clearly separates the training pipeline (which includes ATPG, HASF, and CCAE) from the inference pipeline (visual input only). This revision will allow readers to evaluate the architecture's contribution without ambiguity. revision: yes
Circularity Check
No circularity: empirical architecture with no derivations or self-referential reductions
full rationale
The paper proposes an empirical neural architecture (ATPG, HASF, CCAE modules) for multi-modal segmentation and reports performance gains on standard datasets. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claims rest on experimental comparisons rather than any reduction of outputs to inputs by construction. This matches the default expectation of no significant circularity for non-derivational ML method papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Publicly available clinical bert embeddings
Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jindi, Tristan Naumann, and Matthew McDermott. Publicly available clinical bert embeddings. In Clinical Nat- ural Language Processing Workshop, 2019. 4
work page 2019
-
[2]
Upasana Upadhyay Bharadwaj, Miranda Christine, Steven Li, Dean Chou, Valentina Pedoia, Thomas M Link, Cyn- thia T Chin, and Sharmila Majumdar. Deep learning for auto- mated, interpretable classification of lumbar spinal stenosis and facet arthropathy from axial mri. European Radiology,
-
[3]
Pengcheng Chen, Ziyan Huang, Zhongying Deng, Tianbin Li, Yanzhou Su, Haoyu Wang, Jin Ye, Yu Qiao, and Junjun He. Enhancing medical task performance in gpt-4v: A com- prehensive study on prompt engineering strategies. arXiv,
-
[4]
Wenting Chen, Jie Liu, Tianming Liu, and Yixuan Yuan. Bi- vlgm: Bi-level class-severity-aware vision-language graph matching for text guided medical image segmentation.IJCV,
-
[5]
A modified bisenet for spinal segmentation
Yunjiao Deng, Feng Gu, Shuai Wang, Daxing Zeng, Junyan Lu, Haitao Liu, Yulei Hou, and Qinghua Zhang. A modified bisenet for spinal segmentation. In ICIRA, 2023. 2, 5, 6
work page 2023
-
[6]
An effective u-net and bisenet complementary network for spine segmentation
Yunjiao Deng, Feng Gu, Daxing Zeng, Junyan Lu, Haitao Liu, Yulei Hou, and Qinghua Zhang. An effective u-net and bisenet complementary network for spine segmentation. Biomedical Signal Processing and Control, 2024. 2, 5, 6
work page 2024
-
[7]
Bert: Pre-training of deep bidirectional trans- formers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. arXiv, 2018. 3
work page 2018
-
[8]
En- coder fusion network with co-attention embedding for refer- ring image segmentation
Guang Feng, Zhiwei Hu, Lihe Zhang, and Huchuan Lu. En- coder fusion network with co-attention embedding for refer- ring image segmentation. In CVPR, 2021. 3
work page 2021
-
[9]
Optimizing prompts for text-to-image generation
Yaru Hao, Zewen Chi, Li Dong, and Furu Wei. Optimizing prompts for text-to-image generation. In NeurIPS, 2024. 3
work page 2024
-
[10]
Unetr: Transformers for 3d med- ical image segmentation
Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger R Roth, and Daguang Xu. Unetr: Transformers for 3d med- ical image segmentation. In WACV, 2022. 6
work page 2022
-
[11]
Siyuan He, Qi Li, Xianda Li, and Mengchao Zhang. Lsw- net: Lightweight deep neural network based on small-world properties for spine mr image segmentation.Journal of Mag- netic Resonance Imaging, 2023. 2
work page 2023
-
[12]
Siyuan He, Qi Li, Xianda Li, and Mengchao Zhang. A lightweight convolutional neural network based on dynamic level-set loss function for spine mr image segmentation. Journal of Magnetic Resonance Imaging, 2024. 2
work page 2024
-
[13]
Jihong Hu, Yinhao Li, Hao Sun, Yu Song, Chujie Zhang, Lanfen Lin, and Yen-Wei Chen. Lga: A language guide adapter for advancing the sam model’s capabilities in medi- cal image segmentation. In MICCAI, 2024. 3
work page 2024
-
[14]
Semi-supervised hybrid spine network for segmentation of spine mr images
Meiyan Huang, Shuoling Zhou, Xiumei Chen, Haoran Lai, and Qianjin Feng. Semi-supervised hybrid spine network for segmentation of spine mr images. CMIG, 2023. 2
work page 2023
-
[15]
Shih-Cheng Huang, Liyue Shen, Matthew P Lungren, and Serena Yeung. Gloria: A multimodal global-local represen- tation learning framework for label-efficient medical image recognition. In ICCV, 2021. 3
work page 2021
-
[16]
nnu-net revisited: A call for rigorous validation in 3d medical image segmentation
Fabian Isensee, Tassilo Wald, Constantin Ulrich, Michael Baumgartner, Saikat Roy, Klaus Maier-Hein, and Paul F Jaeger. nnu-net revisited: A call for rigorous validation in 3d medical image segmentation. In MICCAI, 2024. 6
work page 2024
-
[17]
Diagnosis and management of lumbar spinal stenosis: A review
Jeffrey N Katz, Zoe E Zimmerman, Hanna Mass, and Melvin C Makhni. Diagnosis and management of lumbar spinal stenosis: A review. JAMA, 2022. 2
work page 2022
-
[18]
Restr: Convolution-free referring image segmentation using transformers
Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun Zeng, and Suha Kwak. Restr: Convolution-free referring image segmentation using transformers. In CVPR, 2022. 3
work page 2022
-
[19]
Nebojsa Nick Knezevic, Kenneth D Candido, Johan WS Vlaeyen, Jan Van Zundert, and Steven P Cohen. Low back pain. The Lancet, 2021. 2
work page 2021
-
[20]
Lvit: language meets vision transformer in medical image seg- mentation
Zihan Li, Yunxiang Li, Qingde Li, Puyang Wang, Dazhou Guo, Le Lu, Dakai Jin, You Zhang, and Qingqi Hong. Lvit: language meets vision transformer in medical image seg- mentation. IEEE TMI, 2023. 3
work page 2023
-
[21]
Zhe Li, Laurence T Yang, Bocheng Ren, Xin Nie, Zhangyang Gao, Cheng Tan, and Stan Li. Mlip: Enhancing medical visual representation with divergence encoder and knowledge-guided contrastive learning. In CVPR, 2024. 2
work page 2024
-
[22]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In ICCV,
-
[23]
Gres: Gen- eralized referring expression segmentation
Chang Liu, Henghui Ding, and Xudong Jiang. Gres: Gen- eralized referring expression segmentation. In CVPR, 2023. 3
work page 2023
-
[24]
Poly- former: Referring image segmentation as sequential polygon generation
Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Ku- mar Satzoda, Vijay Mahadevan, and R Manmatha. Poly- former: Referring image segmentation as sequential polygon generation. In CVPR, 2023. 3
work page 2023
-
[25]
A visual-language foun- dation model for computational pathology.Nature Medicine,
Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, et al. A visual-language foun- dation model for computational pathology.Nature Medicine,
-
[26]
Collaborative multi-metadata fusion to improve the classifi- cation of lumbar disc herniation
Shuyi Lu, Jinhua Liu, Xiaojie Wang, and Yuanfeng Zhou. Collaborative multi-metadata fusion to improve the classifi- cation of lumbar disc herniation. IEEE TMI, 2023. 2
work page 2023
-
[27]
Image segmentation using text and image prompts
Timo L ¨uddecke and Alexander Ecker. Image segmentation using text and image prompts. In CVPR, 2022. 3
work page 2022
-
[28]
Lumbar intervertebral disc segmentation for computer modeling and simulation
Rodrigo Matos, Paulo Rui Fernandes, Nuno Matela, and An- dre PG Castro. Lumbar intervertebral disc segmentation for computer modeling and simulation. Computer Methods and Programs in Biomedicine, 2023. 2
work page 2023
-
[29]
V-net: Fully convolutional neural networks for volumetric medical image segmentation
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 3DV, 2016. 5
work page 2016
-
[30]
3d mri brain tumor segmentation using autoencoder regularization
Andriy Myronenko. 3d mri brain tumor segmentation using autoencoder regularization. In MICCAIW, 2019. 6
work page 2019
-
[31]
Anam Nazir, Muhammad Nadeem Cheema, Bin Sheng, Ping Li, Huating Li, Guangtao Xue, Jing Qin, Jinman Kim, and David Dagan Feng. Ecsu-net: an embedded clustering sliced u-net coupled with fusing strategy for efficient intervertebral disc segmentation and classification. IEEE TIP, 2021. 2
work page 2021
-
[32]
Shumao Pang, Chunlan Pang, Lei Zhao, Yangfan Chen, Zhi- hai Su, Yujia Zhou, Meiyan Huang, Wei Yang, Hai Lu, and Qianjin Feng. Spineparsenet: Spine parsing for volumetric mr image by a two-stage segmentation framework with se- mantic image representation. IEEE TMI, 2021. 5, 6
work page 2021
-
[33]
Shumao Pang, Chunlan Pang, Zhihai Su, Liyan Lin, Lei Zhao, Yangfan Chen, Yujia Zhou, Hai Lu, and Qianjin Feng. Dgmsnet: Spine segmentation for mr image by a detection- guided mixed-supervised segmentation network. MedIA,
-
[34]
Per-clip video object segmen- tation
Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, and Joon-Young Lee. Per-clip video object segmen- tation. In CVPR, 2022. 3
work page 2022
-
[35]
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, Bowen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, and Johan W Verjans. Decomposing disease descriptions for enhanced pathology detection: A multi-aspect vision- language pre-training framework. In CVPR, 2024. 2
work page 2024
-
[36]
Learn- ing transferable visual models from natural language super- vision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In ICML, 2021. 3
work page 2021
-
[37]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015. 6
work page 2015
-
[38]
Jhon Jairo S ´aenz-Gamboa, Julio Domenech, Antonio Alonso-Manjarr´es, Jon A G ´omez, and Maria de la Iglesia- Vay´a. Automatic semantic segmentation of the lumbar spine: Clinical applicability in a multi-parametric and multi-center study on magnetic resonance images. Artificial Intelligence in Medicine, 2023. 2
work page 2023
-
[39]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dess `ı, Roberta Raileanu, et al. Toolformer: Language models can teach themselves to use tools. In NeurIPS, 2024. 3
work page 2024
-
[40]
Attention gated networks: Learning to leverage salient re- gions in medical images
Jo Schlemper, Ozan Oktay, Michiel Schaap, Mattias Hein- rich, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention gated networks: Learning to leverage salient re- gions in medical images. MedIA, 2019. 6
work page 2019
-
[41]
Test- time prompt tuning for zero-shot generalization in vision- language models
Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, and Chaowei Xiao. Test- time prompt tuning for zero-shot generalization in vision- language models. In NeurIPS, 2022. 3
work page 2022
-
[42]
Large lan- guage models encode clinical knowledge
Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tan- wani, Heather Cole-Lewis, Stephen Pfohl, et al. Large lan- guage models encode clinical knowledge. Nature, 2023. 3
work page 2023
-
[43]
Self-supervised pre-training of swin trans- formers for 3d media
Yucheng Tang, Dong Yang, Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, and Ali Hatamizadeh. Self-supervised pre-training of swin trans- formers for 3d media. In CVPR, 2022. 3, 6
work page 2022
-
[44]
Ekin Tiu, Ellie Talius, Pujan Patel, Curtis P Langlotz, An- drew Y Ng, and Pranav Rajpurkar. Expert-level detection of pathologies from unannotated chest x-ray images via self- supervised learning. Nature Biomedical Engineering, 2022. 3
work page 2022
-
[45]
Lumbar spine segmentation in mr images: a dataset and a public benchmark
Jasper W van der Graaf, Miranda L van Hooff, Constanti- nus FM Buckens, Matthieu Rutten, Job LC van Susante, Robert Jan Kroeze, Marinus de Kleuver, Bram van Gin- neken, and Nikolas Lessmann. Lumbar spine segmentation in mr images: a dataset and a public benchmark. Scientific Data, 2024. 2, 5
work page 2024
-
[46]
Fakai Wang, Kang Zheng, Le Lu, Jing Xiao, Min Wu, and Shun Miao. Automatic vertebra localization and identifica- tion in ct by spine rectification and anatomically-constrained optimization. In CVPR, 2021. 2
work page 2021
-
[47]
Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In ICML, 2022. 3
work page 2022
-
[48]
Cris: Clip-driven referring image segmentation
Zhaoqing Wang, Yu Lu, Qiang Li, Xunqiang Tao, Yandong Guo, Mingming Gong, and Tongliang Liu. Cris: Clip-driven referring image segmentation. In CVPR, 2022. 3
work page 2022
-
[49]
Surgical versus nonsurgical treatment for lumbar degenerative spondylolisthesis
James N Weinstein, Jon D Lurie, Tor D Tosteson, Brett Hanscom, Anna NA Tosteson, Emily A Blood, Nancy JO Birkmeyer, Alan S Hilibrand, Harry Herkowitz, Frank P Cammisa, et al. Surgical versus nonsurgical treatment for lumbar degenerative spondylolisthesis. NEJM, 2007. 2
work page 2007
-
[50]
Medklip: Medical knowledge enhanced language-image pre-training for x-ray diagnosis
Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. Medklip: Medical knowledge enhanced language-image pre-training for x-ray diagnosis. In ICCV,
-
[51]
Lavt: Language-aware vi- sion transformer for referring image segmentation
Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Heng- shuang Zhao, and Philip HS Torr. Lavt: Language-aware vi- sion transformer for referring image segmentation. InCVPR,
-
[52]
Madapter: A better interaction between image and language for medical image segmentation
Xu Zhang, Bo Ni, Yang Yang, and Lefei Zhang. Madapter: A better interaction between image and language for medical image segmentation. In MICCAI, 2024. 3
work page 2024
-
[53]
Zhiqing Zhang, Tianyong Liu, Guojia Fan, Bin Li, Qian- jin Feng, and Shoujun Zhou. Spinemamba: Enhancing 3d spinal segmentation in clinical imaging through residual vi- sual mamba layers and shape priors. arXiv, 2024. 2
work page 2024
-
[54]
Hua-Dong Zheng, Yue-Li Sun, De-Wei Kong, Meng-Chen Yin, Jiang Chen, Yong-Peng Lin, Xue-Feng Ma, Hong-Shen Wang, Guang-Jie Yuan, Min Yao, et al. Deep learning-based high-accuracy quantitation for lumbar intervertebral disc de- generation from mri. Nature Communications, 2022. 2
work page 2022
-
[55]
Text promptable surgical in- strument segmentation with vision-language models
Zijian Zhou, Oluwatosin Alabi, Meng Wei, Tom Ver- cauteren, and Miaojing Shi. Text promptable surgical in- strument segmentation with vision-language models. In NeurIPS, 2023. 3
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.