pith. sign in

arxiv: 2510.08052 · v2 · submitted 2025-10-09 · 💻 cs.CV

RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings for Weakly Supervised Anomaly Detection in Brain MRI Scans

Pith reviewed 2026-05-18 09:03 UTC · model grok-4.3

classification 💻 cs.CV
keywords weakly supervised anomaly detectionbrain MRIspatial attentionpseudo masksregion-aware attentionlocation-based embeddingsanomaly detectionmedical imaging
0
0 comments X

The pith

A two-stage framework generates pseudo weak masks via dual prompt tuning then applies region-aware spatial attention with fixed location-based random embeddings to detect anomalies in brain MRI scans using only slice-level labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a new weakly supervised anomaly detection method for brain MRI that operates without pixel-level annotations. It first creates coarse localization cues through a Discriminative Dual Prompt Tuning process that turns slice-level labels into pseudo weak masks. The second stage feeds these cues into a segmentation network whose region-aware spatial attention mechanism incorporates fixed location-based random embeddings to highlight anomalous areas. This combination delivers better detection accuracy than prior approaches while keeping the total parameter count under eight million. A reader would care because full annotations for medical images are expensive and slow to produce, so reliable weak-supervision techniques could speed up clinical screening.

Core claim

The central claim is that RASALoRE, a two-stage WSAD framework, produces state-of-the-art anomaly detection results on the BraTS20, BraTS21, BraTS23, and MSD datasets by first using Discriminative Dual Prompt Tuning to create pseudo weak masks from slice-level labels and then training a segmentation network whose region-aware spatial attention relies on fixed location-based random embeddings to focus computation on likely anomalous regions, all with a model size below 8 million parameters.

What carries the argument

The region-aware spatial attention mechanism that uses fixed location-based random embeddings to localize anomalous regions by injecting spatial position information without learning additional embedding parameters.

If this is right

  • The approach significantly outperforms existing weakly supervised anomaly detection methods on BraTS20, BraTS21, BraTS23, and MSD datasets.
  • Performance gains are achieved with a model containing fewer than 8 million parameters and lower computational complexity than prior work.
  • Fixed location-based random embeddings enable the attention module to focus on anomalous regions without trainable spatial embeddings.
  • The two-stage pipeline converts slice-level labels into usable coarse masks that support effective segmentation training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Fixed random embeddings may reduce sensitivity to scanner-specific variations compared with fully learned position encodings.
  • The method could be tested on other weakly labeled medical imaging tasks such as liver lesion detection in CT to check transferability.
  • If the pseudo-mask quality holds across different weak-label granularities, the framework might lower annotation costs in new clinical datasets.

Load-bearing premise

The pseudo weak masks produced by Discriminative Dual Prompt Tuning supply sufficiently accurate coarse localization information to train the region-aware spatial attention network effectively.

What would settle it

Replacing the DDPT-generated pseudo masks with random or uniform masks and observing no drop or an increase in anomaly detection performance on the BraTS datasets would falsify the claim that those masks provide useful training cues.

Figures

Figures reproduced from arXiv: 2510.08052 by Balamurugan Palaniappan, Bheeshm Sharma, Karthikeyan Jaganathan.

Figure 1
Figure 1. Figure 1: Overview of Discriminative Dual Prompt Tuning (DDPT) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of RASALoRE Architecture Unlike existing prompt encoders (e.g. MedSAM [20]) where the location embeddings are learnable, our LoRE provide fixed, non-learnable encodings that are independent of dataset-specific biases. The CPPs and their LoRE denoted by Ecpp ∈ R k×d , remain fixed throughout the training as well as testing process, and are shared by all train/test set images. Since our methodology … view at source ↗
Figure 3
Figure 3. Figure 3: (a) Left: Candidate prompt point locations (in blue) overlaid as grid on input image, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Comparison of Predicted Anomaly Mask from Different Methods. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Weakly Supervised Anomaly detection (WSAD) in brain MRI scans is an important challenge useful to obtain quick and accurate detection of brain anomalies when precise pixel-level anomaly annotations are unavailable and only weak labels (e.g., slice-level) are available. In this work, we propose RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings, a novel two-stage WSAD framework. In the first stage, we introduce a Discriminative Dual Prompt Tuning (DDPT) mechanism that generates high-quality pseudo weak masks based on slice-level labels, serving as coarse localization cues. In the second stage, we propose a segmentation network with a region-aware spatial attention mechanism that relies on fixed location-based random embeddings. This design enables the model to effectively focus on anomalous regions. Our approach achieves state-of-the-art anomaly detection performance, significantly outperforming existing WSAD methods while utilizing less than 8 million parameters. Extensive evaluations on the BraTS20, BraTS21, BraTS23, and MSD datasets demonstrate a substantial performance improvement coupled with a significant reduction in computational complexity. Code is available at: https://github.com/BheeshmSharma/RASALoRE-BMVC-2025/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes RASALoRE, a two-stage weakly supervised anomaly detection framework for brain MRI. Stage 1 uses Discriminative Dual Prompt Tuning (DDPT) to produce pseudo weak masks from slice-level labels as coarse localization cues. Stage 2 trains a segmentation network equipped with region-aware spatial attention driven by fixed location-based random embeddings. The authors claim state-of-the-art results on BraTS20/21/23 and MSD datasets while using fewer than 8 million parameters, with code released.

Significance. If the reported gains are robust, the work would be a useful addition to medical-image WSAD by demonstrating competitive performance at low parameter count. The public code link aids reproducibility. The two-stage design directly targets the practical constraint of slice-level labels only.

major comments (2)
  1. [§3.2] §3.2 (DDPT pseudo-mask generation): the central claim that these masks supply sufficiently accurate coarse localization cues for the region-aware attention network is load-bearing, yet the manuscript reports no Dice or IoU overlap of the generated masks against the pixel-level expert segmentations available on BraTS. Without this metric, it remains unclear whether the second-stage improvements arise from reliable spatial signals or from dataset-specific noise patterns.
  2. [§4] §4 (experimental evaluation): the SOTA claim and the assertion of 'substantial performance improvement' are presented without statistical significance tests (e.g., paired t-tests or Wilcoxon tests) or confidence intervals on the reported metrics across the four datasets. This weakens the strength of the cross-method comparison.
minor comments (2)
  1. [Abstract] The abstract and §1 could explicitly list the primary evaluation metrics (AUC, Dice, etc.) used to declare SOTA rather than using the generic phrase 'anomaly detection performance'.
  2. [§3.3] Notation for the location-based random embeddings (e.g., how the fixed embeddings are sampled and injected) is introduced without a compact equation; adding one would improve clarity for readers reproducing the attention module.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive feedback on our manuscript. We address each of the major comments below, indicating the revisions we plan to make to improve the clarity and rigor of the work.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (DDPT pseudo-mask generation): the central claim that these masks supply sufficiently accurate coarse localization cues for the region-aware attention network is load-bearing, yet the manuscript reports no Dice or IoU overlap of the generated masks against the pixel-level expert segmentations available on BraTS. Without this metric, it remains unclear whether the second-stage improvements arise from reliable spatial signals or from dataset-specific noise patterns.

    Authors: We agree that providing quantitative measures of the pseudo-mask quality would help substantiate the role of the DDPT-generated cues. Although the pseudo masks are intended as coarse localization signals derived from slice-level labels rather than precise annotations, evaluating their overlap with expert segmentations can clarify their contribution. In the revised manuscript, we will report Dice and IoU scores for the pseudo masks on the BraTS datasets to address this point. revision: yes

  2. Referee: [§4] §4 (experimental evaluation): the SOTA claim and the assertion of 'substantial performance improvement' are presented without statistical significance tests (e.g., paired t-tests or Wilcoxon tests) or confidence intervals on the reported metrics across the four datasets. This weakens the strength of the cross-method comparison.

    Authors: We acknowledge that including statistical analysis would strengthen the experimental claims. To provide a more rigorous comparison, we will incorporate statistical significance tests such as paired t-tests or Wilcoxon signed-rank tests, along with confidence intervals, for the key metrics across the BraTS20, BraTS21, BraTS23, and MSD datasets in the revised version of the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical two-stage pipeline is self-contained

full rationale

The manuscript describes an empirical construction: a first-stage Discriminative Dual Prompt Tuning (DDPT) module that produces pseudo weak masks from slice-level labels, followed by a second-stage segmentation network whose region-aware spatial attention is driven by fixed location-based random embeddings. All performance claims rest on standard supervised training and evaluation against external benchmarks (BraTS20/21/23 and MSD datasets) rather than any closed mathematical derivation. No equation reduces a claimed prediction to a fitted parameter drawn from the same data, no uniqueness theorem is imported from prior self-work, and no ansatz is smuggled via self-citation. The central result is therefore an observable empirical outcome, not a quantity defined by construction from its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on standard supervised learning assumptions plus two custom mechanisms introduced in the paper; no explicit free parameters beyond typical training hyperparameters are described in the abstract.

invented entities (1)
  • Region Aware Spatial Attention with Location-based Random Embeddings no independent evidence
    purpose: To enable the segmentation network to focus on anomalous regions using fixed random location cues
    New architectural component proposed to replace learned positional encodings while keeping parameter count low

pith-pipeline@v0.9.0 · 5770 in / 1159 out tokens · 53900 ms · 2026-05-18T09:03:57.083272+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 4 internal anchors

  1. [1]

    Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M

    Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, Annette Kopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, ...

  2. [2]

    Weakly supervised object localization via transformer with implicit spatial calibration

    Haotian Bai, Ruimao Zhang, Jiong Wang, and Xiang Wan. Weakly supervised object localization via transformer with implicit spatial calibration. InEuropean Conference on Computer Vision, pages 612–628. Springer, 2022

  3. [3]

    The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

    Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C Kitamura, Sarthak Pati, et al. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification.arXiv preprint arXiv:2107.02314, 2021

  4. [4]

    Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features.Scientific data, 4(1):1–13, 2017

    Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello, Martin Rozycki, Justin S Kirby, John B Freymann, Keyvan Farahani, and Christos Davatzikos. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features.Scientific data, 4(1):1–13, 2017

  5. [5]

    Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge.arXiv preprint arXiv:18...

  6. [6]

    Autoencoders for Unsupervised Anomaly Segmentation in Brain MR Images: A Comparative Study, 2020

    Christoph Baur, Stefan Denner, Benedikt Wiestler, Shadi Albarqouni, and Nassir Navab. Autoencoders for Unsupervised Anomaly Segmentation in Brain MR Images: A Comparative Study, 2020. URLhttps://arxiv.org/abs/2004.03271

  7. [7]

    Guided Reconstruction with Conditioned Diffusion Models for Unsupervised Anomaly Detection in Brain MRIs.arXiv preprint arXiv:2312.04215, 2023

    Finn Behrendt, Debayan Bhattacharya, Robin Mieling, Lennart Maack, Julia Krüger, Roland Opfer, and Alexander Schlaefer. Guided Reconstruction with Conditioned Diffusion Models for Unsupervised Anomaly Detection in Brain MRIs.arXiv preprint arXiv:2312.04215, 2023. 12B. SHARMA, K. JAGANA THAN, B. PALANIAPPAN: RASALORE

  8. [8]

    Patched diffusion models for unsupervised anomaly detection in brain MRI

    Finn Behrendt, Debayan Bhattacharya, Julia Krüger, Roland Opfer, and Alexander Schlaefer. Patched diffusion models for unsupervised anomaly detection in brain MRI. InMedical Imaging with Deep Learning, pages 1019–1032. PMLR, 2024

  9. [9]

    AnoFPDM: Anomaly Detection with Forward Process of Diffusion Models for Brain MRI

    Yiming Che, Fazle Rafsani, Jay Shah, Md Mahfuzur Rahman Siddiquee, and Teresa Wu. AnoFPDM: Anomaly Detection with Forward Process of Diffusion Models for Brain MRI. InProceedings of the Winter Conference on Applications of Computer Vision, pages 1113–1122, 2025

  10. [10]

    Ame-cam: Attentive multiple-exit cam for weakly supervised segmentation on mri brain tumor

    Yu-Jen Chen, Xinrong Hu, Yiyu Shi, and Tsung-Yi Ho. Ame-cam: Attentive multiple-exit cam for weakly supervised segmentation on mri brain tumor. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 173–182. Springer, 2023

  11. [11]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

  12. [12]

    Ts-cam: Token semantic coupled attention map for weakly supervised object localization

    Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, and Qixiang Ye. Ts-cam: Token semantic coupled attention map for weakly supervised object localization. InProceedings of the IEEE/CVF international conference on computer vision, pages 2886–2895, 2021

  13. [13]

    Denoising Diffusion Probabilistic Models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Diffusion Probabilistic Models. NeurIPS, 2020

  14. [14]

    Unsupervised anomaly detection in medical images using masked diffusion model

    Hasan Iqbal, Umar Khalid, Chen Chen, and Jing Hua. Unsupervised anomaly detection in medical images using masked diffusion model. InInternational Workshop on Machine Learning in Medical Imaging, pages 372–381. Springer, 2023

  15. [15]

    Visual prompt tuning

    Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean Conference on Computer Vision, pages 709–727. Springer, 2022

  16. [16]

    PhD thesis, University of Glasgow, 2023

    Antanas Kascenas.Anomaly Detection in Brain Imaging. PhD thesis, University of Glasgow, 2023

  17. [17]

    Denoising autoencoders for unsupervised anomaly detection in brain MRI

    Antanas Kascenas, Nicolas Pugeault, and Alison Q O’Neil. Denoising autoencoders for unsupervised anomaly detection in brain MRI. InMedical Imaging with Deep Learning,

  18. [18]

    URLhttps://openreview.net/forum?id=Bm8-t_ggzPD

  19. [19]

    Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Shuv...

  20. [20]

    URLhttps://arxiv.org/abs/2305.17033

  21. [21]

    Bridging the gap between classification and localization for weakly supervised object localization

    Eunji Kim, Siwon Kim, Jungbeom Lee, Hyunwoo Kim, and Sungroh Yoon. Bridging the gap between classification and localization for weakly supervised object localization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14258–14267, 2022

  22. [22]

    InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 4015–4026 (2023)

    Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature Communications, 15(1), January 2024. ISSN 2041-1723. doi: 10.1038/s41467-024-44824-z. URL http://dx.doi.org/10. 1038/s41467-024-44824-z

  23. [23]

    Anomaly detection through latent space restoration using vector-quantized variational autoencoders, 2020

    Sergio Naval Marimont and Giacomo Tarroni. Anomaly detection through latent space restoration using vector-quantized variational autoencoders, 2020. URL https: //arxiv.org/abs/2012.06765

  24. [24]

    The multimodal brain tumor image segmentation benchmark (BRATS).IEEE transactions on medical imaging, 34(10):1993–2024, 2014

    Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The multimodal brain tumor image segmentation benchmark (BRATS).IEEE transactions on medical imaging, 34(10):1993–2024, 2014

  25. [25]

    Jorge Cardoso

    Walter Hugo Lopez Pinaya, Petru-Daniel Tudosiu, Robert Gray, Geraint Rees, Parashkev Nachev, Sebastien Ourselin, and M. Jorge Cardoso. Unsupervised Brain Anomaly Detection and Segmentation with Transformers, 2021. URL https://arxiv.org/ abs/2102.11650

  26. [26]

    PhD Thesis, INSA Lyon, 2024

    Nicolas Pinon.Unsupervised anomaly detection in neuroimaging: Contributions to representation learning and density support estimation in the latent space. PhD Thesis, INSA Lyon, 2024

  27. [27]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, pages 8748–8763. PMLR, 2021

  28. [28]

    Ano-swinMAE: Unsupervised Anomaly Detection in Brain MRI using swin Transformer based Masked Auto Encoder

    Kumari Rashmi, Ayantika Das, NagaGayathri Matcha, Keerthi Ram, and Mohanasankar Sivaprakasam. Ano-swinMAE: Unsupervised Anomaly Detection in Brain MRI using swin Transformer based Masked Auto Encoder. InMedical Imaging with Deep Learning,

  29. [29]

    URLhttps://openreview.net/forum?id=4uqpqIoQVA

  30. [30]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation.CoRR, abs/1505.04597, 2015. URL http: //arxiv.org/abs/1505.04597

  31. [31]

    Lagan: lesion-aware generative adversarial networks for edema area 14B

    Yuhui Tao, Xiao Ma, Yizhe Zhang, Kun Huang, Zexuan Ji, Wen Fan, Songtao Yuan, and Qiang Chen. Lagan: lesion-aware generative adversarial networks for edema area 14B. SHARMA, K. JAGANA THAN, B. PALANIAPPAN: RASALORE segmentation in sd-oct images.IEEE Journal of Biomedical and Health Informatics, 27 (5):2432–2443, 2023

  32. [32]

    Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

  33. [33]

    Ken C. L. Wong, Mehdi Moradi, Hui Tang, and Tanveer Syeda-Mahmood. 3d segmentation with exponential logarithmic loss for highly unbalanced object sizes. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part III, page 612–619, 2018

  34. [34]

    Unsupervised feature learning via non-parametric instance discrimination

    Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 3733–3742, 2018

  35. [35]

    A weakly supervised and globally explainable learning framework for brain tumor segmentation

    Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, and Yunpeng Cai. A weakly supervised and globally explainable learning framework for brain tumor segmentation. In2024 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2024

  36. [36]

    Dual modality prompt tuning for vision-language pre-trained model

    Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guoqiang Liang, Peng Wang, and Yanning Zhang. Dual modality prompt tuning for vision-language pre-trained model. IEEE Transactions on Multimedia, 2023

  37. [37]

    Yoo, Khashayar Namdar, Matthias W

    Jay J. Yoo, Khashayar Namdar, Matthias W. Wagner, Kristen W. Yeom, Liana F. Nobre, Uri Tabori, Cynthia Hawkins, Birgit B. Ertl-Wagner, Farzad Khalvati, et al. Generative ai for weakly supervised segmentation and downstream classification of brain tumors on mr images.Scientific Reports, 15, 2025

  38. [38]

    Learning deep features for discriminative localization

    Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

  39. [39]

    Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.International Journal of Computer Vision, 130(9):2337–2348, 2022

  40. [40]

    arXiv preprint arXiv:2408.00874 (2024)

    Jiayuan Zhu, Abdullah Hamdi, Yunli Qi, Yueming Jin, and Junde Wu. Medical SAM 2: Segment medical images as video via Segment Anything Model 2, 2024. URL https://arxiv.org/abs/2408.00874