pith. sign in

arxiv: 2410.04960 · v5 · submitted 2024-10-07 · 💻 cs.CV

On Efficient Variants of Segment Anything Model: A Survey

Pith reviewed 2026-05-23 19:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords Segment Anything Modelefficient variantsimage segmentationmodel accelerationsurveyedge deploymentbenchmark evaluationcomputational efficiency
0
0 comments X

The pith

This survey reviews acceleration strategies for the Segment Anything Model and benchmarks their efficiency-accuracy trade-offs on multiple hardware platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a first comprehensive review of efficient variants of the Segment Anything Model, a foundational image segmentation tool whose original version requires heavy computation. It covers motivations for efficiency work, core SAM and acceleration techniques, then organizes acceleration approaches into categories while outlining future directions. The survey concludes with a single unified evaluation of the variants on representative benchmarks across varied hardware, directly comparing their speed, resource use, and accuracy.

Core claim

The survey claims that categorizing SAM acceleration methods by approach, combined with a standardized cross-hardware evaluation, reveals clear performance differences among variants and identifies viable paths for deploying accurate segmentation on resource-limited devices.

What carries the argument

Categorization of acceleration strategies by approach, paired with unified benchmark evaluation across hardware.

If this is right

  • Developers gain a direct comparison to select variants suited to edge or mobile hardware.
  • Research can prioritize the future directions the survey identifies for further gains.
  • Benchmark results establish baseline numbers for new efficiency proposals to beat.
  • Hardware-specific performance data guides deployment choices in constrained environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The survey's structure could serve as a template for efficiency reviews of other large vision models beyond SAM.
  • If acceleration categories prove stable, they may generalize to future foundation models with similar architectures.
  • Unified evaluations reduce the need for each new paper to re-run all prior variants from scratch.

Load-bearing premise

The review assumes the authors captured all major efficient SAM variants without selection bias and that the chosen benchmarks and hardware are representative of real deployment.

What would settle it

Publication of a new SAM variant that exceeds all reviewed methods in both accuracy and efficiency on the same benchmarks and hardware would indicate the survey missed key approaches or used non-representative tests.

read the original abstract

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as edge devices. To address this, a variety of SAM variants have been proposed to enhance efficiency while keeping accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by a detailed exploration of SAM acceleration strategies, categorized by approach, and a discussion of several future research directions. Finally, we offer a unified and extensive evaluation of these methods across various hardware, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper surveys efficient variants of the Segment Anything Model (SAM), claiming to be the first comprehensive review. It covers motivations for efficiency research, core SAM and acceleration techniques, a categorization of acceleration strategies by approach, future research directions, and a unified evaluation of methods across hardware platforms assessing efficiency and accuracy on representative benchmarks.

Significance. If the coverage is systematic and the evaluation is truly standardized rather than aggregated from inconsistent reports, the survey would provide a useful reference for comparing efficiency-accuracy trade-offs in SAM variants and guiding deployment on edge devices.

major comments (1)
  1. [Abstract, §1] Abstract and §1 (Introduction): The central claims of providing the 'first comprehensive review' and a 'unified and extensive evaluation' across hardware are load-bearing but rest on undocumented processes. No explicit literature search criteria, databases, date ranges, or inclusion/exclusion rules are stated, nor is the protocol for re-implementation or metric standardization described. This leaves both the completeness of variant coverage and the fairness of cross-method comparisons unverifiable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for greater transparency in our methodology. We agree that explicitly documenting the literature search process and evaluation protocol will make the claims of comprehensive coverage and unified benchmarking more verifiable. We will revise the manuscript to include these details.

read point-by-point responses
  1. Referee: [Abstract, §1] Abstract and §1 (Introduction): The central claims of providing the 'first comprehensive review' and a 'unified and extensive evaluation' across hardware are load-bearing but rest on undocumented processes. No explicit literature search criteria, databases, date ranges, or inclusion/exclusion rules are stated, nor is the protocol for re-implementation or metric standardization described. This leaves both the completeness of variant coverage and the fairness of cross-method comparisons unverifiable.

    Authors: We acknowledge that the current manuscript does not describe the literature search protocol or re-implementation details. To address this, we will add a dedicated subsection 'Survey Methodology' in §1 that specifies: (1) databases searched (Google Scholar, arXiv, IEEE Xplore, ACM Digital Library); (2) search keywords and Boolean strings (e.g., 'Segment Anything Model' AND (efficient OR acceleration OR lightweight OR edge)); (3) date range (April 2023 to October 2024, aligned with SAM release); (4) inclusion criteria (papers proposing SAM variants with efficiency improvements, including preprints with code); (5) exclusion criteria (non-English works, surveys without new variants, works not focused on SAM). For the unified evaluation, we will expand §4 and add an appendix describing: re-implementation protocol (use of official repositories where available, otherwise faithful re-coding per paper descriptions with author confirmation where possible), hardware configurations (e.g., NVIDIA A100, RTX 3090, Jetson Orin, CPU-only), input standardization (1024×1024 resolution, batch size 1), and metric reporting (consistent FPS, parameters, mIoU on COCO val, ADE20K). These additions will allow readers to assess completeness and fairness. We maintain that the survey is the first to provide both a categorized taxonomy and cross-hardware benchmarks, but agree the documentation strengthens this position. revision: yes

Circularity Check

0 steps flagged

No circularity: survey paper contains no derivations or predictions

full rationale

This is a literature survey paper whose central claims concern coverage of prior work, categorization of acceleration strategies, and presentation of a unified evaluation. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided abstract or description. The reader's assessment correctly identifies the absence of any derivational chain that could reduce to its own inputs. The skeptic concerns about selection bias and standardization of benchmarks are questions of methodological transparency and potential incompleteness, not circularity under the enumerated patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.). Because the paper makes no load-bearing mathematical claims that collapse by construction, the circularity score is 0 and the steps array is empty.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper. The central claim rests on the assumed completeness and lack of bias in the literature selection and evaluation design rather than on any mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.0 · 5675 in / 1008 out tokens · 27288 ms · 2026-05-23T19:40:39.357352+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

    cs.CV 2026-04 unverdicted novelty 3.0

    This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challe...

Reference graph

Works this paper leans on

227 extracted references · 227 canonical work pages · cited by 1 Pith paper · 15 internal anchors

  1. [1]

    On the Opportunities and Risks of Foundation Models

    Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bern- stein, M.S., Bohg, J., Bosselut, A., Brun- skill, E., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)

  2. [2]

    A Survey of Large Language Models

    Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

  3. [3]

    Wan, Z., Wang, X., Liu, C., Alam, S., Zheng, Y., Liu, J., Qu, Z., Yan, S., Zhu, 26 Table 6: Quantitative results of the accuracy of SegAny task ( mIoU) on COCO and LVIS with points and boxes as prompts. For evaluation with points prompts, we select the center point of the ground truth bounding box ( pt1), and one or three randomly sampled points from grou...

  4. [4]

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakan- tan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learn- ers. In: Advances in Neural Information Processing Systems (2020) 27 Table 8 : Quantitative results of instance segmentation on COCO with YOLOv8 [169] or Ground- dingDINO [224] as object...

  5. [5]

    GPT-4 Technical Report

    Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  6. [6]

    : Palm: Scaling lan- guage modeling with pathways

    Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al. : Palm: Scaling lan- guage modeling with pathways. Journal of Machine Learning Research 24(240), 1–113 (2023)

  7. [7]

    PaLM 2 Technical Report

    Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., et al.: Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023)

  8. [8]

    Touvron, H., Lavril, T., Izacard, G., Mar- tinet, X., Lachaux, M.-A., Lacroix, T., Rozi` ere, B., Goyal, N., Hambro, E., Azhar, 28 Table 10: Quantitative results of zero-shot instance segmentation on SGinW benchmark with Ground- ingDINO as the object detector.We report variants’ Average Precision (AP) on each dataset and mean AP over all 25 datasets. Data...

  9. [9]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

  10. [10]

    In: Interna- tional Conference on Learning Representa- tions (2021)

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Interna- tional Conference on Learning Representa- tions (2021)

  11. [11]

    In: Advances in Neural Informa- tion Processing Systems (2017)

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in Neural Informa- tion Processing Systems (2017)

  12. [12]

    Radford, A., Kim, J.W., Hallacy, C., 29 Table 11 : Quantitative results of zero-shot instance segmentation on UVO benchmark with GroundingDINO as the object detector. Model AP AP S APM APL SAM-H 29.9 10 20.8 44.9 SAMfast-H 29.7 9.9 20.7 44.6 SAM2-B+ 30.9 9.4 21.3 47 FastSAM 20.8 7 14.7 30.1 MobileSAM 25.2 8.2 17.4 38 EdgeSAM 24.9 8.6 17.9 36.4 EfficientSA...

  13. [13]

    In: Advances in Neural Information Processing Systems (2023)

    Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. In: Advances in Neural Information Processing Systems (2023)

  14. [14]

    In: Proceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers) (2024)

    Maaz, M., Rasheed, H., Khan, S., Khan, F.: Video-ChatGPT: Towards detailed video understanding via large vision and language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers) (2024)

  15. [15]

    : Multi- modal foundation models: From specialists to general-purpose assistants

    Li, C., Gan, Z., Yang, Z., Yang, J., Li, L., Wang, L., Gao, J., et al. : Multi- modal foundation models: From specialists to general-purpose assistants. Foundations and Trends ® in Computer Graphics and Vision 16(1-2), 1–214 (2024)

  16. [16]

    : Vision-language pre- training: Basics, recent advances, and future trends

    Gan, Z., Li, L., Li, C., Wang, L., Liu, Z., Gao, J., et al. : Vision-language pre- training: Basics, recent advances, and future trends. Foundations and Trends® in Com- puter Graphics and Vision 14(3–4), 163–352 (2022)

  17. [17]

    In: Advances in Neural Information Processing Systems (2024)

    Zou, X., Yang, J., Zhang, H., Li, F., Li, L., Wang, J., Wang, L., Gao, J., Lee, Y.J.: Seg- ment everything everywhere all at once. In: Advances in Neural Information Processing Systems (2024)

  18. [18]

    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1207–1216

    Tang, Y., Bi, J., Xu, S., Song, L., Liang, S., Wang, T., Zhang, D., An, J., Lin, J., Zhu, R., et al.: Video understanding with large language models: A survey. arXiv preprint arXiv:2312.17432 (2023)

  19. [19]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

  20. [20]

    Nature Communications 15(1), 654 (2024)

    Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications 15(1), 654 (2024)

  21. [21]

    In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2023)

    Chen, T., Zhu, L., Deng, C., Cao, R., Wang, Y., Zhang, S., Li, Z., Sun, L., Zang, Y., Mao, P.: Sam-adapter: Adapting segment anything in underperformed scenes. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2023)

  22. [22]

    arXiv preprint arXiv:2304.09148 (2023)

    Chen, T., Zhu, L., Ding, C., Cao, R., Wang, Y., Li, Z., Sun, L., Mao, P., Zang, Y.: Sam fails to segment anything?–sam- adapter: Adapting sam in underperformed scenes: Camouflage, shadow, medical image segmentation, and more. arXiv preprint arXiv:2304.09148 (2023)

  23. [23]

    In: Medical Imag- ing with Deep Learning, Short Paper Track (2023)

    Wald, T., Roy, S., Koehler, G., Disch, N., Rokuss, M.R., Holzschuh, J., Zimmerer, D., Maier-Hein, K.: SAM.MD: Zero-shot med- ical image segmentation capabilities of the segment anything model. In: Medical Imag- ing with Deep Learning, Short Paper Track (2023)

  24. [24]

    In: Medical Image Segmentation Challenge (2024)

    Le, B.-H., Nguyen-Vu, D.-K., Nguyen-Mau, 30 T.-H., Nguyen, H.-D., Tran, M.-T.: Med- ficientsam: a robust medical segmentation model with optimized inference pipeline for limited clinical settings. In: Medical Image Segmentation Challenge (2024)

  25. [25]

    Application of segment anything model for civil infrastructure defect assessment,

    Ahmadi, M., Lonbar, A.G., Sharifi, A., Beris, A.T., Nouri, M., Javidi, A.S.: Appli- cation of segment anything model for civil infrastructure defect assessment. arXiv preprint arXiv:2304.12600 (2023)

  26. [26]

    arXiv preprint arXiv:2304.14006 (2023)

    Xie, D., Wang, R., Ma, J., Chen, C., Lu, H., Yang, D., Shi, F., Lin, X.: Edit everything: A text-guided generative system for images editing. arXiv preprint arXiv:2304.14006 (2023)

  27. [27]

    In: The Twelfth International Confer- ence on Learning Representations (2024)

    Zhang, R., Jiang, Z., Guo, Z., Yan, S., Pan, J., Dong, H., Qiao, Y., Gao, P., Li, H.: Per- sonalize segment anything model with one shot. In: The Twelfth International Confer- ence on Learning Representations (2024)

  28. [28]

    arXiv preprint arXiv:2304.11968 (2023)

    Yang, J., Gao, M., Li, Z., Gao, S., Wang, F., Zheng, F.: Track anything: Seg- ment anything meets videos. arXiv preprint arXiv:2304.11968 (2023)

  29. [29]

    Lu, Z., Xiao, Z., Bai, J., Xiong, Z., Wang, X.: Can sam boost video super-resolution? arXiv preprint arXiv:2305.06524 (2023)

  30. [30]

    arXiv preprint arXiv:2305.01443 (2023)

    He, H., Zhang, J., Xu, M., Liu, J., Du, B., Tao, D.: Scalable mask annotation for video text spotting. arXiv preprint arXiv:2305.01443 (2023)

  31. [31]

    arXiv preprint arXiv:2304.10261 (2023)

    Shen, Q., Yang, X., Wang, X.: Anything- 3d: Towards single-view anything recon- struction in the wild. arXiv preprint arXiv:2304.10261 (2023)

  32. [32]

    arXiv preprint arXiv:2311.01989 (2023)

    Dong, S., Liu, F., Lin, G.: Leveraging large- scale pretrained vision foundation mod- els for label-efficient 3d point cloud seg- mentation. arXiv preprint arXiv:2311.01989 (2023)

  33. [33]

    In: The Thir- teenth International Conference on Learning Representations (2025)

    Xu, X., Chen, H., Zhao, L., Wang, Z., Zhou, J., Lu, J.: EmbodiedSAM: Online segment any 3d thing in real time. In: The Thir- teenth International Conference on Learning Representations (2025)

  34. [34]

    SAM 2: Segment Anything in Images and Videos

    Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., R¨ adle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)

  35. [35]

    arXiv preprint arXiv:2408.02635 (2024)

    Shen, C., Li, W., Shi, Y., Wang, X.: Inter- active 3d medical image segmentation with sam 2. arXiv preprint arXiv:2408.02635 (2024)

  36. [36]

    arXiv preprint arXiv:2408.06170 (2024)

    Yamagishi, Y., Hanaoka, S., Kikuchi, T., Nakao, T., Nakamura, Y., Nomura, Y., Miki, S., Yoshikawa, T., Abe, O.: Zero-shot 3d seg- mentation of abdominal organs in ct scans using segment anything model 2: Adapting video tracking capabilities for 3d medical imaging. arXiv preprint arXiv:2408.06170 (2024)

  37. [37]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

    Wang, Y., Xu, H., Liu, Y., Li, J., Tang, Y.: Sam2-love: Segment anything model 2 in language-aided audio-visual scenes. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

  38. [38]

    arXiv preprint arXiv:2408.04593 (2024)

    Yu, J., Wang, A., Dong, W., Xu, M., Islam, M., Wang, J., Bai, L., Ren, H.: Sam 2 in robotic surgery: An empirical evalua- tion for robustness and generalization in surgical video segmentation. arXiv preprint arXiv:2408.04593 (2024)

  39. [39]

    arXiv preprint arXiv:2408.12447 (2024)

    Tran, T.: The 2nd solution for lsvos chal- lenge rvos track: Spatial-temporal refine- ment for consistent semantic segmentation. arXiv preprint arXiv:2408.12447 (2024)

  40. [40]

    arXiv preprint arXiv:2408.10469 (2024)

    Liu, X., Zhang, J., Zhang, K., Liu, X., Li, L.: Lsvos challenge 3rd place report: Sam2 and cutie based vos. arXiv preprint arXiv:2408.10469 (2024)

  41. [41]

    arXiv preprint arXiv:2306.12156 (2023) 31

    Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J.: Fast segment anything. arXiv preprint arXiv:2306.12156 (2023) 31

  42. [42]

    arXiv preprint arXiv:2312.06736 (2023)

    Varadarajan, B., Soran, B., Iandola, F., Xiang, X., Xiong, Y., Zhu, C., Krishnamoor- thi, R., Chandra, V.: Squeezesam: User friendly mobile interactive segmentation. arXiv preprint arXiv:2312.06736 (2023)

  43. [43]

    Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

    Zhang, C., Han, D., Qiao, Y., Kim, J.U., Bae, S.-H., Lee, S., Hong, C.S.: Faster seg- ment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289 (2023)

  44. [44]

    In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

    Shu, H., Li, W., Tang, Y., Zhang, Y., Chen, Y., Li, H., Wang, Y., Chen, X.: Tinysam: Pushing the envelope for efficient segment anything model. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

  45. [45]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

    Wang, A., Chen, H., Lin, Z., Han, J., Ding, G.: Repvit: Revisiting mobile cnn from vit perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

  46. [46]

    arXiv preprint arXiv:2312.06660 (2023)

    Zhou, C., Li, X., Loy, C.C., Dai, B.: Edge- sam: Prompt-in-the-loop distillation for on- device deployment of sam. arXiv preprint arXiv:2312.06660 (2023)

  47. [47]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

    Lv, C., Chen, H., Guo, J., Ding, Y., Liu, X.: Ptq4sam: Post-training quantization for segment anything. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

  48. [48]

    In: Advances in Neural Information Processing Systems (2024)

    Chen, Z., Fang, G., Ma, X., Wang, X.: Slim- sam: 0.1% data makes segment anything slim. In: Advances in Neural Information Processing Systems (2024)

  49. [49]

    arXiv preprint arXiv:2306.06211 (2023)

    Zhang, C., Puspitasari, F.D., Zheng, S., Li, C., Qiao, Y., Kang, T., Shan, X., Zhang, C., Qin, C., Rameau, F., et al.: A survey on segment anything model (sam): Vision foun- dation model meets prompt engineering. arXiv preprint arXiv:2306.06211 (2023)

  50. [50]

    arXiv preprint arXiv:2305.08196 (2023)

    Zhang, C., Liu, L., Cui, Y., Huang, G., Lin, W., Yang, Y., Hu, Y.: A compre- hensive survey on segment anything model for vision and beyond. arXiv preprint arXiv:2305.08196 (2023)

  51. [51]

    In: 2023 IEEE International Conference on Bioinfor- matics and Biomedicine (BIBM) (2023)

    Zhang, L., Deng, X., Lu, Y.: Segment any- thing model (sam) for medical image seg- mentation: A preliminary review. In: 2023 IEEE International Conference on Bioinfor- matics and Biomedicine (BIBM) (2023)

  52. [52]

    Computers in Biology and Medicine 171, 108238 (2024)

    Zhang, Y., Shen, Z., Jiao, R.: Segment any- thing model for medical image segmenta- tion: Current applications and future direc- tions. Computers in Biology and Medicine 171, 108238 (2024)

  53. [53]

    arXiv preprint arXiv:2408.08315 (2024)

    Zhang, C., Cui, Y., Lin, W., Huang, G., Rong, Y., Liu, L., Shan, S.: Segment any- thing for videos: A systematic survey. arXiv preprint arXiv:2408.08315 (2024)

  54. [54]

    arXiv preprint arXiv:2408.12889 (2024)

    Zhang, Y., Shen, Z.: Unleashing the potential of sam2 for biomedical images and videos: A survey. arXiv preprint arXiv:2408.12889 (2024)

  55. [55]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

    Papa, L., Russo, P., Amerini, I., Zhou, L.: A survey on efficient vision transformers: algo- rithms, techniques, and performance bench- marking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  56. [56]

    In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)

    Xu, C., McAuley, J.: A survey on model compression and acceleration for pretrained language models. In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)

  57. [57]

    ACM Computing Surveys 55(9), 1–35 (2023)

    Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: A systematic survey of prompt- ing methods in natural language process- ing. ACM Computing Surveys 55(9), 1–35 (2023)

  58. [58]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scal- able vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  59. [59]

    In: Inter- national Conference on Machine Learning (2023)

    Ryali, C., Hu, Y.-T., Bolya, D., Wei, C., Fan, H., Huang, P.-Y., Aggarwal, V., Chowd- hury, A., Poursaeed, O., Hoffman, J., et 32 al.: Hiera: A hierarchical vision transformer without the bells-and-whistles. In: Inter- national Conference on Machine Learning (2023)

  60. [60]

    arXiv preprint arXiv:2304.08506 (2023)

    Hu, C., Xia, T., Ju, S., Li, X.: When sam meets medical images: An investiga- tion of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506 (2023)

  61. [61]

    arXiv preprint arXiv:2304.04738 (2023)

    Mohapatra, S., Gosai, A., Schlaug, G.: Sam vs bet: A comparative study for brain extraction and segmentation of magnetic resonance images using deep learning. arXiv preprint arXiv:2304.04738 (2023)

  62. [62]

    In: IS&T Inter- national Symposium on Electronic Imaging (2025)

    Deng, R., Cui, C., Liu, Q., Yao, T., Reme- dios, L.W., Bao, S., Landman, B.A., Whe- less, L.E., Coburn, L.A., Wilson, K.T., et al.: Segment anything model (sam) for dig- ital pathology: Assess zero-shot segmenta- tion on whole slide imaging. In: IS&T Inter- national Symposium on Electronic Imaging (2025)

  63. [63]

    In: Medical Imaging 2024: Computer-Aided Diagnosis (2024)

    Li, Y., Hu, M., Yang, X.: Polyp-sam: Trans- fer sam for polyp segmentation. In: Medical Imaging 2024: Computer-Aided Diagnosis (2024)

  64. [64]

    arXiv preprint arXiv:2304.13973 (2023)

    Hu, M., Li, Y., Yang, X.: Skinsam: Empowering skin cancer segmentation with segment anything model. arXiv preprint arXiv:2304.13973 (2023)

  65. [65]

    arXiv preprint arXiv:2306.06370 (2023)

    Shaharabany, T., Dahan, A., Giryes, R., Wolf, L.: Autosam: Adapting sam to medical images by overloading the prompt encoder. arXiv preprint arXiv:2306.06370 (2023)

  66. [66]

    In: Proceed- ings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

    Konwer, A., Yang, Z., Bas, E., Xiao, C., Prasanna, P., Bhatia, P., Kass-Hout, T.: Enhancing sam with efficient prompting and preference optimization for semi-supervised medical image segmentation. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

  67. [67]

    arXiv preprint arXiv:2312.06316 (2023)

    Zhang, Y., Cheng, Y., Qi, Y.: Semisam: Exploring sam for enhancing semi- supervised medical image segmentation with extremely limited annotations. arXiv preprint arXiv:2312.06316 (2023)

  68. [68]

    arXiv preprint arXiv:2412.16085 (2024)

    Ma, J., Li, F., Kim, S., Asakereh, R., Le, B.-H., Nguyen-Vu, D.-K., Pfefferle, A., Wei, M., Gao, R., Lyu, D., et al.: Efficient med- sams: Segment anything in medical images on laptop. arXiv preprint arXiv:2412.16085 (2024)

  69. [69]

    In: Medical Image Segmentation Challenge (2024)

    Pfefferle, A., Purucker, L., Hutter, F.: Daft: data-aware fine-tuning of foundation models for efficient and effective medical image seg- mentation. In: Medical Image Segmentation Challenge (2024)

  70. [70]

    In: Medical Image Segmentation Challenge (2024)

    Wei, M., Chen, S., Wu, S., Xu, D.: Rep- medsam: Towards real-time and universal medical image segmentation. In: Medical Image Segmentation Challenge (2024)

  71. [71]

    In: Medical Image Segmen- tation Challenge (2024)

    Gao, R., Lyu, D., Staring, M.: Swin- litemedsam: A lightweight box-based seg- ment anything model for large-scale medical image datasets. In: Medical Image Segmen- tation Challenge (2024)

  72. [72]

    In: Medical Imaging 2025: Ultrasonic Imaging and Tomography (2025)

    Hu, M., Yang, X.: Breastlightsam: a lightweight pipeline for fast and accurate breast cancer diagnosis and tumor segmen- tation. In: Medical Imaging 2025: Ultrasonic Imaging and Tomography (2025)

  73. [73]

    Machine Intelli- gence Research 21, 617–630 (2024)

    Ji, W., Li, J., Bi, Q., Liu, T., Li, W., Cheng, L.: Segment anything is not always perfect: An investigation of sam on differ- ent real-world applications. Machine Intelli- gence Research 21, 617–630 (2024)

  74. [74]

    arXiv preprint arXiv:2304.07764 (2023)

    Giannakis, I., Bhardwaj, A., Sam, L., Leon- tidis, G.: Deep learning universal crater detection using segment anything model (sam). arXiv preprint arXiv:2304.07764 (2023)

  75. [75]

    Sensors 23(18), 7884 (2023)

    Li, Y., Wang, D., Yuan, C., Li, H., Hu, J.: Enhancing agricultural image segmenta- tion with an agricultural segment anything model adapter. Sensors 23(18), 7884 (2023)

  76. [76]

    arXiv preprint arXiv:2305.10724 (2023)

    Cao, Y., Xu, X., Sun, C., Cheng, Y., Du, Z., Gao, L., Shen, W.: Segment any anomaly 33 without training via hybrid prompt regu- larization. arXiv preprint arXiv:2305.10724 (2023)

  77. [77]

    International Journal of Applied Earth Observation and Geoinformation 124, 103540 (2023)

    Osco, L.P., Wu, Q., Lemos, E.L., Gon¸ calves, W.N., Ramos, A.P.M., Li, J., Junior, J.M.: The segment anything model (sam) for remote sensing applications: From zero to one shot. International Journal of Applied Earth Observation and Geoinformation 124, 103540 (2023)

  78. [78]

    In: Advances in Neural Information Processing Systems (2023)

    Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., Zhang, L.: Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In: Advances in Neural Information Processing Systems (2023)

  79. [79]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

    Shan, Z., Liu, Y., Zhou, L., Yan, C., Wang, H., Xie, X.: Ros-sam: High-quality interac- tive segmentation for remote sensing moving object. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

  80. [80]

    arXiv preprint arXiv:2304.06790 (2023)

    Yu, T., Feng, R., Feng, R., Liu, J., Jin, X., Zeng, W., Chen, Z.: Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)

Showing first 80 references.