On Efficient Variants of Segment Anything Model: A Survey

Heng Tao Shen; Jun Liu; Ping Hu; Xiaofeng Zhu; Xiaorui Sun

arxiv: 2410.04960 · v5 · submitted 2024-10-07 · 💻 cs.CV

On Efficient Variants of Segment Anything Model: A Survey

Xiaorui Sun , Jun Liu , Heng Tao Shen , Xiaofeng Zhu , Ping Hu This is my paper

Pith reviewed 2026-05-23 19:40 UTC · model grok-4.3

classification 💻 cs.CV

keywords Segment Anything Modelefficient variantsimage segmentationmodel accelerationsurveyedge deploymentbenchmark evaluationcomputational efficiency

0 comments

The pith

This survey reviews acceleration strategies for the Segment Anything Model and benchmarks their efficiency-accuracy trade-offs on multiple hardware platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a first comprehensive review of efficient variants of the Segment Anything Model, a foundational image segmentation tool whose original version requires heavy computation. It covers motivations for efficiency work, core SAM and acceleration techniques, then organizes acceleration approaches into categories while outlining future directions. The survey concludes with a single unified evaluation of the variants on representative benchmarks across varied hardware, directly comparing their speed, resource use, and accuracy.

Core claim

The survey claims that categorizing SAM acceleration methods by approach, combined with a standardized cross-hardware evaluation, reveals clear performance differences among variants and identifies viable paths for deploying accurate segmentation on resource-limited devices.

What carries the argument

Categorization of acceleration strategies by approach, paired with unified benchmark evaluation across hardware.

If this is right

Developers gain a direct comparison to select variants suited to edge or mobile hardware.
Research can prioritize the future directions the survey identifies for further gains.
Benchmark results establish baseline numbers for new efficiency proposals to beat.
Hardware-specific performance data guides deployment choices in constrained environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The survey's structure could serve as a template for efficiency reviews of other large vision models beyond SAM.
If acceleration categories prove stable, they may generalize to future foundation models with similar architectures.
Unified evaluations reduce the need for each new paper to re-run all prior variants from scratch.

Load-bearing premise

The review assumes the authors captured all major efficient SAM variants without selection bias and that the chosen benchmarks and hardware are representative of real deployment.

What would settle it

Publication of a new SAM variant that exceeds all reviewed methods in both accuracy and efficiency on the same benchmarks and hardware would indicate the survey missed key approaches or used non-representative tests.

read the original abstract

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as edge devices. To address this, a variety of SAM variants have been proposed to enhance efficiency while keeping accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by a detailed exploration of SAM acceleration strategies, categorized by approach, and a discussion of several future research directions. Finally, we offer a unified and extensive evaluation of these methods across various hardware, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A survey that organizes existing efficient SAM variants and attempts a unified hardware evaluation, but its claims rest on selection and standardization steps that the abstract leaves opaque.

read the letter

This survey gathers papers on making the Segment Anything Model smaller and faster for edge use. It covers motivations, basic SAM components, acceleration categories, future directions, and runs what it calls a unified evaluation across hardware and benchmarks. That last part is the piece that could save someone time when picking a model for deployment. The categorization of techniques is straightforward and the attempt to put numbers side by side on the same tasks is the main practical contribution. The paper does not introduce new methods or theory; it compiles and compares what already exists. The soft spots sit in the two central claims. The abstract states it is the first comprehensive review but supplies no search protocol, inclusion rules, or date cutoff, so it is impossible to tell whether coverage is systematic or whether some variants were omitted. The unified evaluation is described only at a high level; without details on whether models were re-run under controlled conditions or whether reported numbers were simply collected, the performance tables are hard to interpret. Minor gaps in citation of very recent preprints would be easy to fix, but the lack of transparency on selection and standardization is more material. This paper is mainly for engineers or students who need an overview of the efficient-SAM space rather than for researchers seeking new technical results. A referee could check the methodology sections and the experimental protocol, so the work deserves peer review.

Referee Report

1 major / 0 minor

Summary. The paper surveys efficient variants of the Segment Anything Model (SAM), claiming to be the first comprehensive review. It covers motivations for efficiency research, core SAM and acceleration techniques, a categorization of acceleration strategies by approach, future research directions, and a unified evaluation of methods across hardware platforms assessing efficiency and accuracy on representative benchmarks.

Significance. If the coverage is systematic and the evaluation is truly standardized rather than aggregated from inconsistent reports, the survey would provide a useful reference for comparing efficiency-accuracy trade-offs in SAM variants and guiding deployment on edge devices.

major comments (1)

[Abstract, §1] Abstract and §1 (Introduction): The central claims of providing the 'first comprehensive review' and a 'unified and extensive evaluation' across hardware are load-bearing but rest on undocumented processes. No explicit literature search criteria, databases, date ranges, or inclusion/exclusion rules are stated, nor is the protocol for re-implementation or metric standardization described. This leaves both the completeness of variant coverage and the fairness of cross-method comparisons unverifiable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for greater transparency in our methodology. We agree that explicitly documenting the literature search process and evaluation protocol will make the claims of comprehensive coverage and unified benchmarking more verifiable. We will revise the manuscript to include these details.

read point-by-point responses

Referee: [Abstract, §1] Abstract and §1 (Introduction): The central claims of providing the 'first comprehensive review' and a 'unified and extensive evaluation' across hardware are load-bearing but rest on undocumented processes. No explicit literature search criteria, databases, date ranges, or inclusion/exclusion rules are stated, nor is the protocol for re-implementation or metric standardization described. This leaves both the completeness of variant coverage and the fairness of cross-method comparisons unverifiable.

Authors: We acknowledge that the current manuscript does not describe the literature search protocol or re-implementation details. To address this, we will add a dedicated subsection 'Survey Methodology' in §1 that specifies: (1) databases searched (Google Scholar, arXiv, IEEE Xplore, ACM Digital Library); (2) search keywords and Boolean strings (e.g., 'Segment Anything Model' AND (efficient OR acceleration OR lightweight OR edge)); (3) date range (April 2023 to October 2024, aligned with SAM release); (4) inclusion criteria (papers proposing SAM variants with efficiency improvements, including preprints with code); (5) exclusion criteria (non-English works, surveys without new variants, works not focused on SAM). For the unified evaluation, we will expand §4 and add an appendix describing: re-implementation protocol (use of official repositories where available, otherwise faithful re-coding per paper descriptions with author confirmation where possible), hardware configurations (e.g., NVIDIA A100, RTX 3090, Jetson Orin, CPU-only), input standardization (1024×1024 resolution, batch size 1), and metric reporting (consistent FPS, parameters, mIoU on COCO val, ADE20K). These additions will allow readers to assess completeness and fairness. We maintain that the survey is the first to provide both a categorized taxonomy and cross-hardware benchmarks, but agree the documentation strengthens this position. revision: yes

Circularity Check

0 steps flagged

No circularity: survey paper contains no derivations or predictions

full rationale

This is a literature survey paper whose central claims concern coverage of prior work, categorization of acceleration strategies, and presentation of a unified evaluation. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided abstract or description. The reader's assessment correctly identifies the absence of any derivational chain that could reduce to its own inputs. The skeptic concerns about selection bias and standardization of benchmarks are questions of methodological transparency and potential incompleteness, not circularity under the enumerated patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.). Because the paper makes no load-bearing mathematical claims that collapse by construction, the circularity score is 0 and the steps array is empty.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper. The central claim rests on the assumed completeness and lack of bias in the literature selection and evaluation design rather than on any mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.0 · 5675 in / 1008 out tokens · 27288 ms · 2026-05-23T19:40:39.357352+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation
cs.CV 2026-04 unverdicted novelty 3.0

This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challe...

Reference graph

Works this paper leans on

227 extracted references · 227 canonical work pages · cited by 1 Pith paper · 15 internal anchors

[1]

On the Opportunities and Risks of Foundation Models

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bern- stein, M.S., Bohg, J., Bosselut, A., Brun- skill, E., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

A Survey of Large Language Models

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Wan, Z., Wang, X., Liu, C., Alam, S., Zheng, Y., Liu, J., Qu, Z., Yan, S., Zhu, 26 Table 6: Quantitative results of the accuracy of SegAny task ( mIoU) on COCO and LVIS with points and boxes as prompts. For evaluation with points prompts, we select the center point of the ground truth bounding box ( pt1), and one or three randomly sampled points from grou...

work page 2024
[4]

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakan- tan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learn- ers. In: Advances in Neural Information Processing Systems (2020) 27 Table 8 : Quantitative results of instance segmentation on COCO with YOLOv8 [169] or Ground- dingDINO [224] as object...

work page 2020
[5]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

: Palm: Scaling lan- guage modeling with pathways

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al. : Palm: Scaling lan- guage modeling with pathways. Journal of Machine Learning Research 24(240), 1–113 (2023)

work page 2023
[7]

PaLM 2 Technical Report

Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., et al.: Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Touvron, H., Lavril, T., Izacard, G., Mar- tinet, X., Lachaux, M.-A., Lacroix, T., Rozi` ere, B., Goyal, N., Hambro, E., Azhar, 28 Table 10: Quantitative results of zero-shot instance segmentation on SGinW benchmark with Ground- ingDINO as the object detector.We report variants’ Average Precision (AP) on each dataset and mean AP over all 25 datasets. Data...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

In: Interna- tional Conference on Learning Representa- tions (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Interna- tional Conference on Learning Representa- tions (2021)

work page 2021
[11]

In: Advances in Neural Informa- tion Processing Systems (2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in Neural Informa- tion Processing Systems (2017)

work page 2017
[12]

Radford, A., Kim, J.W., Hallacy, C., 29 Table 11 : Quantitative results of zero-shot instance segmentation on UVO benchmark with GroundingDINO as the object detector. Model AP AP S APM APL SAM-H 29.9 10 20.8 44.9 SAMfast-H 29.7 9.9 20.7 44.6 SAM2-B+ 30.9 9.4 21.3 47 FastSAM 20.8 7 14.7 30.1 MobileSAM 25.2 8.2 17.4 38 EdgeSAM 24.9 8.6 17.9 36.4 EfficientSA...

work page 2021
[13]

In: Advances in Neural Information Processing Systems (2023)

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. In: Advances in Neural Information Processing Systems (2023)

work page 2023
[14]

In: Proceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers) (2024)

Maaz, M., Rasheed, H., Khan, S., Khan, F.: Video-ChatGPT: Towards detailed video understanding via large vision and language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers) (2024)

work page 2024
[15]

: Multi- modal foundation models: From specialists to general-purpose assistants

Li, C., Gan, Z., Yang, Z., Yang, J., Li, L., Wang, L., Gao, J., et al. : Multi- modal foundation models: From specialists to general-purpose assistants. Foundations and Trends ® in Computer Graphics and Vision 16(1-2), 1–214 (2024)

work page 2024
[16]

: Vision-language pre- training: Basics, recent advances, and future trends

Gan, Z., Li, L., Li, C., Wang, L., Liu, Z., Gao, J., et al. : Vision-language pre- training: Basics, recent advances, and future trends. Foundations and Trends® in Com- puter Graphics and Vision 14(3–4), 163–352 (2022)

work page 2022
[17]

In: Advances in Neural Information Processing Systems (2024)

Zou, X., Yang, J., Zhang, H., Li, F., Li, L., Wang, J., Wang, L., Gao, J., Lee, Y.J.: Seg- ment everything everywhere all at once. In: Advances in Neural Information Processing Systems (2024)

work page 2024
[18]

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1207–1216

Tang, Y., Bi, J., Xu, S., Song, L., Liang, S., Wang, T., Zhang, D., An, J., Lin, J., Zhu, R., et al.: Video understanding with large language models: A survey. arXiv preprint arXiv:2312.17432 (2023)

work page arXiv 2023
[19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023
[20]

Nature Communications 15(1), 654 (2024)

Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications 15(1), 654 (2024)

work page 2024
[21]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Chen, T., Zhu, L., Deng, C., Cao, R., Wang, Y., Zhang, S., Li, Z., Sun, L., Zang, Y., Mao, P.: Sam-adapter: Adapting segment anything in underperformed scenes. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023
[22]

arXiv preprint arXiv:2304.09148 (2023)

Chen, T., Zhu, L., Ding, C., Cao, R., Wang, Y., Li, Z., Sun, L., Mao, P., Zang, Y.: Sam fails to segment anything?–sam- adapter: Adapting sam in underperformed scenes: Camouflage, shadow, medical image segmentation, and more. arXiv preprint arXiv:2304.09148 (2023)

work page arXiv 2023
[23]

In: Medical Imag- ing with Deep Learning, Short Paper Track (2023)

Wald, T., Roy, S., Koehler, G., Disch, N., Rokuss, M.R., Holzschuh, J., Zimmerer, D., Maier-Hein, K.: SAM.MD: Zero-shot med- ical image segmentation capabilities of the segment anything model. In: Medical Imag- ing with Deep Learning, Short Paper Track (2023)

work page 2023
[24]

In: Medical Image Segmentation Challenge (2024)

Le, B.-H., Nguyen-Vu, D.-K., Nguyen-Mau, 30 T.-H., Nguyen, H.-D., Tran, M.-T.: Med- ficientsam: a robust medical segmentation model with optimized inference pipeline for limited clinical settings. In: Medical Image Segmentation Challenge (2024)

work page 2024
[25]

Application of segment anything model for civil infrastructure defect assessment,

Ahmadi, M., Lonbar, A.G., Sharifi, A., Beris, A.T., Nouri, M., Javidi, A.S.: Appli- cation of segment anything model for civil infrastructure defect assessment. arXiv preprint arXiv:2304.12600 (2023)

work page arXiv 2023
[26]

arXiv preprint arXiv:2304.14006 (2023)

Xie, D., Wang, R., Ma, J., Chen, C., Lu, H., Yang, D., Shi, F., Lin, X.: Edit everything: A text-guided generative system for images editing. arXiv preprint arXiv:2304.14006 (2023)

work page arXiv 2023
[27]

In: The Twelfth International Confer- ence on Learning Representations (2024)

Zhang, R., Jiang, Z., Guo, Z., Yan, S., Pan, J., Dong, H., Qiao, Y., Gao, P., Li, H.: Per- sonalize segment anything model with one shot. In: The Twelfth International Confer- ence on Learning Representations (2024)

work page 2024
[28]

arXiv preprint arXiv:2304.11968 (2023)

Yang, J., Gao, M., Li, Z., Gao, S., Wang, F., Zheng, F.: Track anything: Seg- ment anything meets videos. arXiv preprint arXiv:2304.11968 (2023)

work page arXiv 2023
[29]

Lu, Z., Xiao, Z., Bai, J., Xiong, Z., Wang, X.: Can sam boost video super-resolution? arXiv preprint arXiv:2305.06524 (2023)

work page arXiv 2023
[30]

arXiv preprint arXiv:2305.01443 (2023)

He, H., Zhang, J., Xu, M., Liu, J., Du, B., Tao, D.: Scalable mask annotation for video text spotting. arXiv preprint arXiv:2305.01443 (2023)

work page arXiv 2023
[31]

arXiv preprint arXiv:2304.10261 (2023)

Shen, Q., Yang, X., Wang, X.: Anything- 3d: Towards single-view anything recon- struction in the wild. arXiv preprint arXiv:2304.10261 (2023)

work page arXiv 2023
[32]

arXiv preprint arXiv:2311.01989 (2023)

Dong, S., Liu, F., Lin, G.: Leveraging large- scale pretrained vision foundation mod- els for label-efficient 3d point cloud seg- mentation. arXiv preprint arXiv:2311.01989 (2023)

work page arXiv 2023
[33]

In: The Thir- teenth International Conference on Learning Representations (2025)

Xu, X., Chen, H., Zhao, L., Wang, Z., Zhou, J., Lu, J.: EmbodiedSAM: Online segment any 3d thing in real time. In: The Thir- teenth International Conference on Learning Representations (2025)

work page 2025
[34]

SAM 2: Segment Anything in Images and Videos

Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., R¨ adle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

arXiv preprint arXiv:2408.02635 (2024)

Shen, C., Li, W., Shi, Y., Wang, X.: Inter- active 3d medical image segmentation with sam 2. arXiv preprint arXiv:2408.02635 (2024)

work page arXiv 2024
[36]

arXiv preprint arXiv:2408.06170 (2024)

Yamagishi, Y., Hanaoka, S., Kikuchi, T., Nakao, T., Nakamura, Y., Nomura, Y., Miki, S., Yoshikawa, T., Abe, O.: Zero-shot 3d seg- mentation of abdominal organs in ct scans using segment anything model 2: Adapting video tracking capabilities for 3d medical imaging. arXiv preprint arXiv:2408.06170 (2024)

work page arXiv 2024
[37]

In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

Wang, Y., Xu, H., Liu, Y., Li, J., Tang, Y.: Sam2-love: Segment anything model 2 in language-aided audio-visual scenes. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

work page 2025
[38]

arXiv preprint arXiv:2408.04593 (2024)

Yu, J., Wang, A., Dong, W., Xu, M., Islam, M., Wang, J., Bai, L., Ren, H.: Sam 2 in robotic surgery: An empirical evalua- tion for robustness and generalization in surgical video segmentation. arXiv preprint arXiv:2408.04593 (2024)

work page arXiv 2024
[39]

arXiv preprint arXiv:2408.12447 (2024)

Tran, T.: The 2nd solution for lsvos chal- lenge rvos track: Spatial-temporal refine- ment for consistent semantic segmentation. arXiv preprint arXiv:2408.12447 (2024)

work page arXiv 2024
[40]

arXiv preprint arXiv:2408.10469 (2024)

Liu, X., Zhang, J., Zhang, K., Liu, X., Li, L.: Lsvos challenge 3rd place report: Sam2 and cutie based vos. arXiv preprint arXiv:2408.10469 (2024)

work page arXiv 2024
[41]

arXiv preprint arXiv:2306.12156 (2023) 31

Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J.: Fast segment anything. arXiv preprint arXiv:2306.12156 (2023) 31

work page arXiv 2023
[42]

arXiv preprint arXiv:2312.06736 (2023)

Varadarajan, B., Soran, B., Iandola, F., Xiang, X., Xiong, Y., Zhu, C., Krishnamoor- thi, R., Chandra, V.: Squeezesam: User friendly mobile interactive segmentation. arXiv preprint arXiv:2312.06736 (2023)

work page arXiv 2023
[43]

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Zhang, C., Han, D., Qiao, Y., Kim, J.U., Bae, S.-H., Lee, S., Hong, C.S.: Faster seg- ment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[44]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

Shu, H., Li, W., Tang, Y., Zhang, Y., Chen, Y., Li, H., Wang, Y., Chen, X.: Tinysam: Pushing the envelope for efficient segment anything model. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

work page 2025
[45]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Wang, A., Chen, H., Lin, Z., Han, J., Ding, G.: Repvit: Revisiting mobile cnn from vit perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024
[46]

arXiv preprint arXiv:2312.06660 (2023)

Zhou, C., Li, X., Loy, C.C., Dai, B.: Edge- sam: Prompt-in-the-loop distillation for on- device deployment of sam. arXiv preprint arXiv:2312.06660 (2023)

work page arXiv 2023
[47]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Lv, C., Chen, H., Guo, J., Ding, Y., Liu, X.: Ptq4sam: Post-training quantization for segment anything. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024
[48]

In: Advances in Neural Information Processing Systems (2024)

Chen, Z., Fang, G., Ma, X., Wang, X.: Slim- sam: 0.1% data makes segment anything slim. In: Advances in Neural Information Processing Systems (2024)

work page 2024
[49]

arXiv preprint arXiv:2306.06211 (2023)

Zhang, C., Puspitasari, F.D., Zheng, S., Li, C., Qiao, Y., Kang, T., Shan, X., Zhang, C., Qin, C., Rameau, F., et al.: A survey on segment anything model (sam): Vision foun- dation model meets prompt engineering. arXiv preprint arXiv:2306.06211 (2023)

work page arXiv 2023
[50]

arXiv preprint arXiv:2305.08196 (2023)

Zhang, C., Liu, L., Cui, Y., Huang, G., Lin, W., Yang, Y., Hu, Y.: A compre- hensive survey on segment anything model for vision and beyond. arXiv preprint arXiv:2305.08196 (2023)

work page arXiv 2023
[51]

In: 2023 IEEE International Conference on Bioinfor- matics and Biomedicine (BIBM) (2023)

Zhang, L., Deng, X., Lu, Y.: Segment any- thing model (sam) for medical image seg- mentation: A preliminary review. In: 2023 IEEE International Conference on Bioinfor- matics and Biomedicine (BIBM) (2023)

work page 2023
[52]

Computers in Biology and Medicine 171, 108238 (2024)

Zhang, Y., Shen, Z., Jiao, R.: Segment any- thing model for medical image segmenta- tion: Current applications and future direc- tions. Computers in Biology and Medicine 171, 108238 (2024)

work page 2024
[53]

arXiv preprint arXiv:2408.08315 (2024)

Zhang, C., Cui, Y., Lin, W., Huang, G., Rong, Y., Liu, L., Shan, S.: Segment any- thing for videos: A systematic survey. arXiv preprint arXiv:2408.08315 (2024)

work page arXiv 2024
[54]

arXiv preprint arXiv:2408.12889 (2024)

Zhang, Y., Shen, Z.: Unleashing the potential of sam2 for biomedical images and videos: A survey. arXiv preprint arXiv:2408.12889 (2024)

work page arXiv 2024
[55]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

Papa, L., Russo, P., Amerini, I., Zhou, L.: A survey on efficient vision transformers: algo- rithms, techniques, and performance bench- marking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

work page 2024
[56]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)

Xu, C., McAuley, J.: A survey on model compression and acceleration for pretrained language models. In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)

work page 2023
[57]

ACM Computing Surveys 55(9), 1–35 (2023)

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: A systematic survey of prompt- ing methods in natural language process- ing. ACM Computing Surveys 55(9), 1–35 (2023)

work page 2023
[58]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scal- able vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

work page
[59]

In: Inter- national Conference on Machine Learning (2023)

Ryali, C., Hu, Y.-T., Bolya, D., Wei, C., Fan, H., Huang, P.-Y., Aggarwal, V., Chowd- hury, A., Poursaeed, O., Hoffman, J., et 32 al.: Hiera: A hierarchical vision transformer without the bells-and-whistles. In: Inter- national Conference on Machine Learning (2023)

work page 2023
[60]

arXiv preprint arXiv:2304.08506 (2023)

Hu, C., Xia, T., Ju, S., Li, X.: When sam meets medical images: An investiga- tion of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506 (2023)

work page arXiv 2023
[61]

arXiv preprint arXiv:2304.04738 (2023)

Mohapatra, S., Gosai, A., Schlaug, G.: Sam vs bet: A comparative study for brain extraction and segmentation of magnetic resonance images using deep learning. arXiv preprint arXiv:2304.04738 (2023)

work page arXiv 2023
[62]

In: IS&T Inter- national Symposium on Electronic Imaging (2025)

Deng, R., Cui, C., Liu, Q., Yao, T., Reme- dios, L.W., Bao, S., Landman, B.A., Whe- less, L.E., Coburn, L.A., Wilson, K.T., et al.: Segment anything model (sam) for dig- ital pathology: Assess zero-shot segmenta- tion on whole slide imaging. In: IS&T Inter- national Symposium on Electronic Imaging (2025)

work page 2025
[63]

In: Medical Imaging 2024: Computer-Aided Diagnosis (2024)

Li, Y., Hu, M., Yang, X.: Polyp-sam: Trans- fer sam for polyp segmentation. In: Medical Imaging 2024: Computer-Aided Diagnosis (2024)

work page 2024
[64]

arXiv preprint arXiv:2304.13973 (2023)

Hu, M., Li, Y., Yang, X.: Skinsam: Empowering skin cancer segmentation with segment anything model. arXiv preprint arXiv:2304.13973 (2023)

work page arXiv 2023
[65]

arXiv preprint arXiv:2306.06370 (2023)

Shaharabany, T., Dahan, A., Giryes, R., Wolf, L.: Autosam: Adapting sam to medical images by overloading the prompt encoder. arXiv preprint arXiv:2306.06370 (2023)

work page arXiv 2023
[66]

In: Proceed- ings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

Konwer, A., Yang, Z., Bas, E., Xiao, C., Prasanna, P., Bhatia, P., Kass-Hout, T.: Enhancing sam with efficient prompting and preference optimization for semi-supervised medical image segmentation. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

work page 2025
[67]

arXiv preprint arXiv:2312.06316 (2023)

Zhang, Y., Cheng, Y., Qi, Y.: Semisam: Exploring sam for enhancing semi- supervised medical image segmentation with extremely limited annotations. arXiv preprint arXiv:2312.06316 (2023)

work page arXiv 2023
[68]

arXiv preprint arXiv:2412.16085 (2024)

Ma, J., Li, F., Kim, S., Asakereh, R., Le, B.-H., Nguyen-Vu, D.-K., Pfefferle, A., Wei, M., Gao, R., Lyu, D., et al.: Efficient med- sams: Segment anything in medical images on laptop. arXiv preprint arXiv:2412.16085 (2024)

work page arXiv 2024
[69]

In: Medical Image Segmentation Challenge (2024)

Pfefferle, A., Purucker, L., Hutter, F.: Daft: data-aware fine-tuning of foundation models for efficient and effective medical image seg- mentation. In: Medical Image Segmentation Challenge (2024)

work page 2024
[70]

In: Medical Image Segmentation Challenge (2024)

Wei, M., Chen, S., Wu, S., Xu, D.: Rep- medsam: Towards real-time and universal medical image segmentation. In: Medical Image Segmentation Challenge (2024)

work page 2024
[71]

In: Medical Image Segmen- tation Challenge (2024)

Gao, R., Lyu, D., Staring, M.: Swin- litemedsam: A lightweight box-based seg- ment anything model for large-scale medical image datasets. In: Medical Image Segmen- tation Challenge (2024)

work page 2024
[72]

In: Medical Imaging 2025: Ultrasonic Imaging and Tomography (2025)

Hu, M., Yang, X.: Breastlightsam: a lightweight pipeline for fast and accurate breast cancer diagnosis and tumor segmen- tation. In: Medical Imaging 2025: Ultrasonic Imaging and Tomography (2025)

work page 2025
[73]

Machine Intelli- gence Research 21, 617–630 (2024)

Ji, W., Li, J., Bi, Q., Liu, T., Li, W., Cheng, L.: Segment anything is not always perfect: An investigation of sam on differ- ent real-world applications. Machine Intelli- gence Research 21, 617–630 (2024)

work page 2024
[74]

arXiv preprint arXiv:2304.07764 (2023)

Giannakis, I., Bhardwaj, A., Sam, L., Leon- tidis, G.: Deep learning universal crater detection using segment anything model (sam). arXiv preprint arXiv:2304.07764 (2023)

work page arXiv 2023
[75]

Sensors 23(18), 7884 (2023)

Li, Y., Wang, D., Yuan, C., Li, H., Hu, J.: Enhancing agricultural image segmenta- tion with an agricultural segment anything model adapter. Sensors 23(18), 7884 (2023)

work page 2023
[76]

arXiv preprint arXiv:2305.10724 (2023)

Cao, Y., Xu, X., Sun, C., Cheng, Y., Du, Z., Gao, L., Shen, W.: Segment any anomaly 33 without training via hybrid prompt regu- larization. arXiv preprint arXiv:2305.10724 (2023)

work page arXiv 2023
[77]

International Journal of Applied Earth Observation and Geoinformation 124, 103540 (2023)

Osco, L.P., Wu, Q., Lemos, E.L., Gon¸ calves, W.N., Ramos, A.P.M., Li, J., Junior, J.M.: The segment anything model (sam) for remote sensing applications: From zero to one shot. International Journal of Applied Earth Observation and Geoinformation 124, 103540 (2023)

work page 2023
[78]

In: Advances in Neural Information Processing Systems (2023)

Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., Zhang, L.: Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In: Advances in Neural Information Processing Systems (2023)

work page 2023
[79]

In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

Shan, Z., Liu, Y., Zhou, L., Yan, C., Wang, H., Xie, X.: Ros-sam: High-quality interac- tive segmentation for remote sensing moving object. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

work page 2025
[80]

arXiv preprint arXiv:2304.06790 (2023)

Yu, T., Feng, R., Feng, R., Liu, J., Jin, X., Zeng, W., Chen, Z.: Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)

work page arXiv 2023

Showing first 80 references.

[1] [1]

On the Opportunities and Risks of Foundation Models

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bern- stein, M.S., Bohg, J., Bosselut, A., Brun- skill, E., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [2]

A Survey of Large Language Models

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Wan, Z., Wang, X., Liu, C., Alam, S., Zheng, Y., Liu, J., Qu, Z., Yan, S., Zhu, 26 Table 6: Quantitative results of the accuracy of SegAny task ( mIoU) on COCO and LVIS with points and boxes as prompts. For evaluation with points prompts, we select the center point of the ground truth bounding box ( pt1), and one or three randomly sampled points from grou...

work page 2024

[4] [4]

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakan- tan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learn- ers. In: Advances in Neural Information Processing Systems (2020) 27 Table 8 : Quantitative results of instance segmentation on COCO with YOLOv8 [169] or Ground- dingDINO [224] as object...

work page 2020

[5] [5]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

: Palm: Scaling lan- guage modeling with pathways

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al. : Palm: Scaling lan- guage modeling with pathways. Journal of Machine Learning Research 24(240), 1–113 (2023)

work page 2023

[7] [7]

PaLM 2 Technical Report

Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., et al.: Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

Touvron, H., Lavril, T., Izacard, G., Mar- tinet, X., Lachaux, M.-A., Lacroix, T., Rozi` ere, B., Goyal, N., Hambro, E., Azhar, 28 Table 10: Quantitative results of zero-shot instance segmentation on SGinW benchmark with Ground- ingDINO as the object detector.We report variants’ Average Precision (AP) on each dataset and mean AP over all 25 datasets. Data...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

In: Interna- tional Conference on Learning Representa- tions (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Interna- tional Conference on Learning Representa- tions (2021)

work page 2021

[11] [11]

In: Advances in Neural Informa- tion Processing Systems (2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in Neural Informa- tion Processing Systems (2017)

work page 2017

[12] [12]

Radford, A., Kim, J.W., Hallacy, C., 29 Table 11 : Quantitative results of zero-shot instance segmentation on UVO benchmark with GroundingDINO as the object detector. Model AP AP S APM APL SAM-H 29.9 10 20.8 44.9 SAMfast-H 29.7 9.9 20.7 44.6 SAM2-B+ 30.9 9.4 21.3 47 FastSAM 20.8 7 14.7 30.1 MobileSAM 25.2 8.2 17.4 38 EdgeSAM 24.9 8.6 17.9 36.4 EfficientSA...

work page 2021

[13] [13]

In: Advances in Neural Information Processing Systems (2023)

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. In: Advances in Neural Information Processing Systems (2023)

work page 2023

[14] [14]

In: Proceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers) (2024)

Maaz, M., Rasheed, H., Khan, S., Khan, F.: Video-ChatGPT: Towards detailed video understanding via large vision and language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers) (2024)

work page 2024

[15] [15]

: Multi- modal foundation models: From specialists to general-purpose assistants

Li, C., Gan, Z., Yang, Z., Yang, J., Li, L., Wang, L., Gao, J., et al. : Multi- modal foundation models: From specialists to general-purpose assistants. Foundations and Trends ® in Computer Graphics and Vision 16(1-2), 1–214 (2024)

work page 2024

[16] [16]

: Vision-language pre- training: Basics, recent advances, and future trends

Gan, Z., Li, L., Li, C., Wang, L., Liu, Z., Gao, J., et al. : Vision-language pre- training: Basics, recent advances, and future trends. Foundations and Trends® in Com- puter Graphics and Vision 14(3–4), 163–352 (2022)

work page 2022

[17] [17]

In: Advances in Neural Information Processing Systems (2024)

Zou, X., Yang, J., Zhang, H., Li, F., Li, L., Wang, J., Wang, L., Gao, J., Lee, Y.J.: Seg- ment everything everywhere all at once. In: Advances in Neural Information Processing Systems (2024)

work page 2024

[18] [18]

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1207–1216

Tang, Y., Bi, J., Xu, S., Song, L., Liang, S., Wang, T., Zhang, D., An, J., Lin, J., Zhu, R., et al.: Video understanding with large language models: A survey. arXiv preprint arXiv:2312.17432 (2023)

work page arXiv 2023

[19] [19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023

[20] [20]

Nature Communications 15(1), 654 (2024)

Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications 15(1), 654 (2024)

work page 2024

[21] [21]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Chen, T., Zhu, L., Deng, C., Cao, R., Wang, Y., Zhang, S., Li, Z., Sun, L., Zang, Y., Mao, P.: Sam-adapter: Adapting segment anything in underperformed scenes. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023

[22] [22]

arXiv preprint arXiv:2304.09148 (2023)

Chen, T., Zhu, L., Ding, C., Cao, R., Wang, Y., Li, Z., Sun, L., Mao, P., Zang, Y.: Sam fails to segment anything?–sam- adapter: Adapting sam in underperformed scenes: Camouflage, shadow, medical image segmentation, and more. arXiv preprint arXiv:2304.09148 (2023)

work page arXiv 2023

[23] [23]

In: Medical Imag- ing with Deep Learning, Short Paper Track (2023)

Wald, T., Roy, S., Koehler, G., Disch, N., Rokuss, M.R., Holzschuh, J., Zimmerer, D., Maier-Hein, K.: SAM.MD: Zero-shot med- ical image segmentation capabilities of the segment anything model. In: Medical Imag- ing with Deep Learning, Short Paper Track (2023)

work page 2023

[24] [24]

In: Medical Image Segmentation Challenge (2024)

Le, B.-H., Nguyen-Vu, D.-K., Nguyen-Mau, 30 T.-H., Nguyen, H.-D., Tran, M.-T.: Med- ficientsam: a robust medical segmentation model with optimized inference pipeline for limited clinical settings. In: Medical Image Segmentation Challenge (2024)

work page 2024

[25] [25]

Application of segment anything model for civil infrastructure defect assessment,

Ahmadi, M., Lonbar, A.G., Sharifi, A., Beris, A.T., Nouri, M., Javidi, A.S.: Appli- cation of segment anything model for civil infrastructure defect assessment. arXiv preprint arXiv:2304.12600 (2023)

work page arXiv 2023

[26] [26]

arXiv preprint arXiv:2304.14006 (2023)

Xie, D., Wang, R., Ma, J., Chen, C., Lu, H., Yang, D., Shi, F., Lin, X.: Edit everything: A text-guided generative system for images editing. arXiv preprint arXiv:2304.14006 (2023)

work page arXiv 2023

[27] [27]

In: The Twelfth International Confer- ence on Learning Representations (2024)

Zhang, R., Jiang, Z., Guo, Z., Yan, S., Pan, J., Dong, H., Qiao, Y., Gao, P., Li, H.: Per- sonalize segment anything model with one shot. In: The Twelfth International Confer- ence on Learning Representations (2024)

work page 2024

[28] [28]

arXiv preprint arXiv:2304.11968 (2023)

Yang, J., Gao, M., Li, Z., Gao, S., Wang, F., Zheng, F.: Track anything: Seg- ment anything meets videos. arXiv preprint arXiv:2304.11968 (2023)

work page arXiv 2023

[29] [29]

Lu, Z., Xiao, Z., Bai, J., Xiong, Z., Wang, X.: Can sam boost video super-resolution? arXiv preprint arXiv:2305.06524 (2023)

work page arXiv 2023

[30] [30]

arXiv preprint arXiv:2305.01443 (2023)

He, H., Zhang, J., Xu, M., Liu, J., Du, B., Tao, D.: Scalable mask annotation for video text spotting. arXiv preprint arXiv:2305.01443 (2023)

work page arXiv 2023

[31] [31]

arXiv preprint arXiv:2304.10261 (2023)

Shen, Q., Yang, X., Wang, X.: Anything- 3d: Towards single-view anything recon- struction in the wild. arXiv preprint arXiv:2304.10261 (2023)

work page arXiv 2023

[32] [32]

arXiv preprint arXiv:2311.01989 (2023)

Dong, S., Liu, F., Lin, G.: Leveraging large- scale pretrained vision foundation mod- els for label-efficient 3d point cloud seg- mentation. arXiv preprint arXiv:2311.01989 (2023)

work page arXiv 2023

[33] [33]

In: The Thir- teenth International Conference on Learning Representations (2025)

Xu, X., Chen, H., Zhao, L., Wang, Z., Zhou, J., Lu, J.: EmbodiedSAM: Online segment any 3d thing in real time. In: The Thir- teenth International Conference on Learning Representations (2025)

work page 2025

[34] [34]

SAM 2: Segment Anything in Images and Videos

Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., R¨ adle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

arXiv preprint arXiv:2408.02635 (2024)

Shen, C., Li, W., Shi, Y., Wang, X.: Inter- active 3d medical image segmentation with sam 2. arXiv preprint arXiv:2408.02635 (2024)

work page arXiv 2024

[36] [36]

arXiv preprint arXiv:2408.06170 (2024)

Yamagishi, Y., Hanaoka, S., Kikuchi, T., Nakao, T., Nakamura, Y., Nomura, Y., Miki, S., Yoshikawa, T., Abe, O.: Zero-shot 3d seg- mentation of abdominal organs in ct scans using segment anything model 2: Adapting video tracking capabilities for 3d medical imaging. arXiv preprint arXiv:2408.06170 (2024)

work page arXiv 2024

[37] [37]

In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

Wang, Y., Xu, H., Liu, Y., Li, J., Tang, Y.: Sam2-love: Segment anything model 2 in language-aided audio-visual scenes. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

work page 2025

[38] [38]

arXiv preprint arXiv:2408.04593 (2024)

Yu, J., Wang, A., Dong, W., Xu, M., Islam, M., Wang, J., Bai, L., Ren, H.: Sam 2 in robotic surgery: An empirical evalua- tion for robustness and generalization in surgical video segmentation. arXiv preprint arXiv:2408.04593 (2024)

work page arXiv 2024

[39] [39]

arXiv preprint arXiv:2408.12447 (2024)

Tran, T.: The 2nd solution for lsvos chal- lenge rvos track: Spatial-temporal refine- ment for consistent semantic segmentation. arXiv preprint arXiv:2408.12447 (2024)

work page arXiv 2024

[40] [40]

arXiv preprint arXiv:2408.10469 (2024)

Liu, X., Zhang, J., Zhang, K., Liu, X., Li, L.: Lsvos challenge 3rd place report: Sam2 and cutie based vos. arXiv preprint arXiv:2408.10469 (2024)

work page arXiv 2024

[41] [41]

arXiv preprint arXiv:2306.12156 (2023) 31

Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J.: Fast segment anything. arXiv preprint arXiv:2306.12156 (2023) 31

work page arXiv 2023

[42] [42]

arXiv preprint arXiv:2312.06736 (2023)

Varadarajan, B., Soran, B., Iandola, F., Xiang, X., Xiong, Y., Zhu, C., Krishnamoor- thi, R., Chandra, V.: Squeezesam: User friendly mobile interactive segmentation. arXiv preprint arXiv:2312.06736 (2023)

work page arXiv 2023

[43] [43]

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Zhang, C., Han, D., Qiao, Y., Kim, J.U., Bae, S.-H., Lee, S., Hong, C.S.: Faster seg- ment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[44] [44]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

Shu, H., Li, W., Tang, Y., Zhang, Y., Chen, Y., Li, H., Wang, Y., Chen, X.: Tinysam: Pushing the envelope for efficient segment anything model. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

work page 2025

[45] [45]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Wang, A., Chen, H., Lin, Z., Han, J., Ding, G.: Repvit: Revisiting mobile cnn from vit perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024

[46] [46]

arXiv preprint arXiv:2312.06660 (2023)

Zhou, C., Li, X., Loy, C.C., Dai, B.: Edge- sam: Prompt-in-the-loop distillation for on- device deployment of sam. arXiv preprint arXiv:2312.06660 (2023)

work page arXiv 2023

[47] [47]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Lv, C., Chen, H., Guo, J., Ding, Y., Liu, X.: Ptq4sam: Post-training quantization for segment anything. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024

[48] [48]

In: Advances in Neural Information Processing Systems (2024)

Chen, Z., Fang, G., Ma, X., Wang, X.: Slim- sam: 0.1% data makes segment anything slim. In: Advances in Neural Information Processing Systems (2024)

work page 2024

[49] [49]

arXiv preprint arXiv:2306.06211 (2023)

Zhang, C., Puspitasari, F.D., Zheng, S., Li, C., Qiao, Y., Kang, T., Shan, X., Zhang, C., Qin, C., Rameau, F., et al.: A survey on segment anything model (sam): Vision foun- dation model meets prompt engineering. arXiv preprint arXiv:2306.06211 (2023)

work page arXiv 2023

[50] [50]

arXiv preprint arXiv:2305.08196 (2023)

Zhang, C., Liu, L., Cui, Y., Huang, G., Lin, W., Yang, Y., Hu, Y.: A compre- hensive survey on segment anything model for vision and beyond. arXiv preprint arXiv:2305.08196 (2023)

work page arXiv 2023

[51] [51]

In: 2023 IEEE International Conference on Bioinfor- matics and Biomedicine (BIBM) (2023)

Zhang, L., Deng, X., Lu, Y.: Segment any- thing model (sam) for medical image seg- mentation: A preliminary review. In: 2023 IEEE International Conference on Bioinfor- matics and Biomedicine (BIBM) (2023)

work page 2023

[52] [52]

Computers in Biology and Medicine 171, 108238 (2024)

Zhang, Y., Shen, Z., Jiao, R.: Segment any- thing model for medical image segmenta- tion: Current applications and future direc- tions. Computers in Biology and Medicine 171, 108238 (2024)

work page 2024

[53] [53]

arXiv preprint arXiv:2408.08315 (2024)

Zhang, C., Cui, Y., Lin, W., Huang, G., Rong, Y., Liu, L., Shan, S.: Segment any- thing for videos: A systematic survey. arXiv preprint arXiv:2408.08315 (2024)

work page arXiv 2024

[54] [54]

arXiv preprint arXiv:2408.12889 (2024)

Zhang, Y., Shen, Z.: Unleashing the potential of sam2 for biomedical images and videos: A survey. arXiv preprint arXiv:2408.12889 (2024)

work page arXiv 2024

[55] [55]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

Papa, L., Russo, P., Amerini, I., Zhou, L.: A survey on efficient vision transformers: algo- rithms, techniques, and performance bench- marking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

work page 2024

[56] [56]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)

Xu, C., McAuley, J.: A survey on model compression and acceleration for pretrained language models. In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)

work page 2023

[57] [57]

ACM Computing Surveys 55(9), 1–35 (2023)

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: A systematic survey of prompt- ing methods in natural language process- ing. ACM Computing Surveys 55(9), 1–35 (2023)

work page 2023

[58] [58]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scal- able vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

work page

[59] [59]

In: Inter- national Conference on Machine Learning (2023)

Ryali, C., Hu, Y.-T., Bolya, D., Wei, C., Fan, H., Huang, P.-Y., Aggarwal, V., Chowd- hury, A., Poursaeed, O., Hoffman, J., et 32 al.: Hiera: A hierarchical vision transformer without the bells-and-whistles. In: Inter- national Conference on Machine Learning (2023)

work page 2023

[60] [60]

arXiv preprint arXiv:2304.08506 (2023)

Hu, C., Xia, T., Ju, S., Li, X.: When sam meets medical images: An investiga- tion of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506 (2023)

work page arXiv 2023

[61] [61]

arXiv preprint arXiv:2304.04738 (2023)

Mohapatra, S., Gosai, A., Schlaug, G.: Sam vs bet: A comparative study for brain extraction and segmentation of magnetic resonance images using deep learning. arXiv preprint arXiv:2304.04738 (2023)

work page arXiv 2023

[62] [62]

In: IS&T Inter- national Symposium on Electronic Imaging (2025)

Deng, R., Cui, C., Liu, Q., Yao, T., Reme- dios, L.W., Bao, S., Landman, B.A., Whe- less, L.E., Coburn, L.A., Wilson, K.T., et al.: Segment anything model (sam) for dig- ital pathology: Assess zero-shot segmenta- tion on whole slide imaging. In: IS&T Inter- national Symposium on Electronic Imaging (2025)

work page 2025

[63] [63]

In: Medical Imaging 2024: Computer-Aided Diagnosis (2024)

Li, Y., Hu, M., Yang, X.: Polyp-sam: Trans- fer sam for polyp segmentation. In: Medical Imaging 2024: Computer-Aided Diagnosis (2024)

work page 2024

[64] [64]

arXiv preprint arXiv:2304.13973 (2023)

Hu, M., Li, Y., Yang, X.: Skinsam: Empowering skin cancer segmentation with segment anything model. arXiv preprint arXiv:2304.13973 (2023)

work page arXiv 2023

[65] [65]

arXiv preprint arXiv:2306.06370 (2023)

Shaharabany, T., Dahan, A., Giryes, R., Wolf, L.: Autosam: Adapting sam to medical images by overloading the prompt encoder. arXiv preprint arXiv:2306.06370 (2023)

work page arXiv 2023

[66] [66]

In: Proceed- ings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

Konwer, A., Yang, Z., Bas, E., Xiao, C., Prasanna, P., Bhatia, P., Kass-Hout, T.: Enhancing sam with efficient prompting and preference optimization for semi-supervised medical image segmentation. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

work page 2025

[67] [67]

arXiv preprint arXiv:2312.06316 (2023)

Zhang, Y., Cheng, Y., Qi, Y.: Semisam: Exploring sam for enhancing semi- supervised medical image segmentation with extremely limited annotations. arXiv preprint arXiv:2312.06316 (2023)

work page arXiv 2023

[68] [68]

arXiv preprint arXiv:2412.16085 (2024)

Ma, J., Li, F., Kim, S., Asakereh, R., Le, B.-H., Nguyen-Vu, D.-K., Pfefferle, A., Wei, M., Gao, R., Lyu, D., et al.: Efficient med- sams: Segment anything in medical images on laptop. arXiv preprint arXiv:2412.16085 (2024)

work page arXiv 2024

[69] [69]

In: Medical Image Segmentation Challenge (2024)

Pfefferle, A., Purucker, L., Hutter, F.: Daft: data-aware fine-tuning of foundation models for efficient and effective medical image seg- mentation. In: Medical Image Segmentation Challenge (2024)

work page 2024

[70] [70]

In: Medical Image Segmentation Challenge (2024)

Wei, M., Chen, S., Wu, S., Xu, D.: Rep- medsam: Towards real-time and universal medical image segmentation. In: Medical Image Segmentation Challenge (2024)

work page 2024

[71] [71]

In: Medical Image Segmen- tation Challenge (2024)

Gao, R., Lyu, D., Staring, M.: Swin- litemedsam: A lightweight box-based seg- ment anything model for large-scale medical image datasets. In: Medical Image Segmen- tation Challenge (2024)

work page 2024

[72] [72]

In: Medical Imaging 2025: Ultrasonic Imaging and Tomography (2025)

Hu, M., Yang, X.: Breastlightsam: a lightweight pipeline for fast and accurate breast cancer diagnosis and tumor segmen- tation. In: Medical Imaging 2025: Ultrasonic Imaging and Tomography (2025)

work page 2025

[73] [73]

Machine Intelli- gence Research 21, 617–630 (2024)

Ji, W., Li, J., Bi, Q., Liu, T., Li, W., Cheng, L.: Segment anything is not always perfect: An investigation of sam on differ- ent real-world applications. Machine Intelli- gence Research 21, 617–630 (2024)

work page 2024

[74] [74]

arXiv preprint arXiv:2304.07764 (2023)

Giannakis, I., Bhardwaj, A., Sam, L., Leon- tidis, G.: Deep learning universal crater detection using segment anything model (sam). arXiv preprint arXiv:2304.07764 (2023)

work page arXiv 2023

[75] [75]

Sensors 23(18), 7884 (2023)

Li, Y., Wang, D., Yuan, C., Li, H., Hu, J.: Enhancing agricultural image segmenta- tion with an agricultural segment anything model adapter. Sensors 23(18), 7884 (2023)

work page 2023

[76] [76]

arXiv preprint arXiv:2305.10724 (2023)

Cao, Y., Xu, X., Sun, C., Cheng, Y., Du, Z., Gao, L., Shen, W.: Segment any anomaly 33 without training via hybrid prompt regu- larization. arXiv preprint arXiv:2305.10724 (2023)

work page arXiv 2023

[77] [77]

International Journal of Applied Earth Observation and Geoinformation 124, 103540 (2023)

Osco, L.P., Wu, Q., Lemos, E.L., Gon¸ calves, W.N., Ramos, A.P.M., Li, J., Junior, J.M.: The segment anything model (sam) for remote sensing applications: From zero to one shot. International Journal of Applied Earth Observation and Geoinformation 124, 103540 (2023)

work page 2023

[78] [78]

In: Advances in Neural Information Processing Systems (2023)

Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., Zhang, L.: Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In: Advances in Neural Information Processing Systems (2023)

work page 2023

[79] [79]

In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

Shan, Z., Liu, Y., Zhou, L., Yan, C., Wang, H., Xie, X.: Ros-sam: High-quality interac- tive segmentation for remote sensing moving object. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)

work page 2025

[80] [80]

arXiv preprint arXiv:2304.06790 (2023)

Yu, T., Feng, R., Feng, R., Liu, J., Jin, X., Zeng, W., Chen, Z.: Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)

work page arXiv 2023