On the Effectiveness of Textual Prompting with Lightweight Fine-Tuning for SAM3 Remote Sensing Segmentation

Amir Boger; David Ioffe; Karen Sandberg Esquenazi; Osher Rafaeli; Roni Blushtein-Livnon; Tal Svoray

arxiv: 2512.15564 · v2 · submitted 2025-12-17 · 💻 cs.CV

On the Effectiveness of Textual Prompting with Lightweight Fine-Tuning for SAM3 Remote Sensing Segmentation

Roni Blushtein-Livnon , Osher Rafaeli , David Ioffe , Amir Boger , Karen Sandberg Esquenazi , Tal Svoray This is my paper

Pith reviewed 2026-05-16 21:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords remote sensingimage segmentationSAM3textual promptinggeometric promptinglightweight fine-tuninghybrid promptingoverhead imagery

0 comments

The pith

Hybrid semantic and geometric prompting with light fine-tuning outperforms text-only approaches for adapting SAM3 to remote sensing segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests SAM3 on overhead imagery by comparing textual prompts alone, geometric cues alone, and their combination, using zero-shot inference and lightweight fine-tuning at increasing supervision levels across four target types. It establishes that hybrid cues deliver the highest scores on all metrics while text-only prompting lags, especially on irregular shapes where semantic alignment is weak. The work shows that modest geometric annotation effort yields most of the gains, with diminishing returns beyond that point. This matters for remote sensing because labeled data is scarce, so identifying low-effort adaptation routes could let foundation models handle more overhead tasks without heavy retraining.

Core claim

SAM3's concept-driven framework generates masks directly from prompts without task-specific changes, and on remote sensing images the combination of textual semantic cues and geometric prompts produces the best masks across targets and metrics. Text-only prompting records the lowest performance, with especially large gaps for irregular targets that reflect poor alignment between the model's textual representations and overhead appearances. Lightweight fine-tuning raises results from the zero-shot baseline, after which further supervision brings only small improvements, and a consistent Precision-IoU gap reveals ongoing under-segmentation and boundary errors.

What carries the argument

SAM3 concept-driven prompting framework with textual, geometric, and hybrid strategies applied under zero-shot and increasing scales of lightweight fine-tuning.

If this is right

Hybrid prompting yields the highest scores across all tested targets and evaluation metrics.
Text-only prompting remains weakest, with the largest shortfalls on irregularly shaped targets.
A modest amount of geometric annotation produces most of the adaptation benefit, after which extra supervision adds little.
Under-segmentation and imprecise boundaries persist as dominant error modes, especially for irregular targets.
For geometrically regular and visually salient targets, textual prompting plus light fine-tuning offers a usable performance-effort balance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical remote-sensing workflows may benefit more from collecting a small number of geometric annotations than from relying on text prompts alone.
The observed Precision-IoU gap suggests that future adaptations could add explicit boundary-refinement steps to reduce under-segmentation.
Testing the same strategies on multi-spectral or SAR imagery would clarify whether the prompting trade-offs generalize beyond the current optical dataset.
Automated generation of geometric cues could further lower the annotation cost while preserving the hybrid advantage.

Load-bearing premise

The four chosen target types and supervision scales are representative enough to support general statements about practical performance-effort trade-offs in remote sensing segmentation.

What would settle it

Running the same prompting and fine-tuning comparisons on a new remote sensing dataset that includes more irregular or low-salience targets and different annotation budgets, then checking whether hybrid prompting still leads and whether gains plateau after modest supervision.

Figures

Figures reproduced from arXiv: 2512.15564 by Amir Boger, David Ioffe, Karen Sandberg Esquenazi, Osher Rafaeli, Roni Blushtein-Livnon, Tal Svoray.

**Figure 2.** Figure 2: Performance under FT200 and ZS by prompt type. A: Average metric [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of SAM3 performance across targets and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Remote sensing (RS) image segmentation is constrained by the limited availability of annotated data and a gap between overhead imagery and natural images used to train foundational models. This motivates effective adaptation under limited supervision. SAM3 concept-driven framework generates masks from textual prompts without requiring task-specific modifications, which may enable this adaptation. We evaluate SAM3 for RS imagery across four target types, comparing textual, geometric, and hybrid prompting strategies, under lightweight fine-tuning scales with increasing supervision, alongside zero-shot inference. Results show that combining semantic and geometric cues yields the highest performance across targets and metrics. Text-only prompting exhibits the lowest performance, with marked score gaps for irregularly shaped targets, reflecting limited semantic alignment between SAM3 textual representations and their overhead appearances. Nevertheless, textual prompting with light fine-tuning offers a practical performance-effort trade-off for geometrically regular and visually salient targets. Across targets, performance improves between zero-shot inference and fine-tuning, followed by diminishing returns as the supervision scale increases. Namely, a modest geometric annotation effort is sufficient for effective adaptation. A persistent gap between Precision and IoU further indicates that under-segmentation and boundary inaccuracies remain prevalent error patterns in RS tasks, particularly for irregular and less prevalent targets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hybrid prompting plus light fine-tuning beats text-only for SAM3 on RS images, but the four targets leave generalizability unclear.

read the letter

The main thing here is that mixing semantic and geometric prompts with a small amount of fine-tuning lifts SAM3 performance on remote sensing segmentation more than text prompts alone, especially for irregular shapes, and most of the gain comes from the first modest supervision step before returns flatten. The work is a straightforward empirical comparison of prompting variants and supervision scales on overhead imagery, which fills a gap since SAM3 was trained on natural scenes. It does a clean job showing the semantic mismatch between text embeddings and RS appearances, plus the recurring Precision-IoU gap that signals boundary and under-segmentation problems. That part feels honest and useful for anyone trying to adapt these models with limited labels. The soft spot is the narrow slice of targets. Four types are not enough to cover the range of RS objects people actually segment, so the practical trade-off advice may not travel to small dense items or multi-class land cover without more testing. Dataset details, exact metrics, and error breakdowns are thin in the abstract, which makes it harder to judge how reproducible the gaps really are. This paper is for people already working on foundation-model adaptation for earth observation who need quick pointers on where annotation effort pays off. It is not a big theoretical advance but it is a solid incremental empirical check. I would send it to peer review so referees can pressure-test the target selection and ask for clearer experimental specs.

Referee Report

2 major / 1 minor

Summary. The manuscript evaluates SAM3 for remote sensing image segmentation across four target types, comparing textual, geometric, and hybrid prompting strategies under zero-shot inference and lightweight fine-tuning with increasing supervision scales. It claims hybrid prompting yields the highest performance across targets and metrics, text-only prompting performs worst with large gaps on irregular shapes due to semantic misalignment, modest geometric annotation suffices for effective adaptation with diminishing returns at higher supervision, and a persistent Precision-IoU gap indicates prevalent under-segmentation and boundary errors.

Significance. If the empirical results hold under proper controls, the work supplies practical guidance on adapting foundation models like SAM3 to remote sensing under annotation scarcity, quantifying the value of hybrid cues and the sufficiency of light supervision for regular targets while flagging persistent error modes that future RS adaptations should target.

major comments (2)

[Abstract] Abstract: the headline claims (hybrid best, text-only worst with marked gaps on irregular targets, diminishing returns after modest supervision) rest on experiments with four unspecified target types and undetailed supervision scales; without enumeration of targets (e.g., their scale, shape irregularity, spectral properties) or concrete annotation budgets, the representativeness for general RS prompting trade-offs cannot be assessed and the practical-effort conclusion remains unverifiable.
[Abstract] Abstract: no dataset details, exact metric definitions (e.g., how Precision and IoU are computed), statistical tests, or error analysis are supplied, leaving the comparative outcomes plausible but impossible to reproduce or stress-test from the provided information.

minor comments (1)

[Abstract] Abstract: the phrase 'lightweight fine-tuning scales with increasing supervision' is used without defining the concrete supervision levels or the fine-tuning protocol, which should be clarified for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the abstract and related sections to improve specificity and reproducibility while preserving the original claims.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims (hybrid best, text-only worst with marked gaps on irregular targets, diminishing returns after modest supervision) rest on experiments with four unspecified target types and undetailed supervision scales; without enumeration of targets (e.g., their scale, shape irregularity, spectral properties) or concrete annotation budgets, the representativeness for general RS prompting trade-offs cannot be assessed and the practical-effort conclusion remains unverifiable.

Authors: We agree that the abstract would be strengthened by explicitly enumerating the target types and supervision scales. In the revised manuscript we have added this information: the four target types are now listed with brief notes on their scale, shape irregularity, and spectral characteristics, and the supervision scales are tied to concrete annotation budgets (number of images and masks at each level). These additions directly support assessment of representativeness and the practical-effort conclusions without altering the experimental design or results. revision: yes
Referee: [Abstract] Abstract: no dataset details, exact metric definitions (e.g., how Precision and IoU are computed), statistical tests, or error analysis are supplied, leaving the comparative outcomes plausible but impossible to reproduce or stress-test from the provided information.

Authors: The main text already contains dataset details (Section 3), exact metric definitions (Precision as TP/(TP+FP) and IoU as intersection-over-union in Section 4.1), and error analysis (Section 5.3 on under-segmentation and boundary errors). To address the referee's point about the abstract, we have inserted a concise summary of the dataset, the metric formulas, and a reference to the error patterns into the revised abstract. Statistical tests were omitted because evaluations use fixed test splits with no stochastic components; we have added a clarifying sentence on this point. These changes make the abstract self-contained while keeping it within length limits. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical evaluation study

full rationale

The paper is a purely empirical evaluation study comparing textual, geometric, and hybrid prompting strategies for SAM3 on remote sensing segmentation, under zero-shot and increasing lightweight fine-tuning supervision scales. All claims derive from direct experimental performance measurements (Precision, IoU, etc.) across four target types, with no equations, derivations, fitted parameters presented as predictions, or first-principles results. No self-citation chains or ansatzes are used to justify any theoretical reduction; the central results are straightforward outcome comparisons. This matches the assessment of it being a direct performance comparison without potential for circular reasoning.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is an empirical study relying on standard machine learning evaluation practices rather than new theoretical constructs or fitted parameters.

axioms (1)

domain assumption Standard assumptions in machine learning evaluation such as representative target sampling and i.i.d. data splits hold for the reported metrics.
Implicit in any comparative performance study on segmentation tasks.

pith-pipeline@v0.9.0 · 5540 in / 1127 out tokens · 30676 ms · 2026-05-16T21:37:33.727715+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 1 internal anchor

[1]

Monitoring forest changes with foundation models and sentinel-2 time series,

J. Sadel, L. Tulczyjew, A. M. Wijata, M. Przeliorz, and J. Nalepa, “Monitoring forest changes with foundation models and sentinel-2 time series,”IEEE Geoscience and Remote Sensing Letters, 2025

work page 2025
[2]

The segment anything model (sam) for remote sensing applications: From zero to one shot,

L. P. Osco, Q. Wu, E. L. De Lemos, W. N. Gonc ¸alves, A. P. M. Ramos, J. Li, and J. M. Junior, “The segment anything model (sam) for remote sensing applications: From zero to one shot,”International Journal of Applied Earth Observation and Geoinformation, vol. 124, p. 103540, 2023

work page 2023
[3]

Remote sensing image segmentation advances: A meta-analysis,

I. Kotaridis and M. Lazaridou, “Remote sensing image segmentation advances: A meta-analysis,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 173, pp. 309–322, 2021

work page 2021
[4]

Pointsam: Pointly- supervised segment anything model for remote sensing images,

N. Liu, X. Xu, Y . Su, H. Zhang, and H.-C. Li, “Pointsam: Pointly- supervised segment anything model for remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–15, 2025

work page 2025
[5]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark,et al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning, pp. 8748–8763, PmLR, 2021

work page 2021
[6]

Grounded language-image pre-training,

L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y . Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwang,et al., “Grounded language-image pre-training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10965–10975, 2022

work page 2022
[7]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection,

S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Su,et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” inEuropean conference on computer vision, pp. 38–55, Springer, 2024

work page 2024
[8]

Image segmentation using text and image prompts,

T. L ¨uddecke and A. Ecker, “Image segmentation using text and image prompts,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7086–7096, 2022

work page 2022
[9]

Remoteclip: A vision language foundation model for remote sensing,

F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, Q. Ye, L. Fu, and J. Zhou, “Remoteclip: A vision language foundation model for remote sensing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1– 16, 2024

work page 2024
[10]

Strong and weak prompt engineering for remote sensing image-text cross-modal retrieval,

T. Sun, C. Zheng, X. Li, Y . Gao, J. Nie, L. Huang, and Z. Wei, “Strong and weak prompt engineering for remote sensing image-text cross-modal retrieval,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025

work page 2025
[11]

Visual and text prompt segmentation: A novel multi-model framework for remote sensing,

X. Zi, K. Jin, X. Tao, J. Li, A. Braytee, R. R. Shah, and M. Prasad, “Visual and text prompt segmentation: A novel multi-model framework for remote sensing,”arXiv preprint arXiv:2503.07911, 2025

work page arXiv 2025
[12]

Segclip: Mul- timodal visual-language and prompt learning for high-resolution remote sensing semantic segmentation,

S. Zhang, B. Zhang, Y . Wu, H. Zhou, J. Jiang, and J. Ma, “Segclip: Mul- timodal visual-language and prompt learning for high-resolution remote sensing semantic segmentation,”IEEE Transactions on Geoscience and Remote Sensing, 2024

work page 2024
[13]

Reviving iterative training with mask guidance for interactive segmentation,

K. Sofiiuk, I. A. Petrov, and A. Konushin, “Reviving iterative training with mask guidance for interactive segmentation,” in2022 IEEE inter- national conference on image processing (ICIP), pp. 3141–3145, IEEE, 2022

work page 2022
[14]

Focalclick: Towards practical interactive image segmentation,

X. Chen, Z. Zhao, Y . Zhang, M. Duan, D. Qi, and H. Zhao, “Focalclick: Towards practical interactive image segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1300–1309, 2022

work page 2022
[15]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo,et al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 4015–4026, 2023

work page 2023
[16]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R ¨adle, C. Rolland, L. Gustafson,et al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Rsprompter: Learning to prompt for remote sensing instance seg- mentation based on visual foundation model,

K. Chen, C. Liu, H. Chen, H. Zhang, W. Li, Z. Zou, and Z. Shi, “Rsprompter: Learning to prompt for remote sensing instance seg- mentation based on visual foundation model,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–17, 2024

work page 2024
[18]

Segment everything everywhere all at once,

X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Wang, L. Wang, J. Gao, and Y . J. Lee, “Segment everything everywhere all at once,”Advances in neural information processing systems, vol. 36, pp. 19769–19782, 2023

work page 2023
[19]

Sam 3: Segment anything with concepts,

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, J. Lei, T. Ma, B. Guo, A. Kalla, M. Marks, J. Greer, M. Wang, P. Sun, R. R¨adle, T. Afouras, E. Mavroudi, K. Xu, T.-H. Wu, Y . Zhou, L. Momeni, R. Hazra, S. Ding, S. Vaze, F. Porcher, F. Li, S. Li, A. Kamath, H. K. Cheng, P. Doll ´ar, N. Ravi, K. ...

work page 2025
[20]

Wikidata: a free collaborative knowl- edgebase,

D. Vrande ˇci´c and M. Kr ¨otzsch, “Wikidata: a free collaborative knowl- edgebase,”Communications of the ACM, vol. 57, no. 10, pp. 78–85, 2014

work page 2014

[1] [1]

Monitoring forest changes with foundation models and sentinel-2 time series,

J. Sadel, L. Tulczyjew, A. M. Wijata, M. Przeliorz, and J. Nalepa, “Monitoring forest changes with foundation models and sentinel-2 time series,”IEEE Geoscience and Remote Sensing Letters, 2025

work page 2025

[2] [2]

The segment anything model (sam) for remote sensing applications: From zero to one shot,

L. P. Osco, Q. Wu, E. L. De Lemos, W. N. Gonc ¸alves, A. P. M. Ramos, J. Li, and J. M. Junior, “The segment anything model (sam) for remote sensing applications: From zero to one shot,”International Journal of Applied Earth Observation and Geoinformation, vol. 124, p. 103540, 2023

work page 2023

[3] [3]

Remote sensing image segmentation advances: A meta-analysis,

I. Kotaridis and M. Lazaridou, “Remote sensing image segmentation advances: A meta-analysis,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 173, pp. 309–322, 2021

work page 2021

[4] [4]

Pointsam: Pointly- supervised segment anything model for remote sensing images,

N. Liu, X. Xu, Y . Su, H. Zhang, and H.-C. Li, “Pointsam: Pointly- supervised segment anything model for remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–15, 2025

work page 2025

[5] [5]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark,et al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning, pp. 8748–8763, PmLR, 2021

work page 2021

[6] [6]

Grounded language-image pre-training,

L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y . Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwang,et al., “Grounded language-image pre-training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10965–10975, 2022

work page 2022

[7] [7]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection,

S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Su,et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” inEuropean conference on computer vision, pp. 38–55, Springer, 2024

work page 2024

[8] [8]

Image segmentation using text and image prompts,

T. L ¨uddecke and A. Ecker, “Image segmentation using text and image prompts,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7086–7096, 2022

work page 2022

[9] [9]

Remoteclip: A vision language foundation model for remote sensing,

F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, Q. Ye, L. Fu, and J. Zhou, “Remoteclip: A vision language foundation model for remote sensing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1– 16, 2024

work page 2024

[10] [10]

Strong and weak prompt engineering for remote sensing image-text cross-modal retrieval,

T. Sun, C. Zheng, X. Li, Y . Gao, J. Nie, L. Huang, and Z. Wei, “Strong and weak prompt engineering for remote sensing image-text cross-modal retrieval,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025

work page 2025

[11] [11]

Visual and text prompt segmentation: A novel multi-model framework for remote sensing,

X. Zi, K. Jin, X. Tao, J. Li, A. Braytee, R. R. Shah, and M. Prasad, “Visual and text prompt segmentation: A novel multi-model framework for remote sensing,”arXiv preprint arXiv:2503.07911, 2025

work page arXiv 2025

[12] [12]

Segclip: Mul- timodal visual-language and prompt learning for high-resolution remote sensing semantic segmentation,

S. Zhang, B. Zhang, Y . Wu, H. Zhou, J. Jiang, and J. Ma, “Segclip: Mul- timodal visual-language and prompt learning for high-resolution remote sensing semantic segmentation,”IEEE Transactions on Geoscience and Remote Sensing, 2024

work page 2024

[13] [13]

Reviving iterative training with mask guidance for interactive segmentation,

K. Sofiiuk, I. A. Petrov, and A. Konushin, “Reviving iterative training with mask guidance for interactive segmentation,” in2022 IEEE inter- national conference on image processing (ICIP), pp. 3141–3145, IEEE, 2022

work page 2022

[14] [14]

Focalclick: Towards practical interactive image segmentation,

X. Chen, Z. Zhao, Y . Zhang, M. Duan, D. Qi, and H. Zhao, “Focalclick: Towards practical interactive image segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1300–1309, 2022

work page 2022

[15] [15]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo,et al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 4015–4026, 2023

work page 2023

[16] [16]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R ¨adle, C. Rolland, L. Gustafson,et al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Rsprompter: Learning to prompt for remote sensing instance seg- mentation based on visual foundation model,

K. Chen, C. Liu, H. Chen, H. Zhang, W. Li, Z. Zou, and Z. Shi, “Rsprompter: Learning to prompt for remote sensing instance seg- mentation based on visual foundation model,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–17, 2024

work page 2024

[18] [18]

Segment everything everywhere all at once,

X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Wang, L. Wang, J. Gao, and Y . J. Lee, “Segment everything everywhere all at once,”Advances in neural information processing systems, vol. 36, pp. 19769–19782, 2023

work page 2023

[19] [19]

Sam 3: Segment anything with concepts,

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, J. Lei, T. Ma, B. Guo, A. Kalla, M. Marks, J. Greer, M. Wang, P. Sun, R. R¨adle, T. Afouras, E. Mavroudi, K. Xu, T.-H. Wu, Y . Zhou, L. Momeni, R. Hazra, S. Ding, S. Vaze, F. Porcher, F. Li, S. Li, A. Kamath, H. K. Cheng, P. Doll ´ar, N. Ravi, K. ...

work page 2025

[20] [20]

Wikidata: a free collaborative knowl- edgebase,

D. Vrande ˇci´c and M. Kr ¨otzsch, “Wikidata: a free collaborative knowl- edgebase,”Communications of the ACM, vol. 57, no. 10, pp. 78–85, 2014

work page 2014