Recognition: unknown
RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking
Pith reviewed 2026-05-10 00:01 UTC · model grok-4.3
The pith
RSRCC is the first benchmark for fine-grained question-answering on localized semantic changes in remote sensing images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims to introduce RSRCC as the first remote sensing change question-answering benchmark designed explicitly for fine-grained reasoning-based supervision. It contains 126k questions split into training, validation, and test sets, constructed around localized change-specific questions. The construction relies on a hierarchical semi-supervised curation pipeline that uses Best-of-N ranking as the final stage to resolve ambiguities after initial extraction and screening of candidate change regions.
What carries the argument
The hierarchical semi-supervised curation pipeline with retrieval-augmented Best-of-N ranking, which extracts candidate regions from semantic segmentation masks, screens them, and validates semantically meaningful localized changes.
If this is right
- Models can be trained to answer questions requiring reasoning about particular semantic changes in remote sensing data.
- The dataset supports supervision beyond location detection to natural language explanations of what changed.
- Scalable filtering of noisy candidates is achieved while preserving meaningful changes.
- Vision-language models for remote sensing can be evaluated on fine-grained change comprehension tasks.
Where Pith is reading between the lines
- Applications in environmental monitoring could benefit from AI systems that describe exact changes like urban expansion in a specific zone.
- The curation approach might be adapted to create benchmarks in other imaging domains requiring localized reasoning.
- Future models might use this data to improve accuracy in distinguishing subtle semantic shifts from noise in satellite imagery.
Load-bearing premise
The hierarchical semi-supervised curation pipeline using Best-of-N ranking accurately filters noisy and ambiguous candidates while preserving semantically meaningful localized changes without introducing substantial selection bias or errors.
What would settle it
A study where experts review a sample of the benchmark questions and find many that are ambiguous, do not match visible changes, or lack clear localization would indicate the pipeline did not succeed.
Figures
read the original abstract
Traditional change detection identifies where changes occur, but does not explain what changed in natural language. Existing remote sensing change captioning datasets typically describe overall image-level differences, leaving fine-grained localized semantic reasoning largely unexplored. To close this gap, we present RSRCC, a new benchmark for remote sensing change question-answering containing 126k questions, split into 87k training, 17.1k validation, and 22k test instances. Unlike prior datasets, RSRCC is built around localized, change-specific questions that require reasoning about a particular semantic change. To the best of our knowledge, this is the first remote sensing change question-answering benchmark designed explicitly for such fine-grained reasoning-based supervision. To construct RSRCC, we introduce a hierarchical semi-supervised curation pipeline that uses Best-of-N ranking as a critical final ambiguity-resolution stage. First, candidate change regions are extracted from semantic segmentation masks, then initially screened using an image-text embedding model, and finally validated through retrieval-augmented vision-language curation with Best-of-N ranking. This process enables scalable filtering of noisy and ambiguous candidates while preserving semantically meaningful changes. The dataset is available at https://huggingface.co/datasets/google/RSRCC.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces RSRCC, a benchmark dataset of 126k remote sensing change question-answering instances (87k train, 17.1k val, 22k test) focused on localized, change-specific questions that require fine-grained semantic reasoning. It is constructed via a hierarchical semi-supervised curation pipeline that extracts candidate regions from semantic segmentation masks, screens them with image-text embeddings, and applies retrieval-augmented vision-language curation with Best-of-N ranking as the final ambiguity-resolution step. The authors claim this is the first such benchmark explicitly designed for reasoning-based supervision and release the data publicly on Hugging Face.
Significance. If the pipeline produces high-quality localized questions with minimal residual noise or bias, RSRCC could enable new supervision signals for models that explain specific semantic changes in remote sensing imagery, going beyond image-level change captioning. The public release and scalable curation approach are concrete contributions that could be adopted by the community.
major comments (2)
- [Construction pipeline (abstract and §3)] Construction pipeline (abstract and §3): no precision/recall, ablation, inter-annotator agreement, or human evaluation is reported for the Best-of-N ranking stage, which is described as the critical final filter. Without these metrics it is impossible to verify that the 126k questions preserve semantically meaningful localized changes rather than introducing selection bias or residual ambiguity, directly undermining the central claim that the benchmark supports effective fine-grained reasoning-based supervision.
- [§1 and related-work discussion] §1 and related-work discussion: the 'first such benchmark' claim is asserted without a quantitative comparison table against prior remote-sensing change-captioning or VQA datasets; a side-by-side analysis of question granularity and supervision type is needed to substantiate novelty.
minor comments (2)
- [Abstract] Abstract: the total 126k and the listed splits sum to 126.1k; clarify whether this is rounding or an off-by-one error.
- [Dataset release] Dataset release: include a datasheet or explicit license statement in the main text in addition to the Hugging Face link.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript introducing RSRCC. We address each major comment point by point below and outline specific revisions to strengthen the paper.
read point-by-point responses
-
Referee: [Construction pipeline (abstract and §3)] Construction pipeline (abstract and §3): no precision/recall, ablation, inter-annotator agreement, or human evaluation is reported for the Best-of-N ranking stage, which is described as the critical final filter. Without these metrics it is impossible to verify that the 126k questions preserve semantically meaningful localized changes rather than introducing selection bias or residual ambiguity, directly undermining the central claim that the benchmark supports effective fine-grained reasoning-based supervision.
Authors: We agree that the absence of targeted metrics for the Best-of-N ranking stage limits verification of the pipeline's effectiveness in preserving high-quality, localized changes. The manuscript describes the hierarchical semi-supervised curation (region extraction from segmentation masks, image-text screening, and retrieval-augmented Best-of-N as the final ambiguity resolver) but does not report precision/recall, ablations, or human evaluation specifically for this stage. In the revised version, we will add an ablation comparing Best-of-N against baseline selection strategies, plus a human evaluation study on a sampled subset (reporting inter-annotator agreement and semantic meaningfulness scores) to quantify residual ambiguity and bias. This will directly support the claim of effective fine-grained reasoning-based supervision. revision: yes
-
Referee: [§1 and related-work discussion] §1 and related-work discussion: the 'first such benchmark' claim is asserted without a quantitative comparison table against prior remote-sensing change-captioning or VQA datasets; a side-by-side analysis of question granularity and supervision type is needed to substantiate novelty.
Authors: We thank the referee for highlighting the need for explicit substantiation. The manuscript positions RSRCC as the first remote sensing change QA benchmark explicitly designed for localized, reasoning-based supervision (distinct from image-level change captioning), based on its focus on change-specific questions requiring fine-grained semantic reasoning. To strengthen this, the revised manuscript will include a side-by-side comparison table (in §1 or related work) contrasting RSRCC against prior datasets on key dimensions: scale, question granularity (localized vs. global), supervision type (reasoning QA vs. captioning), and curation approach. revision: yes
Circularity Check
No circularity: dataset construction is self-contained without reducing claims to inputs or self-citations.
full rationale
The paper presents a benchmark dataset constructed via an explicitly described hierarchical pipeline (semantic segmentation masks, image-text embedding screening, retrieval-augmented Best-of-N ranking). No mathematical derivations, first-principles predictions, or fitted parameters are claimed whose outputs reduce by construction to the inputs. The 'first benchmark' claim rests on explicit comparison to prior remote sensing change captioning datasets rather than self-definition or self-citation load-bearing. Lack of reported validation metrics on the ranking stage is a verification gap, not a circularity in any derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Semantic segmentation masks can accurately identify candidate change regions from paired remote sensing images.
- domain assumption Image-text embedding models provide useful initial screening for semantic relevance of change descriptions.
Reference graph
Works this paper leans on
-
[1]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback.arXiv preprint arXiv:2204.05862, 2022
work page Pith review arXiv 2022
-
[2]
A transformer-based siamese network for change detection
Wele Gedara Chaminda Bandara and Vishal M Patel. A transformer-based siamese network for change detection. InIGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, pages 207–210. IEEE, 2022
2022
-
[3]
A recipe for improving remote sensing vlm zero shot generalization,
Aviad Barzilai, Yotam Gigi, Amr Helmy, Vered Silverman, Yehonathan Refael, Bolous Jaber, Tomer Shekel, George Leifman, and Genady Beryozkin. A recipe for improving remote sensing vlm zero shot generalization.arXiv preprint arXiv:2503.08722, 2025
-
[4]
A spatial-temporal attention-based method and a new dataset for remote sensing image change detection.Remote sensing, 12(10):1662, 2020
Hao Chen and Zhenwei Shi. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection.Remote sensing, 12(10):1662, 2020
2020
-
[5]
Zhenyuan Chen, Chenxi Wang, Ningyu Zhang, and Feng Zhang. Rscc: A large-scale remote sensing change caption dataset for disaster events.arXiv preprint arXiv:2509.01907, 2025
-
[6]
Per-pixel classification is not all you need for semantic segmentation
Bowen Cheng, Alexander Schwing, and Alexander Kirillov. Per-pixel classification is not all you need for semantic segmentation. InAdvances in Neural Information Processing Systems, volume 34, pages 17864–17875, 2021
2021
-
[7]
Functional map of the world
Gordon Christie, Neil Fendley, James Wilson, and Ryan Miller. Functional map of the world. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6172–6182, 2018
2018
-
[8]
Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities, 2025
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities, 2025
2025
-
[9]
Bonbon alignment for large language models andthesweetnessofbest-of-nsampling
Lin Gui, Cristina Gârbacea, and Victor Veitch. Bonbon alignment for large language models andthesweetnessofbest-of-nsampling. InAdvancesinNeuralInformationProcessingSystems (NeurIPS), volume 37, pages 2851–2885, 2024
2024
-
[10]
Spacenet8-thedetectionoffloodedroadsandbuildings
RonnyHänsch, JacobArndt, DaltonLunga, MatthewGibb, TylerPedelose, ArnoldBoedihardjo, DesireePetrie,andToddM.Bacastow. Spacenet8-thedetectionoffloodedroadsandbuildings. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1472–1480, 2022
2022
-
[11]
Rsgpt: A remote sensing vision language model and benchmark.ISPRS Journal of Photogrammetry and Remote Sensing, 224:272–286, 2025
Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Yu Liu, and Xiang Li. Rsgpt: A remote sensing vision language model and benchmark.ISPRS Journal of Photogrammetry and Remote Sensing, 224:272–286, 2025
2025
-
[12]
Deyi Ji, Siqi Gao, Mingyuan Tao, Hongtao Lu, and Feng Zhao. Changenet: Multi-temporal asymmetric change detection dataset.arXiv preprint arXiv:2312.17428, pages 2725–2729, 2024
-
[13]
Regularized best-of-n sampling with minimum bayes risk objective for language model alignment
Yuu Jinnai, Tetsuro Morimura, Kaito Ariu, and Kenshi Abe. Regularized best-of-n sampling with minimum bayes risk objective for language model alignment. InProceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 9321–9347, 2025
2025
-
[14]
RoieKazoom,OfirCohen,RamiPuzis,AsafShabtai,andOferHadar. Vault: Vigilantadversarial updates via llm-driven retrieval-augmented generation for nli.arXiv preprint arXiv:2508.00965, 2025
-
[15]
Don’t lag, rag: Training-free adversarial detection using rag.arXiv preprint arXiv:2504.04858, 2025
Roie Kazoom, Raz Lapid, Moshe Sipper, and Ofer Hadar. Don’t lag, rag: Training-free adversarial detection using rag.arXiv preprint arXiv:2504.04858, 2025. 10
-
[16]
Weicheng Kuo, AJ Piergiovanni, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew Dai, Zhifeng Chen, et al. Mammut: A simple architecture for joint learning for multimodal tasks.arXiv preprint arXiv:2303.16839, 2023
-
[17]
Prometheus-vision: Vision-language model as a judge for fine-grained evaluation
Seongyun Lee, Seungone Kim, Sue Park, Geewook Kim, and Minjoon Seo. Prometheus-vision: Vision-language model as a judge for fine-grained evaluation. InFindings of the Association for Computational Linguistics: ACL 2024, pages 11286–11315, 2024
2024
-
[18]
Jiaqi Li, Feng Zhang, Zhenyuan Chen, Chenxi Wang, and Ningyu Zhang. Xlrs-bench: Could your multimodal llms understand extremely large ultra-high-resolution remote sensing imagery? arXiv preprint arXiv:2503.23771, 2025
-
[19]
Xiang Li, Congcong Wen, Yuan Hu, and Nan Zhou. Rs-clip: Zero shot remote sensing scene classification via contrastive vision-language supervision.International Journal of Applied Earth Observation and Geoinformation, 124:103497, 2023
2023
-
[20]
XiangLi,JianDing,andMohamedElhoseiny. Vrsbench: Aversatilevision-languagebenchmark dataset for remote sensing image understanding.arXiv preprint arXiv:2406.12384, 2024
-
[21]
Remote sensingimagechangecaptioningwithprogressivedifference-awarenetwork.IEEETransactions on Geoscience and Remote Sensing, 60:1–14, 2022
Chenyang Liu, Rui Zhao, Hao Chen, Zheng Zhang, Zhengxia Zou, and Zhenwei Shi. Remote sensingimagechangecaptioningwithprogressivedifference-awarenetwork.IEEETransactions on Geoscience and Remote Sensing, 60:1–14, 2022
2022
-
[22]
Remote sensing image changecaptioningwithdual-branchtransformers: Anewmethodandalargescaledataset.IEEE Transactions on Geoscience and Remote Sensing, 60:1–20, 2022
Chenyang Liu, Rui Zhao, Hao Chen, Zhengxia Zou, and Zhenwei Shi. Remote sensing image changecaptioningwithdual-branchtransformers: Anewmethodandalargescaledataset.IEEE Transactions on Geoscience and Remote Sensing, 60:1–20, 2022
2022
-
[23]
Change-agent: Toward interactive comprehensive remote sensing change interpretation and analysis.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2024
Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, and Zhenwei Shi. Change-agent: Toward interactive comprehensive remote sensing change interpretation and analysis.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2024
2024
-
[24]
Remoteclip: A vision language foundation model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2023
Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Remoteclip: A vision language foundation model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2023
2023
-
[25]
YiLiu,ChaoPang,ZongqianZhan,XiaomengZhang,andXueYang. Buildingchangedetection for remote sensing images using a dual-task constrained deep siamese convolutional network model.IEEE Geoscience and Remote Sensing Letters, 18(5):811–815, 2021
2021
-
[26]
Exploring models and data for remote sensing image caption generation.IEEE Transactions on Geoscience and Remote Sensing, 56(4):2183–2195, 2018
Xiaoqiang Lu, Binqiang Wang, Xiangtao Zheng, and Xuelong Li. Exploring models and data for remote sensing image caption generation.IEEE Transactions on Geoscience and Remote Sensing, 56(4):2183–2195, 2018
2018
-
[27]
Training language models to follow instructions with human feedback
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, pages 27730–27744, 2022
2022
-
[28]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
2021
-
[29]
Sequential operations in digital picture processing.Journal of the ACM (JACM), 13(4):471–494, 1966
Azriel Rosenfeld and John L Pfaltz. Sequential operations in digital picture processing.Journal of the ACM (JACM), 13(4):471–494, 1966
1966
-
[30]
PierGiuseppeSessa,RobertDadashi,LéonardHussenot,JohanFerret,NinoVieillard,Alexandre Ramé, Bobak Shariari, Sarah Perrin, et al. Bond: Aligning llms with best-of-n distillation. arXiv preprint arXiv:2407.14622, 2024. URLhttps://arxiv.org/abs/2407.14622
-
[31]
S2looking: A satellite side-looking dataset for building change detection.Remote Sensing, 13(24):5094, 2021
Li Shen, Yao Lu, Hao Chen, Hao Wei, Donghai Xie, Jiabao Yue, Rui Chen, Shouye Lv, and Bitao Jiang. S2looking: A satellite side-looking dataset for building change detection.Remote Sensing, 13(24):5094, 2021. 11
2021
-
[32]
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024
work page internal anchor Pith review arXiv 2024
-
[33]
Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786, 2025
work page internal anchor Pith review arXiv 2025
-
[34]
Thespacenet7multi-temporalurbandevelopmentchallengedataset
Adam Van Etten, Daniel Hogan, Jesus Martinez Manso, Jacob Shermeyer, Nicholas Weir, and RyanLewis. Thespacenet7multi-temporalurbandevelopmentchallengedataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021
2021
-
[35]
The multi-temporal urban development spacenet dataset
Adam Van Etten, Daniel Hogan, Jesus Martinez Manso, Jacob Shermeyer, Nicholas Weir, and Ryan Lewis. The multi-temporal urban development spacenet dataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6398–6407, 2021
2021
-
[36]
Qfabric: Multi-task change detection dataset
Sagar Verma, Akash Panigrahi, and Siddharth Gupta. Qfabric: Multi-task change detection dataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1052–1061, 2021
2021
-
[37]
Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization
Vicente Vivanco Cepeda, Gaurav Kumar Nayak, and Mubarak Shah. Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 8690–8701, 2023
2023
-
[38]
A cross-spatial differential localization network for remote sensing change captioning.Remote Sensing, 17(13):2285, 2024
Rui Wang, Chen Sun, Xiang Li, Haoyu Yao, and Jiatong Wu. A cross-spatial differential localization network for remote sensing change captioning.Remote Sensing, 17(13):2285, 2024
2024
-
[39]
Congcong Wen, Yiting Lin, Xiaokang Qu, Nan Li, Yong Liao, Hui Lin, and Xiang Li. Rs-rag: Bridging remote sensing imagery and comprehensive knowledge with a multi-modal dataset and retrieval-augmented generation model.arXiv preprint arXiv:2504.04988, 2025
-
[40]
JunshiXia,NaotoYokoya,BrunoAdriano,andCliffordBroni-Bediako. Openearthmap: Abench- mark dataset for global high-resolution land cover mapping.arXiv preprint arXiv:2110.08710, pages 6254–6264, 2023
-
[41]
Segformer: Simple and efficient design for semantic segmentation with transformers
Enze Xie, Wenhai Yu, Vignesh Kumar, Ping Li, Brian Price, and Ding Liang. Segformer: Simple and efficient design for semantic segmentation with transformers. InAdvances in Neural Information Processing Systems (NeurIPS), 2021
2021
-
[42]
Asurveyofchangedetectionmethodsbasedonremote sensing images for multi-source and multi-objective scenarios.Remote Sensing, 12(15):2460, 2020
YananYou,JingyiCao,andWenliZhou. Asurveyofchangedetectionmethodsbasedonremote sensing images for multi-source and multi-objective scenarios.Remote Sensing, 12(15):2460, 2020
2020
-
[43]
Sigmoid loss for language image pre-training, 2023
XiaohuaZhai,BasilMustafa,AlexanderKolesnikov,andLucasBeyer. Sigmoidlossforlanguage image pre-training.arXiv preprint arXiv:2303.15343, pages 11975–11986, 2023
-
[44]
Feng Zhang, Zhenyuan Chen, Jiaqi Li, Chenxi Wang, and Ningyu Zhang. Rssm: A benchmark for remote sensing scene monitoring and spatio-temporal change captioning.arXiv preprint arXiv:2510.11421, 2025
-
[45]
Xinnan Zhang, Chenliang Li, Siliang Zeng, Jiaxiang Li, Zhongruo Wang, Songtao Lu, Alfredo Garcia, and Mingyi Hong. Reinforcement learning in inference time: A perspective from successive policy iterations.arXiv preprint arXiv:2501.04231, 2025
-
[46]
arXiv preprint arXiv:2411.07688 (2024) 5 Preprint 19
Zilun Zhang, Haozhan Shen, Tiancheng Zhao, Zian Guan, Bin Chen, Yuhao Wang, Xu Jia, Yuxiang Cai, Yongheng Shang, and Jianwei Yin. Imagerag: Enhancing ultra high resolution remote sensing imagery analysis with imagerag.arXiv preprint arXiv:2411.07688, 2024
-
[47]
Zhuo Zheng, Ailong Ma, Liangpei Zhang, and Yanfei Zhong. Change is everywhere: Single- temporal supervised object change detection in remote sensing imagery.arXiv preprint arXiv:2108.07002, pages 15193–15202, 2021
-
[48]
knowledge group
Xiao Xiang Zhu, Devis Tuia, Lichao Mou, Gui-Song Xia, Liangpei Zhang, Feng Xu, and Friedrich Fraundorfer. Deep learning in remote sensing: A comprehensive review and list of resources.IEEE geoscience and remote sensing magazine, 5(4):8–36, 2017. 12 A Group-Restricted Retrieval and Boundary Preservation We formalize the intuition that conditioning on retri...
2017
-
[49]
It receives reference examples and must assign a score from 1 to 5 according to visibility and clarity
Instruction:The model is prompted as an expert in satellite image interpretation to score a query patch based on the presence of a specific target class{𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑_𝑐𝑙𝑎𝑠𝑠} . It receives reference examples and must assign a score from 1 to 5 according to visibility and clarity. “You are an expert in recognizing objects from satellite images. Your task is to s...
-
[50]
All images are satellite images
You need to specify if {selected_class} appears in the image. All images are satellite images. Return only the numerical score (1, 2, 3, 4, or 5).” 2.Scoring Guide:The scoring criteria used for filtering are defined as follows: “5: There is definitely a {selected_class} in the last image. The object’s shape, shadow, and features are clearly visible from a...
-
[51]
Example (1): {start_of_image} Score = 5 Example (2): {start_of_image} Score = 3 ... Example (5): {query_image} Score = ?
Example Format:The model is shown several examples of reference and query images structured as: “Example (1): {start_of_image} Score = 5 Example (2): {start_of_image} Score = 3 ... Example (5): {query_image} Score = ?” This systematic prompt design enforces consistent, interpretable visual filtering behavior, ensuring the model evaluates satellite imagery...
-
[52]
Example (1): {image_1} Score = 5. Example (2): {image_2} Score = 3. Example (3): {query_image} Score = ?
Image Examples:Extends the prompt with several labeled examples of reference and query image pairs. “Example (1): {image_1} Score = 5. Example (2): {image_2} Score = 3. Example (3): {query_image} Score = ?”
-
[53]
“You are an expert in satellite image interpretation
Combined (Final):Combines both the scoring guide and reference examples for maximum clarity and contextual learning. “You are an expert in satellite image interpretation. Rate whether the object class appears in the last image, using a score from 1 to 5. Follow the scoring guide: 5 = Definitely visible; 4 = Very likely visible; 3 = Unclear; 2 = Unlikely; ...
-
[54]
Each question includes four options (A-D), where exactly one describes the actual change and the rest represent plausible but incorrect alternatives
Change-Present Instruction (MCQ-Yes):The instruction guides the model to generate a multiple-choice question (MCQ) describing a visible change between two satellite images. Each question includes four options (A-D), where exactly one describes the actual change and the rest represent plausible but incorrect alternatives. “You are an expert in generating m...
-
[55]
visible,
No-Change Instruction (MCQ-No):For cases where no visible change occurs, the model is instructed to generate a question where the correct answer explicitly identifies that there is no change, while the other options describe incorrect or misleading changes. “You are an expert in generating multiple-choice questions about visual comparisons in satellite im...
2000
-
[56]
anAgree/Disagreejudgmentindicatingwhethertheanswercorrectlyrespondedtothegiven question
-
[57]
an optionalimproved alternativein cases where the question or answer appeared unsatis- factory
-
[58]
adifficulty scorefrom 1 to 3 reflecting how visually difficult the example was. The difficulty levels were defined as follows: •1 (Very simple):the change is clearly visible; •2 (Simple):the change is visible, but requires a few seconds to localize or interpret; • 3 (Hard):the change is difficult to detect and may be partially obscured by shadows, occlusi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.