Recognition: no theorem link
Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems
Pith reviewed 2026-05-16 18:25 UTC · model grok-4.3
The pith
Agentic AI systems add sequential planning and tool orchestration to remote sensing beyond current vision models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents the first comprehensive review of agentic AI in remote sensing and introduces a unified taxonomy that distinguishes single-agent copilots from multi-agent systems while analyzing core architectural elements including planning mechanisms, retrieval-augmented generation, and memory structures, together with emerging benchmarks that evaluate trajectory-aware reasoning correctness instead of pixel-level accuracy.
What carries the argument
Unified taxonomy separating single-agent copilots from multi-agent systems, supported by planning mechanisms, retrieval-augmented generation, and memory structures.
If this is right
- Remote sensing evaluation must shift from pixel-level accuracy to trajectory-aware reasoning correctness.
- Architectures should incorporate planning mechanisms, retrieval-augmented generation, and memory structures.
- Limitations in grounding, safety, and orchestration require targeted solutions before deployment.
- A strategic roadmap can guide development of robust autonomous geospatial intelligence systems.
Where Pith is reading between the lines
- The taxonomy could be tested for transfer to sequential decision domains such as autonomous navigation or medical image analysis.
- Integration with live satellite streams might enable real-time adaptive response in disaster monitoring.
- Safety constraints could lead to new verification methods for agent trajectories in Earth observation.
Load-bearing premise
Current vision foundation models and multimodal large language models inherently lack the sequential planning and active tool orchestration needed for complex geospatial workflows.
What would settle it
A demonstration that an unmodified multimodal large language model achieves comparable trajectory-aware reasoning scores on remote-sensing benchmarks without added planning or tool orchestration layers.
Figures
read the original abstract
The paradigm of Earth Observation analysis is shifting from static deep learning models to autonomous agentic AI. Although recent vision foundation models and multimodal large language models advance representation learning, they often lack the sequential planning and active tool orchestration required for complex geospatial workflows. This survey presents the first comprehensive review of agentic AI in remote sensing. We introduce a unified taxonomy distinguishing between single-agent copilots and multi-agent systems while analyzing architectural foundations such as planning mechanisms, retrieval-augmented generation, and memory structures. Furthermore, we review emerging benchmarks that move the evaluation from pixel-level accuracy to trajectory-aware reasoning correctness. By critically examining limitations in grounding, safety, and orchestration, this work outlines a strategic roadmap for the development of robust, autonomous geospatial intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on agentic AI in remote sensing. It argues that vision foundation models and multimodal LLMs advance representation learning but lack sequential planning and tool orchestration for complex geospatial workflows. The central contribution is a unified taxonomy distinguishing single-agent copilots from multi-agent systems, together with analysis of architectural foundations (planning mechanisms, retrieval-augmented generation, memory structures), a review of emerging benchmarks that shift evaluation from pixel-level accuracy to trajectory-aware reasoning correctness, and a critical examination of limitations in grounding, safety, and orchestration that culminates in a strategic roadmap.
Significance. If the taxonomy is coherently supported by the reviewed literature and the analysis of evaluation shifts and limitations is balanced, the work would be significant as the first organizational framework for an emerging intersection of agentic systems and Earth observation. It could help researchers navigate distinctions between system types and redirect attention toward higher-level reasoning metrics rather than isolated accuracy scores.
major comments (2)
- [Abstract] Abstract: the motivating claim that current vision foundation models and MLLMs 'often lack the sequential planning and active tool orchestration required for complex geospatial workflows' is presented without concrete citations or failure-mode examples; because this premise justifies the entire survey, the introduction or taxonomy section must supply a short, referenced enumeration of documented shortcomings in existing models on representative remote-sensing tasks.
- [Taxonomy section] Taxonomy and architectural foundations: the distinction between single-agent copilots and multi-agent systems is introduced at a high level, but the manuscript must explicitly map at least a representative sample of the cited systems onto the taxonomy categories (with a summary table) so that the taxonomy functions as an analytical lens rather than a purely descriptive partition.
minor comments (2)
- All acronyms (RAG, MLLM, etc.) should be defined at first use and a glossary or footnote list added for readers outside the immediate subfield.
- [Benchmarks section] The discussion of benchmark shifts would be strengthened by a comparative table listing existing benchmarks, their primary metrics, and the new trajectory-aware criteria proposed.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our survey. We agree that strengthening the motivation with concrete examples and providing an explicit mapping table will improve the clarity and analytical value of the taxonomy. Both changes will be incorporated in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the motivating claim that current vision foundation models and MLLMs 'often lack the sequential planning and active tool orchestration required for complex geospatial workflows' is presented without concrete citations or failure-mode examples; because this premise justifies the entire survey, the introduction or taxonomy section must supply a short, referenced enumeration of documented shortcomings in existing models on representative remote-sensing tasks.
Authors: We agree that the motivating premise benefits from concrete support. In the revised manuscript we will add a concise, referenced enumeration of documented shortcomings (e.g., failures in multi-step change detection, trajectory planning for disaster response, and tool-use errors on satellite imagery benchmarks) to the Introduction section, citing representative studies that illustrate these limitations. revision: yes
-
Referee: [Taxonomy section] Taxonomy and architectural foundations: the distinction between single-agent copilots and multi-agent systems is introduced at a high level, but the manuscript must explicitly map at least a representative sample of the cited systems onto the taxonomy categories (with a summary table) so that the taxonomy functions as an analytical lens rather than a purely descriptive partition.
Authors: We accept this recommendation. The revised version will include a summary table in the Taxonomy section that explicitly maps a representative sample of the cited systems (e.g., single-agent copilots such as GeoChat and multi-agent frameworks such as those using hierarchical planning) onto the taxonomy categories, thereby making the distinctions operational and analytically useful. revision: yes
Circularity Check
No significant circularity: survey synthesizes external literature without internal derivations or self-referential predictions
full rationale
This is a survey paper whose core contribution is organizational: a taxonomy of single-agent vs. multi-agent systems, review of planning/RAG/memory components, and shift in evaluation benchmarks. No equations, fitted parameters, predictions, or derivations appear in the provided abstract or description. All claims rest on synthesis of prior external work rather than reduction to the paper's own inputs. Self-citations, if present, are not load-bearing for any technical result because no new technical result is derived. The motivational statement about limitations of current vision models is presented as context, not a falsifiable claim proven inside the paper. This matches the default expectation for non-circular survey work.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
Agentic AI for Remote Sensing: Technical Challenges and Research Directions
Agentic AI faces structural challenges in remote sensing due to geospatial data properties and workflow constraints, requiring EO-native agents built around structured state, tool-aware reasoning, and validity-aware e...
-
Agentic AI for Remote Sensing: Technical Challenges and Research Directions
Agentic AI for remote sensing requires new designs centered on structured geospatial state, tool-aware reasoning, verifier-guided execution, and physical validity rather than generic extensions.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Data Leakage Detection and De-duplication in Large Scale Geospatial Image Datasets
Yeshwanth Kumar Adimoolam, Bodhiswatta Chatterjee, Charalambos Poullis, and Melinos Averkiou. Efficient deduplication and leakage detection in large scale image datasets with a focus on the crowdai mapping challenge dataset.arXiv preprint arXiv:2304.02296, 2023. 10
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Temitope Akinboyewa, Zhenlong Li, Huan Ning, and M Naser Lessani. GIS copilot: Towards an autonomous GIS agent for spatial analysis.International Journal of Dig- ital Earth, 18(1):2497489, 2025. 5, 6
work page 2025
-
[4]
Abdul Mohaimen Al Radi, Xu Cao, Fanyang Yu, Yuyuan Liu, Fengbei Liu, Chong Wang, Yuanhong Chen, Jin- tai Chen, Hu Wang, Yanda Meng, et al. Agentic large- language-model systems in medicine: A systematic review and taxonomy.Authorea Preprints, 2025. 2
work page 2025
-
[5]
Flamingo: a visual language model for few-shot learning
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, An- toine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 35: 23716–23736, 2022. 2, 3
work page 2022
-
[6]
Disa: Directional saliency- aware prompt learning for generalizable vision-language models
Niloufar Alipour Talemi, Hossein Kashiani, Hossein R Nowdeh, and Fatemeh Afghah. Disa: Directional saliency- aware prompt learning for generalizable vision-language models. InProceedings of the 31st ACM SIGKDD Confer- ence on Knowledge Discovery and Data Mining V . 2, pages 37–46, 2025. 3
work page 2025
-
[7]
Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foun- dation models defining a new era in vision: a survey and outlook.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 2025. 3
work page 2025
-
[8]
Geography-aware self-supervised learning
Kumar Ayush, Burak Uzkent, Chenlin Meng, Shah Tan- may, Marshall Burke, David Lobell, and Stefano Ermon. Geography-aware self-supervised learning. InICCV, 2021. 3
work page 2021
-
[9]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[10]
Satlaspretrain: A large- scale dataset for remote sensing image understanding
Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdi- nando, and Aniruddha Kembhavi. Satlaspretrain: A large- scale dataset for remote sensing image understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16772–16782, 2023. 3
work page 2023
-
[11]
Yakoub Bazi, Laila Bashmal, Mohamad Mahmoud Al Rah- hal, Riccardo Ricci, and Farid Melgani. RS-LLaV A: A large vision-language model for joint captioning and ques- tion answering in remote sensing imagery.Remote Sensing, 16(9), 2024. 2, 3
work page 2024
-
[12]
Aaron Bell, Amit Aides, Amr Helmy, Arbaaz Muslim, Aviad Barzilai, Aviv Slobodkin, Bolous Jaber, David Schot- tlander, George Leifman, Joydeep Paul, et al. Earth ai: Unlocking geospatial insights with foundation models and cross-modal reasoning.arXiv preprint arXiv:2510.18318,
-
[13]
5 9 Dataset Sensor / Modality Resolution / Scale Dataset Application Scene / LULC classification UC Merced Land Use [143] Aerial RGB∼0.3 m, 256×256 patches land-use scene classification (21 classes) AID [135] Aerial RGB 600×600 pixel patches Aerial scene image classification (30 classes) NWPU-RESISC45 [25] Aerial RGB 256×256 pixel patches Scene classifica...
-
[14]
GeoFlow: Agentic workflow automation for geospatial tasks.arXiv preprint arXiv:2508.04719, 2025
Amulya Bhattaram, Justin Chung, Stanley Chung, Ranit Gupta, Janani Ramamoorthy, Kartikeya Gullapalli, Diana Marculescu, and Dimitrios Stamoulis. GeoFlow: Agentic workflow automation for geospatial tasks.arXiv preprint arXiv:2508.04719, 2025. 4, 6
-
[15]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bern- stein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie S. Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, et al. On the opportunities an...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[16]
Derrick Bonafilia, Beth Tellman, Tyler Anderson, and Er- ica Issenberg. Sen1floods11: A georeferenced dataset to train and test deep learning flood algorithms for sentinel-1. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 210–211,
-
[17]
Dynamic world, near real-time global 10 m land use land cover mapping
Christopher F Brown, Steven P Brumby, Brookie Guzder- Williams, Tanya Birch, Samantha Brooks Hyde, Joseph Mazzariello, Wanda Czerwinski, Valerie J Pasquarella, Robert Haertel, Simon Ilyushchenko, et al. Dynamic world, near real-time global 10 m land use land cover mapping. Scientific data, 9(1):251, 2022. 10
work page 2022
-
[18]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- 10 Dataset / Benchmark Applications Systems and Technologies GeoBenchX [65] Dataset and evaluation framework Multi-step GIS reasoning LangGraph ReAct agent, Python geospatial stack, and an LLM as Judge GTChain-IT / CTChain-Eval [149] Dataset and evaluation framework Benchmarking LLMs on geospatial tool us...
work page 1901
-
[19]
Hao Chen and Zhenwei Shi. A spatial-temporal attention- based method and a new dataset for remote sensing image change detection.Remote sensing, 12(10):1662, 2020. 10
work page 2020
-
[20]
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Keqin Chen, Zhao Zhang, Weili Zeng, Richong Zhang, Feng Zhu, and Rui Zhao. Shikra: Unleashing multi- modal llm’s referential dialogue magic.arXiv preprint arXiv:2306.15195, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Keyan Chen, Jiafan Zhang, Chenyang Liu, Zhengxia Zou, and Zhenwei Shi. Rsrefseg: Referring remote sensing im- age segmentation with foundation models.arXiv preprint arXiv:2501.06809, 2025. 2
-
[22]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020. 2, 3
work page 2020
-
[23]
An LLM agent for automatic geospatial data analy- sis.arXiv preprint arXiv:2410.18792, 2024
Yuxing Chen, Weijie Wang, Sylvain Lobry, and Camille Kurtz. An LLM agent for automatic geospatial data analy- sis.arXiv preprint arXiv:2410.18792, 2024. 8
-
[24]
Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response
Yiheng Chen, Lingyao Li, Zihui Ma, Qikai Hu, Yilun Zhu, Min Deng, and Runlong Yu. Empowering llm agents with geospatial awareness: Toward grounded reasoning for wild- fire response.arXiv preprint arXiv:2510.12061, 2025. 5, 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
Automating geospatial vision tasks with a large language model agent
Yuxing Chen, Weijie Wang, Camille Kurtz, and Sylvain Lobry. Automating geospatial vision tasks with a large language model agent. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 218–235. Springer, 2025. 5, 6, 7
work page 2025
-
[26]
Gong Cheng, Junwei Han, and Xiaoqiang Lu. Remote sens- ing image scene classification: Benchmark and state of the art.Proceedings of the IEEE, 105(10):1865–1883, 2017. 9, 10
work page 2017
-
[27]
Yusen Cheng, Hongli Pang, Yangyang Li, Lei Fan, Shengjie Wei, Ziwen Yuan, and Yinqing Fang. Applications and ad- vancements of spaceborne insar in landslide monitoring and susceptibility mapping: a systematic review.Remote Sens- ing, 17(6):999, 2025. 2
work page 2025
-
[28]
Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery.Advances in Neural Information Processing Systems, 35:197–211, 2022. 2, 3
work page 2022
-
[29]
Li Dalin, Wang Haijiao, Yang Zhen, Gu Yanfeng, and Shen Shi. An online distributed satellite cooperative observation scheduling algorithm based on multiagent deep reinforce- ment learning.IEEE Geoscience and Remote Sensing Let- ters, 18(11):1901–1905, 2020. 4, 6
work page 1901
-
[30]
MRSB DATA. Multimodal artificial intelligence foundation models: Unleashing the power of remote sensing big data in earth observation.Innovation, 2(1):100055, 2024. 2
work page 2024
-
[31]
Urban change detection for multispec- tral earth observation using convolutional neural networks
Rodrigo Caye Daudt, Bertr Le Saux, Alexandre Boulch, and Yann Gousseau. Urban change detection for multispec- tral earth observation using convolutional neural networks. InIGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, pages 2115–2118. Ieee, 2018. 2, 10
work page 2018
-
[32]
Deepglobe 2018: A challenge to parse the earth through satellite images
Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. Deepglobe 2018: A challenge to parse the earth through satellite images. InThe IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018. 10
work page 2018
-
[33]
Deepglobe 2018: A challenge to parse the earth through satellite images
Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. Deepglobe 2018: A challenge to parse the earth through satellite images. InProceedings of the IEEE conference on computer vision and pattern recogni- tion workshops, pages 172–181, 2018. 9, 10
work page 2018
-
[34]
Wenhui Diao, Haichen Yu, Kaiyue Kang, Tong Ling, Di Liu, Yingchao Feng, Hanbo Bi, Libo Ren, Xuexue Li, Yongqiang Mao, et al. Ringmo-aerial: An aerial remote sensing foundation model with affine transformation con- 11 trastive learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2
work page 2025
-
[35]
An im- age is worth 16x16 words: Transformers for image recog- nition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An im- age is worth 16x16 words: Transformers for image recog- nition at scale. InInternational Conference on Learning Representations, 2021. 2, 3
work page 2021
-
[36]
Siqi Du, Shengjun Tang, Weixi Wang, Xiaoming Li, and Renzhong Guo. TREE-GPT: Modular large lan- guage model expert system for forest remote sensing im- age understanding and interactive analysis.arXiv preprint arXiv:2310.04698, 2023. 4, 5, 6
-
[37]
Nanyi Fei, Zhiwu Lu, Yizhao Gao, Guoxing Yang, Yuqi Huo, Jingyuan Wen, Haoyu Lu, Ruihua Song, Xin Gao, Tao Xiang, et al. Towards artificial general intelligence via a multimodal foundation model.Nature Communications, 13 (1):3094, 2022. 3
work page 2022
-
[38]
Peilin Feng, Zhutao Lv, Junyan Ye, Xiaolei Wang, Xinjie Huo, Jinhua Yu, Wanghan Xu, Wenlong Zhang, Lei Bai, Conghui He, and Weijia Li. Earth-agent: Unlocking the full landscape of earth observation with agents.arXiv preprint arXiv:2509.23141, 2025. 5, 6, 7, 8
-
[39]
Google earth ai and gemini for cli- mate and environmental analysis.https://earth
Google Earth Team. Google earth ai and gemini for cli- mate and environmental analysis.https://earth. google.com, 2025. Accessed 2025. 6, 8
work page 2025
-
[40]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. DeepSeek-R1: Incentivizing rea- soning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Remote sensing chatgpt: Solving remote sensing tasks with chatgpt and visual models
Haonan Guo, Xin Su, Chen Wu, Bo Du, Liangpei Zhang, and Deren Li. Remote sensing chatgpt: Solving remote sensing tasks with chatgpt and visual models. InIGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium, pages 11474–11478. IEEE, 2024. 4, 5, 6
work page 2024
-
[42]
Xusen Guo, Mingxing Peng, Xixuan Hao, Xingchen Zou, Qiongyan Wang, Sijie Ruan, and Yuxuan Liang. AgentSense: LLMs empower generalizable and explain- able web-based participatory urban sensing.arXiv preprint arXiv:2510.19661, 2025. 7
-
[43]
EarthLink: A self-evolving ai agent for climate science.arXiv preprint arXiv:2507.17311, 2025
Zijie Guo, Jiong Wang, Xiaoyu Yue, Wangxu Wei, Zhe Jiang, Wanghan Xu, Ben Fei, Wenlong Zhang, Xinyu Gu, Lijing Cheng, et al. EarthLink: A self-evolving ai agent for climate science.arXiv preprint arXiv:2507.17311, 2025. 4, 5, 7
-
[44]
Devashish Vikas Gupta, Azeez Syed Ali Ishaqui, and Divya Kiran Kadiyala. Geode: A zero-shot geospa- tial question-answering agent with explicit reasoning and precise spatio-temporal retrieval.arXiv preprint arXiv:2407.11014, 2024. 4, 5
-
[45]
Ritwik Gupta, Richard Hosfelt, Sandra Sajeev, Nirav Pa- tel, Bryce Goodman, Jigar Doshi, Eric Heim, Howie Choset, and Matthew Gaston. xbd: A dataset for assess- ing building damage from satellite imagery.arXiv preprint arXiv:1911.09296, 2019. 9, 10
-
[46]
Imagebind-llm: Multi-modality instruction tuning
Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, et al. Imagebind-llm: Multi-modality instruction tun- ing.arXiv preprint arXiv:2309.03905, 2023. 3
-
[47]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 3
work page 2016
-
[48]
Masked autoencoders are scal- able vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scal- able vision learners. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 16000–16009, 2022. 2, 3
work page 2022
-
[49]
Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019. 9, 10
work page 2019
-
[50]
Masked image mod- eling: A survey.International Journal of Computer Vision, 133(10):7154–7200, 2025
Vlad Hondru, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, and Nicu Sebe. Masked image mod- eling: A survey.International Journal of Computer Vision, 133(10):7154–7200, 2025. 3
work page 2025
-
[51]
An overview of multimodal remote sensing data fusion: From image to feature, from shallow to deep
Danfeng Hong, Jocelyn Chanussot, and Xiao Xiang Zhu. An overview of multimodal remote sensing data fusion: From image to feature, from shallow to deep. In2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pages 1245–1248. IEEE, 2021. 2
work page 2021
-
[52]
Shuyang Hou, Zhangxiao Shen, Anqi Zhao, Jianyuan Liang, Zhipeng Gui, Xuefeng Guan, Rui Li, and Huayi Wu. Geocode-gpt: A large language model for geospatial code generation.International Journal of Applied Earth Obser- vation and Geoinformation, page 104456, 2025. 5, 7, 8
work page 2025
-
[53]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 2
work page 2022
-
[54]
A comprehensive survey on contrastive learning
Haigen Hu, Xiaoyuan Wang, Yan Zhang, Qi Chen, and Qiu Guan. A comprehensive survey on contrastive learning. Neurocomputing, 610:128645, 2024. 3
work page 2024
-
[55]
Huiyang Hu, Peijin Wang, Yingchao Feng, Kaiwen Wei, Wenxin Yin, Wenhui Diao, Mengyu Wang, Hanbo Bi, Kaiyue Kang, Tong Ling, et al. RINGMO-Agent: A unified remote sensing foundation model for multi-platform and multi-modal reasoning.arXiv preprint arXiv:2507.20776,
-
[56]
2, 3, 4, 5, 6, 7, 8, 9, 11
-
[57]
Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Yu Liu, and Xiang Li. RSGPT: A remote sensing vision lan- guage model and benchmark.ISPRS Journal of Photogram- metry and Remote Sensing, 224:272–286, 2025. 3
work page 2025
-
[58]
GeoAgent: To empower llms using geospatial tools for address stan- dardization
Chenghua Huang, Shisong Chen, Zhixu Li, Jianfeng Qu, Yanghua Xiao, Jiaxin Liu, and Zhigang Chen. GeoAgent: To empower llms using geospatial tools for address stan- dardization. InFindings of the Association for Computa- tional Linguistics ACL 2024, pages 6048–6063, 2024. 2, 8
work page 2024
-
[59]
Geoagent: To empower llms using geospatial tools for ad- 12 dress standardization
Cheng Huang, Yifan Zhang, Zhiyun Wang, and Wenhao Yu. Geoagent: To empower llms using geospatial tools for ad- 12 dress standardization. InFindings of the Association for Computational Linguistics ACL, 2024. 6, 8
work page 2024
-
[60]
Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, et al. Language is not all you need: Aligning perception with language mod- els.Advances in Neural Information Processing Systems, 36:72096–72109, 2023. 2
work page 2023
-
[61]
Whose truth? pluralistic Geo-Alignment for (agentic) ai.arXiv preprint arXiv:2508.05432, 2025
Krzysztof Janowicz, Zilong Liu, Gengchen Mai, Zhangyu Wang, Ivan Majic, Alexandra Fortacz, Grant McKenzie, and Song Gao. Whose truth? pluralistic Geo-Alignment for (agentic) ai.arXiv preprint arXiv:2508.05432, 2025. 6
-
[62]
Roads: Robust prompt-driven multi-class anomaly detection under domain shift
Hossein Kashiani, Niloufar Alipour Talemi, and Fatemeh Afghah. Roads: Robust prompt-driven multi-class anomaly detection under domain shift. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7908–7917. IEEE, 2025. 2
work page 2025
-
[63]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023. 3
work page 2023
-
[64]
Satclip: Global, general- purpose location embeddings with satellite imagery
Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, and Marc Rußwurm. Satclip: Global, general- purpose location embeddings with satellite imagery. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 4347–4355, 2025. 2
work page 2025
-
[65]
Un- derstanding masked autoencoders via hierarchical latent variable models
Lingjing Kong, Martin Q Ma, Guangyi Chen, Eric P Xing, Yuejie Chi, Louis-Philippe Morency, and Kun Zhang. Un- derstanding masked autoencoders via hierarchical latent variable models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 7918–7928, 2023. 2
work page 2023
-
[66]
Anis Koubaa and Khaled Gabr. Agentic UA Vs: Llm-driven autonomy with integrated tool-calling and cognitive reason- ing.arXiv preprint arXiv:2509.13352, 2025. 4, 5, 6
-
[67]
GeoBenchX: Benchmarking LLMs in agent solving multistep geospatial tasks
Varvara Krechetova and Denis Kochedykov. GeoBenchX: Benchmarking LLMs in agent solving multistep geospatial tasks. InProceedings of the 1st ACM SIGSPATIAL Interna- tional Workshop on Generative and Agentic AI for Multi- Modality Space-Time Intelligence, page 27–35, New York, NY , USA, 2025. Association for Computing Machinery. 6, 7, 8, 9, 11
work page 2025
-
[68]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural net- works. InAdvances in Neural Information Processing Sys- tems, 2012. 3
work page 2012
-
[69]
Geochat: Grounded large vision-language model for remote sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, and Fahad Shahbaz Khan. Geochat: Grounded large vision-language model for remote sensing. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 27831–27840, 2024. 2, 3, 7
work page 2024
-
[70]
xView: Objects in Context in Overhead Imagery
Darius Lam, Richard Kuzma, Kevin McGee, Samuel Doo- ley, Michael Laielli, Matthew Klaric, Yaroslav Bulatov, and Brendan McCord. xview: Objects in context in overhead imagery.arXiv preprint arXiv:1802.07856, 2018. 10
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[71]
Multi-agent geospatial copilots for remote sensing workflows.arXiv preprint arXiv:2501.16254, 2025
Chaehong Lee, Varatheepan Paramanayakam, Andreas Karatzas, Yanan Jian, Michael Foret, Heming Liao, Fuxun Yu, Ruopu Li, Iraklis Anagnostopoulos, and Dimitrios Sta- moulis. Multi-agent geospatial copilots for remote sensing workflows.arXiv preprint arXiv:2501.16254, 2025. 5, 7, 8
-
[72]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InIn- ternational conference on machine learning, pages 19730– 19742. PMLR, 2023. 2
work page 2023
-
[73]
Zhenshi Li, Dilxat Muhtar, Feng Gu, Yanglangxing He, Xueliang Zhang, Pengfeng Xiao, Guangjun He, and Xiaox- iang Zhu. LHRS-Bot-Nova: Improved multimodal large language model for remote sensing vision-language inter- pretation.ISPRS Journal of Photogrammetry and Remote Sensing, 227:539–550, 2025. 7
work page 2025
-
[74]
Jianyuan Liang, Shuyang Hou, Haoyue Jiao, Yaxian Qing, Anqi Zhao, Zhangxiao Shen, Longgang Xiang, and Huayi Wu. GeoGraphRAG: A graph-based retrieval-augmented generation approach for empowering large language mod- els in automated geospatial modeling.International Jour- nal of Applied Earth Observation and Geoinformation, 142:104712, 2025. 5, 6, 7
work page 2025
-
[75]
Qingming Lin, Rui Hu, Huaxia Li, Sensen Wu, Yadong Li, Kai Fang, Hailin Feng, Zhenhong Du, and Liuchang Xu. ShapefileGPT: A multi-agent large language model framework for automated shapefile processing.Interna- tional Journal of Digital Earth, 18(2):2577884, 2025. 4, 5, 6, 7, 9, 11
work page 2025
-
[76]
Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, and Zhenwei Shi. Change-agent: Towards interactive comprehensive remote sensing change interpre- tation and analysis.IEEE Transactions on Geoscience and Remote Sensing, 2024. 2, 5
work page 2024
-
[77]
Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Remoteclip: A vision language foundation model for re- mote sensing.IEEE Transactions on Geoscience and Re- mote Sensing, 62:1–16, 2024. 2, 4, 7
work page 2024
-
[78]
Visual instruction tuning.Advances in neural infor- mation processing systems, 36:34892–34916, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural infor- mation processing systems, 36:34892–34916, 2023. 2, 3
work page 2023
-
[79]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 3
work page 2021
-
[80]
Zhuoran Liu, Danpei Zhao, Bo Yuan, and Zhiguo Jiang. RescueADI: adaptive disaster interpretation in remote sens- ing images with autonomous agents.IEEE Transactions on Geoscience and Remote Sensing, 2025. 5, 7, 9, 11
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.