Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response

Grace Colverd; James Walsh; Ra\'ul Ramos-Poll\'an; William Fawcett

arxiv: 2605.05405 · v3 · pith:2IGJVX5Inew · submitted 2026-05-06 · 💻 cs.CV

Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response

James Walsh , William Fawcett , Grace Colverd , Ra\'ul Ramos-Poll\'an This is my paper

Pith reviewed 2026-05-21 09:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords zero-shot retrievalsatellite imagerynatural language queriesdisaster responseproxy subsetprompt optimizationjoint embeddingscrisis management

0 comments

The pith

Optimizing text descriptions on a 100k proxy subset aligns language queries with frozen visual embeddings to retrieve relevant satellite images for disasters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish a practical method for querying satellite image collections with natural language at global scale. By optimizing prompts to describe a representative 100k subset of Sentinel-2 tiles, the resulting text embeddings align with distances in a pre-trained visual model called CLAY. This alignment supports a two-stage retrieval process that first narrows candidates with text similarity and then refines with visual nearest neighbors. A reader would care because full contrastive training on paired data is infeasible at this scale, yet crisis responders need intuitive ways to locate relevant imagery for floods, fires, and droughts. If the approach holds, it bridges foundation models to operational use in systems like ECHO for real-time disaster support.

Core claim

GeoQuery achieves zero-shot retrieval by optimizing a description-generation prompt on a proxy subset so that text embeddings correlate with visual embeddings from CLAY, enabling two-stage search that identifies relevant satellite images for disaster locations worldwide.

What carries the argument

Prompt optimization on language descriptions of a 100k proxy subset to align text-embedding distances with those in the frozen CLAY visual-embedding space for two-stage text-then-visual retrieval.

Load-bearing premise

That distances in the text-embedding space after prompt optimization on the 100k proxy subset will reliably correspond to distances in the frozen CLAY visual-embedding space for unseen global queries and disaster types.

What would settle it

A new test set of disaster-location queries from regions or disaster types outside the original UK floods, US wildfires, and US droughts evaluation showing retrieval accuracy well below 31.6 percent within 50 km.

Figures

Figures reproduced from arXiv: 2605.05405 by Grace Colverd, James Walsh, Ra\'ul Ramos-Poll\'an, William Fawcett.

**Figure 1.** Figure 1: The GeoQuery interface within ECHO, showing the natural-language navigation (“show me deserts”) and the similarity search. Floods [26] and post-disaster building damage assessment in xBD [27], but each requires task-specific labels rather than supporting open-ended retrieval. Our approach addresses these challenges through a two-stage retrieval strategy that applies expensive vision-language model inferenc… view at source ↗

**Figure 2.** Figure 2: The structure of GeoQuery’s two-level embeddings and search process for the satellite view at source ↗

**Figure 3.** Figure 3: Images of the Bellbowrie suburb of Brisbane, Australia. Left: photograph from the view at source ↗

**Figure 4.** Figure 4: Query processing workflow incorporating graph planning, execution, and validation with view at source ↗

**Figure 5.** Figure 5: Example AAG for flood modelling. The boxes show the internal tools used by view at source ↗

**Figure 6.** Figure 6: Crisis Centre flood simulation workflow - Initial disaster preparedness query for Valencia view at source ↗

**Figure 7.** Figure 7: Crisis Centre escalation scenario - Response to elevated METEO agency alerts demon view at source ↗

**Figure 8.** Figure 8: Crisis Centre monitoring and alerting workflow - Automated collection and assessment of view at source ↗

**Figure 9.** Figure 9: Crisis Centre severe weather response - Updated flood risk assessment incorporating severe view at source ↗

**Figure 10.** Figure 10: Crisis Centre quantitative flood modelling - Flash flood simulation based on specific rainfall view at source ↗

**Figure 11.** Figure 11: First Responder vehicle safety assessment - Road network analysis for emergency vehicle view at source ↗

**Figure 12.** Figure 12: First Responder route planning - Continuation of vehicle safety assessment showing road view at source ↗

**Figure 13.** Figure 13: Citizens safe zone identification - Public-facing workflow for identifying emergency view at source ↗

**Figure 14.** Figure 14: Internal alert reactivity - Satellite orbit planning and availability assessment for Valencia, view at source ↗

**Figure 15.** Figure 15: Internal alert reactivity flood mapping - Automated flood risk map generation triggered view at source ↗

read the original abstract

Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-sensing CLIP-style model requires paired data and compute that are unavailable at global scale. To allow natural language querying at global scales, we present GeoQuery, a zero-shot retrieval system that sidesteps data and compute constraints through a two-stage semantic and visual search, leveraging a natural language embedding of a subset (proxy) of global data. Rather than training a joint encoder, we generate language descriptions for a 100k proxy subset of global Sentinel-2 tiles and optimise the description-generation prompt so that distances in the resulting text-embedding space correlate with distances in the frozen CLAY visual-embedding space. Queries are resolved in two stages, with a text-similarity search over the proxy subset followed by a visual nearest-neighbour search over worldwide CLAY embeddings On 76 disaster-location queries covering UK floods, US wildfires, and US droughts, GeoQuery achieves 31.6\% accuracy within 50\,km, with the strongest performance on floods (50\% within 50\,km) where terrain features are well captured by RGB embeddings. Deployed within a crisis response system called \ECHO{}, GeoQuery identified vulnerable areas during Brisbane's 2025 Cyclone Alfred, with downstream flood simulations reproducing historical patterns. Prompt-aligned proxies offer a practical bridge between EO foundation models and operational retrieval when full contrastive training is out of reach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GeoQuery gives a workable prompt-tuning shortcut to link text queries to frozen CLAY satellite embeddings for crisis retrieval, but the 31.6% accuracy claim rests on thin evaluation details that leave generalization unclear.

read the letter

The main takeaway is that this paper shows how to get natural-language search over global Sentinel-2 archives without full contrastive training. They take a 100k proxy subset, generate descriptions, tune the prompt so text distances line up with frozen CLAY visual distances, then run text search on the proxy followed by visual nearest-neighbor on the full worldwide set. That two-stage setup plus the crisis-deployment example is the concrete contribution here. It is new in the specific combination for Earth-observation archives rather than a brand-new framework. What the work does well is the practical side: they plugged it into the ECHO system for Brisbane's 2025 Cyclone Alfred and showed it flagged vulnerable areas that matched historical flood simulations. That gives a sense of operational value for disaster response teams who cannot afford large paired datasets or retraining runs. The soft spots sit in the evaluation. The headline result is 31.6% accuracy within 50 km on 76 disaster queries (50% on floods), yet the abstract gives no breakdown on how those queries were chosen, what counts as a hit, or any error bars. There are also no ablations on the prompt-optimization step itself. The stress-test point about possible overfitting to the proxy subset characteristics is reasonable to raise; without explicit checks on held-out tiles or unseen disaster types, it is hard to tell whether the alignment transfers reliably or just fits the 100k sample. The central assumption that text-embedding distances after tuning will match visual distances for new global queries therefore stays untested in the reported numbers. This paper is aimed at applied remote-sensing groups and crisis-response practitioners who need quick text-based access to foundation-model embeddings. A reader already working with CLIP-style or CLAY embeddings would pick up the proxy-alignment trick and the deployment sketch. It deserves a serious referee because the method is straightforward to implement and the use case is timely, even though the experiments need more rigor on selection effects and generalization. I would send it to peer review with a request for expanded evaluation details and held-out tests.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GeoQuery, a zero-shot retrieval system for satellite imagery that generates language descriptions for a 100k proxy subset of global Sentinel-2 tiles, optimizes the description-generation prompt to align text-embedding distances with frozen CLAY visual embeddings, and resolves queries via text-similarity search over the proxy followed by visual nearest-neighbor search over worldwide CLAY embeddings. It reports 31.6% accuracy within 50 km on 76 disaster-location queries covering UK floods, US wildfires, and US droughts (with 50% on floods), and demonstrates deployment in the ECHO crisis response system for Brisbane's 2025 Cyclone Alfred.

Significance. If the prompt-optimized alignment generalizes reliably to unseen global locations and disaster types, the approach offers a practical, low-resource bridge between visual foundation models and natural-language querying of EO archives without full contrastive training or global paired data. The two-stage proxy-plus-visual design and the reported crisis-response application are potentially useful, though the strength of the contribution depends on demonstrating robust transfer beyond the optimization set.

major comments (2)

[Abstract] Abstract: The headline result of 31.6% accuracy within 50 km (50% on floods) on 76 disaster queries provides no information on query selection criteria, definition of a positive match, error bars, statistical significance, or ablation of the prompt-optimization step. These omissions make the central performance claim difficult to evaluate.
[Method] Method section (prompt optimization and two-stage retrieval): The prompt is optimized on the 100k proxy subset so that text-embedding distances correlate with frozen CLAY visual distances, yet no quantitative check is reported for correlation strength, retrieval quality, or generalization on a held-out portion of the proxy or on queries involving unseen locations and disaster types. This directly affects the validity of the zero-shot transfer assumption.

minor comments (2)

[Abstract] Abstract: Clarify whether the final visual nearest-neighbor search is performed over the complete worldwide CLAY embedding collection or a filtered subset.
[Abstract] Ensure the first use of the acronym ECHO is accompanied by its full expansion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important areas where additional clarity and analysis will strengthen the manuscript. We address each major comment point by point below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: The headline result of 31.6% accuracy within 50 km (50% on floods) on 76 disaster queries provides no information on query selection criteria, definition of a positive match, error bars, statistical significance, or ablation of the prompt-optimization step. These omissions make the central performance claim difficult to evaluate.

Authors: We agree that the abstract lacks sufficient supporting details for rigorous evaluation of the headline result. In the revised manuscript we will expand the abstract to briefly describe the query selection criteria (publicly reported disaster events for UK floods, US wildfires and US droughts), define a positive match as retrieval within 50 km of the documented ground-truth location, and reference the addition of error bars (via bootstrap resampling of the 76 queries), statistical significance testing, and an ablation of the prompt-optimization step. These elements will also be elaborated in the main text. revision: yes
Referee: [Method] Method section (prompt optimization and two-stage retrieval): The prompt is optimized on the 100k proxy subset so that text-embedding distances correlate with frozen CLAY visual distances, yet no quantitative check is reported for correlation strength, retrieval quality, or generalization on a held-out portion of the proxy or on queries involving unseen locations and disaster types. This directly affects the validity of the zero-shot transfer assumption.

Authors: The referee correctly notes the absence of direct quantitative diagnostics for the prompt-optimization procedure. While the reported end-to-end accuracy on the 76 disaster queries (which involve locations and event types outside the proxy) already provides indirect evidence of transfer, we acknowledge that explicit metrics are needed. In the revision we will add (i) correlation coefficients (Pearson and Spearman) between text-embedding and CLAY visual distances on the proxy set, (ii) retrieval-quality metrics on a held-out portion of the proxy, and (iii) explicit discussion of generalization to the unseen disaster queries. These additions will be placed in the Method section with supporting figures or tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical prompt optimization validated on held-out queries

full rationale

The paper presents an empirical two-stage retrieval method: language descriptions are generated for a 100k proxy subset of Sentinel-2 tiles, a prompt is optimized so that text-embedding distances correlate with frozen CLAY visual distances, and queries are handled via text search on the proxy followed by visual nearest-neighbor search globally. Performance is reported on 76 separate disaster-location queries (UK floods, US wildfires, US droughts) that are distinct from the proxy optimization set. No derivation, prediction, or result reduces to its inputs by construction, no self-citations or uniqueness theorems are invoked as load-bearing, and no ansatz or renaming is smuggled in. The central claim rests on measured accuracy rather than tautological equivalence, making the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the untested assumption that a small proxy set plus prompt tuning can produce a useful global alignment without introducing systematic bias for rare disaster features; no new physical entities or free parameters beyond the prompt itself are introduced.

free parameters (1)

prompt template for description generation
The wording instructions are tuned so that text distances correlate with visual distances; the exact template and tuning objective are not reported.

axioms (1)

domain assumption CLAY visual embeddings capture terrain and land-cover features relevant to flood, fire, and drought location queries
Invoked when claiming that RGB-based visual nearest-neighbor search will retrieve useful imagery for the tested disaster types.

pith-pipeline@v0.9.0 · 5820 in / 1434 out tokens · 28006 ms · 2026-05-21T09:21:24.791359+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

Clay foundation model: An open source AI model for earth

Clay Foundation. Clay foundation model: An open source AI model for earth. https: //github.com/Clay-foundation/model, 2024. Version 1.5. Pretrained Vision Transformer with masked autoencoder objective on approximately 70 million globally sampled chips from Sentinel-2, Landsat, Sentinel-1 SAR, LINZ, NAIP, and MODIS

work page 2024
[2]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.CoRR, abs/2103.00020, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

Le, Yunhsuan Sung, Zhen Li, and Tom Duerig

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V . Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, pages 49...

work page 2021
[4]

BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 12888–12900. PMLR, 2022

work page 2022
[5]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986, 2023

work page 2023
[6]

Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications, 2025

Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Carlos Gomes, Ankur Kumar, Myscon Truong, Denys Godwin, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, Besart Mujeci, Disha Shid- ham, Tr...

work page 2025
[7]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery, 2023

work page 2023
[8]

Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell

Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale- aware masked autoencoder for multiscale geospatial representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4088–4099, 2023

work page 2023
[9]

SatlasPretrain: A large-scale dataset for remote sensing image understanding

Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdinando, and Aniruddha Kembhavi. SatlasPretrain: A large-scale dataset for remote sensing image understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16772–16782, 2023

work page 2023
[10]

SpectralGPT: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244, 2024

Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, and 6 Jocelyn Chanussot. SpectralGPT: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244, 2024

work page 2024
[11]

Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu

Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu. Neural plasticity-inspired multimodal foundation model for earth observation, 2024

work page 2024
[12]

Foundation models for remote sensing and earth observation: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025

Aoran Xiao, Weihao Xuan, Junjue Wang, Jiaxing Huang, Dacheng Tao, Shijian Lu, and Naoto Yokoya. Foundation models for remote sensing and earth observation: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025. In press

work page 2025
[13]

GEO-Bench: Toward foundation models for earth monitoring

Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, and Xiao Xiang Zhu. GEO-Bench: Toward foundation models for earth monitoring. InAdvances in Neural ...

work page 2023
[14]

Rs5m and georsclip: A large- scale vision- language dataset and a large vision-language model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–23, 2024

Zilun Zhang, Tiancheng Zhao, Yulong Guo, and Jianwei Yin. Rs5m and georsclip: A large- scale vision- language dataset and a large vision-language model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–23, 2024

work page 2024
[15]

Remoteclip: A vision language foundation model for remote sensing, 2024

Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Remoteclip: A vision language foundation model for remote sensing, 2024

work page 2024
[16]

SkyScript: A large and semantically diverse vision-language dataset for remote sensing

Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, and Ram Rajagopal. SkyScript: A large and semantically diverse vision-language dataset for remote sensing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5805–5813, 2024

work page 2024
[17]

Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu

Zhitong Xiong, Yi Wang, Weikang Yu, Adam J. Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu. DOFA-CLIP: Multimodal vision-language foundation models for earth observation, 2025

work page 2025
[18]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023

work page 2023
[19]

ReAct: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InProceedings of the 11th International Conference on Learning Representations (ICLR), 2023

work page 2023
[20]

White, Doug Burger, and Chi Wang

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. InProceedings of the 1st Conference on Language Modeling (COLM), 2024

work page 2024
[21]

Chemcrow: Augmenting large-language models with chemistry tools, 2023

Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools, 2023

work page 2023
[22]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023

work page 2023
[23]

Alireza Ghafarollahi and Markus J. Buehler. ProtAgents: Protein discoveryvialarge language model multi-agent collaborations combining physics and machine learning.Digital Discovery, 3(7):1389–1409, 2024

work page 2024
[24]

Geogpt: An assistant for understand- ing and processing geospatial tasks.International Journal of Applied Earth Observation and Geoinformation, 131:103976, 2024

Yifan Zhang, Cheng Wei, Zhengting He, and Wenhao Yu. Geogpt: An assistant for understand- ing and processing geospatial tasks.International Journal of Applied Earth Observation and Geoinformation, 131:103976, 2024

work page 2024
[25]

Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1

Derrick Bonafilia, Beth Tellman, Tyler Anderson, and Erica Issenberg. Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 210–211, 2020. 7

work page 2020
[26]

Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

Gonzalo Mateo-Garcia, Joshua Veitch-Michaelis, Lewis Smith, Silviu Vlad Oprea, Guy Schu- mann, Yarin Gal, Atılım Güne¸ s Baydin, and Dietmar Backes. Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

work page 2021
[27]

xBD: A dataset for assessing building damage from satellite imagery, 2019

Ritwik Gupta, Richard Hosfelt, Sandra Sajeev, Nirav Patel, Bryce Goodman, Jigar Doshi, Eric Heim, Howie Choset, and Matthew Gaston. xBD: A dataset for assessing building damage from satellite imagery, 2019

work page 2019
[28]

CesiumJS

Bentley Systems. CesiumJS

work page
[29]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024

Gemini Team Google. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024

work page 2024
[30]

gradient descent

Reid Pryzant, Dan Iter, Jerry Li, Yin Lee, Chenguang Zhu, and Michael Zeng. Automatic prompt optimization with “gradient descent” and beam search. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957–7968, Singapore, December 2023. Association for Computati...

work page 2023
[31]

Le, Denny Zhou, and Xinyun Chen

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V . Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InProceedings of the 12th International Conference on Learning Representations (ICLR), 2024

work page 2024
[32]

Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. InProceedings of the 12th International Conference on Learning...

work page 2024
[33]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020

work page 2020
[34]

ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT

Omar Khattab and Matei Zaharia. ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 39–48, 2020

work page 2020
[35]

PlaNet - photo geolocation with convolu- tional neural networks

Tobias Weyand, Ilya Kostrikov, and James Philbin. PlaNet - photo geolocation with convolu- tional neural networks. InComputer Vision – ECCV 2016, volume 9912 ofLecture Notes in Computer Science, pages 37–55. Springer, 2016

work page 2016
[36]

Australian Government Publishing Service, Canberra, 1974

Bureau of Meteorology.Brisbane Floods January 1974: Report by Director of Meteorology. Australian Government Publishing Service, Canberra, 1974

work page 1974
[37]

Open Source Geospatial Foundation, 2025

GDAL/OGR contributors.GDAL/OGR Geospatial Data Abstraction software Library. Open Source Geospatial Foundation, 2025

work page 2025
[38]

understood

Kelsey Jordahl et al. geopandas/geopandas: v0.6.1, October 2019. A GeoQuery Ablation Study A.1 Experimental Setup We evaluated GeoQuery’s disaster location identification capability using 76 queries across three categories: 40 UK flood queries (testing 10 major 2024 flooding locations including Stratford- upon-Avon, Birmingham, and Portsmouth), 20 US wild...

work page 2019
[39]

meteorological alerts for severe rainfall)

Risk identification via external monitoring (e.g. meteorological alerts for severe rainfall)

work page
[40]

These extents define the bounds for digital twinning of infrastructure and topography, a core foundation for downstream simulation and scenario building

The risk is developed into a “project” defined spatially and temporally. These extents define the bounds for digital twinning of infrastructure and topography, a core foundation for downstream simulation and scenario building. For example, a national meteorological agency might flag a possible flood event triggered by 48 hours of intense rainfall in Australia

work page
[41]

ECHO supports requests to specify which real-time data streams must be monitored first, simulate crisis events, and finally define alerting procedures as information is ingested

Once enough information is collected on a given project, experts may begin to define the nature of the inquiry. ECHO supports requests to specify which real-time data streams must be monitored first, simulate crisis events, and finally define alerting procedures as information is ingested. For example, five-metre digital elevation maps are downloaded alon...

work page
[42]

For example, an expert might request a flood model and an evaluation of which buildings may be suitable for sheltering at-risk individuals in place

These highly granular assets are then accessible to an expert to rapidly define the line of geospatial inquiry and identify risks unknown to the automated system. For example, an expert might request a flood model and an evaluation of which buildings may be suitable for sheltering at-risk individuals in place

work page
[43]

Digital Twin

A crisis responder or member of the public may then request hyper-localised information from the contextually aware agent. For example, they might ask which roads are likely to be inaccessible to a particular vehicle, such as an ambulance or a family car, when planning a safe route. For any of the steps above to be possible, we require a means to construc...

work page 2025
[44]

data": bbox

Disaster Risk Analysis For requests about assessing disaster risks (fire, floods, earthquakes, etc.), ensure the query includes: - Location of interest - Time horizon - Type of disaster Example 1: Previous context: Take me to valencia Current state variables available: {"data": bbox"} User Input: Can you determine if this area is flood prone over the next...

work page
[45]

Show me images of oceans near deserts

Satellite Image Search For general satellite image queries that don’t involve disaster risk (e.g., "Show me images of oceans near deserts"). These queries do not require a time horizon, nor a specific location. Feel confident to pass on such queries to the planner as long as no disasters are mentioned. Example 1: User input: show me forests Output: {’stat...

work page
[46]

Start with OSM_Geocode for location queries

work page
[47]

Use ’after’ for dependencies

work page
[48]

Empty ’after’ means step can start immediately

work page
[49]

Input/output must match tool definitions exactly

work page
[50]

Use only listed tools

work page
[51]

OSM Points of Interest should only be used when looking for specific physical infrastructure tags **{examples}** Return only valid JSON matching this format using listed tools. B.5 Planner User Prompt Create a logical tool sequence plan for: ‘‘‘{query}‘‘‘ Here are all previous messages between the user and the planner: **{conversation_history}** Here are ...

work page

[1] [1]

Clay foundation model: An open source AI model for earth

Clay Foundation. Clay foundation model: An open source AI model for earth. https: //github.com/Clay-foundation/model, 2024. Version 1.5. Pretrained Vision Transformer with masked autoencoder objective on approximately 70 million globally sampled chips from Sentinel-2, Landsat, Sentinel-1 SAR, LINZ, NAIP, and MODIS

work page 2024

[2] [2]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.CoRR, abs/2103.00020, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[3] [3]

Le, Yunhsuan Sung, Zhen Li, and Tom Duerig

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V . Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, pages 49...

work page 2021

[4] [4]

BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 12888–12900. PMLR, 2022

work page 2022

[5] [5]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986, 2023

work page 2023

[6] [6]

Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications, 2025

Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Carlos Gomes, Ankur Kumar, Myscon Truong, Denys Godwin, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, Besart Mujeci, Disha Shid- ham, Tr...

work page 2025

[7] [7]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery, 2023

work page 2023

[8] [8]

Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell

Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale- aware masked autoencoder for multiscale geospatial representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4088–4099, 2023

work page 2023

[9] [9]

SatlasPretrain: A large-scale dataset for remote sensing image understanding

Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdinando, and Aniruddha Kembhavi. SatlasPretrain: A large-scale dataset for remote sensing image understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16772–16782, 2023

work page 2023

[10] [10]

SpectralGPT: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244, 2024

Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, and 6 Jocelyn Chanussot. SpectralGPT: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244, 2024

work page 2024

[11] [11]

Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu

Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu. Neural plasticity-inspired multimodal foundation model for earth observation, 2024

work page 2024

[12] [12]

Foundation models for remote sensing and earth observation: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025

Aoran Xiao, Weihao Xuan, Junjue Wang, Jiaxing Huang, Dacheng Tao, Shijian Lu, and Naoto Yokoya. Foundation models for remote sensing and earth observation: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025. In press

work page 2025

[13] [13]

GEO-Bench: Toward foundation models for earth monitoring

Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, and Xiao Xiang Zhu. GEO-Bench: Toward foundation models for earth monitoring. InAdvances in Neural ...

work page 2023

[14] [14]

Rs5m and georsclip: A large- scale vision- language dataset and a large vision-language model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–23, 2024

Zilun Zhang, Tiancheng Zhao, Yulong Guo, and Jianwei Yin. Rs5m and georsclip: A large- scale vision- language dataset and a large vision-language model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–23, 2024

work page 2024

[15] [15]

Remoteclip: A vision language foundation model for remote sensing, 2024

Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Remoteclip: A vision language foundation model for remote sensing, 2024

work page 2024

[16] [16]

SkyScript: A large and semantically diverse vision-language dataset for remote sensing

Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, and Ram Rajagopal. SkyScript: A large and semantically diverse vision-language dataset for remote sensing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5805–5813, 2024

work page 2024

[17] [17]

Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu

Zhitong Xiong, Yi Wang, Weikang Yu, Adam J. Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu. DOFA-CLIP: Multimodal vision-language foundation models for earth observation, 2025

work page 2025

[18] [18]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023

work page 2023

[19] [19]

ReAct: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InProceedings of the 11th International Conference on Learning Representations (ICLR), 2023

work page 2023

[20] [20]

White, Doug Burger, and Chi Wang

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. InProceedings of the 1st Conference on Language Modeling (COLM), 2024

work page 2024

[21] [21]

Chemcrow: Augmenting large-language models with chemistry tools, 2023

Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools, 2023

work page 2023

[22] [22]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023

work page 2023

[23] [23]

Alireza Ghafarollahi and Markus J. Buehler. ProtAgents: Protein discoveryvialarge language model multi-agent collaborations combining physics and machine learning.Digital Discovery, 3(7):1389–1409, 2024

work page 2024

[24] [24]

Geogpt: An assistant for understand- ing and processing geospatial tasks.International Journal of Applied Earth Observation and Geoinformation, 131:103976, 2024

Yifan Zhang, Cheng Wei, Zhengting He, and Wenhao Yu. Geogpt: An assistant for understand- ing and processing geospatial tasks.International Journal of Applied Earth Observation and Geoinformation, 131:103976, 2024

work page 2024

[25] [25]

Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1

Derrick Bonafilia, Beth Tellman, Tyler Anderson, and Erica Issenberg. Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 210–211, 2020. 7

work page 2020

[26] [26]

Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

Gonzalo Mateo-Garcia, Joshua Veitch-Michaelis, Lewis Smith, Silviu Vlad Oprea, Guy Schu- mann, Yarin Gal, Atılım Güne¸ s Baydin, and Dietmar Backes. Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

work page 2021

[27] [27]

xBD: A dataset for assessing building damage from satellite imagery, 2019

Ritwik Gupta, Richard Hosfelt, Sandra Sajeev, Nirav Patel, Bryce Goodman, Jigar Doshi, Eric Heim, Howie Choset, and Matthew Gaston. xBD: A dataset for assessing building damage from satellite imagery, 2019

work page 2019

[28] [28]

CesiumJS

Bentley Systems. CesiumJS

work page

[29] [29]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024

Gemini Team Google. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024

work page 2024

[30] [30]

gradient descent

Reid Pryzant, Dan Iter, Jerry Li, Yin Lee, Chenguang Zhu, and Michael Zeng. Automatic prompt optimization with “gradient descent” and beam search. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957–7968, Singapore, December 2023. Association for Computati...

work page 2023

[31] [31]

Le, Denny Zhou, and Xinyun Chen

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V . Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InProceedings of the 12th International Conference on Learning Representations (ICLR), 2024

work page 2024

[32] [32]

Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. InProceedings of the 12th International Conference on Learning...

work page 2024

[33] [33]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020

work page 2020

[34] [34]

ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT

Omar Khattab and Matei Zaharia. ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 39–48, 2020

work page 2020

[35] [35]

PlaNet - photo geolocation with convolu- tional neural networks

Tobias Weyand, Ilya Kostrikov, and James Philbin. PlaNet - photo geolocation with convolu- tional neural networks. InComputer Vision – ECCV 2016, volume 9912 ofLecture Notes in Computer Science, pages 37–55. Springer, 2016

work page 2016

[36] [36]

Australian Government Publishing Service, Canberra, 1974

Bureau of Meteorology.Brisbane Floods January 1974: Report by Director of Meteorology. Australian Government Publishing Service, Canberra, 1974

work page 1974

[37] [37]

Open Source Geospatial Foundation, 2025

GDAL/OGR contributors.GDAL/OGR Geospatial Data Abstraction software Library. Open Source Geospatial Foundation, 2025

work page 2025

[38] [38]

understood

Kelsey Jordahl et al. geopandas/geopandas: v0.6.1, October 2019. A GeoQuery Ablation Study A.1 Experimental Setup We evaluated GeoQuery’s disaster location identification capability using 76 queries across three categories: 40 UK flood queries (testing 10 major 2024 flooding locations including Stratford- upon-Avon, Birmingham, and Portsmouth), 20 US wild...

work page 2019

[39] [39]

meteorological alerts for severe rainfall)

Risk identification via external monitoring (e.g. meteorological alerts for severe rainfall)

work page

[40] [40]

These extents define the bounds for digital twinning of infrastructure and topography, a core foundation for downstream simulation and scenario building

The risk is developed into a “project” defined spatially and temporally. These extents define the bounds for digital twinning of infrastructure and topography, a core foundation for downstream simulation and scenario building. For example, a national meteorological agency might flag a possible flood event triggered by 48 hours of intense rainfall in Australia

work page

[41] [41]

ECHO supports requests to specify which real-time data streams must be monitored first, simulate crisis events, and finally define alerting procedures as information is ingested

Once enough information is collected on a given project, experts may begin to define the nature of the inquiry. ECHO supports requests to specify which real-time data streams must be monitored first, simulate crisis events, and finally define alerting procedures as information is ingested. For example, five-metre digital elevation maps are downloaded alon...

work page

[42] [42]

For example, an expert might request a flood model and an evaluation of which buildings may be suitable for sheltering at-risk individuals in place

These highly granular assets are then accessible to an expert to rapidly define the line of geospatial inquiry and identify risks unknown to the automated system. For example, an expert might request a flood model and an evaluation of which buildings may be suitable for sheltering at-risk individuals in place

work page

[43] [43]

Digital Twin

A crisis responder or member of the public may then request hyper-localised information from the contextually aware agent. For example, they might ask which roads are likely to be inaccessible to a particular vehicle, such as an ambulance or a family car, when planning a safe route. For any of the steps above to be possible, we require a means to construc...

work page 2025

[44] [44]

data": bbox

Disaster Risk Analysis For requests about assessing disaster risks (fire, floods, earthquakes, etc.), ensure the query includes: - Location of interest - Time horizon - Type of disaster Example 1: Previous context: Take me to valencia Current state variables available: {"data": bbox"} User Input: Can you determine if this area is flood prone over the next...

work page

[45] [45]

Show me images of oceans near deserts

Satellite Image Search For general satellite image queries that don’t involve disaster risk (e.g., "Show me images of oceans near deserts"). These queries do not require a time horizon, nor a specific location. Feel confident to pass on such queries to the planner as long as no disasters are mentioned. Example 1: User input: show me forests Output: {’stat...

work page

[46] [46]

Start with OSM_Geocode for location queries

work page

[47] [47]

Use ’after’ for dependencies

work page

[48] [48]

Empty ’after’ means step can start immediately

work page

[49] [49]

Input/output must match tool definitions exactly

work page

[50] [50]

Use only listed tools

work page

[51] [51]

OSM Points of Interest should only be used when looking for specific physical infrastructure tags **{examples}** Return only valid JSON matching this format using listed tools. B.5 Planner User Prompt Create a logical tool sequence plan for: ‘‘‘{query}‘‘‘ Here are all previous messages between the user and the planner: **{conversation_history}** Here are ...

work page