pith. sign in

arxiv: 2605.05405 · v3 · pith:2IGJVX5Inew · submitted 2026-05-06 · 💻 cs.CV

Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response

Pith reviewed 2026-05-21 09:21 UTC · model grok-4.3

classification 💻 cs.CV
keywords zero-shot retrievalsatellite imagerynatural language queriesdisaster responseproxy subsetprompt optimizationjoint embeddingscrisis management
0
0 comments X

The pith

Optimizing text descriptions on a 100k proxy subset aligns language queries with frozen visual embeddings to retrieve relevant satellite images for disasters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish a practical method for querying satellite image collections with natural language at global scale. By optimizing prompts to describe a representative 100k subset of Sentinel-2 tiles, the resulting text embeddings align with distances in a pre-trained visual model called CLAY. This alignment supports a two-stage retrieval process that first narrows candidates with text similarity and then refines with visual nearest neighbors. A reader would care because full contrastive training on paired data is infeasible at this scale, yet crisis responders need intuitive ways to locate relevant imagery for floods, fires, and droughts. If the approach holds, it bridges foundation models to operational use in systems like ECHO for real-time disaster support.

Core claim

GeoQuery achieves zero-shot retrieval by optimizing a description-generation prompt on a proxy subset so that text embeddings correlate with visual embeddings from CLAY, enabling two-stage search that identifies relevant satellite images for disaster locations worldwide.

What carries the argument

Prompt optimization on language descriptions of a 100k proxy subset to align text-embedding distances with those in the frozen CLAY visual-embedding space for two-stage text-then-visual retrieval.

Load-bearing premise

That distances in the text-embedding space after prompt optimization on the 100k proxy subset will reliably correspond to distances in the frozen CLAY visual-embedding space for unseen global queries and disaster types.

What would settle it

A new test set of disaster-location queries from regions or disaster types outside the original UK floods, US wildfires, and US droughts evaluation showing retrieval accuracy well below 31.6 percent within 50 km.

Figures

Figures reproduced from arXiv: 2605.05405 by Grace Colverd, James Walsh, Ra\'ul Ramos-Poll\'an, William Fawcett.

Figure 1
Figure 1. Figure 1: The GeoQuery interface within ECHO, showing the natural-language navigation (“show me deserts”) and the similarity search. Floods [26] and post-disaster building damage assessment in xBD [27], but each requires task-specific labels rather than supporting open-ended retrieval. Our approach addresses these challenges through a two-stage retrieval strategy that applies expensive vision-language model inferenc… view at source ↗
Figure 2
Figure 2. Figure 2: The structure of GeoQuery’s two-level embeddings and search process for the satellite view at source ↗
Figure 3
Figure 3. Figure 3: Images of the Bellbowrie suburb of Brisbane, Australia. Left: photograph from the view at source ↗
Figure 4
Figure 4. Figure 4: Query processing workflow incorporating graph planning, execution, and validation with view at source ↗
Figure 5
Figure 5. Figure 5: Example AAG for flood modelling. The boxes show the internal tools used by view at source ↗
Figure 6
Figure 6. Figure 6: Crisis Centre flood simulation workflow - Initial disaster preparedness query for Valencia view at source ↗
Figure 7
Figure 7. Figure 7: Crisis Centre escalation scenario - Response to elevated METEO agency alerts demon view at source ↗
Figure 8
Figure 8. Figure 8: Crisis Centre monitoring and alerting workflow - Automated collection and assessment of view at source ↗
Figure 9
Figure 9. Figure 9: Crisis Centre severe weather response - Updated flood risk assessment incorporating severe view at source ↗
Figure 10
Figure 10. Figure 10: Crisis Centre quantitative flood modelling - Flash flood simulation based on specific rainfall view at source ↗
Figure 11
Figure 11. Figure 11: First Responder vehicle safety assessment - Road network analysis for emergency vehicle view at source ↗
Figure 12
Figure 12. Figure 12: First Responder route planning - Continuation of vehicle safety assessment showing road view at source ↗
Figure 13
Figure 13. Figure 13: Citizens safe zone identification - Public-facing workflow for identifying emergency view at source ↗
Figure 14
Figure 14. Figure 14: Internal alert reactivity - Satellite orbit planning and availability assessment for Valencia, view at source ↗
Figure 15
Figure 15. Figure 15: Internal alert reactivity flood mapping - Automated flood risk map generation triggered view at source ↗
read the original abstract

Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-sensing CLIP-style model requires paired data and compute that are unavailable at global scale. To allow natural language querying at global scales, we present GeoQuery, a zero-shot retrieval system that sidesteps data and compute constraints through a two-stage semantic and visual search, leveraging a natural language embedding of a subset (proxy) of global data. Rather than training a joint encoder, we generate language descriptions for a 100k proxy subset of global Sentinel-2 tiles and optimise the description-generation prompt so that distances in the resulting text-embedding space correlate with distances in the frozen CLAY visual-embedding space. Queries are resolved in two stages, with a text-similarity search over the proxy subset followed by a visual nearest-neighbour search over worldwide CLAY embeddings On 76 disaster-location queries covering UK floods, US wildfires, and US droughts, GeoQuery achieves 31.6\% accuracy within 50\,km, with the strongest performance on floods (50\% within 50\,km) where terrain features are well captured by RGB embeddings. Deployed within a crisis response system called \ECHO{}, GeoQuery identified vulnerable areas during Brisbane's 2025 Cyclone Alfred, with downstream flood simulations reproducing historical patterns. Prompt-aligned proxies offer a practical bridge between EO foundation models and operational retrieval when full contrastive training is out of reach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GeoQuery, a zero-shot retrieval system for satellite imagery that generates language descriptions for a 100k proxy subset of global Sentinel-2 tiles, optimizes the description-generation prompt to align text-embedding distances with frozen CLAY visual embeddings, and resolves queries via text-similarity search over the proxy followed by visual nearest-neighbor search over worldwide CLAY embeddings. It reports 31.6% accuracy within 50 km on 76 disaster-location queries covering UK floods, US wildfires, and US droughts (with 50% on floods), and demonstrates deployment in the ECHO crisis response system for Brisbane's 2025 Cyclone Alfred.

Significance. If the prompt-optimized alignment generalizes reliably to unseen global locations and disaster types, the approach offers a practical, low-resource bridge between visual foundation models and natural-language querying of EO archives without full contrastive training or global paired data. The two-stage proxy-plus-visual design and the reported crisis-response application are potentially useful, though the strength of the contribution depends on demonstrating robust transfer beyond the optimization set.

major comments (2)
  1. [Abstract] Abstract: The headline result of 31.6% accuracy within 50 km (50% on floods) on 76 disaster queries provides no information on query selection criteria, definition of a positive match, error bars, statistical significance, or ablation of the prompt-optimization step. These omissions make the central performance claim difficult to evaluate.
  2. [Method] Method section (prompt optimization and two-stage retrieval): The prompt is optimized on the 100k proxy subset so that text-embedding distances correlate with frozen CLAY visual distances, yet no quantitative check is reported for correlation strength, retrieval quality, or generalization on a held-out portion of the proxy or on queries involving unseen locations and disaster types. This directly affects the validity of the zero-shot transfer assumption.
minor comments (2)
  1. [Abstract] Abstract: Clarify whether the final visual nearest-neighbor search is performed over the complete worldwide CLAY embedding collection or a filtered subset.
  2. [Abstract] Ensure the first use of the acronym ECHO is accompanied by its full expansion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important areas where additional clarity and analysis will strengthen the manuscript. We address each major comment point by point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline result of 31.6% accuracy within 50 km (50% on floods) on 76 disaster queries provides no information on query selection criteria, definition of a positive match, error bars, statistical significance, or ablation of the prompt-optimization step. These omissions make the central performance claim difficult to evaluate.

    Authors: We agree that the abstract lacks sufficient supporting details for rigorous evaluation of the headline result. In the revised manuscript we will expand the abstract to briefly describe the query selection criteria (publicly reported disaster events for UK floods, US wildfires and US droughts), define a positive match as retrieval within 50 km of the documented ground-truth location, and reference the addition of error bars (via bootstrap resampling of the 76 queries), statistical significance testing, and an ablation of the prompt-optimization step. These elements will also be elaborated in the main text. revision: yes

  2. Referee: [Method] Method section (prompt optimization and two-stage retrieval): The prompt is optimized on the 100k proxy subset so that text-embedding distances correlate with frozen CLAY visual distances, yet no quantitative check is reported for correlation strength, retrieval quality, or generalization on a held-out portion of the proxy or on queries involving unseen locations and disaster types. This directly affects the validity of the zero-shot transfer assumption.

    Authors: The referee correctly notes the absence of direct quantitative diagnostics for the prompt-optimization procedure. While the reported end-to-end accuracy on the 76 disaster queries (which involve locations and event types outside the proxy) already provides indirect evidence of transfer, we acknowledge that explicit metrics are needed. In the revision we will add (i) correlation coefficients (Pearson and Spearman) between text-embedding and CLAY visual distances on the proxy set, (ii) retrieval-quality metrics on a held-out portion of the proxy, and (iii) explicit discussion of generalization to the unseen disaster queries. These additions will be placed in the Method section with supporting figures or tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical prompt optimization validated on held-out queries

full rationale

The paper presents an empirical two-stage retrieval method: language descriptions are generated for a 100k proxy subset of Sentinel-2 tiles, a prompt is optimized so that text-embedding distances correlate with frozen CLAY visual distances, and queries are handled via text search on the proxy followed by visual nearest-neighbor search globally. Performance is reported on 76 separate disaster-location queries (UK floods, US wildfires, US droughts) that are distinct from the proxy optimization set. No derivation, prediction, or result reduces to its inputs by construction, no self-citations or uniqueness theorems are invoked as load-bearing, and no ansatz or renaming is smuggled in. The central claim rests on measured accuracy rather than tautological equivalence, making the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the untested assumption that a small proxy set plus prompt tuning can produce a useful global alignment without introducing systematic bias for rare disaster features; no new physical entities or free parameters beyond the prompt itself are introduced.

free parameters (1)
  • prompt template for description generation
    The wording instructions are tuned so that text distances correlate with visual distances; the exact template and tuning objective are not reported.
axioms (1)
  • domain assumption CLAY visual embeddings capture terrain and land-cover features relevant to flood, fire, and drought location queries
    Invoked when claiming that RGB-based visual nearest-neighbor search will retrieve useful imagery for the tested disaster types.

pith-pipeline@v0.9.0 · 5820 in / 1434 out tokens · 28006 ms · 2026-05-21T09:21:24.791359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

  1. [1]

    Clay foundation model: An open source AI model for earth

    Clay Foundation. Clay foundation model: An open source AI model for earth. https: //github.com/Clay-foundation/model, 2024. Version 1.5. Pretrained Vision Transformer with masked autoencoder objective on approximately 70 million globally sampled chips from Sentinel-2, Landsat, Sentinel-1 SAR, LINZ, NAIP, and MODIS

  2. [2]

    Learning Transferable Visual Models From Natural Language Supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.CoRR, abs/2103.00020, 2021

  3. [3]

    Le, Yunhsuan Sung, Zhen Li, and Tom Duerig

    Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V . Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, pages 49...

  4. [4]

    BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 12888–12900. PMLR, 2022

  5. [5]

    Sigmoid loss for language image pre-training

    Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986, 2023

  6. [6]

    Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications, 2025

    Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Carlos Gomes, Ankur Kumar, Myscon Truong, Denys Godwin, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, Besart Mujeci, Disha Shid- ham, Tr...

  7. [7]

    Lobell, and Stefano Ermon

    Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery, 2023

  8. [8]

    Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell

    Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale- aware masked autoencoder for multiscale geospatial representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4088–4099, 2023

  9. [9]

    SatlasPretrain: A large-scale dataset for remote sensing image understanding

    Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdinando, and Aniruddha Kembhavi. SatlasPretrain: A large-scale dataset for remote sensing image understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16772–16782, 2023

  10. [10]

    SpectralGPT: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244, 2024

    Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, and 6 Jocelyn Chanussot. SpectralGPT: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244, 2024

  11. [11]

    Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu

    Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu. Neural plasticity-inspired multimodal foundation model for earth observation, 2024

  12. [12]

    Foundation models for remote sensing and earth observation: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025

    Aoran Xiao, Weihao Xuan, Junjue Wang, Jiaxing Huang, Dacheng Tao, Shijian Lu, and Naoto Yokoya. Foundation models for remote sensing and earth observation: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025. In press

  13. [13]

    GEO-Bench: Toward foundation models for earth monitoring

    Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, and Xiao Xiang Zhu. GEO-Bench: Toward foundation models for earth monitoring. InAdvances in Neural ...

  14. [14]

    Rs5m and georsclip: A large- scale vision- language dataset and a large vision-language model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–23, 2024

    Zilun Zhang, Tiancheng Zhao, Yulong Guo, and Jianwei Yin. Rs5m and georsclip: A large- scale vision- language dataset and a large vision-language model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–23, 2024

  15. [15]

    Remoteclip: A vision language foundation model for remote sensing, 2024

    Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Remoteclip: A vision language foundation model for remote sensing, 2024

  16. [16]

    SkyScript: A large and semantically diverse vision-language dataset for remote sensing

    Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, and Ram Rajagopal. SkyScript: A large and semantically diverse vision-language dataset for remote sensing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5805–5813, 2024

  17. [17]

    Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu

    Zhitong Xiong, Yi Wang, Weikang Yu, Adam J. Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu. DOFA-CLIP: Multimodal vision-language foundation models for earth observation, 2025

  18. [18]

    Toolformer: Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023), 2023

  19. [19]

    ReAct: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InProceedings of the 11th International Conference on Learning Representations (ICLR), 2023

  20. [20]

    White, Doug Burger, and Chi Wang

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. InProceedings of the 1st Conference on Language Modeling (COLM), 2024

  21. [21]

    Chemcrow: Augmenting large-language models with chemistry tools, 2023

    Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools, 2023

  22. [22]

    Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

    Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023

  23. [23]

    Alireza Ghafarollahi and Markus J. Buehler. ProtAgents: Protein discoveryvialarge language model multi-agent collaborations combining physics and machine learning.Digital Discovery, 3(7):1389–1409, 2024

  24. [24]

    Geogpt: An assistant for understand- ing and processing geospatial tasks.International Journal of Applied Earth Observation and Geoinformation, 131:103976, 2024

    Yifan Zhang, Cheng Wei, Zhengting He, and Wenhao Yu. Geogpt: An assistant for understand- ing and processing geospatial tasks.International Journal of Applied Earth Observation and Geoinformation, 131:103976, 2024

  25. [25]

    Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1

    Derrick Bonafilia, Beth Tellman, Tyler Anderson, and Erica Issenberg. Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 210–211, 2020. 7

  26. [26]

    Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

    Gonzalo Mateo-Garcia, Joshua Veitch-Michaelis, Lewis Smith, Silviu Vlad Oprea, Guy Schu- mann, Yarin Gal, Atılım Güne¸ s Baydin, and Dietmar Backes. Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

  27. [27]

    xBD: A dataset for assessing building damage from satellite imagery, 2019

    Ritwik Gupta, Richard Hosfelt, Sandra Sajeev, Nirav Patel, Bryce Goodman, Jigar Doshi, Eric Heim, Howie Choset, and Matthew Gaston. xBD: A dataset for assessing building damage from satellite imagery, 2019

  28. [28]

    CesiumJS

    Bentley Systems. CesiumJS

  29. [29]

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024

    Gemini Team Google. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024

  30. [30]

    gradient descent

    Reid Pryzant, Dan Iter, Jerry Li, Yin Lee, Chenguang Zhu, and Michael Zeng. Automatic prompt optimization with “gradient descent” and beam search. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957–7968, Singapore, December 2023. Association for Computati...

  31. [31]

    Le, Denny Zhou, and Xinyun Chen

    Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V . Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InProceedings of the 12th International Conference on Learning Representations (ICLR), 2024

  32. [32]

    Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts

    Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. InProceedings of the 12th International Conference on Learning...

  33. [33]

    Dense passage retrieval for open-domain question answering

    Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020

  34. [34]

    ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT

    Omar Khattab and Matei Zaharia. ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 39–48, 2020

  35. [35]

    PlaNet - photo geolocation with convolu- tional neural networks

    Tobias Weyand, Ilya Kostrikov, and James Philbin. PlaNet - photo geolocation with convolu- tional neural networks. InComputer Vision – ECCV 2016, volume 9912 ofLecture Notes in Computer Science, pages 37–55. Springer, 2016

  36. [36]

    Australian Government Publishing Service, Canberra, 1974

    Bureau of Meteorology.Brisbane Floods January 1974: Report by Director of Meteorology. Australian Government Publishing Service, Canberra, 1974

  37. [37]

    Open Source Geospatial Foundation, 2025

    GDAL/OGR contributors.GDAL/OGR Geospatial Data Abstraction software Library. Open Source Geospatial Foundation, 2025

  38. [38]

    understood

    Kelsey Jordahl et al. geopandas/geopandas: v0.6.1, October 2019. A GeoQuery Ablation Study A.1 Experimental Setup We evaluated GeoQuery’s disaster location identification capability using 76 queries across three categories: 40 UK flood queries (testing 10 major 2024 flooding locations including Stratford- upon-Avon, Birmingham, and Portsmouth), 20 US wild...

  39. [39]

    meteorological alerts for severe rainfall)

    Risk identification via external monitoring (e.g. meteorological alerts for severe rainfall)

  40. [40]

    These extents define the bounds for digital twinning of infrastructure and topography, a core foundation for downstream simulation and scenario building

    The risk is developed into a “project” defined spatially and temporally. These extents define the bounds for digital twinning of infrastructure and topography, a core foundation for downstream simulation and scenario building. For example, a national meteorological agency might flag a possible flood event triggered by 48 hours of intense rainfall in Australia

  41. [41]

    ECHO supports requests to specify which real-time data streams must be monitored first, simulate crisis events, and finally define alerting procedures as information is ingested

    Once enough information is collected on a given project, experts may begin to define the nature of the inquiry. ECHO supports requests to specify which real-time data streams must be monitored first, simulate crisis events, and finally define alerting procedures as information is ingested. For example, five-metre digital elevation maps are downloaded alon...

  42. [42]

    For example, an expert might request a flood model and an evaluation of which buildings may be suitable for sheltering at-risk individuals in place

    These highly granular assets are then accessible to an expert to rapidly define the line of geospatial inquiry and identify risks unknown to the automated system. For example, an expert might request a flood model and an evaluation of which buildings may be suitable for sheltering at-risk individuals in place

  43. [43]

    Digital Twin

    A crisis responder or member of the public may then request hyper-localised information from the contextually aware agent. For example, they might ask which roads are likely to be inaccessible to a particular vehicle, such as an ambulance or a family car, when planning a safe route. For any of the steps above to be possible, we require a means to construc...

  44. [44]

    data": bbox

    Disaster Risk Analysis For requests about assessing disaster risks (fire, floods, earthquakes, etc.), ensure the query includes: - Location of interest - Time horizon - Type of disaster Example 1: Previous context: Take me to valencia Current state variables available: {"data": bbox"} User Input: Can you determine if this area is flood prone over the next...

  45. [45]

    Show me images of oceans near deserts

    Satellite Image Search For general satellite image queries that don’t involve disaster risk (e.g., "Show me images of oceans near deserts"). These queries do not require a time horizon, nor a specific location. Feel confident to pass on such queries to the planner as long as no disasters are mentioned. Example 1: User input: show me forests Output: {’stat...

  46. [46]

    Start with OSM_Geocode for location queries

  47. [47]

    Use ’after’ for dependencies

  48. [48]

    Empty ’after’ means step can start immediately

  49. [49]

    Input/output must match tool definitions exactly

  50. [50]

    Use only listed tools

  51. [51]

    OSM Points of Interest should only be used when looking for specific physical infrastructure tags **{examples}** Return only valid JSON matching this format using listed tools. B.5 Planner User Prompt Create a logical tool sequence plan for: ‘‘‘{query}‘‘‘ Here are all previous messages between the user and the planner: **{conversation_history}** Here are ...