arxiv: 2605.11336 · v1 · submitted 2026-05-11 · 💻 cs.IR · cs.AI· cs.CL· cs.HC

Recognition: no theorem link

Much of Geospatial Web Search Is Beyond Traditional GIS

Ilya Ilyankou , Stefano Cavazzi , James Haworth

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:16 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CLcs.HC

keywords geospatial web searchquery classificationMS MARCOweb query analysisGIS limitationstransactional queriesquery taxonomysentence embeddings

0 comments

The pith

Web searches about places are far more common than existing labels suggest and are dominated by practical questions like prices, hours and recommendations rather than maps or geography.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that 18 percent of a large set of real Bing queries concern place in some way, nearly three times the share marked as Location in the original annotations. Using sentence embeddings and a classifier on the complete unfiltered MS MARCO corpus of over a million queries, the authors produce a taxonomy of 88 categories. This shows that transactional lookups such as costs and prices make up 15.3 percent of the geospatial queries, almost twice the share of all physical geography questions combined. Most of these queries fall outside the scope of traditional GIS systems and knowledge graphs. The findings imply that search systems need mixed architectures capable of handling both fixed database answers and generative or real-time responses for volatile or evaluative questions.

Core claim

Applying dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior toponym filtering identifies 181,827 geospatial queries (18.0 percent), nearly threefold the 6.17 percent labelled as Location. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3 percent of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity falls outside the scope traditional GIS systems and knowledge graphs are built to serve.

What carries the argument

A taxonomy of 88 categories obtained by clustering 181,827 geospatial queries identified via a SetFit classifier on the full unfiltered MS MARCO query corpus.

If this is right

Hybrid retrieval systems are required that combine deterministic spatial database lookups with generative or real-time components for evaluative and temporally volatile queries.
Benchmarks for geographic reasoning in large language models should test transactional and practical place questions in addition to map-style queries.
Knowledge graphs and traditional GIS cover only a minority of the place-related questions people actually pose.
The released labelled dataset, classifier, and taxonomy enable further work on detecting geospatial intent at scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Search engines may need to move toward pipelines that blend retrieval with generation rather than relying solely on structured databases for place queries.
Current GIS tools and training may not address the dominant everyday uses of location information such as price checks and opening hours.
The observed proportions could be tested for stability across languages, other search engines, or query logs from different time periods.

Load-bearing premise

The classifier and clustering step correctly identify geospatial intent and produce a stable, meaningful set of 88 categories when run on the complete query set without any prior place-name filtering.

What would settle it

A human audit of a random sample of the 181,827 flagged queries to measure how many are truly geospatial, or repeating the entire pipeline on a fresh large query log from another search engine.

read the original abstract

Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We apply dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior filtering for toponyms or spatial keywords, identifying 181,827 geospatial queries (18.0%), nearly threefold the 6.17% labelled as Location in the original annotations. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3% of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity - costs, opening hours, contact details, weather, travel recommendations - falls outside the scope traditional GIS systems and knowledge graphs are built to serve. The categories vary substantially in the kind of answer they admit, from deterministic lookups answerable from spatial databases or knowledge graphs to evaluative or temporally volatile queries that require generative or real-time systems. We discuss implications for hybrid retrieval architectures and for benchmarks of geographic reasoning in large language models. We openly release the labelled dataset, classifier, and taxonomy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the full MS MARCO corpus of 1.01 million Bing queries using dense sentence embeddings, a SetFit classifier, and density-based clustering without prior toponym or spatial-keyword filtering. It reports identifying 181,827 geospatial queries (18.0%), nearly three times the 6.17% originally labeled 'Location'. From these, density-based clustering induces a taxonomy of 88 categories, with the central finding that transactional and practical queries dominate: costs and prices alone comprise 15.3% of the geospatial queries, nearly twice the size of the physical geography theme. The work argues that much of this activity lies outside traditional GIS and knowledge-graph capabilities, discusses implications for hybrid retrieval and LLM geographic-reasoning benchmarks, and releases the labeled dataset, classifier, and taxonomy.

Significance. If the classifier and clustering pipeline prove reliable, the study supplies a large-scale, reproducible empirical map of geospatial web-search intent that shifts emphasis from physical geography to transactional lookups. The open release of data, model, and taxonomy is a clear strength that supports follow-on work on retrieval architectures and evaluation benchmarks. The distinction between deterministic, evaluative, and temporally volatile query types offers concrete guidance for system design.

major comments (2)

[Methods] Methods section (classifier application): the SetFit model is applied directly to the entire unfiltered 1.01 M query corpus to produce the 181 k geospatial subset, yet no precision, recall, F1, or manual audit on a held-out sample from the full corpus is reported. The 18.0% figure and all downstream category shares (including the headline 15.3% costs/prices result) rest on this unvalidated step; false-positive rate on non-spatial queries would directly inflate both the numerator and the relative sizes of transactional clusters.
[Results] Results (clustering and taxonomy): the 88-category taxonomy is induced via HDBSCAN on the 181 k embeddings, but no stability analysis, parameter-sensitivity test, or comparison against alternative clustering choices is provided. Because the claim that 'transactional and practical lookups dominate' is quantified by the relative sizes of these induced clusters, lack of robustness evidence makes the dominance statement load-bearing and unverifiable from the current text.

minor comments (2)

[Abstract] Abstract: the multiplier 'nearly threefold' is correct but could be stated exactly (18.0 / 6.17 ≈ 2.92) for precision.
[Results] The manuscript would benefit from a small table or appendix listing the top 10–15 category names with their query counts and example queries to make the taxonomy immediately usable by readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, committing to revisions that directly strengthen the empirical claims.

read point-by-point responses

Referee: [Methods] Methods section (classifier application): the SetFit model is applied directly to the entire unfiltered 1.01 M query corpus to produce the 181 k geospatial subset, yet no precision, recall, F1, or manual audit on a held-out sample from the full corpus is reported. The 18.0% figure and all downstream category shares (including the headline 15.3% costs/prices result) rest on this unvalidated step; false-positive rate on non-spatial queries would directly inflate both the numerator and the relative sizes of transactional clusters.

Authors: We agree that the absence of a direct validation audit on the full unfiltered corpus is a limitation. The SetFit classifier was trained and evaluated on a manually labeled subset of 10k queries (F1 = 0.92 on its internal test split), then applied to the 1.01M corpus. To address the referee's concern, we will add a post-hoc manual audit: two annotators will independently label a random sample of 500 queries drawn from the full corpus (stratified by original MS MARCO labels), reporting precision, recall, and estimated false-positive rate for the geospatial class. This will be inserted into the Methods section with a discussion of how any observed false-positive rate affects the 18% figure and the relative sizes of the transactional clusters. revision: yes
Referee: [Results] Results (clustering and taxonomy): the 88-category taxonomy is induced via HDBSCAN on the 181 k embeddings, but no stability analysis, parameter-sensitivity test, or comparison against alternative clustering choices is provided. Because the claim that 'transactional and practical lookups dominate' is quantified by the relative sizes of these induced clusters, lack of robustness evidence makes the dominance statement load-bearing and unverifiable from the current text.

Authors: We acknowledge that the original submission lacks explicit robustness checks for the HDBSCAN-derived taxonomy. The 88 categories were obtained with min_cluster_size = 100 to retain only substantial groups. In the revision we will add (1) a parameter-sensitivity sweep over min_cluster_size values 50–200, reporting the stability of the top-5 category sizes (including costs/prices), and (2) a comparison on a 20k-query subsample against agglomerative clustering with the same embedding space. These analyses will be placed in the Results section and will explicitly test whether the dominance of transactional categories (costs/prices at 15.3%) persists across reasonable parameter choices. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical pipeline on external corpus

full rationale

The paper processes the external MS MARCO corpus of 1.01 million queries using dense embeddings, a SetFit classifier, and HDBSCAN-style density clustering to surface 181k geospatial queries and induce an 88-category taxonomy. All reported percentages (18.0% geospatial share, 15.3% costs/prices) are direct outputs of this data-driven pipeline with no fitted parameters renamed as predictions, no self-definitional loops, and no load-bearing self-citations or uniqueness theorems. The derivation chain is self-contained against the input corpus and original annotations; no step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The work depends on standard assumptions in natural language processing and unsupervised clustering applied to search log data.

free parameters (1)

SetFit classifier training parameters
The classifier is trained on data, introducing fitted parameters for query classification.

axioms (2)

domain assumption Dense sentence embeddings capture semantic similarity for query classification
Core to the dense embedding approach used in SetFit.
domain assumption Density-based clustering identifies meaningful query categories
Applied to derive the 88 categories from embeddings.

pith-pipeline@v0.9.0 · 5535 in / 1480 out tokens · 49485 ms · 2026-05-13T01:16:28.542616+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 3 internal anchors

[1]

2 Dimo Angelov

Springer International Publishing.doi:10.1007/978-3-030-51935-3_34. 2 Dimo Angelov. Top2Vec: Distributed Representations of Topics, August

work page doi:10.1007/978-3-030-51935-3_34
[2]

3 David Arthur and Sergei Vassilvitskii

URL:https: //arxiv.org/abs/2008.09470v1. 3 David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’07, pages 1027–1035, USA, January

work page arXiv 2008
[3]

k-means++: The advantages of careful seeding,

Society for Industrial and Applied Mathematics. URL:https://dl.acm.org/doi/10.5555/1283383.1283494. 4 Ira Assent. Clustering high dimensional data.WIREs Data Mining and Knowledge Discovery, 2(4):340–350, 2012.doi:10.1002/widm.1062. 5 Prafulla Bafna, Dhanya Pramod, and Anagha Vaidya. Document clustering: TF-IDF approach. In2016 International Conference on ...

work page doi:10.5555/1283383.1283494 2012
[4]

URL:https://dl.acm.org/ doi/10.1145/3589132.3625625,doi:10.1145/3589132.3625625

Association for Computing Machinery. URL:https://dl.acm.org/ doi/10.1145/3589132.3625625,doi:10.1145/3589132.3625625. 8 Ricardo J. G. B. Campello, Peer Kröger, Jörg Sander, and Arthur Zimek. Density-based clustering.WIREs Data Mining and Knowledge Discovery, 10(2):e1343,

work page doi:10.1145/3589132.3625625
[5]

9 Ricardo J

URL: https: //onlinelibrary.wiley.com/doi/abs/10.1002/widm.1343,doi:10.1002/widm.1343. 9 Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. Density-Based Clustering Based on Hierarchical Density Estimates. In Jian Pei, Vincent S. Tseng, Longbing Cao, Hiroshi Motoda, and Guandong Xu, editors,Advances in Knowledge Discovery and Data Mining, pages ...

work page doi:10.1002/widm.1343
[6]

In: Advances in Knowledge Discovery and Data Mining

Springer.doi:10.1007/978-3-642-37456-2_14. 10 Jacob Cohen. A Coefficient of Agreement for Nominal Scales.Educational and Psychological Measurement, 20(1):37–46, April 1960.doi:10.1177/001316446002000104. 11Tim Cresswell.Place: An Introduction. John Wiley & Sons, August

work page doi:10.1007/978-3-642-37456-2_14 1960
[7]

doi:10.1007/978-3-540-76925-5_8

Springer. doi:10.1007/978-3-540-76925-5_8. 13 Aaron Grattafiori, Abhimanyu Dubey, et al. The Llama 3 herd of models,

work page doi:10.1007/978-3-540-76925-5_8
[8]

The Llama 3 Herd of Models

URL: https://arxiv.org/abs/2407.21783,arXiv:2407.21783. 14 MaartenGrootendorst. BERTopic: Neuraltopicmodelingwithaclass-basedTF-IDFprocedure. arXiv preprint arXiv:2203.05794,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

16 Much of Geospatial Web Search Is Beyond Traditional GIS 15 Stefan Hahmann and Dirk Burghardt. How much information is geospatially referenced? Networks and cognition.International Journal of Geographical Information Science, 27(6):1171– 1189, June 2013.doi:10.1080/13658816.2012.743664. 16 Ehsan Hamzei, Haonan Li, Maria Vasardani, Timothy Baldwin, Steph...

work page doi:10.1080/13658816.2012.743664 2013
[10]

18 Ehsan Hamzei, Stephan Winter, and Martin Tomko

doi:10.4230/LIPICS.COSIT.2019.12. 18 Ehsan Hamzei, Stephan Winter, and Martin Tomko. Place facets: a systematic literature review.Spatial Cognition & Computation, 20(1):33–81, January 2020.doi:10.1080/13875868. 2019.1688332. 19 Andreas Henrich and Volker Luedecke. Characteristics of geographic information needs. InProceedings of the 4th ACM workshop on Ge...

work page doi:10.4230/lipics.cosit.2019.12 2019
[11]

doi:10.1145/1316948.1316950

Association for Computing Machinery. doi:10.1145/1316948.1316950. 20 Ilya Ilyankou, Meihui Wang, Stefano Cavazzi, and James Haworth. Quantifying Geospatial in the Common Crawl Corpus. InProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’24, pages 585–588, New York, NY, USA, November

work page doi:10.1145/1316948.1316950
[12]

URL:https://dl.acm

Association for Computing Machinery. URL:https://dl.acm. org/doi/10.1145/3678717.3691286,doi:10.1145/3678717.3691286. 21 Rosie Jones, Wei V. Zhang, Benjamin Rey, Pradhuman Jhala, and Eugene Stipp. Geographic intention and modification in web search.International Journal of Geographical Information Science, 22(3):229–246, March 2008.doi:10.1080/13658810701...

work page doi:10.1145/3678717.3691286 2008
[13]

23 Werner Kuhn, Ehsan Hamzei, Martin Tomko, Stephan Winter, and Haonan Li

URL:https://www.sciencedirect.com/ science/article/pii/S1569843224005594,doi:10.1016/j.jag.2024.104203. 23 Werner Kuhn, Ehsan Hamzei, Martin Tomko, Stephan Winter, and Haonan Li. The semantics of place-related questions.Journal of Spatial Information Science, (23):157–168,

work page doi:10.1016/j.jag.2024.104203 2024
[14]

doi: 10.5311/JOSIS.2021.23.161. 24 J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174, March

work page doi:10.5311/josis.2021.23.161 2021
[15]

26 Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, and Stefano Ermon

URL:https://agile-giss.copernicus.org/articles/2/8/2021/, doi:10.5194/agile-giss-2-8-2021. 26 Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, and Stefano Ermon. GeoLLM: Extracting Geospatial Knowledge from Large Language Models, February

work page doi:10.5194/agile-giss-2-8-2021 2021
[16]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

arXiv:2310.06213 [cs]. URL:http://arxiv.org/abs/2310.06213. 27 Leland McInnes, John Healy, and James Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, September 2020.doi:10.48550/arXiv.1802.03426. 28 Peter Mooney, Wencong Cui, Boyuan Guan, and Levente Juhász. Towards Understanding the Geospatial Skills of ChatGPT: Taki...

work page internal anchor Pith review doi:10.48550/arxiv.1802.03426 2020
[17]

URL:https://dl.acm.org/doi/10.1145/3615886.3627745, doi:10.1145/3615886.3627745

Associ- ation for Computing Machinery. URL:https://dl.acm.org/doi/10.1145/3615886.3627745, doi:10.1145/3615886.3627745. I. Ilyankou and S. Cavazzi and J. Haworth 17 29 Davoud Moulavi, Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Arthur Zimek, and Jörg Sander. Density-Based Clustering Validation. InProceedings of the 2014 SIAM International Conference on...

work page doi:10.1145/3615886.3627745 2014
[18]

URL:http://arxiv.org/abs/2210.07316. 31 D. Punjani, K. Singh, A. Both, M. Koubarakis, I. Angelidis, K. Bereta, T. Beris, D. Bilidas, T. Ioannidis, N. Karalis, C. Lange, D. Pantazi, C. Papaloukas, and G. Stamoulis. Template- Based Question Answering over Linked Geospatial Data. InProceedings of the 12th Workshop on Geographic Information Retrieval, pages 1...

work page internal anchor Pith review arXiv
[19]

doi:10.1145/3281354.3281362

ACM. doi:10.1145/3281354.3281362. 32 Ross S. Purves, Stephan Winter, and Werner Kuhn. Places in Information Science.Journal of the Association for Information Science and Technology, 70(11):1173–1182,

work page doi:10.1145/3281354.3281362
[20]

33 Mansi Radke, Prarthana Das, Kristin Stock, and Christopher B

doi: 10.1002/asi.24194. 33 Mansi Radke, Prarthana Das, Kristin Stock, and Christopher B. Jones. Detecting the Geospa- tialness of Prepositions from Natural Language Text. In Sabine Timpf, Christoph Schlieder, Markus Kattenbeck, Bernd Ludwig, and Kathleen Stewart, editors,14th International Con- ference on Spatial Information Theory (COSIT 2019), volume 14...

work page doi:10.1002/asi.24194 2019
[21]

34 Nils Reimers and Iryna Gurevych

Schloss Dagstuhl – Leibniz-Zentrum für Informatik.doi:10.4230/LIPIcs.COSIT.2019.11. 34 Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Interna...

work page doi:10.4230/lipics.cosit.2019.11 2019
[22]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Association for Computational Linguistics. doi:10.18653/v1/D19-1410. 35 Jonathan Roberts, Timo Lüddecke, Sowmen Das, Kai Han, and Samuel Albanie. GPT4GEO: How a Language Model Sees the World’s Geography, May

work page doi:10.18653/v1/d19-1410
[23]

URL: http://arxiv.org/abs/2306.00020

arXiv:2306.00020 [cs]. URL: http://arxiv.org/abs/2306.00020. 36 Mark Sanderson and Janet Kohler. Analyzing geographic queries. InProceedings of the Work- shop on Geographic Information Retrieval. 27th Annual International ACM SIGIR Conference,

work page arXiv
[24]

Answers to where-questions.Discourse Processes, 6(4):319–352, 1983.doi: 10.1080/01638538309544571

37 Benny Shanon. Answers to where-questions.Discourse Processes, 6(4):319–352, 1983.doi: 10.1080/01638538309544571. 38 Suzanna Sia, Ayush Dalmia, and Sabrina J. Mielke. Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too! InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (E...

work page doi:10.1080/01638538309544571 1983
[25]

URL:https: //aclanthology.org/2020.emnlp-main.135,doi:10.18653/v1/2020.emnlp-main.135

Association for Computational Linguistics. URL:https: //aclanthology.org/2020.emnlp-main.135,doi:10.18653/v1/2020.emnlp-main.135. 39 Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, and Oren Pereg. Efficient Few-Shot Learning Without Prompts, September

work page doi:10.18653/v1/2020.emnlp-main.135 2020
[26]

Efficient few-shot learn- ing without prompts,

doi: 10.48550/arXiv.2209.11055. 40 Haiqi Xu, Ehsan Hamzei, Enkhbold Nyamsuren, Han Kruiger, Stephan Winter, Martin Tomko, and Simon Scheider. Extracting interrogative intents and concepts from geo-analytic questions. AGILE: GIScience Series, 1:1–21, July 2020.doi:10.5194/agile-giss-1-23-2020. 18 Much of Geospatial Web Search Is Beyond Traditional GIS A LL...

work page doi:10.48550/arxiv.2209.11055 2020