Recognition: no theorem link
Much of Geospatial Web Search Is Beyond Traditional GIS
Pith reviewed 2026-05-13 01:16 UTC · model grok-4.3
The pith
Web searches about places are far more common than existing labels suggest and are dominated by practical questions like prices, hours and recommendations rather than maps or geography.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applying dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior toponym filtering identifies 181,827 geospatial queries (18.0 percent), nearly threefold the 6.17 percent labelled as Location. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3 percent of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity falls outside the scope traditional GIS systems and knowledge graphs are built to serve.
What carries the argument
A taxonomy of 88 categories obtained by clustering 181,827 geospatial queries identified via a SetFit classifier on the full unfiltered MS MARCO query corpus.
If this is right
- Hybrid retrieval systems are required that combine deterministic spatial database lookups with generative or real-time components for evaluative and temporally volatile queries.
- Benchmarks for geographic reasoning in large language models should test transactional and practical place questions in addition to map-style queries.
- Knowledge graphs and traditional GIS cover only a minority of the place-related questions people actually pose.
- The released labelled dataset, classifier, and taxonomy enable further work on detecting geospatial intent at scale.
Where Pith is reading between the lines
- Search engines may need to move toward pipelines that blend retrieval with generation rather than relying solely on structured databases for place queries.
- Current GIS tools and training may not address the dominant everyday uses of location information such as price checks and opening hours.
- The observed proportions could be tested for stability across languages, other search engines, or query logs from different time periods.
Load-bearing premise
The classifier and clustering step correctly identify geospatial intent and produce a stable, meaningful set of 88 categories when run on the complete query set without any prior place-name filtering.
What would settle it
A human audit of a random sample of the 181,827 flagged queries to measure how many are truly geospatial, or repeating the entire pipeline on a fresh large query log from another search engine.
read the original abstract
Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We apply dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior filtering for toponyms or spatial keywords, identifying 181,827 geospatial queries (18.0%), nearly threefold the 6.17% labelled as Location in the original annotations. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3% of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity - costs, opening hours, contact details, weather, travel recommendations - falls outside the scope traditional GIS systems and knowledge graphs are built to serve. The categories vary substantially in the kind of answer they admit, from deterministic lookups answerable from spatial databases or knowledge graphs to evaluative or temporally volatile queries that require generative or real-time systems. We discuss implications for hybrid retrieval architectures and for benchmarks of geographic reasoning in large language models. We openly release the labelled dataset, classifier, and taxonomy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the full MS MARCO corpus of 1.01 million Bing queries using dense sentence embeddings, a SetFit classifier, and density-based clustering without prior toponym or spatial-keyword filtering. It reports identifying 181,827 geospatial queries (18.0%), nearly three times the 6.17% originally labeled 'Location'. From these, density-based clustering induces a taxonomy of 88 categories, with the central finding that transactional and practical queries dominate: costs and prices alone comprise 15.3% of the geospatial queries, nearly twice the size of the physical geography theme. The work argues that much of this activity lies outside traditional GIS and knowledge-graph capabilities, discusses implications for hybrid retrieval and LLM geographic-reasoning benchmarks, and releases the labeled dataset, classifier, and taxonomy.
Significance. If the classifier and clustering pipeline prove reliable, the study supplies a large-scale, reproducible empirical map of geospatial web-search intent that shifts emphasis from physical geography to transactional lookups. The open release of data, model, and taxonomy is a clear strength that supports follow-on work on retrieval architectures and evaluation benchmarks. The distinction between deterministic, evaluative, and temporally volatile query types offers concrete guidance for system design.
major comments (2)
- [Methods] Methods section (classifier application): the SetFit model is applied directly to the entire unfiltered 1.01 M query corpus to produce the 181 k geospatial subset, yet no precision, recall, F1, or manual audit on a held-out sample from the full corpus is reported. The 18.0% figure and all downstream category shares (including the headline 15.3% costs/prices result) rest on this unvalidated step; false-positive rate on non-spatial queries would directly inflate both the numerator and the relative sizes of transactional clusters.
- [Results] Results (clustering and taxonomy): the 88-category taxonomy is induced via HDBSCAN on the 181 k embeddings, but no stability analysis, parameter-sensitivity test, or comparison against alternative clustering choices is provided. Because the claim that 'transactional and practical lookups dominate' is quantified by the relative sizes of these induced clusters, lack of robustness evidence makes the dominance statement load-bearing and unverifiable from the current text.
minor comments (2)
- [Abstract] Abstract: the multiplier 'nearly threefold' is correct but could be stated exactly (18.0 / 6.17 ≈ 2.92) for precision.
- [Results] The manuscript would benefit from a small table or appendix listing the top 10–15 category names with their query counts and example queries to make the taxonomy immediately usable by readers.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, committing to revisions that directly strengthen the empirical claims.
read point-by-point responses
-
Referee: [Methods] Methods section (classifier application): the SetFit model is applied directly to the entire unfiltered 1.01 M query corpus to produce the 181 k geospatial subset, yet no precision, recall, F1, or manual audit on a held-out sample from the full corpus is reported. The 18.0% figure and all downstream category shares (including the headline 15.3% costs/prices result) rest on this unvalidated step; false-positive rate on non-spatial queries would directly inflate both the numerator and the relative sizes of transactional clusters.
Authors: We agree that the absence of a direct validation audit on the full unfiltered corpus is a limitation. The SetFit classifier was trained and evaluated on a manually labeled subset of 10k queries (F1 = 0.92 on its internal test split), then applied to the 1.01M corpus. To address the referee's concern, we will add a post-hoc manual audit: two annotators will independently label a random sample of 500 queries drawn from the full corpus (stratified by original MS MARCO labels), reporting precision, recall, and estimated false-positive rate for the geospatial class. This will be inserted into the Methods section with a discussion of how any observed false-positive rate affects the 18% figure and the relative sizes of the transactional clusters. revision: yes
-
Referee: [Results] Results (clustering and taxonomy): the 88-category taxonomy is induced via HDBSCAN on the 181 k embeddings, but no stability analysis, parameter-sensitivity test, or comparison against alternative clustering choices is provided. Because the claim that 'transactional and practical lookups dominate' is quantified by the relative sizes of these induced clusters, lack of robustness evidence makes the dominance statement load-bearing and unverifiable from the current text.
Authors: We acknowledge that the original submission lacks explicit robustness checks for the HDBSCAN-derived taxonomy. The 88 categories were obtained with min_cluster_size = 100 to retain only substantial groups. In the revision we will add (1) a parameter-sensitivity sweep over min_cluster_size values 50–200, reporting the stability of the top-5 category sizes (including costs/prices), and (2) a comparison on a 20k-query subsample against agglomerative clustering with the same embedding space. These analyses will be placed in the Results section and will explicitly test whether the dominance of transactional categories (costs/prices at 15.3%) persists across reasonable parameter choices. revision: yes
Circularity Check
No circularity: purely empirical pipeline on external corpus
full rationale
The paper processes the external MS MARCO corpus of 1.01 million queries using dense embeddings, a SetFit classifier, and HDBSCAN-style density clustering to surface 181k geospatial queries and induce an 88-category taxonomy. All reported percentages (18.0% geospatial share, 15.3% costs/prices) are direct outputs of this data-driven pipeline with no fitted parameters renamed as predictions, no self-definitional loops, and no load-bearing self-citations or uniqueness theorems. The derivation chain is self-contained against the input corpus and original annotations; no step reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- SetFit classifier training parameters
axioms (2)
- domain assumption Dense sentence embeddings capture semantic similarity for query classification
- domain assumption Density-based clustering identifies meaningful query categories
Reference graph
Works this paper leans on
-
[1]
Springer International Publishing.doi:10.1007/978-3-030-51935-3_34. 2 Dimo Angelov. Top2Vec: Distributed Representations of Topics, August
-
[2]
3 David Arthur and Sergei Vassilvitskii
URL:https: //arxiv.org/abs/2008.09470v1. 3 David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’07, pages 1027–1035, USA, January
-
[3]
k-means++: The advantages of careful seeding,
Society for Industrial and Applied Mathematics. URL:https://dl.acm.org/doi/10.5555/1283383.1283494. 4 Ira Assent. Clustering high dimensional data.WIREs Data Mining and Knowledge Discovery, 2(4):340–350, 2012.doi:10.1002/widm.1062. 5 Prafulla Bafna, Dhanya Pramod, and Anagha Vaidya. Document clustering: TF-IDF approach. In2016 International Conference on ...
-
[4]
URL:https://dl.acm.org/ doi/10.1145/3589132.3625625,doi:10.1145/3589132.3625625
Association for Computing Machinery. URL:https://dl.acm.org/ doi/10.1145/3589132.3625625,doi:10.1145/3589132.3625625. 8 Ricardo J. G. B. Campello, Peer Kröger, Jörg Sander, and Arthur Zimek. Density-based clustering.WIREs Data Mining and Knowledge Discovery, 10(2):e1343,
-
[5]
URL: https: //onlinelibrary.wiley.com/doi/abs/10.1002/widm.1343,doi:10.1002/widm.1343. 9 Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. Density-Based Clustering Based on Hierarchical Density Estimates. In Jian Pei, Vincent S. Tseng, Longbing Cao, Hiroshi Motoda, and Guandong Xu, editors,Advances in Knowledge Discovery and Data Mining, pages ...
-
[6]
In: Advances in Knowledge Discovery and Data Mining
Springer.doi:10.1007/978-3-642-37456-2_14. 10 Jacob Cohen. A Coefficient of Agreement for Nominal Scales.Educational and Psychological Measurement, 20(1):37–46, April 1960.doi:10.1177/001316446002000104. 11Tim Cresswell.Place: An Introduction. John Wiley & Sons, August
-
[7]
doi:10.1007/978-3-540-76925-5_8
Springer. doi:10.1007/978-3-540-76925-5_8. 13 Aaron Grattafiori, Abhimanyu Dubey, et al. The Llama 3 herd of models,
-
[8]
URL: https://arxiv.org/abs/2407.21783,arXiv:2407.21783. 14 MaartenGrootendorst. BERTopic: Neuraltopicmodelingwithaclass-basedTF-IDFprocedure. arXiv preprint arXiv:2203.05794,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
16 Much of Geospatial Web Search Is Beyond Traditional GIS 15 Stefan Hahmann and Dirk Burghardt. How much information is geospatially referenced? Networks and cognition.International Journal of Geographical Information Science, 27(6):1171– 1189, June 2013.doi:10.1080/13658816.2012.743664. 16 Ehsan Hamzei, Haonan Li, Maria Vasardani, Timothy Baldwin, Steph...
-
[10]
18 Ehsan Hamzei, Stephan Winter, and Martin Tomko
doi:10.4230/LIPICS.COSIT.2019.12. 18 Ehsan Hamzei, Stephan Winter, and Martin Tomko. Place facets: a systematic literature review.Spatial Cognition & Computation, 20(1):33–81, January 2020.doi:10.1080/13875868. 2019.1688332. 19 Andreas Henrich and Volker Luedecke. Characteristics of geographic information needs. InProceedings of the 4th ACM workshop on Ge...
-
[11]
Association for Computing Machinery. doi:10.1145/1316948.1316950. 20 Ilya Ilyankou, Meihui Wang, Stefano Cavazzi, and James Haworth. Quantifying Geospatial in the Common Crawl Corpus. InProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’24, pages 585–588, New York, NY, USA, November
-
[12]
Association for Computing Machinery. URL:https://dl.acm. org/doi/10.1145/3678717.3691286,doi:10.1145/3678717.3691286. 21 Rosie Jones, Wei V. Zhang, Benjamin Rey, Pradhuman Jhala, and Eugene Stipp. Geographic intention and modification in web search.International Journal of Geographical Information Science, 22(3):229–246, March 2008.doi:10.1080/13658810701...
-
[13]
23 Werner Kuhn, Ehsan Hamzei, Martin Tomko, Stephan Winter, and Haonan Li
URL:https://www.sciencedirect.com/ science/article/pii/S1569843224005594,doi:10.1016/j.jag.2024.104203. 23 Werner Kuhn, Ehsan Hamzei, Martin Tomko, Stephan Winter, and Haonan Li. The semantics of place-related questions.Journal of Spatial Information Science, (23):157–168,
-
[14]
doi: 10.5311/JOSIS.2021.23.161. 24 J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174, March
-
[15]
26 Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, and Stefano Ermon
URL:https://agile-giss.copernicus.org/articles/2/8/2021/, doi:10.5194/agile-giss-2-8-2021. 26 Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, and Stefano Ermon. GeoLLM: Extracting Geospatial Knowledge from Large Language Models, February
-
[16]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
arXiv:2310.06213 [cs]. URL:http://arxiv.org/abs/2310.06213. 27 Leland McInnes, John Healy, and James Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, September 2020.doi:10.48550/arXiv.1802.03426. 28 Peter Mooney, Wencong Cui, Boyuan Guan, and Levente Juhász. Towards Understanding the Geospatial Skills of ChatGPT: Taki...
work page internal anchor Pith review doi:10.48550/arxiv.1802.03426 2020
-
[17]
URL:https://dl.acm.org/doi/10.1145/3615886.3627745, doi:10.1145/3615886.3627745
Associ- ation for Computing Machinery. URL:https://dl.acm.org/doi/10.1145/3615886.3627745, doi:10.1145/3615886.3627745. I. Ilyankou and S. Cavazzi and J. Haworth 17 29 Davoud Moulavi, Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Arthur Zimek, and Jörg Sander. Density-Based Clustering Validation. InProceedings of the 2014 SIAM International Conference on...
-
[18]
URL:http://arxiv.org/abs/2210.07316. 31 D. Punjani, K. Singh, A. Both, M. Koubarakis, I. Angelidis, K. Bereta, T. Beris, D. Bilidas, T. Ioannidis, N. Karalis, C. Lange, D. Pantazi, C. Papaloukas, and G. Stamoulis. Template- Based Question Answering over Linked Geospatial Data. InProceedings of the 12th Workshop on Geographic Information Retrieval, pages 1...
work page internal anchor Pith review arXiv
-
[19]
ACM. doi:10.1145/3281354.3281362. 32 Ross S. Purves, Stephan Winter, and Werner Kuhn. Places in Information Science.Journal of the Association for Information Science and Technology, 70(11):1173–1182,
-
[20]
33 Mansi Radke, Prarthana Das, Kristin Stock, and Christopher B
doi: 10.1002/asi.24194. 33 Mansi Radke, Prarthana Das, Kristin Stock, and Christopher B. Jones. Detecting the Geospa- tialness of Prepositions from Natural Language Text. In Sabine Timpf, Christoph Schlieder, Markus Kattenbeck, Bernd Ludwig, and Kathleen Stewart, editors,14th International Con- ference on Spatial Information Theory (COSIT 2019), volume 14...
-
[21]
34 Nils Reimers and Iryna Gurevych
Schloss Dagstuhl – Leibniz-Zentrum für Informatik.doi:10.4230/LIPIcs.COSIT.2019.11. 34 Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Interna...
-
[22]
Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks
Association for Computational Linguistics. doi:10.18653/v1/D19-1410. 35 Jonathan Roberts, Timo Lüddecke, Sowmen Das, Kai Han, and Samuel Albanie. GPT4GEO: How a Language Model Sees the World’s Geography, May
-
[23]
URL: http://arxiv.org/abs/2306.00020
arXiv:2306.00020 [cs]. URL: http://arxiv.org/abs/2306.00020. 36 Mark Sanderson and Janet Kohler. Analyzing geographic queries. InProceedings of the Work- shop on Geographic Information Retrieval. 27th Annual International ACM SIGIR Conference,
-
[24]
Answers to where-questions.Discourse Processes, 6(4):319–352, 1983.doi: 10.1080/01638538309544571
37 Benny Shanon. Answers to where-questions.Discourse Processes, 6(4):319–352, 1983.doi: 10.1080/01638538309544571. 38 Suzanna Sia, Ayush Dalmia, and Sabrina J. Mielke. Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too! InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (E...
-
[25]
URL:https: //aclanthology.org/2020.emnlp-main.135,doi:10.18653/v1/2020.emnlp-main.135
Association for Computational Linguistics. URL:https: //aclanthology.org/2020.emnlp-main.135,doi:10.18653/v1/2020.emnlp-main.135. 39 Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, and Oren Pereg. Efficient Few-Shot Learning Without Prompts, September
-
[26]
Efficient few-shot learn- ing without prompts,
doi: 10.48550/arXiv.2209.11055. 40 Haiqi Xu, Ehsan Hamzei, Enkhbold Nyamsuren, Han Kruiger, Stephan Winter, Martin Tomko, and Simon Scheider. Extracting interrogative intents and concepts from geo-analytic questions. AGILE: GIScience Series, 1:1–21, July 2020.doi:10.5194/agile-giss-1-23-2020. 18 Much of Geospatial Web Search Is Beyond Traditional GIS A LL...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.