Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Aldo Lipani; Ilya Ilyankou; James Haworth; Stefano Cavazzi; Xiaowei Gao

arxiv: 2404.04169 · v1 · submitted 2024-04-05 · 💻 cs.CL · cs.LG

Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Ilya Ilyankou , Aldo Lipani , Stefano Cavazzi , Xiaowei Gao , James Haworth This is my paper

Pith reviewed 2026-05-24 01:53 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords sentence transformerszero-shot learningquasi-geospatial conceptssemantic searchroute recommendationhiking queriesembedding similarity

0 comments

The pith

Sentence transformers fine-tuned on general QA data show zero-shot ability to link route descriptions with hiking queries via concepts like difficulty and type.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether sentence transformers, trained only for asymmetric semantic search on ordinary question-answering text, can match descriptions of real hiking routes across Great Britain to the kinds of queries people use when planning hikes. It asks whether this matching reflects grasp of quasi-geospatial ideas such as route type and difficulty. A sympathetic reader would care if the answer is yes, because it would mean general-purpose language models already carry enough spatial and experiential knowledge to support downstream tasks like route recommendation without extra domain training. The authors report partial success in the zero-shot setting.

Core claim

Sentence transformers fine-tuned on general question-answering datasets demonstrate some zero-shot ability to associate human-generated route descriptions with hiking queries by recognizing quasi-geospatial concepts such as route types and difficulty levels.

What carries the argument

Embedding similarity scores between route description texts and hiking experience queries, produced by sentence transformers trained for asymmetric semantic search.

If this is right

Route recommendation systems could use these pre-trained models directly to suggest trails matching user text descriptions of desired difficulty or scenery.
Zero-shot performance on geospatial concepts implies that general text corpora already encode enough relational knowledge for basic spatial reasoning tasks.
Specialized fine-tuning on geospatial data may not be required for initial deployment of text-based hiking or routing tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the observed similarity truly tracks conceptual grasp, the same models could be probed for other domain concepts such as urban mobility patterns or environmental risk factors without new training.
Performance gaps across different sentence transformer architectures might reveal which training objectives best preserve latent spatial structure in embeddings.

Load-bearing premise

Higher similarity scores between route texts and queries reflect genuine understanding of quasi-geospatial properties rather than surface word overlap or dataset artifacts.

What would settle it

A controlled test in which difficulty and type descriptors are swapped across otherwise lexically similar route-query pairs, with the models showing no corresponding drop in ranking accuracy, would indicate the original results stem from lexical patterns alone.

Figures

Figures reproduced from arXiv: 2404.04169 by Aldo Lipani, Ilya Ilyankou, James Haworth, Stefano Cavazzi, Xiaowei Gao.

**Figure 2.** Figure 2: Cumulative mean length, grade, and elevation gain for what should be harder routes Peculiarly, even when fine-tuned on the same Multi-QA dataset, MiniLM, DistilBERT, and MPNet models are in total disagreement over the walks that can be completed in under an hour ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Three models, even though fine-tuned on the same dataset, disagree on which walks can be completed in under 1 hour [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Sentence transformers are language models designed to perform semantic search. This study investigates the capacity of sentence transformers, fine-tuned on general question-answering datasets for asymmetric semantic search, to associate descriptions of human-generated routes across Great Britain with queries often used to describe hiking experiences. We find that sentence transformers have some zero-shot capabilities to understand quasi-geospatial concepts, such as route types and difficulty, suggesting their potential utility for routing recommendation systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies sentence transformers to GB hiking routes in a zero-shot setup and claims limited conceptual grasp of route type and difficulty, but the abstract gives no metrics or controls so the claim stays untested.

read the letter

The main thing here is a simple test: take sentence transformers fine-tuned on general QA data and see if they rank hiking route descriptions against typical user queries better than chance. They report some success on quasi-geospatial notions like difficulty and route type. That domain choice is the only real novelty; the method itself is standard semantic search evaluation moved to a new corpus of British hiking trails. Nothing in the abstract suggests new architecture or theory, just an off-the-shelf model applied to route data. Credit for picking a concrete, narrow task where spatial language matters and checking whether general embeddings already capture it. The soft spot is obvious from the abstract alone. No numbers, no baselines, no mention of lexical overlap controls, no dataset sizes, and no statistical tests. The stress-test concern lands: if shared words like “steep,” “loop,” or place names drive the similarity scores, the result does not show concept learning. Without those checks the central claim cannot be evaluated. The work is for people who build recommendation systems that mix text queries with route metadata and want a quick check on whether existing embeddings help. It does not look ready for a serious referee yet because the evidence is missing, but a full version with proper controls and numbers could be worth sending out once those gaps are filled.

Referee Report

3 major / 1 minor

Summary. The paper investigates whether sentence transformers fine-tuned on general QA datasets can acquire quasi-geospatial concepts (route types, difficulty) in zero-shot settings by measuring embedding similarity between Great Britain route descriptions and hiking queries, concluding that the models show some such capabilities with potential utility for routing recommendation systems.

Significance. If the central empirical claim holds after controls, the result would indicate that general-purpose sentence transformers can transfer to domain-specific geospatial reasoning without targeted training, supporting broader use in recommendation systems. The zero-shot design and use of held-out route-query pairs provide a direct test of the hypothesis.

major comments (3)

[Abstract and §4] Abstract and §4 (Results): positive results are asserted without any reported metrics, baselines, dataset sizes, or statistical tests, so the support for the claim of zero-shot conceptual understanding cannot be evaluated.
[§3] §3 (Methods): no controls or ablations are described to isolate semantic structure from lexical/n-gram overlap (shared route-type terms, difficulty adjectives, location names), which directly undercuts the interpretation that higher similarity reflects learned quasi-geospatial concepts rather than surface cues.
[§4] §4 (Results): the evaluation omits comparison against simple lexical baselines (e.g., TF-IDF or n-gram overlap rankings), leaving open whether the reported similarities exceed what surface patterns alone would produce.

minor comments (1)

[§3] Notation for embedding similarity and query-route pairing could be formalized with an equation in §3 for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the current manuscript version requires additional quantitative details, controls, and baselines to properly support the claims. We will revise accordingly to strengthen the empirical presentation.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Results): positive results are asserted without any reported metrics, baselines, dataset sizes, or statistical tests, so the support for the claim of zero-shot conceptual understanding cannot be evaluated.

Authors: We acknowledge this limitation in the submitted version. The full manuscript contains similarity computations over held-out route-query pairs from Great Britain hiking data, but these were not reported with sufficient numerical detail or tests in the abstract and §4. In revision we will add the specific metrics (e.g., mean similarity scores and rank statistics), exact dataset sizes, and any statistical tests used to evaluate the zero-shot performance. revision: yes
Referee: [§3] §3 (Methods): no controls or ablations are described to isolate semantic structure from lexical/n-gram overlap (shared route-type terms, difficulty adjectives, location names), which directly undercuts the interpretation that higher similarity reflects learned quasi-geospatial concepts rather than surface cues.

Authors: We agree that explicit controls are needed to rule out surface-form explanations. The current methods section focuses on the zero-shot embedding similarity protocol but does not include ablations for lexical overlap. We will add such controls (e.g., synonym substitution or term-masking experiments) in the revised methods to better isolate semantic from lexical contributions. revision: yes
Referee: [§4] §4 (Results): the evaluation omits comparison against simple lexical baselines (e.g., TF-IDF or n-gram overlap rankings), leaving open whether the reported similarities exceed what surface patterns alone would produce.

Authors: We accept the point. The results section currently presents only the sentence-transformer similarities without lexical baselines. In the revision we will include direct comparisons to TF-IDF and n-gram overlap rankings on the same route-query pairs, allowing readers to assess whether the transformer performance exceeds surface-level matching. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation of embedding similarities on held-out pairs

full rationale

The paper reports zero-shot performance of sentence transformers on associating route descriptions with hiking queries after fine-tuning on unrelated QA data. No equations, parameters fitted to the target task, or self-citation chains are invoked to derive the central claim. The result is a direct comparison of model outputs against external data, with no reduction of predictions to inputs by construction. This matches the default expectation of a non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that embedding proximity reflects conceptual understanding of route attributes. No free parameters are fitted, no new entities are postulated, and no additional axioms beyond standard NLP evaluation assumptions are invoked.

axioms (1)

domain assumption Embedding similarity between route descriptions and hiking queries indicates understanding of quasi-geospatial concepts
Invoked in the interpretation of zero-shot performance as evidence of conceptual grasp.

pith-pipeline@v0.9.0 · 5599 in / 1124 out tokens · 27065 ms · 2026-05-24T01:53:13.909910+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

[1]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks, 2019. URL: http://arxiv.org/abs/1908.10084, arXiv:1908.10084 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[2]

W. Wei, P. M. Barnaghi, A. Bargiela, Search with Meanings: An Overview of Semantic Search Systems (2008)

work page 2008
[3]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2019. URL: http://arxiv.org/abs/1810.04805, arXiv:1810.04805 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[4]

Zhang, W

K. Zhang, W. Wu, H. Wu, Z. Li, M. Zhou, Question Retrieval with High Quality Answers in Community Question Answering, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, ACM, Shanghai China, 2014, pp. 371–380. URL: https://dl.acm.org/doi/10.1145/2661829.2661908. doi: 10. 1145/2661829.2661908

work page doi:10.1145/2661829.2661908 2014
[5]

Davies, Who walks, where and why? Practitioners’ observations and perspectives on recreational walkers at UK tourist destinations, Annals of Leisure Research 21 (2018) 553–

N. Davies, Who walks, where and why? Practitioners’ observations and perspectives on recreational walkers at UK tourist destinations, Annals of Leisure Research 21 (2018) 553–

work page 2018
[6]

doi:10.1080/11745398.2016

URL: https://doi.org/10.1080/11745398.2016.1250648. doi:10.1080/11745398.2016. 1250648, publisher: Routledge _eprint: https://doi.org/10.1080/11745398.2016.1250648

work page doi:10.1080/11745398.2016.1250648 2016
[7]

Molokáč, J

M. Molokáč, J. Hlaváčová, D. Tometzová, E. Liptáková, The Preference Analysis for Hikers’ Choice of Hiking Trail, Sustainability 14 (2022) 6795. URL: https://www.mdpi. com/2071-1050/14/11/6795. doi:10.3390/su14116795, number: 11 Publisher: Multidisci- plinary Digital Publishing Institute

work page doi:10.3390/su14116795 2022
[8]

R. B. Hull, W. P. Stewart, The Landscape Encountered and Experienced While Hiking, Envi- ronment and Behavior 27 (1995) 404–426. URL: https://doi.org/10.1177/0013916595273007. doi:10.1177/0013916595273007, publisher: SAGE Publications Inc

work page doi:10.1177/0013916595273007 1995
[9]

Ballatore, S

A. Ballatore, S. Cavazzi, J. Morley, The context of outdoor walking: A classification of user-generated routes, The Geographical Journal 189 (2023) 485–500. URL: https:// onlinelibrary.wiley.com/doi/abs/10.1111/geoj.12511. doi:10.1111/geoj.12511, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/geoj.12511

work page doi:10.1111/geoj.12511 2023
[10]

GeoPandas 0.dev+untagged — GeoPandas 0+untagged.50.g5558c35.dirty documentation,

work page
[11]

URL: https://geopandas.org/en/stable/index.html

work page
[12]

Petrak, N

D. Petrak, N. S. Moosavi, I. Gurevych, Arithmetic-Based Pretraining – Improving Nu- meracy of Pretrained Language Models, 2023. URL: http://arxiv.org/abs/2205.06733, arXiv:2205.06733 [cs]

work page arXiv 2023
[13]

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, M. Zhou, MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers, 2020. URL: http://arxiv.org/abs/2002.10957. doi:10.48550/arXiv.2002.10957, arXiv:2002.10957 [cs]

work page doi:10.48550/arxiv.2002.10957 2020
[14]

V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2020. URL: http://arxiv.org/abs/1910.01108. doi: 10.48550/ arXiv.1910.01108, arXiv:1910.01108 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[15]

K. Song, X. Tan, T. Qin, J. Lu, T.-Y. Liu, MPNet: Masked and Permuted Pre-training for Language Understanding (2020)

work page 2020
[16]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. McNamara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, T. Wang, MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2018. URL: http://arxiv. org/abs/1611.09268. doi:10.48550/arXiv.1611.09268, arXiv:1611.09268 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.09268 2018
[17]

URL: https:// huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1

sentence-transformers/multi-qa-MiniLM-L6-cos-v1 · Hugging Face, 2024. URL: https:// huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1

work page 2024
[18]

L. T. Sarjakoski, P. Kettunen, H.-M. Flink, M. Laakso, M. Rönneberg, T. Sarjakoski, Analysis of verbal route descriptions and landmarks for hiking, Personal and Ubiq- uitous Computing 16 (2012) 1001–1011. URL: https://doi.org/10.1007/s00779-011-0460-7. doi:10.1007/s00779-011-0460-7

work page doi:10.1007/s00779-011-0460-7 2012
[19]

Calbimonte, S

J.-P. Calbimonte, S. Martin, D. Calvaresi, N. Zappelaz, A. Cotting, Semantic Data Models for Hiking Trail Difficulty Assessment, in: J. Neidhardt, W. Wörndl (Eds.), Information and Communication Technologies in Tourism 2020, Springer International Publishing, Cham, 2020, pp. 295–306. doi:10.1007/978-3-030-36737-4_24

work page doi:10.1007/978-3-030-36737-4_24 2020
[20]

Bellogín, P

A. Bellogín, P. Castells, I. Cantador, Statistical biases in Information Retrieval metrics for recommender systems, Information Retrieval Journal 20 (2017) 606–634. URL: https: //doi.org/10.1007/s10791-017-9312-z. doi: 10.1007/s10791-017-9312-z

work page doi:10.1007/s10791-017-9312-z 2017
[21]

S. M. Club, Scottish Mountaineering Club Journal, Scottish Mountaineering Club., 1893. A. Tables Table 1 Route attributes Attribute Description Dataset length_m Route length in metres - total_gain Total elevation gain (ascent) OS Terrain 5 total_loss Total elevation loss (descent) OS Terrain 5 grade Hiking grade, calculated as (total_gain ÷ length_m × 100...

work page

[1] [1]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings using Siamese BERT- Networks, 2019. URL: http://arxiv.org/abs/1908.10084, arXiv:1908.10084 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[2] [2]

W. Wei, P. M. Barnaghi, A. Bargiela, Search with Meanings: An Overview of Semantic Search Systems (2008)

work page 2008

[3] [3]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2019. URL: http://arxiv.org/abs/1810.04805, arXiv:1810.04805 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[4] [4]

Zhang, W

K. Zhang, W. Wu, H. Wu, Z. Li, M. Zhou, Question Retrieval with High Quality Answers in Community Question Answering, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, ACM, Shanghai China, 2014, pp. 371–380. URL: https://dl.acm.org/doi/10.1145/2661829.2661908. doi: 10. 1145/2661829.2661908

work page doi:10.1145/2661829.2661908 2014

[5] [5]

Davies, Who walks, where and why? Practitioners’ observations and perspectives on recreational walkers at UK tourist destinations, Annals of Leisure Research 21 (2018) 553–

N. Davies, Who walks, where and why? Practitioners’ observations and perspectives on recreational walkers at UK tourist destinations, Annals of Leisure Research 21 (2018) 553–

work page 2018

[6] [6]

doi:10.1080/11745398.2016

URL: https://doi.org/10.1080/11745398.2016.1250648. doi:10.1080/11745398.2016. 1250648, publisher: Routledge _eprint: https://doi.org/10.1080/11745398.2016.1250648

work page doi:10.1080/11745398.2016.1250648 2016

[7] [7]

Molokáč, J

M. Molokáč, J. Hlaváčová, D. Tometzová, E. Liptáková, The Preference Analysis for Hikers’ Choice of Hiking Trail, Sustainability 14 (2022) 6795. URL: https://www.mdpi. com/2071-1050/14/11/6795. doi:10.3390/su14116795, number: 11 Publisher: Multidisci- plinary Digital Publishing Institute

work page doi:10.3390/su14116795 2022

[8] [8]

R. B. Hull, W. P. Stewart, The Landscape Encountered and Experienced While Hiking, Envi- ronment and Behavior 27 (1995) 404–426. URL: https://doi.org/10.1177/0013916595273007. doi:10.1177/0013916595273007, publisher: SAGE Publications Inc

work page doi:10.1177/0013916595273007 1995

[9] [9]

Ballatore, S

A. Ballatore, S. Cavazzi, J. Morley, The context of outdoor walking: A classification of user-generated routes, The Geographical Journal 189 (2023) 485–500. URL: https:// onlinelibrary.wiley.com/doi/abs/10.1111/geoj.12511. doi:10.1111/geoj.12511, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/geoj.12511

work page doi:10.1111/geoj.12511 2023

[10] [10]

GeoPandas 0.dev+untagged — GeoPandas 0+untagged.50.g5558c35.dirty documentation,

work page

[11] [11]

URL: https://geopandas.org/en/stable/index.html

work page

[12] [12]

Petrak, N

D. Petrak, N. S. Moosavi, I. Gurevych, Arithmetic-Based Pretraining – Improving Nu- meracy of Pretrained Language Models, 2023. URL: http://arxiv.org/abs/2205.06733, arXiv:2205.06733 [cs]

work page arXiv 2023

[13] [13]

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, M. Zhou, MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers, 2020. URL: http://arxiv.org/abs/2002.10957. doi:10.48550/arXiv.2002.10957, arXiv:2002.10957 [cs]

work page doi:10.48550/arxiv.2002.10957 2020

[14] [14]

V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2020. URL: http://arxiv.org/abs/1910.01108. doi: 10.48550/ arXiv.1910.01108, arXiv:1910.01108 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[15] [15]

K. Song, X. Tan, T. Qin, J. Lu, T.-Y. Liu, MPNet: Masked and Permuted Pre-training for Language Understanding (2020)

work page 2020

[16] [16]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. McNamara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, T. Wang, MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2018. URL: http://arxiv. org/abs/1611.09268. doi:10.48550/arXiv.1611.09268, arXiv:1611.09268 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.09268 2018

[17] [17]

URL: https:// huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1

sentence-transformers/multi-qa-MiniLM-L6-cos-v1 · Hugging Face, 2024. URL: https:// huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1

work page 2024

[18] [18]

L. T. Sarjakoski, P. Kettunen, H.-M. Flink, M. Laakso, M. Rönneberg, T. Sarjakoski, Analysis of verbal route descriptions and landmarks for hiking, Personal and Ubiq- uitous Computing 16 (2012) 1001–1011. URL: https://doi.org/10.1007/s00779-011-0460-7. doi:10.1007/s00779-011-0460-7

work page doi:10.1007/s00779-011-0460-7 2012

[19] [19]

Calbimonte, S

J.-P. Calbimonte, S. Martin, D. Calvaresi, N. Zappelaz, A. Cotting, Semantic Data Models for Hiking Trail Difficulty Assessment, in: J. Neidhardt, W. Wörndl (Eds.), Information and Communication Technologies in Tourism 2020, Springer International Publishing, Cham, 2020, pp. 295–306. doi:10.1007/978-3-030-36737-4_24

work page doi:10.1007/978-3-030-36737-4_24 2020

[20] [20]

Bellogín, P

A. Bellogín, P. Castells, I. Cantador, Statistical biases in Information Retrieval metrics for recommender systems, Information Retrieval Journal 20 (2017) 606–634. URL: https: //doi.org/10.1007/s10791-017-9312-z. doi: 10.1007/s10791-017-9312-z

work page doi:10.1007/s10791-017-9312-z 2017

[21] [21]

S. M. Club, Scottish Mountaineering Club Journal, Scottish Mountaineering Club., 1893. A. Tables Table 1 Route attributes Attribute Description Dataset length_m Route length in metres - total_gain Total elevation gain (ascent) OS Terrain 5 total_loss Total elevation loss (descent) OS Terrain 5 grade Hiking grade, calculated as (total_gain ÷ length_m × 100...

work page