Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT
Pith reviewed 2026-05-17 21:22 UTC · model grok-4.3
The pith
Language-model embeddings in hyperbolic space enable effective hierarchical retrieval from SNOMED CT for queries using out-of-vocabulary terms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that projecting language model-based embeddings of SNOMED CT concepts into hyperbolic space supports efficient and accurate subsumption inference between an arbitrary out-of-vocabulary textual query and any concept in the ontology, leading to better hierarchical retrieval performance than existing embedding and lexical baselines.
What carries the argument
Hyperbolic embeddings of ontology concepts generated from language models, used to model and infer subsumption relations for OOV queries.
If this is right
- Retrieval of the most specific subsumers and less relevant ancestors for OOV queries outperforms SBERT, SapBERT, and lexical methods on the constructed datasets.
- The approach supports efficient subsumption inference in hyperbolic space for hierarchical concept matching.
- Evaluation datasets and code are made available to support further research on ontology retrieval.
- The method extends in principle to other biomedical or hierarchical ontologies.
Where Pith is reading between the lines
- Such retrieval could enhance search capabilities in medical databases where users phrase questions in everyday language.
- Integrating this with electronic health record systems might reduce errors from mismatched terminology.
- Testing on queries from different medical subfields could reveal where the hyperbolic projection provides the largest gains over flat embeddings.
Load-bearing premise
The assumption that language-model embeddings projected into hyperbolic space reliably encode the subsumption relation between an arbitrary OOV textual query and an arbitrary SNOMED CT concept.
What would settle it
A new test set of OOV queries with expert-annotated correct subsumers where the hyperbolic method does not outperform or match the performance of standard Euclidean embeddings or lexical matching.
Figures
read the original abstract
SNOMED CT is a biomedical ontology with a hierarchical representation, modelling terminological concepts at a large scale. Knowledge retrieval in SNOMED CT is critical for its application but often proves challenging due to linguistic ambiguity, synonymy, polysemy, and so on. This problem is exacerbated when the queries are out-of-vocabulary (OOV), i.e., lacking any equivalent matches in the ontology. In this work, we focus on the problem of hierarchical concept retrieval from SNOMED CT with OOV queries, and propose an approach driven by utilising language model-based ontology embeddings, which represent hierarchical concepts in a hyperbolic space for enabling efficient subsumption inference between a textual query and an arbitrary concept. For evaluation, we construct three datasets where OOV queries are annotated against SNOMED CT concepts, testing the retrieval of the most specific subsumers and their less relevant ancestors. We find that our method outperforms the baselines, including SBERT, SapBERT, and two lexical matching methods. While evaluated against SNOMED CT, the approach is generalisable and can be extended to other ontologies. We release all the experiment codes and datasets at https://github.com/jonathondilworth/HR-OOV-SNOMED-CT.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes using language-model embeddings of SNOMED CT concepts and OOV textual queries, projected into hyperbolic space, to perform hierarchical retrieval by inferring subsumption relations. It constructs three evaluation datasets with annotated OOV queries, reports that the method outperforms SBERT, SapBERT, and two lexical baselines on retrieval of most-specific subsumers and ancestors, and claims generalizability to other ontologies while releasing code and data.
Significance. If the central empirical claim holds, the work offers a practical approach to OOV query handling in large biomedical ontologies, where linguistic variation is common. The open release of datasets and code strengthens reproducibility. However, the significance is tempered by the lack of detailed quantitative results or validation of the hyperbolic projection's ability to recover the partial order for arbitrary unseen queries.
major comments (2)
- [Abstract and §4] Abstract and §4 (Evaluation): The claim that the method 'outperforms the baselines' is presented without any numerical results, error bars, statistical significance tests, or details on OOV query selection and annotation in the abstract; the central empirical claim therefore cannot be assessed for magnitude or reliability from the provided summary.
- [§3.2] §3.2 (Hyperbolic Projection): The assumption that off-the-shelf LM embeddings projected into hyperbolic space will place arbitrary OOV query vectors such that the most specific subsumer lies in the correct hierarchical position is load-bearing for the outperformance claim, yet no theoretical argument or ablation is given to show why standard Euclidean LM training plus projection recovers the SNOMED CT DAG partial order for unseen biomedical phrasing.
minor comments (2)
- [§2] §2 (Related Work): The discussion of prior hyperbolic embeddings for ontologies could usefully cite additional recent work on Poincaré embeddings for taxonomies to better situate the contribution.
- [Figure 1] Figure 1: The diagram illustrating the embedding and projection pipeline would benefit from explicit labels on the hyperbolic space component to clarify the subsumption inference step.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's comments. We have carefully considered each point and provide our responses below. We agree with several suggestions and will make revisions to improve the clarity and completeness of the manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Evaluation): The claim that the method 'outperforms the baselines' is presented without any numerical results, error bars, statistical significance tests, or details on OOV query selection and annotation in the abstract; the central empirical claim therefore cannot be assessed for magnitude or reliability from the provided summary.
Authors: We thank the referee for pointing this out. While the full evaluation details and numerical results are presented in Section 4 with tables comparing our method to SBERT, SapBERT, and lexical baselines, we agree that the abstract should provide a more concrete summary of the key findings. In the revised version, we will include specific metrics, such as the improvement in retrieval performance for most-specific subsumers, along with a brief mention of the dataset construction. For Section 4, we will ensure error bars are included if not already present and add statistical significance tests to the comparisons. Details on OOV query selection and annotation are described in the dataset construction subsection, but we will add a summary sentence in the abstract. revision: yes
-
Referee: [§3.2] §3.2 (Hyperbolic Projection): The assumption that off-the-shelf LM embeddings projected into hyperbolic space will place arbitrary OOV query vectors such that the most specific subsumer lies in the correct hierarchical position is load-bearing for the outperformance claim, yet no theoretical argument or ablation is given to show why standard Euclidean LM training plus projection recovers the SNOMED CT DAG partial order for unseen biomedical phrasing.
Authors: We recognize the importance of justifying the use of hyperbolic projection for recovering hierarchical relations with OOV queries. Our work is motivated by prior literature on hyperbolic embeddings for hierarchical data, where the geometry naturally accommodates tree-like structures. The empirical results across three datasets demonstrate that this approach yields superior performance compared to Euclidean baselines. We will add a more detailed discussion in Section 3.2 explaining the rationale, including references to works on hyperbolic space for ontologies. Regarding ablations, we have conducted experiments comparing hyperbolic and Euclidean projections, which we will highlight more explicitly in the revised manuscript or appendix. revision: partial
Circularity Check
Empirical evaluation against external baselines; no derivation reduces to internal fit or self-citation
full rationale
The paper describes an embedding-based retrieval method (LM embeddings projected to hyperbolic space) and reports empirical outperformance on three constructed OOV query datasets against published baselines (SBERT, SapBERT, lexical matchers). No equations, predictions, or uniqueness claims are shown to reduce by construction to parameters fitted inside the paper or to prior self-citations that bear the central load. The evaluation is externally falsifiable via the released code and datasets.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hyperbolic space can faithfully embed hierarchical relations so that subsumption corresponds to a geometric relation between embeddings
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
They both jointly encode text labels by taking a language model and efficiently embed the hierarchy in a hyperbolic space, allowing direct subsumption inference ... s(C ⊑ D) := −(d_κ(x_C, x_D) + λ(∥x_D∥_κ − ∥x_C∥_κ))
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HiT incorporates textual information by tuning a pre-trained language model (PLM) using contrastive hyperbolic objectives
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
ClinQueryAgent: A Conversational Agent for Population Health Management
The paper introduces ClinQueryAgent, a conversational agent that converts natural language queries into database queries for population health management while keeping patient data secure, and reports its use by 128 s...
Reference graph
Works this paper leans on
-
[1]
Ferishta Bakhshi-Raiez, NF de Keizer, Ronald Cornet, M Dorrepaal, Dave Don- gelmans, and Monique WM Jaspers. 2012. A usability evaluation of a SNOMED CT based compositional interface terminology for intensive care.International journal of medical informatics81, 5 (2012), 351–362
work page 2012
-
[2]
Jiaoyan Chen, Pan Hu, Ernesto Jimenez-Ruiz, Ole Magnus Holter, Denvar Antonyrajah, and Ian Horrocks. 2021. OWL2Vec*: Embedding of OWL On- tologies. doi:10.48550/arXiv.2009.14654
-
[3]
Jiaoyan Chen, Olga Mashkova, Fernando Zhapa-Camacho, Robert Hoehndorf, Yuan He, and Ian Horrocks. 2025. Ontology embedding: a survey of methods, applications and resources.IEEE Knowledge and Data Engineering(2025)
work page 2025
-
[4]
NHS England. 2025. SNOMED CT (Terminology and Classifications)
work page 2025
-
[5]
Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, and Brahmananda Sapkota. 2024. DeepOnto: A Python package for ontology engineering with deep learning.Semantic Web15, 5 (2024), 1991–2004
work page 2024
-
[6]
Yuan He, Zhangdie Yuan, Jiaoyan Chen, and Ian Horrocks. 2024. Language Models as Hierarchy Encoders. doi:10.48550/arXiv.2401.11374
-
[7]
Patel-Schneider, and Se- bastian Rudolph
Pascal Hitzler, Markus Krötzsch, Bijan Parsia, Peter F. Patel-Schneider, and Se- bastian Rudolph. 2012.OWL 2 web ontology language: Primer (second edition). W3C Recommendation. World Wide Web Consortium (W3C)
work page 2012
-
[8]
Maximilian Nickel and Douwe Kiela. 2017. Poincaré Embeddings for Learning Hierarchical Representations. doi:10.48550/arXiv.1705.08039
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1705.08039 2017
-
[9]
Sampo Pyysalo, Tomoko Ohta, Rafal Rak, Andrew Rowley, Hong-Woo Chun, Sung-Jae Jung, Sung-Pil Choi, Jun’ichi Tsujii, and Sophia Ananiadou. 2015. Overview of the cancer genetics and pathway curation tasks of bionlp shared task 2013.BMC bioinformatics16, Suppl 10 (2015), S2. Publisher: Springer
work page 2015
-
[10]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. doi:10.48550/arXiv.1908.10084
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 2019
-
[11]
Guangzhi Xiong, Qiao Jin, Zhiyong Lu, and Aidong Zhang. 2024. Benchmark- ing retrieval-augmented generation for medicine. InFindings of the association for computational linguistics ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Bangkok, Thailand and virtual meeting, 6233–6251
work page 2024
-
[12]
Hui Yang, Jiaoyan Chen, Yuan He, Yongsheng Gao, and Ian Horrocks. 2025. Language models as ontology encoders.arXiv preprint arXiv:2507.14334(2025). Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT submission to WWW’26, April 13–17, 2026, Dubai, UAE
- [13]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.