Recognition: unknown
Semantic Recall for Vector Search
Pith reviewed 2026-05-09 23:28 UTC · model grok-4.3
The pith
Semantic recall is a new metric for vector search that only counts retrieval of semantically relevant nearest neighbors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors—a scenario we uncover to be common within embedding datasets. Additionally, we introduce T<f
What carries the argument
Semantic Recall, the metric that evaluates retrieval quality solely on semantically relevant objects reachable by exact nearest neighbor search.
If this is right
- Algorithms can be tuned to retrieve fewer but more relevant neighbors, improving cost-quality tradeoffs without inflating recall scores.
- Evaluation on embedding datasets will reveal that many current high-recall methods perform worse under semantic recall on queries with sparse relevant results.
- Tolerant recall provides a usable approximation when semantic labels are absent, enabling immediate application of the idea.
- Benchmarking practices in vector search shift toward metrics that separate geometric proximity from semantic utility.
Where Pith is reading between the lines
- Search system designers could de-emphasize exact embedding distance in favor of semantic filters, potentially changing index construction.
- The same distinction between relevance and proximity may apply to other similarity-based tasks such as recommendation or clustering.
- Future work could test whether training embeddings explicitly to increase the density of relevant neighbors raises semantic recall ceilings.
Load-bearing premise
Semantically relevant objects can be reliably identified or approximated for the queries in typical embedding datasets, and missing irrelevant neighbors should not count against performance.
What would settle it
An experiment that measures user satisfaction or task success on a set of real queries and shows that algorithms ranked higher by semantic recall do not produce better outcomes than those ranked higher by traditional recall.
Figures
read the original abstract
We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval quality, and that optimizing search algorithms for these metrics can lead to improved cost-quality tradeoffs.
Editorial analysis
A structured set of objections, weighed in public.
Circularity Check
No circularity: metric is a direct definitional restriction with independent empirical support
full rationale
The paper defines semantic recall explicitly as standard recall computed only over the subset of nearest neighbors that are semantically relevant to the query. This is a straightforward restriction rather than a reduction of any derived quantity back to fitted parameters or self-referential equations. The observation that queries with few relevant neighbors are common is presented as an empirical finding obtained via external labeling, not as a mathematical consequence derived from the metric itself. Tolerant Recall is introduced as a separate proxy approximation without any shown dependency that loops back to the primary metric's outputs. No self-citations, uniqueness theorems, or ansatzes are invoked in the abstract or description to justify core claims. The derivation chain remains self-contained against external benchmarks for relevance labeling.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Semantically relevant objects can be identified independently of the nearest-neighbor geometry.
Reference graph
Works this paper leans on
- [1]
-
[2]
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN- Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems87 (2020), 101374
2020
-
[3]
Federico Cabitza, Andrea Campagner, and Valerio Basile. 2023. Toward a per- spectivist turn in ground truthing for predictive computing. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 6860–6868
2023
-
[4]
Manos Chatzakis, Yannis Papakonstantinou, and Themis Palpanas. 2025. DARTH: Declarative Recall Through Early Termination for Approximate Nearest Neighbor Search.Proceedings of the ACM on Management of Data3, 4 (2025), 1–26
2025
- [5]
-
[6]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171–4186
2019
-
[8]
Yifan Ding, Nicholas Botzer, and Tim Weninger. 2022. Posthoc verification and the fallibility of the ground truth. InProceedings of the First Workshop on Dynamic Adversarial Data Collection. 23–29
2022
-
[9]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The faiss library.IEEE Transactions on Big Data(2025)
2025
-
[10]
Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi-Wing Wong. 2025. Practical and asymptotically optimal quantization of high- dimensional vectors in euclidean space for approximate nearest neighbor search. Proceedings of the ACM on Management of Data3, 3 (2025), 1–26
2025
-
[11]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. Simcse: Simple contrastive learning of sentence embeddings.arXiv preprint arXiv:2104.08821(2021)
work page internal anchor Pith review arXiv 2021
-
[12]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM, 1487–1495. doi:10.1145/3097983.3098043
-
[13]
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. InInternational Conference on Machine Learning. PMLR, 3887–3896
2020
- [14]
-
[15]
Leonardo Kuffo, Elena Krippner, and Peter Boncz. 2025. PDX: A Data Layout for Vector Similarity Search.Proceedings of the ACM on Management of Data3, 3 (2025), 1–26
2025
-
[16]
Leonardo Kuffo, Ioanna Tsakalidou, Roberta De Viti, Albert Angel, Jiří Iša, and Rastislav Lenhardt. 2026. Reproducibility: Semantic Recall for Vector Search. https://colab.research.google.com/drive/1cUnvdRP7CjeJvx5eaAzjA-J5d_ d3oUk7. Google Colaboratory notebook
2026
-
[17]
Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Ábrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, et al. 2025. Gemini embedding: Generalizable embeddings from gemini. arXiv preprint arXiv:2503.07891(2025)
work page internal anchor Pith review arXiv 2025
-
[18]
Xianming Li, Aamir Shakir, Rui Huang, Julius Lipp, and Jing Li. 2025. ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Rerank- ing.arXiv preprint arXiv:2506.03487(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824–836
2018
- [20]
-
[21]
James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Survey of vector database management systems.The VLDB Journal33, 5 (2024), 1591–1615
2024
-
[22]
Yannis Papakonstantinou, Alan Li, Ruiqi Guo, Sanjiv Kumar, and Phil Sun. 2024. ScaNN for AlloyDB. (2024). https://services.google.com/fh/files/misc/scann_ for_alloydb_whitepaper.pdf
2024
-
[23]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. InProceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543
2014
-
[24]
Jeffrey Pound, Floris Chabert, Arjun Bhushan, Ankur Goswami, Anil Pacaci, and Shihabur Rahman Chowdhury. 2025. MicroNN: An On-device Disk-resident Updatable Vector Database. InCompanion of the 2025 International Conference on Management of Data. 608–621
2025
-
[25]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research21, 140 (2020), 1–67
2020
-
[26]
2023.Introducing Embed v3
Nils Reimers, Elliott Choi, Amr Kayid, Alekhya Nandula, Manoj Govindassamy, and Abdullah Elkady. 2023.Introducing Embed v3. Cohere. https://cohere.com/ blog/introducing-embed-v3 Cohere Blog; published November 2, 2023
2023
-
[27]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084(2019)
work page internal anchor Pith review arXiv 2019
-
[28]
Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krish- naswamny, Gopal Srinivasa, et al. 2022. Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search. InNeurIPS 2021 Competitions and Demonstrations Track. PMLR, 177–189
2022
-
[29]
Philip Sun, Ruiqi Guo, and Sanjiv Kumar. 2023. Automating Nearest Neighbor Search Configuration with Constrained Optimization. InThe Eleventh Interna- tional Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/forum?id=KfptQCEKVW4
2023
-
[30]
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Wenping Wang, Yunxi Guo, Chiyao Shen, Shuai Ding, Guangdeng Liao, Hao Fu, and Pramodh Karanth Prabhakar. 2023. Integrity and junkiness failure handling for embedding-based retrieval: A case study in social network search. InProceed- ings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3250–3254
2023
- [32]
- [33]
-
[34]
Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, and Jimmy Lin. 2023. Miracl: A multilingual retrieval dataset covering 18 diverse languages. Transactions of the Association for Computational Linguistics11 (2023), 1114–1131
2023
-
[35]
Keneilwe Zuva and Tranos Zuva. 2012. Evaluation of information retrieval systems.International journal of computer science & information technology4, 3 (2012), 35
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.