Vector Retrieval with Similarity and Diversity: How Hard Is It?
Pith reviewed 2026-05-25 08:40 UTC · model grok-4.3
The pith
Maximizing the dot product between a query vector and the sum of selected vectors defines an NP-complete retrieval problem.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the VRSD problem—maximizing the dot-product similarity between a query vector and the sum of a selected set of candidate vectors—is NP-complete. It introduces a parameter-free heuristic to solve it and shows through evaluations that the heuristic outperforms baselines such as MMR and k-DPP on multiple datasets using both objective metrics and subjective LLM assessments.
What carries the argument
The VRSD optimization problem, which encodes both similarity and diversity by maximizing the query's dot product with the sum of selected vectors.
If this is right
- Exact solutions for balanced similarity-diversity retrieval cannot be computed efficiently in the worst case.
- Practical systems must rely on heuristics or approximations for this objective.
- A parameter-free method removes the need for manual tuning that affects methods like MMR.
- The heuristic delivers measurable gains over MMR and k-DPP on standard retrieval benchmarks.
Where Pith is reading between the lines
- The sum-based objective may relate to existing set-selection problems studied in submodular optimization.
- Approximation algorithms with provable ratios could be developed specifically for this formulation.
- The approach might transfer to other embedding spaces where collective coverage matters, such as in recommendation or summarization tasks.
Load-bearing premise
That the dot product between the query and the sum of selected vectors sufficiently captures both individual similarity to the query and diversity among the selected items.
What would settle it
Discovery of a polynomial-time exact algorithm for the VRSD problem on general instances, or repeated cases where the proposed heuristic underperforms a well-tuned MMR baseline on held-out data.
Figures
read the original abstract
Dense vector retrieval is an important building block of modern machine learning systems, underlying applications ranging from semantic search to retrieval-augmented generation and knowledge-intensive reasoning. Beyond retrieving items that are individually similar to a query, many applications require a set of results that is also diverse, complementary, and collectively informative. Balancing similarity and diversity is therefore central to effective retrieval, but remains challenging to optimize in a stable and theoretically grounded way. Maximal Marginal Relevance (MMR) is a widely adopted heuristic for this problem, yet its reliance on a manually tuned parameter leads to optimization fluctuations and unpredictable retrieval results. More broadly, existing methods provide limited theoretical insight into how similarity and diversity interact in dense vector spaces, leaving the joint optimization problem insufficiently understood. To address these challenges, this paper introduces a novel approach that characterizes both constraints simultaneously by maximizing the similarity between the query vector and the sum of the selected candidate vectors. We formally define this optimization problem, Vector Retrieval with Similarity and Diversity (VRSD), and prove that it is NP-complete, establishing a rigorous theoretical bound on the inherent difficulty of this dual-objective retrieval. Subsequently, we present a parameter-free heuristic algorithm to solve VRSD. Extensive evaluations on multiple datasets, incorporating both objective geometric metrics and LLM-simulated subjective assessments, demonstrate that our VRSD heuristic consistently outperforms established baselines, including MMR and Determinantal Point Processes (k-DPP).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines Vector Retrieval with Similarity and Diversity (VRSD) as the problem of selecting a cardinality-k set S of vectors to maximize the dot product between a query q and the sum of the vectors in S. It claims to prove that VRSD is NP-complete and proposes a parameter-free heuristic for solving it. The heuristic is evaluated on multiple datasets against baselines including MMR and k-DPP, using both geometric metrics and LLM-simulated subjective assessments, with claims of consistent outperformance.
Significance. If the NP-completeness result held, the work would supply a formal complexity characterization of a dual-objective retrieval task and a tuning-free algorithmic alternative to MMR, both of which would be useful contributions to the information-retrieval literature. The parameter-free property of the heuristic and the use of LLM-based evaluation would be additional strengths.
major comments (1)
- [Abstract and §3] Abstract and §3 (VRSD definition): the objective is stated as maximizing dot(q, sum_{v in S} v) subject to |S|=k. By linearity of the inner product this equals sum_{v in S} dot(q, v), whose optimum is obtained by sorting all candidates by their individual dot products with q and taking the top k. The problem is therefore solvable in O(n log n) time and lies in P, rendering any NP-completeness proof impossible for the stated formulation. The same linearity also shows that the objective contains no cross terms or penalties that could enforce diversity.
minor comments (1)
- [Experiments] The experimental section would benefit from explicit pseudocode or a reference implementation link for the proposed heuristic to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed review and for identifying this critical issue in our formulation. We acknowledge the validity of the observation and will revise the manuscript to correct the definition of VRSD, the complexity analysis, and related claims.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (VRSD definition): the objective is stated as maximizing dot(q, sum_{v in S} v) subject to |S|=k. By linearity of the inner product this equals sum_{v in S} dot(q, v), whose optimum is obtained by sorting all candidates by their individual dot products with q and taking the top k. The problem is therefore solvable in O(n log n) time and lies in P, rendering any NP-completeness proof impossible for the stated formulation. The same linearity also shows that the objective contains no cross terms or penalties that could enforce diversity.
Authors: We agree with the referee's analysis. By linearity of the inner product, the stated objective reduces exactly to selecting the k vectors with the largest individual dot products with q. This is solvable in O(n log n) time via sorting and contains no mechanism for diversity. Consequently, the NP-completeness claim cannot hold for the formulation as written in the abstract and §3. In the revised manuscript we will (1) replace the current VRSD objective with one that explicitly incorporates a diversity term (e.g., subtracting a pairwise similarity penalty among selected vectors), (2) re-establish the complexity result under the corrected definition, (3) update the abstract, §3, the heuristic description, and all experimental claims, and (4) re-run the evaluations to ensure the new formulation is meaningfully distinct from simple top-k retrieval. We view this as a necessary correction rather than a defense of the original text. revision: yes
Circularity Check
No circularity detected in derivation chain
full rationale
The paper defines VRSD via a new objective (maximizing dot-product similarity between query and vector sum of selected set) and claims an independent NP-completeness proof for that formulation. No steps reduce by construction to fitted parameters, self-citations, or prior ansatzes from the authors; the objective and complexity claim are presented as self-contained definitions without load-bearing reductions to inputs. This is the normal case of an independent theoretical contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, and Peter Clark. Think you have solved direct-answer question answering? try arc-da, the direct-answer ai2 reasoning challenge. arXiv preprint arXiv:2102.03315,
-
[2]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901,
work page 1901
-
[3]
Reading wikipedia to answer open- domain questions
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer open- domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pp. 1870–1879. Association for Computational Linguistics (ACL),
work page 2017
-
[4]
Uprise: Universal prompt retrieval for improving zero-shot evaluation
Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Weiwei Deng, and Qi Zhang. Uprise: Universal prompt retrieval for improving zero-shot evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 12318–12337,
work page 2023
-
[5]
Bert: Pre-training of deep bidirectional transformers for language understanding
10 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186,
work page 2019
-
[6]
Leveraging passage retrieval with generative models for open domain question answering
Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. In EACL 2021-16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 874–880. Association for Computational Linguistics,
work page 2021
-
[7]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics,
work page 2020
-
[8]
Jiachang Liu, Dinghan Shen, Yizhe Zhang, William B Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 100–114,
work page 2022
- [9]
-
[10]
Can a suit of armor conduct electricity? a new dataset for open book question answering
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2381–2391,
work page 2018
-
[11]
Controllable semantic parsing via retrieval augmentation
Panupong Pasupat, Yuan Zhang, and Kelvin Guu. Controllable semantic parsing via retrieval augmentation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7683–7698,
work page 2021
-
[12]
Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo...
work page 2021
-
[13]
Sentence-bert: Sentence embeddings using siamese bert-networks
Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992,
work page 2019
-
[14]
Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking
Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, and Ji-Rong Wen. Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 2825–2835,
work page 2021
-
[15]
Learning to retrieve prompts for in-context learning
Ohad Rubin, Jonathan Herzig, and Jonathan Berant. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2655–2671,
work page 2022
-
[16]
Constrained language models yield few-shot semantic parsers
Richard Shin, Christopher Lin, Sam Thomson, Charles Chen Jr, Subhro Roy, Emmanouil Antonios Platanios, Adam Pauls, Dan Klein, Jason Eisner, and Benjamin Van Durme. Constrained language models yield few-shot semantic parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7699–7715,
work page 2021
-
[17]
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Learning to retrieve in-context examples for large language models
Liang Wang, Nan Yang, and Furu Wei. Learning to retrieve in-context examples for large language models. arXiv preprint arXiv:2307.07164,
-
[19]
Complementary explanations for effective in-context learning
Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, and Ramakanth Pasunuru. Complementary explanations for effective in-context learning. Findings of the Association for Computational Linguistics: ACL 2023,
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.