Vector Retrieval with Similarity and Diversity: How Hard Is It?

Dong Deng; Hang Gao; Yongfeng Zhang

arxiv: 2407.04573 · v4 · pith:RYVYPHH3new · submitted 2024-07-05 · 💻 cs.IR · cs.CL

Vector Retrieval with Similarity and Diversity: How Hard Is It?

Hang Gao , Dong Deng , Yongfeng Zhang This is my paper

Pith reviewed 2026-05-25 08:40 UTC · model grok-4.3

classification 💻 cs.IR cs.CL

keywords vector retrievalsimilarity and diversityNP-completeheuristic algorithmmaximal marginal relevancedeterminantal point processesdense vector search

0 comments

The pith

Maximizing the dot product between a query vector and the sum of selected vectors defines an NP-complete retrieval problem.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines Vector Retrieval with Similarity and Diversity as the task of selecting a fixed-size set of vectors that maximizes the dot product of the query with their vector sum. This single objective is shown to encode both individual similarity and collective diversity. The authors prove that solving this optimization exactly is NP-complete. They then introduce a parameter-free heuristic to find good solutions in practice. Experiments across datasets indicate the heuristic improves on Maximal Marginal Relevance and determinantal point processes using both geometric scores and LLM-based judgments.

Core claim

The paper establishes that the VRSD problem—maximizing the dot-product similarity between a query vector and the sum of a selected set of candidate vectors—is NP-complete. It introduces a parameter-free heuristic to solve it and shows through evaluations that the heuristic outperforms baselines such as MMR and k-DPP on multiple datasets using both objective metrics and subjective LLM assessments.

What carries the argument

The VRSD optimization problem, which encodes both similarity and diversity by maximizing the query's dot product with the sum of selected vectors.

If this is right

Exact solutions for balanced similarity-diversity retrieval cannot be computed efficiently in the worst case.
Practical systems must rely on heuristics or approximations for this objective.
A parameter-free method removes the need for manual tuning that affects methods like MMR.
The heuristic delivers measurable gains over MMR and k-DPP on standard retrieval benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sum-based objective may relate to existing set-selection problems studied in submodular optimization.
Approximation algorithms with provable ratios could be developed specifically for this formulation.
The approach might transfer to other embedding spaces where collective coverage matters, such as in recommendation or summarization tasks.

Load-bearing premise

That the dot product between the query and the sum of selected vectors sufficiently captures both individual similarity to the query and diversity among the selected items.

What would settle it

Discovery of a polynomial-time exact algorithm for the VRSD problem on general instances, or repeated cases where the proposed heuristic underperforms a well-tuned MMR baseline on held-out data.

Figures

Figures reproduced from arXiv: 2407.04573 by Dong Deng, Hang Gao, Yongfeng Zhang.

read the original abstract

Dense vector retrieval is an important building block of modern machine learning systems, underlying applications ranging from semantic search to retrieval-augmented generation and knowledge-intensive reasoning. Beyond retrieving items that are individually similar to a query, many applications require a set of results that is also diverse, complementary, and collectively informative. Balancing similarity and diversity is therefore central to effective retrieval, but remains challenging to optimize in a stable and theoretically grounded way. Maximal Marginal Relevance (MMR) is a widely adopted heuristic for this problem, yet its reliance on a manually tuned parameter leads to optimization fluctuations and unpredictable retrieval results. More broadly, existing methods provide limited theoretical insight into how similarity and diversity interact in dense vector spaces, leaving the joint optimization problem insufficiently understood. To address these challenges, this paper introduces a novel approach that characterizes both constraints simultaneously by maximizing the similarity between the query vector and the sum of the selected candidate vectors. We formally define this optimization problem, Vector Retrieval with Similarity and Diversity (VRSD), and prove that it is NP-complete, establishing a rigorous theoretical bound on the inherent difficulty of this dual-objective retrieval. Subsequently, we present a parameter-free heuristic algorithm to solve VRSD. Extensive evaluations on multiple datasets, incorporating both objective geometric metrics and LLM-simulated subjective assessments, demonstrate that our VRSD heuristic consistently outperforms established baselines, including MMR and Determinantal Point Processes (k-DPP).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The claimed NP-completeness for VRSD cannot hold because the objective reduces exactly to top-k selection by individual similarity.

read the letter

The main thing your colleague needs to know is that the paper's central theoretical claim is incorrect. VRSD is defined as maximizing dot(q, sum of vectors in S). This expression equals the sum of the individual dot(q, v_i) terms with no cross terms or interactions. The optimum is therefore the k vectors with the highest separate similarities to q, found by a single sort. The problem is in P, so no valid NP-completeness proof exists for this objective regardless of the attempted reduction. The formulation also supplies no penalty or constraint that could enforce diversity; any set of k similar items scores the same as a diverse set as long as each item is close to q on its own. The stress-test note is right on this point. The paper does attempt something useful by seeking a parameter-free objective that avoids the tuning problems of MMR. It presents a heuristic and reports better results than MMR and k-DPP on geometric metrics plus LLM judgments across datasets. That motivation around stable diversity in retrieval is relevant for semantic search and RAG. The abstract cites the right prior baselines. The soft spot is load-bearing: the math does not support the stated claims about hardness or about capturing both similarity and diversity. The heuristic may be doing something different in practice, but then it is not optimizing the VRSD objective as written. Experiments are described only at high level, so their details cannot be checked here. This work targets researchers in information retrieval who want theoretically grounded alternatives to tuned heuristics. Because of the direct contradiction between the objective and the complexity result, it does not deliver reliable new insight. I would not bring it to a reading group and would not cite it. A serious editor should desk-reject rather than send to peer review.

Referee Report

1 major / 1 minor

Summary. The paper defines Vector Retrieval with Similarity and Diversity (VRSD) as the problem of selecting a cardinality-k set S of vectors to maximize the dot product between a query q and the sum of the vectors in S. It claims to prove that VRSD is NP-complete and proposes a parameter-free heuristic for solving it. The heuristic is evaluated on multiple datasets against baselines including MMR and k-DPP, using both geometric metrics and LLM-simulated subjective assessments, with claims of consistent outperformance.

Significance. If the NP-completeness result held, the work would supply a formal complexity characterization of a dual-objective retrieval task and a tuning-free algorithmic alternative to MMR, both of which would be useful contributions to the information-retrieval literature. The parameter-free property of the heuristic and the use of LLM-based evaluation would be additional strengths.

major comments (1)

[Abstract and §3] Abstract and §3 (VRSD definition): the objective is stated as maximizing dot(q, sum_{v in S} v) subject to |S|=k. By linearity of the inner product this equals sum_{v in S} dot(q, v), whose optimum is obtained by sorting all candidates by their individual dot products with q and taking the top k. The problem is therefore solvable in O(n log n) time and lies in P, rendering any NP-completeness proof impossible for the stated formulation. The same linearity also shows that the objective contains no cross terms or penalties that could enforce diversity.

minor comments (1)

[Experiments] The experimental section would benefit from explicit pseudocode or a reference implementation link for the proposed heuristic to support reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and for identifying this critical issue in our formulation. We acknowledge the validity of the observation and will revise the manuscript to correct the definition of VRSD, the complexity analysis, and related claims.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (VRSD definition): the objective is stated as maximizing dot(q, sum_{v in S} v) subject to |S|=k. By linearity of the inner product this equals sum_{v in S} dot(q, v), whose optimum is obtained by sorting all candidates by their individual dot products with q and taking the top k. The problem is therefore solvable in O(n log n) time and lies in P, rendering any NP-completeness proof impossible for the stated formulation. The same linearity also shows that the objective contains no cross terms or penalties that could enforce diversity.

Authors: We agree with the referee's analysis. By linearity of the inner product, the stated objective reduces exactly to selecting the k vectors with the largest individual dot products with q. This is solvable in O(n log n) time via sorting and contains no mechanism for diversity. Consequently, the NP-completeness claim cannot hold for the formulation as written in the abstract and §3. In the revised manuscript we will (1) replace the current VRSD objective with one that explicitly incorporates a diversity term (e.g., subtracting a pairwise similarity penalty among selected vectors), (2) re-establish the complexity result under the corrected definition, (3) update the abstract, §3, the heuristic description, and all experimental claims, and (4) re-run the evaluations to ensure the new formulation is meaningfully distinct from simple top-k retrieval. We view this as a necessary correction rather than a defense of the original text. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper defines VRSD via a new objective (maximizing dot-product similarity between query and vector sum of selected set) and claims an independent NP-completeness proof for that formulation. No steps reduce by construction to fitted parameters, self-citations, or prior ansatzes from the authors; the objective and complexity claim are presented as self-contained definitions without load-bearing reductions to inputs. This is the normal case of an independent theoretical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced beyond standard computational-complexity assumptions for the NP-completeness proof.

pith-pipeline@v0.9.0 · 5773 in / 1064 out tokens · 33423 ms · 2026-05-25T08:40:36.478450+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Think you have solved direct-answer question answering? try arc-da, the direct-answer ai2 reasoning challenge

Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, and Peter Clark. Think you have solved direct-answer question answering? try arc-da, the direct-answer ai2 reasoning challenge. arXiv preprint arXiv:2102.03315,

work page arXiv
[2]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901,

work page 1901
[3]

Reading wikipedia to answer open- domain questions

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer open- domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pp. 1870–1879. Association for Computational Linguistics (ACL),

work page 2017
[4]

Uprise: Universal prompt retrieval for improving zero-shot evaluation

Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Weiwei Deng, and Qi Zhang. Uprise: Universal prompt retrieval for improving zero-shot evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 12318–12337,

work page 2023
[5]

Bert: Pre-training of deep bidirectional transformers for language understanding

10 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186,

work page 2019
[6]

Leveraging passage retrieval with generative models for open domain question answering

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. In EACL 2021-16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 874–880. Association for Computational Linguistics,

work page 2021
[7]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics,

work page 2020
[8]

Jiachang Liu, Dinghan Shen, Yizhe Zhang, William B Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 100–114,

work page 2022
[9]

Man Luo, Xin Xu, Zhuyun Dai, Panupong Pasupat, Mehran Kazemi, Chitta Baral, Vaiva Imbra- saite, and Vincent Y Zhao. Dr. icl: Demonstration-retrieved in-context learning. arXiv preprint arXiv:2305.14128,

work page arXiv
[10]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2381–2391,

work page 2018
[11]

Controllable semantic parsing via retrieval augmentation

Panupong Pasupat, Yuan Zhang, and Kelvin Guu. Controllable semantic parsing via retrieval augmentation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7683–7698,

work page 2021
[12]

Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering

Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo...

work page 2021
[13]

Sentence-bert: Sentence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992,

work page 2019
[14]

Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking

Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, and Ji-Rong Wen. Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 2825–2835,

work page 2021
[15]

Learning to retrieve prompts for in-context learning

Ohad Rubin, Jonathan Herzig, and Jonathan Berant. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2655–2671,

work page 2022
[16]

Constrained language models yield few-shot semantic parsers

Richard Shin, Christopher Lin, Sam Thomson, Charles Chen Jr, Subhro Roy, Emmanouil Antonios Platanios, Adam Pauls, Dan Klein, Jason Eisner, and Benjamin Van Durme. Constrained language models yield few-shot semantic parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7699–7715,

work page 2021
[17]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Learning to retrieve in-context examples for large language models

Liang Wang, Nan Yang, and Furu Wei. Learning to retrieve in-context examples for large language models. arXiv preprint arXiv:2307.07164,

work page arXiv
[19]

Complementary explanations for effective in-context learning

Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, and Ramakanth Pasunuru. Complementary explanations for effective in-context learning. Findings of the Association for Computational Linguistics: ACL 2023,

work page 2023

[1] [1]

Think you have solved direct-answer question answering? try arc-da, the direct-answer ai2 reasoning challenge

Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, and Peter Clark. Think you have solved direct-answer question answering? try arc-da, the direct-answer ai2 reasoning challenge. arXiv preprint arXiv:2102.03315,

work page arXiv

[2] [2]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901,

work page 1901

[3] [3]

Reading wikipedia to answer open- domain questions

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer open- domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pp. 1870–1879. Association for Computational Linguistics (ACL),

work page 2017

[4] [4]

Uprise: Universal prompt retrieval for improving zero-shot evaluation

Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Weiwei Deng, and Qi Zhang. Uprise: Universal prompt retrieval for improving zero-shot evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 12318–12337,

work page 2023

[5] [5]

Bert: Pre-training of deep bidirectional transformers for language understanding

10 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186,

work page 2019

[6] [6]

Leveraging passage retrieval with generative models for open domain question answering

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. In EACL 2021-16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 874–880. Association for Computational Linguistics,

work page 2021

[7] [7]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics,

work page 2020

[8] [8]

Jiachang Liu, Dinghan Shen, Yizhe Zhang, William B Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 100–114,

work page 2022

[9] [9]

Man Luo, Xin Xu, Zhuyun Dai, Panupong Pasupat, Mehran Kazemi, Chitta Baral, Vaiva Imbra- saite, and Vincent Y Zhao. Dr. icl: Demonstration-retrieved in-context learning. arXiv preprint arXiv:2305.14128,

work page arXiv

[10] [10]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2381–2391,

work page 2018

[11] [11]

Controllable semantic parsing via retrieval augmentation

Panupong Pasupat, Yuan Zhang, and Kelvin Guu. Controllable semantic parsing via retrieval augmentation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7683–7698,

work page 2021

[12] [12]

Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering

Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo...

work page 2021

[13] [13]

Sentence-bert: Sentence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992,

work page 2019

[14] [14]

Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking

Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, and Ji-Rong Wen. Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 2825–2835,

work page 2021

[15] [15]

Learning to retrieve prompts for in-context learning

Ohad Rubin, Jonathan Herzig, and Jonathan Berant. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2655–2671,

work page 2022

[16] [16]

Constrained language models yield few-shot semantic parsers

Richard Shin, Christopher Lin, Sam Thomson, Charles Chen Jr, Subhro Roy, Emmanouil Antonios Platanios, Adam Pauls, Dan Klein, Jason Eisner, and Benjamin Van Durme. Constrained language models yield few-shot semantic parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7699–7715,

work page 2021

[17] [17]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Learning to retrieve in-context examples for large language models

Liang Wang, Nan Yang, and Furu Wei. Learning to retrieve in-context examples for large language models. arXiv preprint arXiv:2307.07164,

work page arXiv

[19] [19]

Complementary explanations for effective in-context learning

Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, and Ramakanth Pasunuru. Complementary explanations for effective in-context learning. Findings of the Association for Computational Linguistics: ACL 2023,

work page 2023