Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
Pith reviewed 2026-06-26 06:14 UTC · model grok-4.3
The pith
Multi-vector embeddings with m vectors require single-vector dimension (ε² m)^Ω(1/ε) to approximate their Chamfer similarities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
There exist collections of multi-vector embeddings in R^d each containing at most m vectors such that any single-vector embeddings in R^D satisfying |<q_i, d_j> - Chamfer(Q_i, X_j)| ≤ ε for all i,j must have D = (ε² m)^Ω(1/ε), for dataset sizes n ≤ 2^{poly(m)}. The proof constructs the hard instance so that the Chamfer matrix encodes the NAND_k function, which forces the stated dimension lower bound.
What carries the argument
The Pattern Matrix Method construction that encodes the NAND_k boolean function into the Chamfer similarity matrix of the chosen multi-vector sets.
If this is right
- The MUVERA upper bound of D = m^{O(1/ε²)} cannot be strengthened to D linear in m d.
- Multi-vector embeddings can express similarity structures that single-vector embeddings cannot approximate even roughly at comparable representation size.
- Any attempt to replace multi-vector retrieval with single-vector embeddings must accept either large approximation error or a super-polynomial increase in dimension for constant ε.
Where Pith is reading between the lines
- Retrieval systems may need to retain multi-vector storage rather than compress to single vectors if they aim to preserve the full range of expressible similarities.
- The NAND_k encoding suggests that other boolean functions could be used to derive similar separations for different similarity measures.
- Practical dimension-reduction techniques for multi-vector models would have to operate in a regime that respects this exponential gap.
Load-bearing premise
The Pattern Matrix Method produces a Chamfer similarity matrix that exactly encodes the NAND_k boolean function for the chosen multi-vector sets, forcing the dimension lower bound under the n ≤ 2^{poly(m)} dataset size restriction.
What would settle it
A calculation for small fixed m and ε that exhibits a single-vector dimension smaller than (ε² m)^Ω(1/ε) while still approximating the constructed Chamfer matrix within ε would falsify the separation.
read the original abstract
Multi-vector (MV) embeddings have become a powerful paradigm in neural information retrieval (IR), achieving high retrieval accuracy by representing data with multiple vectors and scoring them via the non-linear Chamfer similarity. Despite their widely perceived superiority over single-vector (SV) embeddings which use inner product similarity, to date there is no formal proof that SV similarities cannot approximate MV similarities with the same representation size. Specifically, we ask the following: for any bounded dataset size $n \leq 2^{poly(m)}$, what is the smallest dimension $D$ so that given any collection of MV embeddings $Q_1,\dots,Q_n,X_1,\dots,X_n \subset \mathbb{R}^d$ containing at most $m$ vectors each, there always exist $q_1,\dots,q_n$, $d_1,\dots,d_n \in \mathbb{R}^{D}$ satisfying $|\langle q_i, d_j \rangle - \texttt{Chamfer}(Q_i,X_j)| \leq \epsilon$ for all $i,j$? Recently, the MUVERA algorithm demonstrated that $D = m^{O(1/\epsilon^2)}$ is possible. If improved to $D = md$, this would imply that MV embeddings are no more expressive than SV embeddings. In this paper, we rule out this scenario. Specifically, we prove the existence of a collection of MV embeddings in $\mathbb{R}^d$, each containing at most $m$ vectors, which require single-vector dimension of $D =(\epsilon^2 m)^{\Omega(1/\epsilon)}$ to approximate, establishing a strong separation in representation size between MV and SV embeddings. Our proof leverages the Pattern Matrix Method by constructing a hard instance whose Chamfer similarity matrix encodes the $NAND_k$ boolean function. Our results confirm a long-held belief in the IR community: at a fixed representation size, multi-vector embeddings can express similarities which cannot even be approximately represented by single vector embeddings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to prove that multi-vector embeddings with at most m vectors each in R^d are strictly more expressive than single-vector embeddings: for datasets of size n ≤ 2^{poly(m)}, there exist MV sets whose Chamfer similarities cannot be ε-approximated by any inner-product matrix of dimension smaller than D = (ε² m)^Ω(1/ε). The separation is obtained by constructing a hard instance via the Pattern Matrix Method whose Chamfer matrix exactly encodes the NAND_k communication matrix.
Significance. If the lower-bound construction holds, the result supplies the first rigorous separation between MV and SV representation power at fixed size, ruling out the possibility that D = m d suffices and confirming a widely held intuition in the IR community. The application of communication-complexity techniques (Pattern Matrix Method) to obtain an explicit dimension lower bound on approximate rank is a methodological strength.
major comments (1)
- [Proof of the main lower bound via Pattern Matrix Method] The central lower-bound claim (abstract and proof of the main theorem) rests on the assertion that the constructed Chamfer similarity matrix exactly equals the NAND_k communication matrix. Because Chamfer is a nonlinear aggregate over the m vectors, the vector placement in R^d must be shown to produce an exact encoding; any approximation error or perturbation that permits a lower-dimensional SV matrix to achieve the ε-approximation would invalidate the inherited pattern-matrix dimension lower bound. No explicit coordinates, error analysis, or edge-case verification for this encoding appear in the provided description.
minor comments (1)
- The abstract should state the precise definition of Chamfer similarity employed (max versus sum of pairwise inner products) and the precise range of d for which the MV sets live.
Simulated Author's Rebuttal
We thank the referee for their thorough review and for identifying the need for greater clarity in the lower-bound construction. We address the major comment below and will revise the manuscript to strengthen the presentation of the proof details.
read point-by-point responses
-
Referee: [Proof of the main lower bound via Pattern Matrix Method] The central lower-bound claim (abstract and proof of the main theorem) rests on the assertion that the constructed Chamfer similarity matrix exactly equals the NAND_k communication matrix. Because Chamfer is a nonlinear aggregate over the m vectors, the vector placement in R^d must be shown to produce an exact encoding; any approximation error or perturbation that permits a lower-dimensional SV matrix to achieve the ε-approximation would invalidate the inherited pattern-matrix dimension lower bound. No explicit coordinates, error analysis, or edge-case verification for this encoding appear in the provided description.
Authors: We agree that the encoding step requires explicit verification to ensure the inherited approximate-rank lower bound applies without error. In the full manuscript (Section 3), the construction is given by embedding the pattern matrix of NAND_k into Chamfer similarities as follows: for each row/column index corresponding to an input to NAND_k, we place m vectors in R^d (with d = O(k log n)) such that the maximum inner product over the m pairs is exactly 1 when NAND_k evaluates to true and exactly 0 otherwise. This is achieved by dedicating one vector per 'witness' bit in the pattern matrix and using orthogonal directions to isolate the max operation; the nonlinearity of Chamfer is precisely what allows the exact match. The subsequent ε-approximation by an SV inner-product matrix is then ruled out by the known (ε² m)^Ω(1/ε) lower bound on the approximate rank of the NAND_k pattern matrix (via the Pattern Matrix Method). We will add an expanded subsection (new Section 3.3) containing the coordinate assignments, a formal proof that Chamfer equals the communication matrix with zero error, and verification for the edge cases (m=1, ε→0, and n=2^{poly(m)}). revision: yes
Circularity Check
No circularity: external communication-complexity lower bound via Pattern Matrix Method
full rationale
The paper derives its separation by constructing MV sets in R^d whose Chamfer matrix exactly encodes NAND_k (under the stated n bound) and then invoking the known approximate-rank lower bound for that function from the Pattern Matrix Method. This is a self-contained mathematical reduction against an external technique; no equations reduce to fitted parameters, self-definitions, or load-bearing self-citations. The central claim therefore stands on independent content.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Pattern Matrix Method yields communication lower bounds that translate directly into dimension lower bounds for matrix approximation.
Reference graph
Works this paper leans on
-
[1]
On strengths and limitations of single-vector embeddings.arXiv preprint arXiv:2603.29519,
[AGK+26] Mihir Agarwal, Ankit Garg, Neeraj Kayal, Kirankumar Shiragur, et al. On strengths and limitations of single-vector embeddings.arXiv preprint arXiv:2603.29519,
-
[2]
Muvera: Multi-vector retrieval via fixed dimensional encodings.arXiv preprint arXiv:2405.19504,
[DHJ+24] Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, and Vahab Mir- rokni. Muvera: Multi-vector retrieval via fixed dimensional encodings.arXiv preprint arXiv:2405.19504,
-
[3]
Colpali: Efficient document retrieval with vision language models
[FSW+25] Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, C´ eline Hudelot, and Pierre Colombo. Colpali: Efficient document retrieval with vision language models. 13 InInternational Conference on Learning Representations, volume 2025, pages 61424– 61449,
2025
-
[4]
[GDC21] Luyu Gao, Zhuyun Dai, and Jamie Callan. Coil: Revisit exact lexical match in infor- mation retrieval with contextualized inverted list.arXiv preprint arXiv:2104.07186,
-
[5]
Approximate algorithms for chamfer distance under translation.arXiv preprint arXiv:2605.25280,
[HZZ26] Gil Halevi, Daniel Zhang, and Jason Zhang. Approximate algorithms for chamfer distance under translation.arXiv preprint arXiv:2605.25280,
-
[6]
[LMSC25] Vihan Lakshman, Blaise Munyampirwa, Julian Shun, and Benjamin Coleman. Break- ing the curse of dimensionality: On the stability of modern vector retrieval.arXiv preprint arXiv:2512.12458,
-
[7]
Mteb: Massive text embedding benchmark.arXiv preprint arXiv:2210.07316,
[MTMR22] Niklas Muennighoff, Nouamane Tazi, Lo¨ ıc Magne, and Nils Reimers. Mteb: Massive text embedding benchmark.arXiv preprint arXiv:2210.07316,
-
[8]
Efficient multi-vector dense retrieval using bit vectors.arXiv preprint arXiv:2404.02805,
[NRV24] Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini. Efficient multi-vector dense retrieval using bit vectors.arXiv preprint arXiv:2404.02805,
-
[9]
Multi-vector retrieval as sparse align- ment.arXiv preprint arXiv:2211.01267,
[QLD+22] Yujie Qian, Jinhyuk Lee, Sai Meher Karthik Duddu, Zhuyun Dai, Siddhartha Brahma, Iftekhar Naim, Tao Lei, and Vincent Y Zhao. Multi-vector retrieval as sparse align- ment.arXiv preprint arXiv:2211.01267,
-
[10]
[SKSF+21] Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. Colbertv2: effective and efficient retrieval via lightweight late interaction (2022).URL preprint arXiv:2112.01488,
arXiv 2022
-
[11]
On the theoretical limitations of embedding-based retrieval.arXiv preprint arXiv:2508.21038,
[WBNL25] Orion Weller, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. On the theoretical limitations of embedding-based retrieval.arXiv preprint arXiv:2508.21038,
-
[12]
Pseudo-relevance feedback for multiple representation dense retrieval
[WMTO21] Xiao Wang, Craig Macdonald, Nicola Tonellotto, and Iadh Ounis. Pseudo-relevance feedback for multiple representation dense retrieval. InProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, pages 297–306,
2021
-
[13]
Filip: Fine-grained interactive language-image pre-training.arXiv preprint arXiv:2111.07783,
[YHH+21] Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiao- dan Liang, Zhenguo Li, Xin Jiang, and Chunjing Xu. Filip: Fine-grained interactive language-image pre-training.arXiv preprint arXiv:2111.07783,
-
[14]
Neural information retrieval: A literature review.arXiv preprint arXiv:1611.06792,
[ZRB+16] Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, et al. Neural information retrieval: A literature review.arXiv preprint arXiv:1611.06792,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.