Is Dimensionality a Barrier for Retrieval Models?
Pith reviewed 2026-05-25 04:53 UTC · model grok-4.3
The pith
The infinite-dimension maximal margin for any relevance matrix A is nearly achieved already in dimension O(m^{-2} log n).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any relevance matrix A the quantity m^rd(+∞, A) can be nearly recovered by unit-norm embeddings whose dimension is only O(m^rd(+∞, A)^{-2} log n). In the special case where A contains every possible k-sparse row exactly once, dimension O(k log(n/k)) is necessary and sufficient to attain the optimal margin Θ(k^{-1/2}).
What carries the argument
The margin m^rd(d, A) is the largest value m such that there exist unit-norm query vectors U_j and document vectors V_i satisfying signed inner-product separation exactly according to the entries of A.
If this is right
- For all-k-sparse queries the dimension O(k log(n/k)) is both necessary and sufficient to reach margin Θ(k^{-1/2}).
- Modern embedding sizes around 1000 already suffice for near-optimal margins on data sets with trillions of documents under the inner-product model.
- Explicit constructions exist that produce large margins even when dimension is o(k log(n/k)).
- Sigmoid loss yields larger empirical margins than InfoNCE on the tested synthetic instances.
Where Pith is reading between the lines
- If margin governs robustness and compositional generalization, then training objectives that directly target signed inner-product separation should remain effective at moderate dimensions.
- The dimension lower bound for the k-sparse case may be used to derive concrete sample-complexity requirements for learning retrieval representations.
- The communication-complexity origin of the model suggests analogous dimension bounds could apply to other separation tasks such as nearest-neighbor search under Hamming or Euclidean distance.
Load-bearing premise
The retrieval model requires exact signed inner-product separation with a uniform margin between unit-norm vectors.
What would settle it
An explicit matrix A for which every embedding achieving margin within a constant factor of m^rd(+∞, A) requires dimension larger than C m^{-2} log n for any fixed C would falsify the main upper bound.
Figures
read the original abstract
Why does the low dimensionality of representations, typically $d\approx 1000$, not prevent modern embedding-based retrieval models from scaling to billions, or even trillions, of data points? To answer this question, we study maximal-margin embeddings in the following retrieval model, classically studied in communication complexity [PS86] and more recently in embedding-based retrieval [WBNL26]. Let $A\in \{0,1\}^{N\times n}$ be a matrix indicating whether each of $N$ queries is relevant to each of $n$ documents. We are interested in the largest margin $m>0,$ denoted by $\mathsf{m}^{\mathsf{rd}}(d, A),$ for which there exist unit norm embeddings of the queries and documents $\{U_j\}_{j = 1}^N, \{V_i\}_{i = 1}^n$ with the following property. $\langle U_j, V_i\rangle \ge m$ whenever $A_{ji} = 1$ and $\langle U_j, V_i\rangle \le -m$ otherwise. A large margin is a key proxy for representation quality: it controls both robustness to perturbations and compositional generalization across queries. Our main theorem establishes that the best possible margin without a restriction on the dimension, $\mathsf{m}^{\mathsf{rd}}(+\infty, A),$ can be nearly achieved in dimension $d = O(\mathsf{m}^{\mathsf{rd}}(+\infty, A)^{-2}\log n)$ which improves a theorem of [BDES02]. Together with a matching lower bound in Theorem 1.5, we conclude that when $A\in \{0,1\}^{\binom{n}{k}\times n}$ is the matrix containing all possible $k$-sparse rows once, dimension $d = O(k\log (n/k))$ is necessary and sufficient for the maximal possible margin $\mathsf{m}^{\mathsf{rd}}(+\infty, A) = \Theta(k^{-1/2})$ in this setting. This fully resolves the setup of [WBNL26]. We also give several constructions for large margins when $d = o(k\log (n/k)).$ Finally, we empirically test the InfoNCE and sigmoid losses for producing large margin embeddings and demonstrate a clear advantage of the sigmoid loss.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies maximal-margin retrieval embeddings for a 0-1 relevance matrix A, where unit-norm query and document vectors must satisfy signed inner-product separation of margin m. It proves that the infinite-dimensional optimum m^rd(+∞, A) can be recovered up to (1-o(1)) factors in dimension d = O(m^{-2} log n), improving the dimension bound of [BDES02]. For the all-k-sparse query matrix it supplies a matching lower bound showing that d = O(k log(n/k)) is necessary and sufficient to achieve the optimal margin Θ(k^{-1/2}). Additional constructions are given for d = o(k log(n/k)) and an empirical comparison of InfoNCE versus sigmoid loss is reported.
Significance. If the stated theorems hold, the work supplies a tight, parameter-free characterization of the dimension needed for optimal-margin retrieval embeddings and resolves the open question posed in [WBNL26]. The dimension-reduction result is obtained via an explicit construction together with a matching lower bound for the sparse-query case; the empirical section provides a concrete, falsifiable comparison between two standard losses. These elements together give a self-contained theoretical and practical account of why modest embedding dimensions suffice at scale.
minor comments (3)
- [Abstract] Abstract: the statement that the sigmoid loss shows 'a clear advantage' is not accompanied by any numerical values or statistical test; the claim should be supported by the specific margins or loss curves reported in the experimental section.
- [Abstract] The notation m^rd(d, A) is introduced without an explicit reference to the section containing its formal definition; a forward pointer would improve readability.
- [§1] The improvement over [BDES02] is stated only in the abstract; the introduction or related-work section should contain a one-sentence comparison of the new dimension exponent versus the prior bound.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript, the accurate summary of our results on dimension bounds for maximal-margin retrieval embeddings, and the recommendation for minor revision. The report correctly identifies the resolution of the open question from [WBNL26] via matching upper and lower bounds. No specific major comments were raised in the report.
Circularity Check
No significant circularity; mathematical theorems are self-contained
full rationale
The paper's core claims consist of mathematical theorems establishing dimension bounds for achieving near-optimal margins m^rd(∞, A) via constructions that improve on the cited result of [BDES02], together with a matching lower bound for the k-sparse case. These rest on the classical inner-product retrieval model from [PS86] and [WBNL26] but do not reduce any claimed margin or dimension to a fitted quantity, self-definition, or load-bearing self-citation chain. The derivation is independent of the present paper's own inputs and is externally verifiable as a margin-preserving embedding result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Unit-norm query and document embeddings exist that realize the optimal margin m^rd(+∞, A) for any fixed relevance matrix A
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our main theorem establishes that the best possible margin without a restriction on the dimension, m^rd(+∞, A), can be nearly achieved in dimension d = O(m^rd(+∞, A)^{-2} log n) which improves a theorem of [BDES02].
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our proofs are based on connections between this problem and the literature on compressed sensing and the restricted isometry property.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Combinatorics, Probability and Computing , volume=
Learning complexity vs communication complexity , author=. Combinatorics, Probability and Computing , volume=. 2009 , publisher=
work page 2009
-
[2]
International conference on machine learning , pages=
A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=
work page 2020
-
[3]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
With a little help from my friends: Nearest-neighbor contrastive learning of visual representations , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[4]
An algorithmic theory of learning: Robust concepts and random projection , author=. Machine Learning , volume=. 2006 , publisher=
work page 2006
-
[5]
Proceedings of the Symposium on the Mathematical Theory of Automata , volume=
On convergence proofs on perceptrons , author=. Proceedings of the Symposium on the Mathematical Theory of Automata , volume=. 1962 , organization=
work page 1962
-
[6]
Kernels as features: On kernels, margins, and low-dimensional mappings , author=. Machine Learning , volume=. 2006 , publisher=
work page 2006
-
[7]
IEEE Transactions on Information Theory , volume=
Compressed sensing , author=. IEEE Transactions on Information Theory , volume=. 2006 , publisher=
work page 2006
-
[8]
IEEE Transactions on Big Data , volume=
Billion-scale similarity search with GPUs , author=. IEEE Transactions on Big Data , volume=. 2019 , publisher=
work page 2019
-
[9]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Product Quantization for Nearest Neighbor Search , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2011 , publisher=
work page 2011
-
[10]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2020 , publisher=
work page 2020
-
[11]
Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
B. Laurent and P. Massart , title =. The Annals of Statistics , number =. 2000 , doi =
work page 2000
-
[13]
Lower bounds in communication complexity and learning theory via analytic methods , author=
-
[14]
Geometrical realization of set systems and probabilistic communication complexity , booktitle =
Noga Alon and Peter Frankl and Vojt. Geometrical realization of set systems and probabilistic communication complexity , booktitle =. 1985 , publisher =
work page 1985
-
[15]
Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces , journal =
J. Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces , journal =. 2003 , doi =
work page 2003
-
[16]
Corinna Cortes and Vladimir Vapnik , title =. Machine Learning , volume =. 1995 , doi =
work page 1995
-
[17]
Peter L. Bartlett , title =. IEEE Transactions on Information Theory , volume =
-
[18]
Barman, Siddharth , title =. 2018 , issue_date =. doi:10.1137/15M1050574 , journal =
-
[19]
Conference on Learning Theory , pages=
Approximate is good enough: Probabilistic variants of dimensional and margin complexity , author=. Conference on Learning Theory , pages=. 2020 , organization=
work page 2020
-
[20]
Schapire and Yoav Freund and Peter Bartlett and Wee Sun Lee , title =
Robert E. Schapire and Yoav Freund and Peter Bartlett and Wee Sun Lee , title =. The Annals of Statistics , volume =. 1998 , doi =
work page 1998
- [21]
-
[22]
Annals of the Institute of Statistical Mathematics , volume=
Some geometric applications of the beta distribution , author=. Annals of the Institute of Statistical Mathematics , volume=. 1990 , publisher=
work page 1990
-
[23]
Peter L. Bartlett and Dylan J. Foster and Matus J. Telgarsky , title =. Advances in Neural Information Processing Systems 30 , editor =
-
[24]
Contemporary mathematics , volume=
Extensions of Lipschitz mappings into a Hilbert space , author=. Contemporary mathematics , volume=
-
[25]
Journal of Machine Learning Research , volume=
Limitations of learning via embeddings in Euclidean half spaces , author=. Journal of Machine Learning Research , volume=
-
[26]
Conference on Learning Theory , pages=
Sign rank versus VC dimension , author=. Conference on Learning Theory , pages=. 2016 , organization=
work page 2016
- [27]
-
[28]
Journal of Machine Learning Research , volume =
Daniel Soudry and Elad Hoffer and Mor Shpigel Nacson and Suriya Gunasekar and Nathan Srebro , title =. Journal of Machine Learning Research , volume =. 2018 , url =
work page 2018
-
[29]
Robust Large Margin Deep Neural Networks , journal =
Jure Sokoli. Robust Large Margin Deep Neural Networks , journal =. 2017 , doi =
work page 2017
-
[30]
Proceedings of the 55th Annual ACM Symposium on Theory of Computing , pages=
Local and global expansion in random geometric graphs , author=. Proceedings of the 55th Annual ACM Symposium on Theory of Computing , pages=
-
[31]
An exponential improvement for Ramsey lower bounds
An exponential improvement for Ramsey lower bounds , author=. arXiv preprint arXiv:2507.12926 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[32]
Communications on Pure and Applied Mathematics , volume=
Explicit Construction of RIP Matrices Is Ramsey-Hard , author=. Communications on Pure and Applied Mathematics , volume=. 2020 , publisher=
work page 2020
-
[33]
Elsayed and Dilip Krishnan and Hossein Mobahi and Kevin Regan and Samy Bengio , title =
Gamaleldin F. Elsayed and Dilip Krishnan and Hossein Mobahi and Kevin Regan and Samy Bengio , title =. Advances in Neural Information Processing Systems 31 , editor =
-
[34]
Maria-Florina Balcan and Avrim Blum and Nathan Srebro , title =. Machine Learning , volume =. 2008 , doi =
work page 2008
-
[35]
Kilian Q. Weinberger and Lawrence K. Saul , title =. Journal of Machine Learning Research , volume =. 2009 , url =
work page 2009
-
[36]
Raia Hadsell and Sumit Chopra and Yann LeCun , title =. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) , volume =. 2006 , doi =
work page 2006
-
[37]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
Florian Schroff and Dmitry Kalenichenko and James Philbin , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =. 2015 , doi =
work page 2015
-
[38]
Proceedings of the 33rd International Conference on Machine Learning , series =
Weiyang Liu and Yandong Wen and Zhiding Yu and Meng Yang , title =. Proceedings of the 33rd International Conference on Machine Learning , series =
-
[39]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =
Jiankang Deng and Jia Guo and Niannan Xue and Stefanos Zafeiriou , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =
-
[40]
arXiv preprint arXiv:2408.00995 , year =
Kiril Bangachev and Guy Bresler , title =. arXiv preprint arXiv:2408.00995 , year =
-
[41]
J. H. Conway and N. J. A. Sloane , title =. 1999 , edition =
work page 1999
-
[42]
Emmanuel J. Cand. Decoding by Linear Programming , journal =
- [43]
-
[44]
Charles M. Fiduccia and Edward R. Scheinerman and Ann N. Trenk and Jennifer S. Zito , title =. Discrete Mathematics , volume =
-
[45]
A Linear Lower Bound on the Unbounded Error Probabilistic Communication Complexity , journal =
J. A Linear Lower Bound on the Unbounded Error Probabilistic Communication Complexity , journal =
-
[46]
Dense Passage Retrieval for Open-Domain Question Answering , booktitle =
Vladimir Karpukhin and Barlas O. Dense Passage Retrieval for Open-Domain Question Answering , booktitle =
-
[47]
Journal of Functional Analysis , volume =
Bo'az Klartag and Shahar Mendelson , title =. Journal of Functional Analysis , volume =
-
[48]
What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank
Stefan Neumann and Rainer Gemulla and Pauli Miettinen , title =. arXiv preprint arXiv:1609.05034 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[50]
Transactions on Machine Learning Research , year =
Maxime Oquab and others , title =. Transactions on Machine Learning Research , year =
-
[51]
Proceedings of the 41st International Conference on Machine Learning (ICML) , pages =
Kiho Park and Yo Joong Choe and Victor Veitch , title =. Proceedings of the 41st International Conference on Machine Learning (ICML) , pages =
-
[52]
Journal of Computer and System Sciences , volume =
Ramamohan Paturi and Janos Simon , title =. Journal of Computer and System Sciences , volume =
-
[53]
Proceedings of the 38th International Conference on Machine Learning (ICML) , pages =
Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , title =. Proceedings of the 38th International Conference on Machine Learning (ICML) , pages =
-
[54]
Nils Reimers and Iryna Gurevych , title =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP) , pages =
work page 2019
- [55]
-
[56]
Constructive approximation , volume=
A simple proof of the restricted isometry property for random matrices , author=. Constructive approximation , volume=. 2008 , publisher=
work page 2008
-
[57]
Representation Learning with Contrastive Predictive Coding
Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[58]
Roman Vershynin , title =
-
[59]
arXiv preprint arXiv:2601.20844 , year =
Zihao Wang and Hang Yin and Lihui Liu and Hanghang Tong and Yangqiu Song and Ginny Wong and Simon See , title =. arXiv preprint arXiv:2601.20844 , year =
-
[60]
Stephen J. Young and Edward R. Scheinerman , title =. Lecture Notes in Computer Science , volume =
-
[61]
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai and Basil Mustafa and Alexander Kolesnikov and Lucas Beyer , title =. arXiv preprint arXiv:2303.15343 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[62]
Proceedings of the 57th annual meeting of the association for computational linguistics , pages=
Latent retrieval for weakly supervised open domain question answering , author=. Proceedings of the 57th annual meeting of the association for computational linguistics , pages=
-
[63]
Proceedings of the 37th International Conference on Machine Learning , articleno =
Wang, Tongzhou and Isola, Phillip , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =
work page 2020
-
[64]
Journal of Approximation Theory , volume=
Optimal recovery and n-widths for convex classes of functions , author=. Journal of Approximation Theory , volume=. 1995 , publisher=
work page 1995
-
[65]
Xi Chen and Xiao Wang and Soravit Changpinyo and AJ Piergiovanni and Piotr Padlewski and Daniel Salz and Sebastian Goodman and Adam Grycner and Basil Mustafa and Lucas Beyer and Alexander Kolesnikov and Joan Puigcerver and Nan Ding and Keran Rong and Hassan Akbari and Gaurav Mishra and Linting Xue and Ashish V Thapliyal and James Bradbury and Weicheng Kuo...
work page 2023
-
[66]
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
Noroozi, Mehdi and Favaro, Paolo. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. Computer Vision -- ECCV 2016. 2016
work page 2016
-
[67]
Wyner, A. D. , title =. Bell System Technical Journal , volume =. doi:https://doi.org/10.1002/j.1538-7305.1968.tb00062.x , year =
-
[68]
Proceedings of the Glasgow Mathematical Association , author=
The Closest Packing of Spherical Caps in n Dimensions , volume=. Proceedings of the Glasgow Mathematical Association , author=. 1955 , pages=. doi:10.1017/S2040618500033219 , number=
-
[69]
Shannon, Claude E. , title =. Bell System Technical Journal , volume =. doi:https://doi.org/10.1002/j.1538-7305.1959.tb03905.x , year =
-
[70]
On the origin of number and arrangement of the places of exit on the surface of pollen-grains
Tammes, Pieter Merkus Lambertus. On the origin of number and arrangement of the places of exit on the surface of pollen-grains. 1930
work page 1930
-
[71]
Sphere packings, lattices and groups , volume =
Conway, J.H and Sloane, N.J.A and Bannai, E and Borcherds, R.E and Leech, J and Norton, S.P and Odlyzko, A.M and Parker, R.A and Queen, L and Venkov, B.B , edition =. Sphere packings, lattices and groups , volume =
-
[72]
Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas , year =
-
[73]
A Theoretical Analysis of Contrastive Unsupervised Representation Learning , booktitle =
Nikunj Saunshi and Orestis Plevrakis and Sanjeev Arora and Mikhail Khodak and Hrishikesh Khandeparkar , editor =. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , booktitle =. 2019 , url =
work page 2019
-
[74]
Anna Van Elst and Debarghya Ghoshdastidar , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2412.03486 , eprinttype =. 2412.03486 , timestamp =
-
[75]
Srinivasan, Krishna and Raman, Karthik and Chen, Jiecao and Bendersky, Michael and Najork, Marc , title =. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3463257 , abstract =
-
[76]
International Conference on Learning Representations , year=
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , author=. International Conference on Learning Representations , year=
-
[77]
Representation Learning with Contrastive Predictive Coding , author=. 2019 , eprint=
work page 2019
-
[78]
Yiwei Lu and Guojun Zhang and Sun Sun and Hongyu Guo and Yaoliang Yu , journal=. \ f\ -. 2023 , url=
work page 2023
-
[79]
Khattab, Omar and Zaharia, Matei , title =. 2020 , isbn =. doi:10.1145/3397271.3401075 , booktitle =
-
[80]
Journal of Computer and System Sciences , volume=
Probabilistic communication complexity , author=. Journal of Computer and System Sciences , volume=. 1986 , publisher=
work page 1986
-
[81]
Random Structures & Algorithms , volume=
Testing for high-dimensional geometry in random graphs , author=. Random Structures & Algorithms , volume=. 2016 , publisher=
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.