Coordinate Heterogeneity Governs Binary Quantization: From InfoNCE to Recall

Wenxuan Xiao

arxiv: 2605.17524 · v1 · pith:2BJBSUTWnew · submitted 2026-05-17 · 💻 cs.LG · cs.DB

Coordinate Heterogeneity Governs Binary Quantization: From InfoNCE to Recall

Wenxuan Xiao This is my paper

Pith reviewed 2026-05-20 14:18 UTC · model grok-4.3

classification 💻 cs.LG cs.DB

keywords binary quantizationcoordinate heterogeneityInfoNCEcontrastive embeddingsnearest neighbor searchranking fidelityscaling lawembedding compression

0 comments

The pith

Coordinate heterogeneity in InfoNCE embeddings determines binary quantization performance and strategy choice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Binary quantization compresses high-dimensional embeddings into one or two bits per coordinate for extreme-speed nearest neighbor search. The paper connects the Gaussian structure of InfoNCE-trained representations to a full analytical framework that explains when and why BQ works. The central finding is that coordinate heterogeneity, meaning non-uniform variances across coordinates, controls ranking fidelity, the information carried by the magnitude bit, and whether random rotation helps or hurts. Closed-form expressions and a two-parameter scaling law predict performance across models and dimensions. Experiments on 13 datasets and 6 embedding families confirm the predictions and give a design guide for choosing between rotation and axis-preserving methods.

Core claim

We resolve this puzzle by connecting the Gaussian structure recently established for InfoNCE-trained representations to a complete analytical framework for BQ quality. The key insight is that coordinate heterogeneity -- the non-uniformity of per-coordinate variances -- governs the key aspects of BQ performance. We derive closed-form expressions for ranking fidelity, prove that the magnitude bit carries information proportional to heterogeneity, and show that random rotation destroys precisely the signal that one paradigm exploits while creating the isotropy that the other requires. A two-parameter scaling law predicts fidelity across models and dimensions.

What carries the argument

Coordinate heterogeneity, the non-uniformity of per-coordinate variances in the embedding vectors.

If this is right

Closed-form expressions for ranking fidelity allow direct prediction of BQ performance from embedding statistics.
The information value of the magnitude bit scales directly with the degree of coordinate heterogeneity.
Random rotation removes the heterogeneous signal that axis-preserving quantization exploits.
The two-parameter scaling law for fidelity holds across different models and embedding dimensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

System designers could compute coordinate heterogeneity on a sample of embeddings to decide automatically whether to apply random rotation.
The same analytical approach might extend to multi-bit or product quantization if similar variance structures appear.
Non-contrastive embeddings could be tested to determine whether coordinate heterogeneity remains the dominant factor outside the InfoNCE setting.

Load-bearing premise

InfoNCE-trained representations exhibit a Gaussian structure that can be used to derive closed-form expressions for ranking fidelity and magnitude-bit value.

What would settle it

Measure actual recall after binary quantization on a new collection of embeddings whose per-coordinate variances have been artificially equalized or made extremely varied, and check whether the observed fidelity matches the closed-form prediction.

Figures

Figures reproduced from arXiv: 2605.17524 by Wenxuan Xiao.

read the original abstract

Binary quantization (BQ) compresses high-dimensional embeddings into one or two bits per coordinate, enabling nearest neighbor search at extreme speed. Yet a striking puzzle persists: BQ achieves competitive recall on contrastive embeddings but fails on others -- and two leading systems adopt diametrically opposite strategies (random rotation vs. preserving coordinate axes) without a common theory explaining when each is appropriate. We resolve this puzzle by connecting the Gaussian structure recently established for InfoNCE-trained representations to a complete analytical framework for BQ quality. The key insight is that coordinate heterogeneity -- the non-uniformity of per-coordinate variances -- governs the key aspects of BQ performance. We derive closed-form expressions for ranking fidelity, prove that the magnitude bit carries information proportional to heterogeneity, and show that random rotation destroys precisely the signal that one paradigm exploits while creating the isotropy that the other requires. A two-parameter scaling law predicts fidelity across models and dimensions. Experiments on 13 datasets and 6 embedding families validate all predictions and provide the first principled design guide for binary quantization systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript develops an analytical framework for binary quantization (BQ) performance on high-dimensional embeddings. It connects the Gaussian structure of InfoNCE-trained representations (imported from prior work) to coordinate heterogeneity—the non-uniformity of per-coordinate variances—as the governing factor. The authors derive closed-form expressions for ranking fidelity, prove that the magnitude bit carries information proportional to heterogeneity, explain why random rotation versus axis-preserving strategies succeed or fail in different regimes, and present a two-parameter scaling law that predicts BQ fidelity across models and dimensions. These predictions are tested on 13 datasets spanning 6 embedding families.

Significance. If the derivations are shown to be robust, the work supplies a principled explanation for the observed effectiveness of BQ on contrastive embeddings and supplies concrete design guidance for extreme-speed nearest-neighbor systems. The extensive multi-dataset validation and the explicit link between heterogeneity and the magnitude bit are strengths that could influence both theory and practice in representation compression.

major comments (2)

[§3] §3 (analytical framework): The closed-form expressions for ranking fidelity and the claimed proportionality of magnitude-bit information to heterogeneity are obtained by integrating under the assumption of Gaussian per-coordinate marginals. The manuscript imports this Gaussianity from prior work but does not report independent verification (normality tests, kurtosis, or QQ-plots) on the 13 evaluation datasets. Without such checks or error bounds for non-Gaussian deviations, the exactness of the integral expressions and the proportionality result remain conditional.
[§5.2] §5.2 (scaling law): The two-parameter scaling law is presented as predictive across models and dimensions. The manuscript must clarify whether the two parameters are derived from the heterogeneity model in closed form or fitted to the experimental data used for validation. If the latter, the law functions as an empirical description rather than a parameter-free consequence of the theory, weakening the central claim that heterogeneity alone governs performance.

minor comments (1)

[Table 2] Table 2: the reported confidence intervals for recall@10 appear to be computed without accounting for multiple-comparison correction across the 13 datasets; a brief note on the statistical procedure would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your thorough review and valuable suggestions. We respond to each major comment point-by-point and outline the revisions we will make to address the concerns raised.

read point-by-point responses

Referee: [§3] §3 (analytical framework): The closed-form expressions for ranking fidelity and the claimed proportionality of magnitude-bit information to heterogeneity are obtained by integrating under the assumption of Gaussian per-coordinate marginals. The manuscript imports this Gaussianity from prior work but does not report independent verification (normality tests, kurtosis, or QQ-plots) on the 13 evaluation datasets. Without such checks or error bounds for non-Gaussian deviations, the exactness of the integral expressions and the proportionality result remain conditional.

Authors: We acknowledge the importance of verifying the Gaussian assumption on our specific datasets. Although the Gaussian structure is supported by prior literature on InfoNCE-trained embeddings, we will include in the revised version independent checks such as Shapiro-Wilk tests, kurtosis values, and QQ-plots for the coordinate marginals across all 13 datasets. Furthermore, we will derive and report approximate error bounds for the ranking fidelity expressions under mild deviations from Gaussianity to demonstrate the robustness of our results. revision: yes
Referee: [§5.2] §5.2 (scaling law): The two-parameter scaling law is presented as predictive across models and dimensions. The manuscript must clarify whether the two parameters are derived from the heterogeneity model in closed form or fitted to the experimental data used for validation. If the latter, the law functions as an empirical description rather than a parameter-free consequence of the theory, weakening the central claim that heterogeneity alone governs performance.

Authors: The parameters of the scaling law are derived in closed form from the heterogeneity model and the analytical expressions for ranking fidelity, rather than being fitted to the experimental validation data. They correspond to summary statistics of the coordinate variance distribution (e.g., mean and variance of the heterogeneity). To address the concern, we will expand the derivation in §5.2 to explicitly show how these parameters arise from the theoretical framework, confirming that the scaling law is a direct consequence of the heterogeneity-governed model. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rest on external Gaussian assumption plus new heterogeneity analysis.

full rationale

The paper imports the Gaussian structure of InfoNCE representations as a recently established result from prior work and uses it as the foundation for deriving closed-form ranking fidelity expressions, proving magnitude-bit information proportionality to heterogeneity, and obtaining a two-parameter scaling law. These steps are presented as logical consequences of the imported structure combined with the coordinate heterogeneity insight, followed by experimental validation across 13 datasets. No self-definitional reductions, fitted parameters explicitly renamed as first-principles predictions, or load-bearing self-citations appear in the abstract or described chain; the central claims remain independent of the present paper's own fitted values or internal loops.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the external Gaussian structure result for InfoNCE embeddings and introduces a two-parameter scaling law whose parameters appear to be chosen to match observed fidelity.

free parameters (1)

two parameters of the scaling law
Used to predict ranking fidelity across models and dimensions; their origin (derived or fitted) is not specified in the abstract.

axioms (1)

domain assumption InfoNCE-trained representations possess Gaussian structure
Invoked as the recently established foundation that enables the closed-form derivations for BQ quality.

pith-pipeline@v0.9.0 · 5706 in / 1361 out tokens · 45669 ms · 2026-05-20T14:18:06.889707+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

[1]

arXiv preprint arXiv:2602.24012 , year=

InfoNCE Induces Gaussian Distribution , author=. arXiv preprint arXiv:2602.24012 , year=

work page arXiv
[2]

Proceedings of the 34th Annual ACM Symposium on Theory of Computing , pages=

Similarity estimation techniques from rounding algorithms , author=. Proceedings of the 34th Annual ACM Symposium on Theory of Computing , pages=

work page
[3]

Proceedings of the 30th Annual ACM Symposium on Theory of Computing , pages=

Approximate nearest neighbors: towards removing the curse of dimensionality , author=. Proceedings of the 30th Annual ACM Symposium on Theory of Computing , pages=

work page
[4]

Journal of the ACM , volume=

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , author=. Journal of the ACM , volume=

work page
[5]

Proceedings of the ACM on Management of Data (SIGMOD) , volume=

RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search , author=. Proceedings of the ACM on Management of Data (SIGMOD) , volume=

work page
[6]

arXiv preprint , year=

QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization , author=. arXiv preprint , year=

work page
[7]

Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng , journal=

work page
[8]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Product quantization for nearest neighbor search , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page
[9]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Optimized product quantization , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page
[10]

International Conference on Machine Learning , pages=

Accelerating large-scale inference with anisotropic vector quantization , author=. International Conference on Machine Learning , pages=

work page
[11]

IEEE Transactions on Big Data , volume=

Billion-scale similarity search with GPUs , author=. IEEE Transactions on Big Data , volume=

work page
[12]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page
[13]

Advances in Neural Information Processing Systems , volume=

DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

Representation Learning with Contrastive Predictive Coding

Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

International Conference on Machine Learning , pages=

A simple framework for contrastive learning of visual representations , author=. International Conference on Machine Learning , pages=

work page
[16]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Momentum contrast for unsupervised visual representation learning , author=. IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[17]

International Conference on Machine Learning , pages=

Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=

work page
[18]

IEEE/CVF International Conference on Computer Vision , pages=

Emerging properties in self-supervised vision transformers , author=. IEEE/CVF International Conference on Computer Vision , pages=

work page
[19]

International Conference on Machine Learning , pages=

Understanding contrastive representation learning through alignment and uniformity on the hypersphere , author=. International Conference on Machine Learning , pages=

work page
[20]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Exploring simple siamese representation learning , author=. IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[21]

Advances in Neural Information Processing Systems , volume=

Provable guarantees for self-supervised deep learning with spectral contrastive loss , author=. Advances in Neural Information Processing Systems , volume=

work page
[22]

International Conference on Machine Learning , pages=

Contrastive learning inverts the data generating process , author=. International Conference on Machine Learning , pages=

work page
[23]

International Conference on Machine Learning , pages=

A theoretical analysis of contrastive unsupervised representation learning , author=. International Conference on Machine Learning , pages=

work page
[24]

Proceedings of the National Academy of Sciences , volume=

Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=

work page
[25]

International Conference on Machine Learning , pages=

Whitening for self-supervised representation learning , author=. International Conference on Machine Learning , pages=

work page
[26]

International Conference on Learning Representations , year=

VICReg: Variance-invariance-covariance regularization for self-supervised learning , author=. International Conference on Learning Representations , year=

work page
[27]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing , pages=

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2019
[28]

Nomic Embed: Training a Reproducible Long Context Text Embedder

Nomic Embed: Training a Reproducible Long Context Text Embedder , author=. arXiv preprint arXiv:2402.01613 , year=

work page internal anchor Pith review arXiv
[29]

2018 , publisher=

High-Dimensional Probability: An Introduction with Applications in Data Science , author=. 2018 , publisher=

work page 2018
[30]

Annales de l'IHP Probabilit

A dozen de Finetti-style results in search of a theory , author=. Annales de l'IHP Probabilit

work page
[31]

2013 , publisher=

Concentration Inequalities: A Nonasymptotic Theory of Independence , author=. 2013 , publisher=

work page 2013
[32]

Journal of the American Statistical Association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American Statistical Association , volume=

work page
[33]

The Annals of Statistics , pages=

Estimation of the mean of a multivariate normal distribution , author=. The Annals of Statistics , pages=

work page
[34]

Journal of Statistical Planning and Inference , volume=

A note on Stein's lemma for multivariate elliptical distributions , author=. Journal of Statistical Planning and Inference , volume=

work page
[35]

Information Systems , volume=

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , author=. Information Systems , volume=

work page
[36]

Journal of the American Statistical Association , volume=

Ordinal measures of association , author=. Journal of the American Statistical Association , volume=

work page

[1] [1]

arXiv preprint arXiv:2602.24012 , year=

InfoNCE Induces Gaussian Distribution , author=. arXiv preprint arXiv:2602.24012 , year=

work page arXiv

[2] [2]

Proceedings of the 34th Annual ACM Symposium on Theory of Computing , pages=

Similarity estimation techniques from rounding algorithms , author=. Proceedings of the 34th Annual ACM Symposium on Theory of Computing , pages=

work page

[3] [3]

Proceedings of the 30th Annual ACM Symposium on Theory of Computing , pages=

Approximate nearest neighbors: towards removing the curse of dimensionality , author=. Proceedings of the 30th Annual ACM Symposium on Theory of Computing , pages=

work page

[4] [4]

Journal of the ACM , volume=

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , author=. Journal of the ACM , volume=

work page

[5] [5]

Proceedings of the ACM on Management of Data (SIGMOD) , volume=

RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search , author=. Proceedings of the ACM on Management of Data (SIGMOD) , volume=

work page

[6] [6]

arXiv preprint , year=

QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization , author=. arXiv preprint , year=

work page

[7] [7]

Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng , journal=

work page

[8] [8]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Product quantization for nearest neighbor search , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page

[9] [9]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Optimized product quantization , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page

[10] [10]

International Conference on Machine Learning , pages=

Accelerating large-scale inference with anisotropic vector quantization , author=. International Conference on Machine Learning , pages=

work page

[11] [11]

IEEE Transactions on Big Data , volume=

Billion-scale similarity search with GPUs , author=. IEEE Transactions on Big Data , volume=

work page

[12] [12]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page

[13] [13]

Advances in Neural Information Processing Systems , volume=

DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node , author=. Advances in Neural Information Processing Systems , volume=

work page

[14] [14]

Representation Learning with Contrastive Predictive Coding

Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

International Conference on Machine Learning , pages=

A simple framework for contrastive learning of visual representations , author=. International Conference on Machine Learning , pages=

work page

[16] [16]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Momentum contrast for unsupervised visual representation learning , author=. IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[17] [17]

International Conference on Machine Learning , pages=

Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=

work page

[18] [18]

IEEE/CVF International Conference on Computer Vision , pages=

Emerging properties in self-supervised vision transformers , author=. IEEE/CVF International Conference on Computer Vision , pages=

work page

[19] [19]

International Conference on Machine Learning , pages=

Understanding contrastive representation learning through alignment and uniformity on the hypersphere , author=. International Conference on Machine Learning , pages=

work page

[20] [20]

IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Exploring simple siamese representation learning , author=. IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[21] [21]

Advances in Neural Information Processing Systems , volume=

Provable guarantees for self-supervised deep learning with spectral contrastive loss , author=. Advances in Neural Information Processing Systems , volume=

work page

[22] [22]

International Conference on Machine Learning , pages=

Contrastive learning inverts the data generating process , author=. International Conference on Machine Learning , pages=

work page

[23] [23]

International Conference on Machine Learning , pages=

A theoretical analysis of contrastive unsupervised representation learning , author=. International Conference on Machine Learning , pages=

work page

[24] [24]

Proceedings of the National Academy of Sciences , volume=

Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=

work page

[25] [25]

International Conference on Machine Learning , pages=

Whitening for self-supervised representation learning , author=. International Conference on Machine Learning , pages=

work page

[26] [26]

International Conference on Learning Representations , year=

VICReg: Variance-invariance-covariance regularization for self-supervised learning , author=. International Conference on Learning Representations , year=

work page

[27] [27]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing , pages=

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2019

[28] [28]

Nomic Embed: Training a Reproducible Long Context Text Embedder

Nomic Embed: Training a Reproducible Long Context Text Embedder , author=. arXiv preprint arXiv:2402.01613 , year=

work page internal anchor Pith review arXiv

[29] [29]

2018 , publisher=

High-Dimensional Probability: An Introduction with Applications in Data Science , author=. 2018 , publisher=

work page 2018

[30] [30]

Annales de l'IHP Probabilit

A dozen de Finetti-style results in search of a theory , author=. Annales de l'IHP Probabilit

work page

[31] [31]

2013 , publisher=

Concentration Inequalities: A Nonasymptotic Theory of Independence , author=. 2013 , publisher=

work page 2013

[32] [32]

Journal of the American Statistical Association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American Statistical Association , volume=

work page

[33] [33]

The Annals of Statistics , pages=

Estimation of the mean of a multivariate normal distribution , author=. The Annals of Statistics , pages=

work page

[34] [34]

Journal of Statistical Planning and Inference , volume=

A note on Stein's lemma for multivariate elliptical distributions , author=. Journal of Statistical Planning and Inference , volume=

work page

[35] [35]

Information Systems , volume=

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , author=. Information Systems , volume=

work page

[36] [36]

Journal of the American Statistical Association , volume=

Ordinal measures of association , author=. Journal of the American Statistical Association , volume=

work page