Asymmetric Random Projections

Edo Liberty; Nick Ryder; Zohar Karnin

arxiv: 1906.09489 · v1 · pith:Q347OZX2new · submitted 2019-06-22 · 💻 cs.LG · stat.ML

Asymmetric Random Projections

Nick Ryder , Zohar Karnin , Edo Liberty This is my paper

Pith reviewed 2026-05-25 17:55 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords random projectionsdimensionality reductionasymmetric projectionsinner product estimationmatrix multiplicationlinear regressiondata-dependent sketching

0 comments

The pith

By using statistics from a known data set one can build asymmetric random projections that better estimate inner products between two different vector sources than standard oblivious projections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to extract simple statistics from a data set available in advance and use them to design random projections tailored to pairs of vectors coming from two possibly distinct sources. This data-dependent approach targets tasks such as matrix multiplication and linear regression or classification where inner-product estimates matter. Because the projections exploit differences between the sources they achieve lower error than projections chosen without any data information. The method remains computationally light and includes proofs that it outperforms oblivious random projections. Experiments on matrix multiplication, regression, and classification confirm measurable gains in accuracy for the same projection dimension.

Core claim

The central claim is that a light preprocessing step can compute statistics from a given data set and produce asymmetric random projections whose rows are chosen to respect the empirical second-moment structure of each source separately. These projections then yield unbiased inner-product estimators whose variance is strictly smaller than that of oblivious random projections whenever the two sources differ in their second moments.

What carries the argument

Asymmetric random projection matrix whose rows are scaled and signed according to per-source second-moment estimates extracted before projection.

If this is right

Matrix multiplication can be approximated faster or more accurately by first projecting the two factors with source-specific projections.
Linear regression or classification on data from two sources can use lower-dimensional representations while retaining more accurate inner-product geometry.
Any algorithm whose cost scales with inner-product computations between two distinct collections benefits from the reduced variance.
The same preprocessing can be reused across multiple projection dimensions or multiple downstream tasks on the same pair of sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same idea of precomputing source statistics could be applied to other linear sketching methods that currently treat all vectors identically.
In streaming or distributed settings where one source arrives first, its statistics could be stored and later combined with a second source.
The approach suggests testing whether similar per-source adjustments improve other geometry-preserving maps such as Johnson-Lindenstrauss embeddings when sources are known to be heterogeneous.

Load-bearing premise

The full data set is available ahead of time so that source-specific statistics can be computed before the projection is chosen.

What would settle it

A controlled experiment on synthetic or real pairs of matrices in which the mean squared error of inner-product estimates using the data-dependent projections is not smaller than the error using standard oblivious projections of the same dimension.

Figures

Figures reproduced from arXiv: 1906.09489 by Edo Liberty, Nick Ryder, Zohar Karnin.

**Figure 1.** Figure 1: Dense Data FMM. Left ARCENE-data. Right Isolet-data. X-axis is the target dimension, Y-axis is the log MSE, dotted line represent lower and upper confidence bounds according to a single standard deviation. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Sparse Data FMM. Top left is Go_SF-data. Top right is Go_SF-feature. Bottom left is TW_OCdata. Bottom right is TW_OC-feature. X-axis is the target dimension, Y-axis is the log MSE, dotted line represent lower and upper confidence bounds according to a single standard deviation. 5.2 Regression 5.2.1 Linear Regression We used two data sets from the UCI Machine Learning Repository [Smi06, FG11]. Slice Locali… view at source ↗

**Figure 3.** Figure 3: Synthetic Data FMM. Detailed in §B.1. Top left compares diag to diag. Top right compares unif to diag. Bottom left compares unif to unifskew. Bottom right compares unif to unif. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

read the original abstract

Random projections (RP) are a popular tool for reducing dimensionality while preserving local geometry. In many applications the data set to be projected is given to us in advance, yet the current RP techniques do not make use of information about the data. In this paper, we provide a computationally light way to extract statistics from the data that allows designing a data dependent RP with superior performance compared to data-oblivious RP. We tackle scenarios such as matrix multiplication and linear regression/classification in which we wish to estimate inner products between pairs of vectors from two possibly different sources. Our technique takes advantage of the difference between the sources and is provably superior to oblivious RPs. Additionally, we provide extensive experiments comparing RPs with our approach showing significant performance lifts in fast matrix multiplication, regression and classification problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a clean way to make random projections asymmetric by pulling light statistics from two separate data sources, yielding provable gains on inner-product tasks when the data is available upfront.

read the letter

The core contribution is the asymmetric construction: instead of a single oblivious projection matrix, they compute cheap statistics from each of the two sources and build a projection that exploits the difference between them. This is scoped exactly to settings like cross-matrix multiplication or regression where you have both datasets in advance, and the abstract claims a strict improvement over standard random projections for inner-product estimation. That formulation looks new relative to the oblivious RP literature. The experiments are reported to show clear lifts in matrix multiplication, regression, and classification without changing the asymptotic cost, which is the practical payoff. The approach stays computationally light, which matters for the workloads they target. The main limitation is the upfront data requirement; if either source arrives online or you cannot afford the precomputation, the method does not apply. Gains will also shrink if the two sources have similar distributions. Within the stated scope there is no circularity or load-bearing gap visible from the description. The work is a targeted, incremental improvement rather than a broad overhaul. It is worth a serious referee because the construction is explicit, the claims are scoped, and the experiments address real use cases. People working on fast linear algebra or two-source dimensionality reduction would get value from it.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes asymmetric random projections that extract statistics from two data sources available in advance to construct data-dependent projections for inner-product estimation. It claims a computationally light method that is provably superior to oblivious random projections, with applications to fast matrix multiplication and linear regression/classification, supported by theoretical analysis and extensive experiments showing performance improvements.

Significance. If the central construction and superiority claims hold, the work provides a practical enhancement to random projections by exploiting data availability and source differences, with potential utility in paired-source estimation tasks. The explicit data-dependent construction and reported experimental gains constitute strengths.

minor comments (2)

[Abstract] Abstract: the phrase 'provably superior' would benefit from a one-sentence qualifier on the precise metric (e.g., variance of the estimator) and the assumptions under which the guarantee holds.
The experimental section would be clearer if it included a brief statement of the random-projection dimension relative to the original data dimension and the number of trials used for averaging.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on asymmetric random projections and for recommending minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a data-dependent asymmetric random projection constructed by extracting statistics from a pre-available data set, then using source differences to achieve provably superior inner-product estimation compared to oblivious RPs. The abstract presents this as an independent construction and empirical method without any derivation step that reduces by definition or self-citation to its own inputs. No fitted parameter is renamed as a prediction, no uniqueness theorem is imported from the authors' prior work, and the superiority claim is scoped to the setting where full data is known in advance. The central argument remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.0 · 5653 in / 1003 out tokens · 38074 ms · 2026-05-25T17:55:04.099705+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

The fast johnson--lindenstrauss transform and approximate nearest neighbors

Nir Ailon and Bernard Chazelle. The fast johnson--lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing , 39(1):302--322, 2009

work page 2009
[2]

Database-friendly random projections: Johnson-lindenstrauss with binary coins

Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of computer and System Sciences , 66(4):671--687, 2003

work page 2003
[3]

Reuters rcv1 rcv2 multilingual, multiview text categorization test collection data set

Massih-Reza Amini and Cyril Goutte. Reuters rcv1 rcv2 multilingual, multiview text categorization test collection data set. UCI Machine Learning Repository , 2013

work page 2013
[4]

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on , pages 459--468. IEEE, 2006

work page 2006
[5]

The space complexity of approximating the frequency moments

Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and system sciences , 58(1):137--147, 1999

work page 1999
[6]

Dimensionality reduction for k-means clustering and low rank approximation

Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing , pages 163--172. ACM, 2015

work page 2015
[7]

Isolet data set

Ron Cole and Mark Fanty. Isolet data set. UCI Machine Learning Repository , 1994

work page 1994
[8]

Linear dimensionality reduction: Survey, insights, and generalizations

John P Cunningham and Zoubin Ghahramani. Linear dimensionality reduction: Survey, insights, and generalizations. The Journal of Machine Learning Research , 16(1):2859--2900, 2015

work page 2015
[9]

Numerical linear algebra in the streaming model

Kenneth L Clarkson and David P Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual ACM symposium on Theory of computing , pages 205--214. ACM, 2009

work page 2009
[10]

An elementary proof of a theorem of johnson and lindenstrauss

Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms , 22(1):60--65, 2003

work page 2003
[11]

Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12(Jul):2121--2159, 2011

work page 2011
[12]

Repeat consumption matrices data set

Moshe Lichman Dimitrios Kotzias and Padhraic Smyth. Repeat consumption matrices data set. UCI Machine Learning Repository , 2018

work page 2018
[13]

Schubert S

M. Schubert S. Poelsterl A. Cavallaro F. Graf, H.P. Kriegel. Relative location of ct slices on axial axis data set. UCI Machine Learning Repository , 2011

work page 2011
[14]

Experiments with random projections for machine learning

Dmitriy Fradkin and David Madigan. Experiments with random projections for machine learning. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , pages 517--522. ACM, 2003

work page 2003
[15]

Arcene data set

Isabelle Guyon. Arcene data set. UCI Machine Learning Repository , 2008

work page 2008
[16]

Extensions of lipschitz mappings into a hilbert space

William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics , 26(189-206):1, 1984

work page 1984
[17]

Control variates as a variance reduction technique for random projections

Keegan Kang and Giles Hooker. Control variates as a variance reduction technique for random projections. In International Conference on Pattern Recognition Applications and Methods , pages 1--20. Springer, 2017

work page 2017
[18]

Online pca with spectral bounds

Zohar Karnin and Edo Liberty. Online pca with spectral bounds. In Conference on Learning Theory , pages 1129--1140, 2015

work page 2015
[19]

The cifar-10 dataset

Alex Krizhevsky. The cifar-10 dataset. Alex Krizhevsky's Personal Webpage , 2009

work page 2009
[20]

Improving random projections using marginal information

Ping Li, Trevor J Hastie, and Kenneth W Church. Improving random projections using marginal information. In International Conference on Computational Learning Theory , pages 635--649. Springer, 2006

work page 2006
[21]

Linear regression with random projections

Odalric-Ambrym Maillard and R \'e mi Munos. Linear regression with random projections. Journal of Machine Learning Research , 13(Sep):2735--2772, 2012

work page 2012
[22]

Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings

Jelani Nelson and Huy L Nguy \^e n. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on , pages 117--126. IEEE, 2013

work page 2013
[23]

Random features for large-scale kernel machines

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems , pages 1177--1184, 2008

work page 2008
[24]

Improved approximation algorithms for large matrices via random projections

Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. FOCS , pages 143---152, 2006

work page 2006
[25]

Informed weighted random projection for dimension reduction

Jaydeep Sen and Harish Karnick. Informed weighted random projection for dimension reduction. In International Conference on Advanced Data Mining and Applications , pages 433--442. Springer, 2013

work page 2013
[26]

10-k corpus

Noah Smith. 10-k corpus. UCI Machine Learning Repository , 2006

work page 2006
[27]

Woodruff

D. Woodruff. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. , 10(1–2):1---157, 2014

work page 2014

[1] [1]

The fast johnson--lindenstrauss transform and approximate nearest neighbors

Nir Ailon and Bernard Chazelle. The fast johnson--lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing , 39(1):302--322, 2009

work page 2009

[2] [2]

Database-friendly random projections: Johnson-lindenstrauss with binary coins

Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of computer and System Sciences , 66(4):671--687, 2003

work page 2003

[3] [3]

Reuters rcv1 rcv2 multilingual, multiview text categorization test collection data set

Massih-Reza Amini and Cyril Goutte. Reuters rcv1 rcv2 multilingual, multiview text categorization test collection data set. UCI Machine Learning Repository , 2013

work page 2013

[4] [4]

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on , pages 459--468. IEEE, 2006

work page 2006

[5] [5]

The space complexity of approximating the frequency moments

Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and system sciences , 58(1):137--147, 1999

work page 1999

[6] [6]

Dimensionality reduction for k-means clustering and low rank approximation

Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing , pages 163--172. ACM, 2015

work page 2015

[7] [7]

Isolet data set

Ron Cole and Mark Fanty. Isolet data set. UCI Machine Learning Repository , 1994

work page 1994

[8] [8]

Linear dimensionality reduction: Survey, insights, and generalizations

John P Cunningham and Zoubin Ghahramani. Linear dimensionality reduction: Survey, insights, and generalizations. The Journal of Machine Learning Research , 16(1):2859--2900, 2015

work page 2015

[9] [9]

Numerical linear algebra in the streaming model

Kenneth L Clarkson and David P Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual ACM symposium on Theory of computing , pages 205--214. ACM, 2009

work page 2009

[10] [10]

An elementary proof of a theorem of johnson and lindenstrauss

Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms , 22(1):60--65, 2003

work page 2003

[11] [11]

Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12(Jul):2121--2159, 2011

work page 2011

[12] [12]

Repeat consumption matrices data set

Moshe Lichman Dimitrios Kotzias and Padhraic Smyth. Repeat consumption matrices data set. UCI Machine Learning Repository , 2018

work page 2018

[13] [13]

Schubert S

M. Schubert S. Poelsterl A. Cavallaro F. Graf, H.P. Kriegel. Relative location of ct slices on axial axis data set. UCI Machine Learning Repository , 2011

work page 2011

[14] [14]

Experiments with random projections for machine learning

Dmitriy Fradkin and David Madigan. Experiments with random projections for machine learning. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , pages 517--522. ACM, 2003

work page 2003

[15] [15]

Arcene data set

Isabelle Guyon. Arcene data set. UCI Machine Learning Repository , 2008

work page 2008

[16] [16]

Extensions of lipschitz mappings into a hilbert space

William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics , 26(189-206):1, 1984

work page 1984

[17] [17]

Control variates as a variance reduction technique for random projections

Keegan Kang and Giles Hooker. Control variates as a variance reduction technique for random projections. In International Conference on Pattern Recognition Applications and Methods , pages 1--20. Springer, 2017

work page 2017

[18] [18]

Online pca with spectral bounds

Zohar Karnin and Edo Liberty. Online pca with spectral bounds. In Conference on Learning Theory , pages 1129--1140, 2015

work page 2015

[19] [19]

The cifar-10 dataset

Alex Krizhevsky. The cifar-10 dataset. Alex Krizhevsky's Personal Webpage , 2009

work page 2009

[20] [20]

Improving random projections using marginal information

Ping Li, Trevor J Hastie, and Kenneth W Church. Improving random projections using marginal information. In International Conference on Computational Learning Theory , pages 635--649. Springer, 2006

work page 2006

[21] [21]

Linear regression with random projections

Odalric-Ambrym Maillard and R \'e mi Munos. Linear regression with random projections. Journal of Machine Learning Research , 13(Sep):2735--2772, 2012

work page 2012

[22] [22]

Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings

Jelani Nelson and Huy L Nguy \^e n. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on , pages 117--126. IEEE, 2013

work page 2013

[23] [23]

Random features for large-scale kernel machines

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems , pages 1177--1184, 2008

work page 2008

[24] [24]

Improved approximation algorithms for large matrices via random projections

Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. FOCS , pages 143---152, 2006

work page 2006

[25] [25]

Informed weighted random projection for dimension reduction

Jaydeep Sen and Harish Karnick. Informed weighted random projection for dimension reduction. In International Conference on Advanced Data Mining and Applications , pages 433--442. Springer, 2013

work page 2013

[26] [26]

10-k corpus

Noah Smith. 10-k corpus. UCI Machine Learning Repository , 2006

work page 2006

[27] [27]

Woodruff

D. Woodruff. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. , 10(1–2):1---157, 2014

work page 2014