pith. sign in

arxiv: 1906.09489 · v1 · pith:Q347OZX2new · submitted 2019-06-22 · 💻 cs.LG · stat.ML

Asymmetric Random Projections

Pith reviewed 2026-05-25 17:55 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords random projectionsdimensionality reductionasymmetric projectionsinner product estimationmatrix multiplicationlinear regressiondata-dependent sketching
0
0 comments X

The pith

By using statistics from a known data set one can build asymmetric random projections that better estimate inner products between two different vector sources than standard oblivious projections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to extract simple statistics from a data set available in advance and use them to design random projections tailored to pairs of vectors coming from two possibly distinct sources. This data-dependent approach targets tasks such as matrix multiplication and linear regression or classification where inner-product estimates matter. Because the projections exploit differences between the sources they achieve lower error than projections chosen without any data information. The method remains computationally light and includes proofs that it outperforms oblivious random projections. Experiments on matrix multiplication, regression, and classification confirm measurable gains in accuracy for the same projection dimension.

Core claim

The central claim is that a light preprocessing step can compute statistics from a given data set and produce asymmetric random projections whose rows are chosen to respect the empirical second-moment structure of each source separately. These projections then yield unbiased inner-product estimators whose variance is strictly smaller than that of oblivious random projections whenever the two sources differ in their second moments.

What carries the argument

Asymmetric random projection matrix whose rows are scaled and signed according to per-source second-moment estimates extracted before projection.

If this is right

  • Matrix multiplication can be approximated faster or more accurately by first projecting the two factors with source-specific projections.
  • Linear regression or classification on data from two sources can use lower-dimensional representations while retaining more accurate inner-product geometry.
  • Any algorithm whose cost scales with inner-product computations between two distinct collections benefits from the reduced variance.
  • The same preprocessing can be reused across multiple projection dimensions or multiple downstream tasks on the same pair of sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same idea of precomputing source statistics could be applied to other linear sketching methods that currently treat all vectors identically.
  • In streaming or distributed settings where one source arrives first, its statistics could be stored and later combined with a second source.
  • The approach suggests testing whether similar per-source adjustments improve other geometry-preserving maps such as Johnson-Lindenstrauss embeddings when sources are known to be heterogeneous.

Load-bearing premise

The full data set is available ahead of time so that source-specific statistics can be computed before the projection is chosen.

What would settle it

A controlled experiment on synthetic or real pairs of matrices in which the mean squared error of inner-product estimates using the data-dependent projections is not smaller than the error using standard oblivious projections of the same dimension.

Figures

Figures reproduced from arXiv: 1906.09489 by Edo Liberty, Nick Ryder, Zohar Karnin.

Figure 1
Figure 1. Figure 1: Dense Data FMM. Left ARCENE-data. Right Isolet-data. X-axis is the target dimension, Y-axis is the log MSE, dotted line represent lower and upper confidence bounds according to a single standard deviation. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sparse Data FMM. Top left is Go_SF-data. Top right is Go_SF-feature. Bottom left is TW_OC￾data. Bottom right is TW_OC-feature. X-axis is the target dimension, Y-axis is the log MSE, dotted line represent lower and upper confidence bounds according to a single standard deviation. 5.2 Regression 5.2.1 Linear Regression We used two data sets from the UCI Machine Learning Repository [Smi06, FG11]. Slice Locali… view at source ↗
Figure 3
Figure 3. Figure 3: Synthetic Data FMM. Detailed in §B.1. Top left compares diag to diag. Top right compares unif to diag. Bottom left compares unif to unifskew. Bottom right compares unif to unif. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
read the original abstract

Random projections (RP) are a popular tool for reducing dimensionality while preserving local geometry. In many applications the data set to be projected is given to us in advance, yet the current RP techniques do not make use of information about the data. In this paper, we provide a computationally light way to extract statistics from the data that allows designing a data dependent RP with superior performance compared to data-oblivious RP. We tackle scenarios such as matrix multiplication and linear regression/classification in which we wish to estimate inner products between pairs of vectors from two possibly different sources. Our technique takes advantage of the difference between the sources and is provably superior to oblivious RPs. Additionally, we provide extensive experiments comparing RPs with our approach showing significant performance lifts in fast matrix multiplication, regression and classification problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes asymmetric random projections that extract statistics from two data sources available in advance to construct data-dependent projections for inner-product estimation. It claims a computationally light method that is provably superior to oblivious random projections, with applications to fast matrix multiplication and linear regression/classification, supported by theoretical analysis and extensive experiments showing performance improvements.

Significance. If the central construction and superiority claims hold, the work provides a practical enhancement to random projections by exploiting data availability and source differences, with potential utility in paired-source estimation tasks. The explicit data-dependent construction and reported experimental gains constitute strengths.

minor comments (2)
  1. [Abstract] Abstract: the phrase 'provably superior' would benefit from a one-sentence qualifier on the precise metric (e.g., variance of the estimator) and the assumptions under which the guarantee holds.
  2. The experimental section would be clearer if it included a brief statement of the random-projection dimension relative to the original data dimension and the number of trials used for averaging.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on asymmetric random projections and for recommending minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a data-dependent asymmetric random projection constructed by extracting statistics from a pre-available data set, then using source differences to achieve provably superior inner-product estimation compared to oblivious RPs. The abstract presents this as an independent construction and empirical method without any derivation step that reduces by definition or self-citation to its own inputs. No fitted parameter is renamed as a prediction, no uniqueness theorem is imported from the authors' prior work, and the superiority claim is scoped to the setting where full data is known in advance. The central argument remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.0 · 5653 in / 1003 out tokens · 38074 ms · 2026-05-25T17:55:04.099705+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    The fast johnson--lindenstrauss transform and approximate nearest neighbors

    Nir Ailon and Bernard Chazelle. The fast johnson--lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing , 39(1):302--322, 2009

  2. [2]

    Database-friendly random projections: Johnson-lindenstrauss with binary coins

    Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of computer and System Sciences , 66(4):671--687, 2003

  3. [3]

    Reuters rcv1 rcv2 multilingual, multiview text categorization test collection data set

    Massih-Reza Amini and Cyril Goutte. Reuters rcv1 rcv2 multilingual, multiview text categorization test collection data set. UCI Machine Learning Repository , 2013

  4. [4]

    Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

    Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on , pages 459--468. IEEE, 2006

  5. [5]

    The space complexity of approximating the frequency moments

    Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and system sciences , 58(1):137--147, 1999

  6. [6]

    Dimensionality reduction for k-means clustering and low rank approximation

    Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing , pages 163--172. ACM, 2015

  7. [7]

    Isolet data set

    Ron Cole and Mark Fanty. Isolet data set. UCI Machine Learning Repository , 1994

  8. [8]

    Linear dimensionality reduction: Survey, insights, and generalizations

    John P Cunningham and Zoubin Ghahramani. Linear dimensionality reduction: Survey, insights, and generalizations. The Journal of Machine Learning Research , 16(1):2859--2900, 2015

  9. [9]

    Numerical linear algebra in the streaming model

    Kenneth L Clarkson and David P Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual ACM symposium on Theory of computing , pages 205--214. ACM, 2009

  10. [10]

    An elementary proof of a theorem of johnson and lindenstrauss

    Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms , 22(1):60--65, 2003

  11. [11]

    Adaptive subgradient methods for online learning and stochastic optimization

    John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12(Jul):2121--2159, 2011

  12. [12]

    Repeat consumption matrices data set

    Moshe Lichman Dimitrios Kotzias and Padhraic Smyth. Repeat consumption matrices data set. UCI Machine Learning Repository , 2018

  13. [13]

    Schubert S

    M. Schubert S. Poelsterl A. Cavallaro F. Graf, H.P. Kriegel. Relative location of ct slices on axial axis data set. UCI Machine Learning Repository , 2011

  14. [14]

    Experiments with random projections for machine learning

    Dmitriy Fradkin and David Madigan. Experiments with random projections for machine learning. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , pages 517--522. ACM, 2003

  15. [15]

    Arcene data set

    Isabelle Guyon. Arcene data set. UCI Machine Learning Repository , 2008

  16. [16]

    Extensions of lipschitz mappings into a hilbert space

    William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics , 26(189-206):1, 1984

  17. [17]

    Control variates as a variance reduction technique for random projections

    Keegan Kang and Giles Hooker. Control variates as a variance reduction technique for random projections. In International Conference on Pattern Recognition Applications and Methods , pages 1--20. Springer, 2017

  18. [18]

    Online pca with spectral bounds

    Zohar Karnin and Edo Liberty. Online pca with spectral bounds. In Conference on Learning Theory , pages 1129--1140, 2015

  19. [19]

    The cifar-10 dataset

    Alex Krizhevsky. The cifar-10 dataset. Alex Krizhevsky's Personal Webpage , 2009

  20. [20]

    Improving random projections using marginal information

    Ping Li, Trevor J Hastie, and Kenneth W Church. Improving random projections using marginal information. In International Conference on Computational Learning Theory , pages 635--649. Springer, 2006

  21. [21]

    Linear regression with random projections

    Odalric-Ambrym Maillard and R \'e mi Munos. Linear regression with random projections. Journal of Machine Learning Research , 13(Sep):2735--2772, 2012

  22. [22]

    Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings

    Jelani Nelson and Huy L Nguy \^e n. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on , pages 117--126. IEEE, 2013

  23. [23]

    Random features for large-scale kernel machines

    Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems , pages 1177--1184, 2008

  24. [24]

    Improved approximation algorithms for large matrices via random projections

    Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. FOCS , pages 143---152, 2006

  25. [25]

    Informed weighted random projection for dimension reduction

    Jaydeep Sen and Harish Karnick. Informed weighted random projection for dimension reduction. In International Conference on Advanced Data Mining and Applications , pages 433--442. Springer, 2013

  26. [26]

    10-k corpus

    Noah Smith. 10-k corpus. UCI Machine Learning Repository , 2006

  27. [27]

    Woodruff

    D. Woodruff. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. , 10(1–2):1---157, 2014