Bidirectional Random Projections

Chao Lan; Luyuan Yang

arxiv: 2606.10377 · v1 · pith:GDB5DYURnew · submitted 2026-06-09 · 🧮 math.ST · cs.LG· stat.TH

Bidirectional Random Projections

Chao Lan , Luyuan Yang This is my paper

Pith reviewed 2026-06-27 11:41 UTC · model grok-4.3

classification 🧮 math.ST cs.LGstat.TH

keywords random projectionsordinary least squaresexcess loss boundbidirectional projectionsfixed designdimension reductionregression

0 comments

The pith

Bidirectional random projections produce an OLS excess loss bound that improves over row-only projections when the row ratio n1/n is small.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an expected excess loss bound for the ordinary least squares estimator fitted to data after random projections are applied to both rows and columns. It compares this bound to an existing one that uses only row projection and shows that the difference behaves like O(p1 + C/p1), where the constant C depends on the ratio of projected rows to original rows and can become negative for small values of that ratio. A reader would care because random projections reduce the size of large regression problems while attempting to preserve statistical performance, so knowing when the extra column projection helps the bound matters for practical dimension reduction.

Core claim

The paper establishes an expected excess loss bound for the OLS estimator built on the bidirectionally projected pair (W X R, W Y). Relative to the established bound for the row-projected estimator on (X R, Y), the new bound differs by a gap that is approximately O(p1 + C 1/p1), where C scales with n1/n and can be negative for small n1/n.

What carries the argument

The expected excess loss bound for the OLS estimator on the bidirectionally projected data (WXR, WY), obtained under fixed design with properly distributed random projections R and W.

If this is right

The bidirectional approach can produce a strictly smaller expected excess loss bound than row-only projection when the ratio n1/n is sufficiently small.
The gap expression implies an optimal choice of p1 that balances the linear and inverse terms for a given C.
Numerical experiments on real-world data confirm that the theoretical gap behavior appears in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result suggests testing whether the same gap form appears when the projections are applied to other linear estimators such as ridge regression.
It raises the question of whether an optimal p1 can be chosen in a data-driven way without knowing C in advance.
The analysis may extend to streaming or online settings where both dimensions are reduced sequentially.

Load-bearing premise

The random projections R and W are properly distributed.

What would settle it

A numerical check on synthetic fixed-design data that computes the realized gap in excess loss for several values of p1 and n1/n and finds it deviates from the predicted O(p1 + C/p1) scaling when n1/n is small.

Figures

Figures reproduced from arXiv: 2606.10377 by Chao Lan, Luyuan Yang.

**Figure 1.** Figure 1: Results on the Digit Data Set. Proof. First observe that 𝔼𝑅,𝑊 ,𝜃̂ 𝑅𝑊 [𝐿𝑅𝑊 (𝜃̂ 𝑅𝑊 )] = 1 𝑛1 𝔼𝑅,𝑊 ,𝜃̂ 𝑅𝑊 ,𝑌 ||𝑊 𝑋𝑅𝜃̂ 𝑅𝑊 − 𝑊 𝑌 ||2 ≥ 1 𝑛1 𝔼𝜆 2 𝑤↓ ||𝑋𝑅𝜃̂ 𝑅𝑊 − 𝑌 ||2 (34) By the weighted mean value theorem for integrals, there exists a 𝑊0 ∈ ℝ𝑛1×𝑛 such that 1 𝑛1 𝔼𝑅,𝑊 ,𝜃̂ 𝑅𝑊 ,𝑌 [𝜆 2 𝑤↓ ||𝑋𝑅𝜃̂ 𝑅𝑊 − 𝑌 ||2 ] = 1 𝑛1 𝜆 2 𝑤0↓ 𝔼𝑅,𝜃̂ 𝑅𝑊 ,𝑌 ||𝑋𝑅𝜃̂ 𝑅𝑊 − 𝑌 ||2 = 𝑛 𝑛1 𝜆 2 𝑤0↓ 𝔼𝑅,𝜃̂ 𝑅𝑊 [𝐿𝑅(𝜃̂ 𝑅𝑊 )], (35) where the last equali… view at source ↗

**Figure 2.** Figure 2: Loss Gap versus 𝑛1∕𝑛 on the Digit Data Set [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Results on the Superconductivty Data Set. A. Additional Numerical Results We first tested on the Digit data set with more choices of 𝑛1∕𝑛 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Results on the MNIST Data Set. : Preprint submitted to Elsevier Page 8 of 7 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

This paper analyzes bidirectional random projections for ordinary least squares (OLS) regression under the fixed design setting. Let $(X,Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$ be a sample and $R \in \mathbb{R}^{n_1 \times n}, W \in \mathbb{R}^{p \times p_1}$ be two properly distributed random projections. We develop an expected excess loss bound for the OLS estimator built on $(WXR, WY)$. Compared to an established bound for OLS estimator built on $(XR, Y)$, the gap is approximately $O\left( p_1 + C \frac{1}{p_1} \right)$, where $C$ scales with $n_1/n$ and can be negative for small $n_1/n$. Its implications are confirmed by numerical results on real-world data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends the one-way random projection bound for OLS excess loss to the bidirectional case and derives a gap of order O(p1 + C/p1) that can turn negative, but the key assumptions stay too loose to verify the sign and rate cleanly.

read the letter

The main takeaway is that the authors derive an expected excess loss bound for OLS after projecting both rows and columns, then compare it to the established one-directional bound and obtain a gap of roughly O(p1 + C/p1) where C depends on n1/n and can be negative when the row projection is light. That negative-gap observation is the concrete new piece.

The work is a direct technical extension of existing unidirectional results. The numerical checks on real data are useful because they show the claimed ordering can appear outside synthetic settings. Credit is due for keeping the fixed-design OLS setting and for reporting the possibility that the second projection helps rather than hurts.

The soft spot is exactly the one flagged in the stress-test note. The phrase “properly distributed” for R and W is never unpacked with explicit moment or tail conditions in the abstract, so the calculations that produce the 1/p1 term and the sign of C cannot be reproduced from the given information. If the full paper supplies a precise law for the entries and shows the concentration steps, the gap claim becomes checkable; otherwise the comparison rests on an unstated hypothesis. That is a moderate rather than fatal issue for a short technical note.

The paper is aimed at people already working on sketched or projected regression. A reader who needs the precise bidirectional gap expression will find it here, but the result is too narrow to change how most people think about random projections. It is worth sending to referees because the core claim is stated cleanly and the numerics provide some external check, even if the assumption list needs tightening in revision.

Referee Report

1 major / 1 minor

Summary. The paper analyzes bidirectional random projections for OLS regression in the fixed-design setting. With data (X, Y) and random projections R ∈ R^{n1×n}, W ∈ R^{p×p1} that are 'properly distributed,' it derives an expected excess-loss bound for the OLS estimator on the doubly projected data (WXR, WY). It then compares this bound to an established bound for the singly projected estimator on (XR, Y) and claims that the gap is approximately O(p1 + C/p1), where the constant C scales with the ratio n1/n and may be negative when n1/n is small. The claim is supported by numerical experiments on real data.

Significance. If the gap analysis holds under explicit distributional assumptions on R and W, the result would indicate that adding a second projection dimension p1 can sometimes reduce excess loss relative to unidirectional projection, with the improvement controlled by the sample-size ratio n1/n. This would be a modest but concrete contribution to the literature on sketched least squares, particularly if the O(1/p1) term and the sign of C can be made rigorous.

major comments (1)

[Abstract / main bound derivation] Abstract and main derivation: the central gap claim O(p1 + C/p1) with C allowed to change sign rests entirely on the unstated moment calculations for the quadratic forms involving the projected Gram matrices. The only hypothesis supplied is that R and W are 'properly distributed'; without an explicit entry-wise law, variance scaling, or independence structure, neither the 1/p1 term nor the dependence of C on n1/n can be verified. This assumption is load-bearing for both the excess-loss bound and the comparison.

minor comments (1)

[Numerical results] The numerical experiments on real-world data are mentioned but no details are given on the choice of p1, n1, or how the excess loss is estimated; adding a small table or figure caption would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for identifying the need to make the distributional assumptions explicit. We address the major comment below.

read point-by-point responses

Referee: [Abstract / main bound derivation] Abstract and main derivation: the central gap claim O(p1 + C/p1) with C allowed to change sign rests entirely on the unstated moment calculations for the quadratic forms involving the projected Gram matrices. The only hypothesis supplied is that R and W are 'properly distributed'; without an explicit entry-wise law, variance scaling, or independence structure, neither the 1/p1 term nor the dependence of C on n1/n can be verified. This assumption is load-bearing for both the excess-loss bound and the comparison.

Authors: We agree that the current phrasing 'properly distributed' is insufficient to support the claimed moment calculations and the sign behavior of C. In the revised manuscript we will replace this with explicit assumptions: the entries of R and W are i.i.d. sub-Gaussian with mean zero and variance scaled to 1/n and 1/p respectively, together with the required independence between R and W. We will also insert the intermediate moment bounds on the quadratic forms (E[||W X R||^2] and the cross terms) that yield the O(p1) and O(C/p1) contributions, making the dependence of C on n1/n transparent. These additions will be placed immediately after the statement of the excess-loss bound. revision: yes

Circularity Check

0 steps flagged

No circularity: bound extends external established result under explicit distributional assumption

full rationale

The derivation claims an expected excess loss bound for OLS on (WXR, WY) by extending an established external bound for OLS on (XR, Y). The gap expression O(p1 + C/p1) is presented as following from the difference under the stated assumption that R and W are properly distributed. No quoted step shows a prediction reducing to a fitted parameter defined inside the paper, a self-citation chain supplying the uniqueness or ansatz, or any self-definitional equivalence. The assumption is load-bearing but external and falsifiable; the comparison target is described as established and independent. The numerical confirmation on real data is post-hoc validation, not part of the derivation. The chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; analysis rests on standard random-projection assumptions for fixed-design OLS whose details are not supplied.

axioms (1)

domain assumption Random projections R and W are properly distributed
Required for the excess-loss bound to hold, stated directly in the abstract.

pith-pipeline@v0.9.1-grok · 5673 in / 986 out tokens · 22834 ms · 2026-06-27T11:41:29.810013+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references

[1]

Artificial intelligence and statistics , pages=

New bounds on compressive linear least squares regression , author=. Artificial intelligence and statistics , pages=. 2014 , organization=

2014
[2]

Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , author=. Ann. Probab , volume=. 1993 , publisher=

1993
[3]

2002 , publisher=

A distribution-free theory of nonparametric regression , author=. 2002 , publisher=

2002
[4]

Advances in neural information processing systems , volume=

Compressed least-squares regression , author=. Advances in neural information processing systems , volume=
[5]

Artificial Intelligence and Statistics , pages=

Compressed least squares regression revisited , author=. Artificial Intelligence and Statistics , pages=. 2017 , organization=

2017
[6]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Compressed least-squares regression on sparse spaces , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[7]

, author=

Introduction to the non-asymptotic analysis of random matrices. , author=
[8]

Advances in Neural Information Processing Systems , volume=

Compressed regression , author=. Advances in Neural Information Processing Systems , volume=
[9]

IEEE Transactions on Information Theory , volume=

Distributed sketching for randomized optimization: Exact characterization, concentration, and lower bounds , author=. IEEE Transactions on Information Theory , volume=. 2023 , publisher=

2023

[1] [1]

Artificial intelligence and statistics , pages=

New bounds on compressive linear least squares regression , author=. Artificial intelligence and statistics , pages=. 2014 , organization=

2014

[2] [2]

Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , author=. Ann. Probab , volume=. 1993 , publisher=

1993

[3] [3]

2002 , publisher=

A distribution-free theory of nonparametric regression , author=. 2002 , publisher=

2002

[4] [4]

Advances in neural information processing systems , volume=

Compressed least-squares regression , author=. Advances in neural information processing systems , volume=

[5] [5]

Artificial Intelligence and Statistics , pages=

Compressed least squares regression revisited , author=. Artificial Intelligence and Statistics , pages=. 2017 , organization=

2017

[6] [6]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Compressed least-squares regression on sparse spaces , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[7] [7]

, author=

Introduction to the non-asymptotic analysis of random matrices. , author=

[8] [8]

Advances in Neural Information Processing Systems , volume=

Compressed regression , author=. Advances in Neural Information Processing Systems , volume=

[9] [9]

IEEE Transactions on Information Theory , volume=

Distributed sketching for randomized optimization: Exact characterization, concentration, and lower bounds , author=. IEEE Transactions on Information Theory , volume=. 2023 , publisher=

2023