Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions

Gal Mendelson; Tomer Zilca

arxiv: 2604.23418 · v1 · submitted 2026-04-25 · 💻 cs.LG · cs.PF

Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions

Tomer Zilca , Gal Mendelson This is my paper

Pith reviewed 2026-05-08 08:11 UTC · model grok-4.3

classification 💻 cs.LG cs.PF

keywords structured Hadamard rotationsuniform random rotationsKolmogorov distanceWasserstein distancehigh-dimensional approximationJohnson-Lindenstraussrandom projectionskernel approximation

0 comments

The pith

Two-block Hadamard rotations match uniform random rotations on any single coordinate with error shrinking like d to the minus one-fifth but diverge on the full vector distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that composing two Walsh-Hadamard transforms with random signs produces a cheap rotation whose individual output coordinates become statistically close to those of a true uniform random rotation as dimension grows. The closeness holds uniformly over every possible input vector and is quantified by an explicit Kolmogorov-distance bound. At the same time the joint law of the entire output vector remains separated from the uniform rotation law by a positive Wasserstein distance that does not vanish with dimension. This separation matters because structured Hadamard rotations are already deployed in fast embeddings, kernel methods, and model compression, so the result clarifies when their marginal accuracy is sufficient and when the missing high-dimensional geometry becomes a problem.

Core claim

Every fixed coordinate of the two-block transform converges uniformly, over all inputs, to the corresponding coordinate of a uniformly rotated vector, with an explicit Kolmogorov-distance bound of order d^{-1/5}. On the negative side, an explicit lower bound on the Wasserstein distance between the full vector distributions shows that the two-block transform is not a globally accurate surrogate for a uniform random rotation in the worst case. For the extremal input used in the lower bound, a matching asymptotic upper bound establishes that the scale of the discrepancy is sharp.

What carries the argument

the two-block structured Hadamard rotation formed by two Walsh-Hadamard matrices interleaved with random sign diagonal matrices

If this is right

Single-coordinate statistics produced by the two-block method become reliable approximations once dimension is large.
The marginal approximation error improves at the explicit rate of order d to the power of negative one-fifth.
Tasks that depend on the full geometry of the rotated vector cannot treat the two-block method as an exact replacement.
For the worst-case input the distributional gap remains of constant size independent of dimension.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Methods that only need coordinate-wise properties, such as certain one-dimensional kernel or projection steps, can safely use two blocks.
Adding more blocks or alternative mixing patterns could be tested to reduce the persistent full-vector gap.
Numerical checks at moderate dimensions could verify whether the d^{-1/5} marginal rate is visible before the very-large-d regime.

Load-bearing premise

The dimension must be large enough for the concentration and asymptotic approximations used in the proofs to hold.

What would settle it

Compute the Wasserstein distance between the two-block output distribution and the uniform rotation distribution for the extremal input vector as dimension increases and check whether the distance stays bounded away from zero or drops to zero.

Figures

Figures reproduced from arXiv: 2604.23418 by Gal Mendelson, Tomer Zilca.

**Figure 1.** Figure 1: Empirical one-coordinate Kolmogorov distance versus dimension, based on 1000 random view at source ↗

**Figure 2.** Figure 2: Numerical illustration of the lower-bound phenomenon behind theorem 5.1. The quantity view at source ↗

read the original abstract

Uniform random rotations are a useful primitive in applications such as fast Johnson-Lindenstrauss embeddings, kernel approximation, communication-efficient learning, and recent AI compression pipelines, but they are computationally expensive to generate and apply in high dimensions. A common practical replacement is repeated structured random rotations built from Walsh-Hadamard transforms and random sign diagonals. Applying the structured random rotation twice has been shown empirically to be useful, but the supporting theory is still limited. In this paper we study the approximation quality achieved when using this two-block structured Hadamard rotation. Our results are both positive and negative. On the positive side, we prove that every fixed coordinate of the two-block transform converges uniformly, over all inputs, to the corresponding coordinate of a uniformly rotated vector, with an explicit Kolmogorov-distance bound of order $d^{-1/5}$. On the negative side, we prove an explicit lower bound on the Wasserstein distance between the full vector distributions, showing that the two-block transform is not a globally accurate surrogate for a uniform random rotation in the worst case. For the extremal input used in the lower bound, we also prove a matching asymptotic upper bound, showing that the lower-bound scale is sharp for that input. Taken together, the results identify a clear separation between one-dimensional marginal behavior, where approximation improves with dimension, and full high-dimensional geometry, where a nonvanishing discrepancy remains. This provides a partial theoretical explanation for the empirical success of structured Hadamard rotations in some algorithms, while also clarifying the limitations of treating them as drop-in replacements for true uniform random rotations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives explicit rates showing two-block Hadamard rotations match uniform rotations on every marginal with Kolmogorov distance O(d^{-1/5}) but retain a non-vanishing Wasserstein gap on the full vector for the worst-case input.

read the letter

Two things stand out. The paper proves an explicit uniform Kolmogorov bound of order d^{-1/5} for every fixed coordinate of the two-block transform versus a uniform rotation, and it pairs this with a Wasserstein lower bound that stays bounded away from zero, plus a matching asymptotic upper bound on the extremal input. These quantitative results for the two-block case are new relative to the cited literature on structured rotations. The work does a clean job separating the marginal behavior, which improves with dimension, from the joint geometry, which does not. This split supplies a partial explanation for why two-block Hadamard rotations have worked empirically in some embedding and kernel tasks while still falling short as full replacements. The derivations rest on the standard Walsh-Hadamard-plus-signs construction and ordinary concentration arguments, so the claims track without circularity or extra assumptions. The bounds are stated with explicit dependence on dimension, which is useful. One soft spot is the marginal rate itself. Order d^{-1/5} is slow, so the approximation only tightens at quite large d; the paper is upfront about this but does not explore faster alternatives or average-case inputs. The negative result is also tied to one extremal vector, which is the right way to prove it is not globally accurate, yet leaves the typical-case gap unquantified. This paper is for people who already use or analyze structured random rotations in high-dimensional learning pipelines and need concrete guidance on when the approximation is safe. A reader working on Johnson-Lindenstrauss embeddings, kernel approximation, or compression methods will find the rates directly applicable. It deserves a serious referee because the results are quantitative, the positive and negative sides are balanced, and the topic sits at the intersection of theory and current practice. I would send it out for review.

Referee Report

0 major / 2 minor

Summary. The paper analyzes two-block structured Hadamard rotations (Walsh-Hadamard transforms combined with random sign diagonals) as practical surrogates for uniform random rotations in high dimensions. It proves positive results on marginal approximation: every fixed coordinate of the two-block transform converges uniformly to the corresponding coordinate of a uniform rotation, with an explicit Kolmogorov-distance bound of order d^{-1/5}. It also proves negative results: an explicit lower bound on the Wasserstein distance between the full vector distributions, demonstrating that the two-block construction is not a globally accurate surrogate in the worst case, together with a matching asymptotic upper bound for the extremal input.

Significance. If the stated bounds hold, the results supply a precise theoretical account of when and why structured Hadamard rotations succeed empirically in applications such as Johnson-Lindenstrauss embeddings, kernel approximation, and communication-efficient learning. The explicit rates, the matching lower/upper pair on Wasserstein distance, and the clean separation between improving one-dimensional marginals and persistently discrepant high-dimensional geometry are concrete strengths that can guide algorithm design.

minor comments (2)

[Abstract] Abstract: the Kolmogorov bound is stated as 'order d^{-1/5}'; if the leading constant or the precise exponent (rather than the big-O) is available from the proof, stating it would improve precision without lengthening the abstract.
[Main theorems] Notation: confirm that the dimension parameter is uniformly denoted by d (or n) across all statements of the main theorems and that the input vector is normalized to the unit sphere in the Wasserstein lower-bound construction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for recommending acceptance. The referee summary accurately reflects both the positive marginal approximation results and the negative results on the full distributional distance.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper establishes explicit Kolmogorov-distance bounds of order d^{-1/5} for fixed marginals and Wasserstein-distance bounds for the full vector by direct analysis of the standard two-block Walsh-Hadamard-plus-random-signs construction. All steps rely on explicit high-dimensional concentration arguments and properties of the Hadamard matrix that are stated independently of the target approximation result. No parameter is fitted to data and then relabeled as a prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz or definition is smuggled in via prior work by the same authors. The derivation is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard properties of orthogonal matrices, Walsh-Hadamard transforms, and uniform distributions on the sphere; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption Walsh-Hadamard transforms combined with random sign diagonals form orthogonal matrices
Standard construction used throughout the structured random rotation literature.

pith-pipeline@v0.9.0 · 5595 in / 1204 out tokens · 89170 ms · 2026-05-08T08:11:50.658750+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

The fast johnson–lindenstrauss transform and approximate nearest neighbors.SIAM Journal on Computing, 39(1):302–322, 2009

Nir Ailon and Bernard Chazelle. The fast johnson–lindenstrauss transform and approximate nearest neighbors.SIAM Journal on Computing, 39(1):302–322, 2009

work page 2009
[2]

Fast dimension reduction using rademacher series on dual bch codes.Discrete & Computational Geometry, 42(4):615–630, 2009

Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual bch codes.Discrete & Computational Geometry, 42(4):615–630, 2009

work page 2009
[3]

Unifying orthogonal monte carlo methods

Krzysztof Choromanski, Mark Rowland, Wenyu Chen, and Adrian Weller. Unifying orthogonal monte carlo methods. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1203–1212, 2019

work page 2019
[4]

The unreason- able effectiveness of structured random orthogonal embeddings.arXiv preprint arXiv:1703.00864, 2018

Krzysztof Choromanski, Mark Rowland, Valerii Likhosherstov, and Adrian Weller. The unreason- able effectiveness of structured random orthogonal embeddings.arXiv preprint arXiv:1703.00864, 2018

work page arXiv 2018
[5]

The kac walk and fast mixing of structured orthogonal matrices

Krzysztof Choromanski, Mark Rowland, and Adrian Weller. The kac walk and fast mixing of structured orthogonal matrices. InConference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 1–29, 2017

work page 2017
[6]

Cambridge university press, 2019

Rick Durrett.Probability: theory and examples, volume 49. Cambridge university press, 2019

work page 2019
[7]

B. J. Fino and V. R. Algazi. Unified matrix treatment of the fast walsh–hadamard transform. IEEE Transactions on Computers, C-25(11):1142–1146, 1976

work page 1976
[8]

Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024

Jianyang Gao and Cheng Long. Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024

work page 2024
[9]

John wiley & sons, 1994

Norman L Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan.Continuous univariate distributions, volume 1, volume 1. John wiley & sons, 1994

work page 1994
[10]

Le, Tam´ as Sarl´ os, and Alexander J

Quoc V. Le, Tam´ as Sarl´ os, and Alexander J. Smola. Fastfood: Approximating kernel expansions in loglinear time. InProceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 244–252, 2013

work page 2013
[11]

Number 89

Michel Ledoux.The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2001

work page 2001
[12]

Nist digital library of mathematical functions.Annals of Mathematics and Artificial Intelligence, 38(1):105–119, 2003

Daniel W Lozier. Nist digital library of mathematical functions.Annals of Mathematics and Artificial Intelligence, 38(1):105–119, 2003

work page 2003
[13]

Bounds for the ratio of two gamma functions.Journal of Inequalities and Applications, 2010(1):493058, 2010

Feng Qi. Bounds for the ratio of two gamma functions.Journal of Inequalities and Applications, 2010(1):493058, 2010

work page 2010
[14]

Fundamentals of stein’s method

Nathan Ross. Fundamentals of stein’s method. 2011

work page 2011
[15]

Eden: Communication-efficient and robust distributed mean estimation for federated learning

Shay Vargaftik, Ran Ben-Basat, Amit Portnoy, Gal Mendelson, Yaniv Ben-Itzhak, and Michael Mitzenmacher. Eden: Communication-efficient and robust distributed mean estimation for federated learning. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 21984–22014, 2022. 25

work page 2022
[16]

Drive: One-bit distributed mean estimation

Shay Vargaftik, Ran Ben-Basat, Amit Portnoy, Gal Mendelson, and Michael Mitzenmacher. Drive: One-bit distributed mean estimation. InAdvances in Neural Information Processing Systems, volume 34, pages 362–377, 2021

work page 2021
[17]

Springer, Berlin, 2009

C´ edric Villani.Optimal Transport: Old and New. Springer, Berlin, 2009

work page 2009
[18]

Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, and Sanjiv Kumar

Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, and Sanjiv Kumar. Orthogonal random features. InAdvances in Neural Information Processing Systems, volume 29, pages 1975–1983, 2016

work page 1975
[19]

Turboquant: Online vector quantization with near-optimal distortion rate

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate. InInternational Conference on Learning Representations (ICLR) 2026, 2026. Poster; published on OpenReview, January 26, 2026. 26

work page 2026

[1] [1]

The fast johnson–lindenstrauss transform and approximate nearest neighbors.SIAM Journal on Computing, 39(1):302–322, 2009

Nir Ailon and Bernard Chazelle. The fast johnson–lindenstrauss transform and approximate nearest neighbors.SIAM Journal on Computing, 39(1):302–322, 2009

work page 2009

[2] [2]

Fast dimension reduction using rademacher series on dual bch codes.Discrete & Computational Geometry, 42(4):615–630, 2009

Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual bch codes.Discrete & Computational Geometry, 42(4):615–630, 2009

work page 2009

[3] [3]

Unifying orthogonal monte carlo methods

Krzysztof Choromanski, Mark Rowland, Wenyu Chen, and Adrian Weller. Unifying orthogonal monte carlo methods. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1203–1212, 2019

work page 2019

[4] [4]

The unreason- able effectiveness of structured random orthogonal embeddings.arXiv preprint arXiv:1703.00864, 2018

Krzysztof Choromanski, Mark Rowland, Valerii Likhosherstov, and Adrian Weller. The unreason- able effectiveness of structured random orthogonal embeddings.arXiv preprint arXiv:1703.00864, 2018

work page arXiv 2018

[5] [5]

The kac walk and fast mixing of structured orthogonal matrices

Krzysztof Choromanski, Mark Rowland, and Adrian Weller. The kac walk and fast mixing of structured orthogonal matrices. InConference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 1–29, 2017

work page 2017

[6] [6]

Cambridge university press, 2019

Rick Durrett.Probability: theory and examples, volume 49. Cambridge university press, 2019

work page 2019

[7] [7]

B. J. Fino and V. R. Algazi. Unified matrix treatment of the fast walsh–hadamard transform. IEEE Transactions on Computers, C-25(11):1142–1146, 1976

work page 1976

[8] [8]

Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024

Jianyang Gao and Cheng Long. Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024

work page 2024

[9] [9]

John wiley & sons, 1994

Norman L Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan.Continuous univariate distributions, volume 1, volume 1. John wiley & sons, 1994

work page 1994

[10] [10]

Le, Tam´ as Sarl´ os, and Alexander J

Quoc V. Le, Tam´ as Sarl´ os, and Alexander J. Smola. Fastfood: Approximating kernel expansions in loglinear time. InProceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 244–252, 2013

work page 2013

[11] [11]

Number 89

Michel Ledoux.The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2001

work page 2001

[12] [12]

Nist digital library of mathematical functions.Annals of Mathematics and Artificial Intelligence, 38(1):105–119, 2003

Daniel W Lozier. Nist digital library of mathematical functions.Annals of Mathematics and Artificial Intelligence, 38(1):105–119, 2003

work page 2003

[13] [13]

Bounds for the ratio of two gamma functions.Journal of Inequalities and Applications, 2010(1):493058, 2010

Feng Qi. Bounds for the ratio of two gamma functions.Journal of Inequalities and Applications, 2010(1):493058, 2010

work page 2010

[14] [14]

Fundamentals of stein’s method

Nathan Ross. Fundamentals of stein’s method. 2011

work page 2011

[15] [15]

Eden: Communication-efficient and robust distributed mean estimation for federated learning

Shay Vargaftik, Ran Ben-Basat, Amit Portnoy, Gal Mendelson, Yaniv Ben-Itzhak, and Michael Mitzenmacher. Eden: Communication-efficient and robust distributed mean estimation for federated learning. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 21984–22014, 2022. 25

work page 2022

[16] [16]

Drive: One-bit distributed mean estimation

Shay Vargaftik, Ran Ben-Basat, Amit Portnoy, Gal Mendelson, and Michael Mitzenmacher. Drive: One-bit distributed mean estimation. InAdvances in Neural Information Processing Systems, volume 34, pages 362–377, 2021

work page 2021

[17] [17]

Springer, Berlin, 2009

C´ edric Villani.Optimal Transport: Old and New. Springer, Berlin, 2009

work page 2009

[18] [18]

Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, and Sanjiv Kumar

Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, and Sanjiv Kumar. Orthogonal random features. InAdvances in Neural Information Processing Systems, volume 29, pages 1975–1983, 2016

work page 1975

[19] [19]

Turboquant: Online vector quantization with near-optimal distortion rate

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate. InInternational Conference on Learning Representations (ICLR) 2026, 2026. Poster; published on OpenReview, January 26, 2026. 26

work page 2026