pith. sign in

arxiv: 2604.23418 · v1 · submitted 2026-04-25 · 💻 cs.LG · cs.PF

Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions

Pith reviewed 2026-05-08 08:11 UTC · model grok-4.3

classification 💻 cs.LG cs.PF
keywords structured Hadamard rotationsuniform random rotationsKolmogorov distanceWasserstein distancehigh-dimensional approximationJohnson-Lindenstraussrandom projectionskernel approximation
0
0 comments X

The pith

Two-block Hadamard rotations match uniform random rotations on any single coordinate with error shrinking like d to the minus one-fifth but diverge on the full vector distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that composing two Walsh-Hadamard transforms with random signs produces a cheap rotation whose individual output coordinates become statistically close to those of a true uniform random rotation as dimension grows. The closeness holds uniformly over every possible input vector and is quantified by an explicit Kolmogorov-distance bound. At the same time the joint law of the entire output vector remains separated from the uniform rotation law by a positive Wasserstein distance that does not vanish with dimension. This separation matters because structured Hadamard rotations are already deployed in fast embeddings, kernel methods, and model compression, so the result clarifies when their marginal accuracy is sufficient and when the missing high-dimensional geometry becomes a problem.

Core claim

Every fixed coordinate of the two-block transform converges uniformly, over all inputs, to the corresponding coordinate of a uniformly rotated vector, with an explicit Kolmogorov-distance bound of order d^{-1/5}. On the negative side, an explicit lower bound on the Wasserstein distance between the full vector distributions shows that the two-block transform is not a globally accurate surrogate for a uniform random rotation in the worst case. For the extremal input used in the lower bound, a matching asymptotic upper bound establishes that the scale of the discrepancy is sharp.

What carries the argument

the two-block structured Hadamard rotation formed by two Walsh-Hadamard matrices interleaved with random sign diagonal matrices

If this is right

  • Single-coordinate statistics produced by the two-block method become reliable approximations once dimension is large.
  • The marginal approximation error improves at the explicit rate of order d to the power of negative one-fifth.
  • Tasks that depend on the full geometry of the rotated vector cannot treat the two-block method as an exact replacement.
  • For the worst-case input the distributional gap remains of constant size independent of dimension.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Methods that only need coordinate-wise properties, such as certain one-dimensional kernel or projection steps, can safely use two blocks.
  • Adding more blocks or alternative mixing patterns could be tested to reduce the persistent full-vector gap.
  • Numerical checks at moderate dimensions could verify whether the d^{-1/5} marginal rate is visible before the very-large-d regime.

Load-bearing premise

The dimension must be large enough for the concentration and asymptotic approximations used in the proofs to hold.

What would settle it

Compute the Wasserstein distance between the two-block output distribution and the uniform rotation distribution for the extremal input vector as dimension increases and check whether the distance stays bounded away from zero or drops to zero.

Figures

Figures reproduced from arXiv: 2604.23418 by Gal Mendelson, Tomer Zilca.

Figure 1
Figure 1. Figure 1: Empirical one-coordinate Kolmogorov distance versus dimension, based on 1000 random view at source ↗
Figure 2
Figure 2. Figure 2: Numerical illustration of the lower-bound phenomenon behind theorem 5.1. The quantity view at source ↗
read the original abstract

Uniform random rotations are a useful primitive in applications such as fast Johnson-Lindenstrauss embeddings, kernel approximation, communication-efficient learning, and recent AI compression pipelines, but they are computationally expensive to generate and apply in high dimensions. A common practical replacement is repeated structured random rotations built from Walsh-Hadamard transforms and random sign diagonals. Applying the structured random rotation twice has been shown empirically to be useful, but the supporting theory is still limited. In this paper we study the approximation quality achieved when using this two-block structured Hadamard rotation. Our results are both positive and negative. On the positive side, we prove that every fixed coordinate of the two-block transform converges uniformly, over all inputs, to the corresponding coordinate of a uniformly rotated vector, with an explicit Kolmogorov-distance bound of order $d^{-1/5}$. On the negative side, we prove an explicit lower bound on the Wasserstein distance between the full vector distributions, showing that the two-block transform is not a globally accurate surrogate for a uniform random rotation in the worst case. For the extremal input used in the lower bound, we also prove a matching asymptotic upper bound, showing that the lower-bound scale is sharp for that input. Taken together, the results identify a clear separation between one-dimensional marginal behavior, where approximation improves with dimension, and full high-dimensional geometry, where a nonvanishing discrepancy remains. This provides a partial theoretical explanation for the empirical success of structured Hadamard rotations in some algorithms, while also clarifying the limitations of treating them as drop-in replacements for true uniform random rotations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper analyzes two-block structured Hadamard rotations (Walsh-Hadamard transforms combined with random sign diagonals) as practical surrogates for uniform random rotations in high dimensions. It proves positive results on marginal approximation: every fixed coordinate of the two-block transform converges uniformly to the corresponding coordinate of a uniform rotation, with an explicit Kolmogorov-distance bound of order d^{-1/5}. It also proves negative results: an explicit lower bound on the Wasserstein distance between the full vector distributions, demonstrating that the two-block construction is not a globally accurate surrogate in the worst case, together with a matching asymptotic upper bound for the extremal input.

Significance. If the stated bounds hold, the results supply a precise theoretical account of when and why structured Hadamard rotations succeed empirically in applications such as Johnson-Lindenstrauss embeddings, kernel approximation, and communication-efficient learning. The explicit rates, the matching lower/upper pair on Wasserstein distance, and the clean separation between improving one-dimensional marginals and persistently discrepant high-dimensional geometry are concrete strengths that can guide algorithm design.

minor comments (2)
  1. [Abstract] Abstract: the Kolmogorov bound is stated as 'order d^{-1/5}'; if the leading constant or the precise exponent (rather than the big-O) is available from the proof, stating it would improve precision without lengthening the abstract.
  2. [Main theorems] Notation: confirm that the dimension parameter is uniformly denoted by d (or n) across all statements of the main theorems and that the input vector is normalized to the unit sphere in the Wasserstein lower-bound construction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for recommending acceptance. The referee summary accurately reflects both the positive marginal approximation results and the negative results on the full distributional distance.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper establishes explicit Kolmogorov-distance bounds of order d^{-1/5} for fixed marginals and Wasserstein-distance bounds for the full vector by direct analysis of the standard two-block Walsh-Hadamard-plus-random-signs construction. All steps rely on explicit high-dimensional concentration arguments and properties of the Hadamard matrix that are stated independently of the target approximation result. No parameter is fitted to data and then relabeled as a prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz or definition is smuggled in via prior work by the same authors. The derivation is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard properties of orthogonal matrices, Walsh-Hadamard transforms, and uniform distributions on the sphere; no new free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Walsh-Hadamard transforms combined with random sign diagonals form orthogonal matrices
    Standard construction used throughout the structured random rotation literature.

pith-pipeline@v0.9.0 · 5595 in / 1204 out tokens · 89170 ms · 2026-05-08T08:11:50.658750+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    The fast johnson–lindenstrauss transform and approximate nearest neighbors.SIAM Journal on Computing, 39(1):302–322, 2009

    Nir Ailon and Bernard Chazelle. The fast johnson–lindenstrauss transform and approximate nearest neighbors.SIAM Journal on Computing, 39(1):302–322, 2009

  2. [2]

    Fast dimension reduction using rademacher series on dual bch codes.Discrete & Computational Geometry, 42(4):615–630, 2009

    Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual bch codes.Discrete & Computational Geometry, 42(4):615–630, 2009

  3. [3]

    Unifying orthogonal monte carlo methods

    Krzysztof Choromanski, Mark Rowland, Wenyu Chen, and Adrian Weller. Unifying orthogonal monte carlo methods. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1203–1212, 2019

  4. [4]

    The unreason- able effectiveness of structured random orthogonal embeddings.arXiv preprint arXiv:1703.00864, 2018

    Krzysztof Choromanski, Mark Rowland, Valerii Likhosherstov, and Adrian Weller. The unreason- able effectiveness of structured random orthogonal embeddings.arXiv preprint arXiv:1703.00864, 2018

  5. [5]

    The kac walk and fast mixing of structured orthogonal matrices

    Krzysztof Choromanski, Mark Rowland, and Adrian Weller. The kac walk and fast mixing of structured orthogonal matrices. InConference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 1–29, 2017

  6. [6]

    Cambridge university press, 2019

    Rick Durrett.Probability: theory and examples, volume 49. Cambridge university press, 2019

  7. [7]

    B. J. Fino and V. R. Algazi. Unified matrix treatment of the fast walsh–hadamard transform. IEEE Transactions on Computers, C-25(11):1142–1146, 1976

  8. [8]

    Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024

    Jianyang Gao and Cheng Long. Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024

  9. [9]

    John wiley & sons, 1994

    Norman L Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan.Continuous univariate distributions, volume 1, volume 1. John wiley & sons, 1994

  10. [10]

    Le, Tam´ as Sarl´ os, and Alexander J

    Quoc V. Le, Tam´ as Sarl´ os, and Alexander J. Smola. Fastfood: Approximating kernel expansions in loglinear time. InProceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 244–252, 2013

  11. [11]

    Number 89

    Michel Ledoux.The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2001

  12. [12]

    Nist digital library of mathematical functions.Annals of Mathematics and Artificial Intelligence, 38(1):105–119, 2003

    Daniel W Lozier. Nist digital library of mathematical functions.Annals of Mathematics and Artificial Intelligence, 38(1):105–119, 2003

  13. [13]

    Bounds for the ratio of two gamma functions.Journal of Inequalities and Applications, 2010(1):493058, 2010

    Feng Qi. Bounds for the ratio of two gamma functions.Journal of Inequalities and Applications, 2010(1):493058, 2010

  14. [14]

    Fundamentals of stein’s method

    Nathan Ross. Fundamentals of stein’s method. 2011

  15. [15]

    Eden: Communication-efficient and robust distributed mean estimation for federated learning

    Shay Vargaftik, Ran Ben-Basat, Amit Portnoy, Gal Mendelson, Yaniv Ben-Itzhak, and Michael Mitzenmacher. Eden: Communication-efficient and robust distributed mean estimation for federated learning. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 21984–22014, 2022. 25

  16. [16]

    Drive: One-bit distributed mean estimation

    Shay Vargaftik, Ran Ben-Basat, Amit Portnoy, Gal Mendelson, and Michael Mitzenmacher. Drive: One-bit distributed mean estimation. InAdvances in Neural Information Processing Systems, volume 34, pages 362–377, 2021

  17. [17]

    Springer, Berlin, 2009

    C´ edric Villani.Optimal Transport: Old and New. Springer, Berlin, 2009

  18. [18]

    Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, and Sanjiv Kumar

    Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, and Sanjiv Kumar. Orthogonal random features. InAdvances in Neural Information Processing Systems, volume 29, pages 1975–1983, 2016

  19. [19]

    Turboquant: Online vector quantization with near-optimal distortion rate

    Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate. InInternational Conference on Learning Representations (ICLR) 2026, 2026. Poster; published on OpenReview, January 26, 2026. 26