AQ-Stacker: An Adaptive Quantum Matrix Multiplication Algorithm with Scaling via Parallel Hadamard Stacking

Wladimir Silva

arxiv: 2604.02530 · v2 · submitted 2026-04-02 · 🪐 quant-ph

AQ-Stacker: An Adaptive Quantum Matrix Multiplication Algorithm with Scaling via Parallel Hadamard Stacking

Wladimir Silva This is my paper

Pith reviewed 2026-05-13 20:44 UTC · model grok-4.3

classification 🪐 quant-ph

keywords quantum matrix multiplicationHadamard testQRAMadaptive stackingquantum machine learninginner product complexityhybrid quantum-classical algorithm

0 comments

The pith

A hybrid quantum algorithm reduces vector inner product complexity to O(log N) by pairing QRAM state preparation with adaptive Hadamard tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AQ-Stacker, a hybrid quantum-classical scheme for matrix multiplication that adapts its Hadamard-test configuration to available qubit resources. By relying on QRAM to prepare input states, it claims the inner product between two N-dimensional vectors can be evaluated in O(log N) time. The algorithm switches between sequential horizontal stacking and parallel vertical stacking, producing a tunable complexity that ranges from near-term hardware compatibility up to O(N^2) on fault-tolerant systems. Numerical experiments on MNIST digit classification reach 96 percent accuracy, suggesting the method preserves stability while targeting super-classical scaling in high-dimensional linear algebra.

Core claim

The central claim is that an Adaptive Stacking framework built around parallel Hadamard tests, when supplied with QRAM-based state preparation, lowers the cost of computing inner products to O(log N) while allowing the same circuit family to be reconfigured for higher parallelism on larger qubit registers, thereby delivering a single algorithm family whose time complexity can be dialed from near-term to fault-tolerant regimes.

What carries the argument

The Adaptive Stacking framework, which reconfigures Hadamard-test execution from sequential horizontal stacking to massive vertical parallelism according to qubit count and dynamically incorporates QRAM state preparation to achieve the reduced inner-product complexity.

If this is right

Matrix multiplication becomes a candidate primitive for quantum advantage in machine-learning workloads once QRAM is available.
The same circuit family can be deployed on present-day hardware in low-parallelism mode and upgraded to higher parallelism without redesign.
Numerical stability is preserved at the level needed for classification tasks such as MNIST digit recognition.
The approach supplies a concrete route to O(N^2) quantum matrix multiplication on fault-tolerant architectures.
Tunable resource usage allows the algorithm to match varying hardware constraints without changing the underlying mathematical formulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the QRAM assumption holds, the same stacking technique may generalize to other linear-algebra kernels such as matrix-vector multiplication or low-rank updates.
The adaptive switching mechanism could serve as a template for resource-aware compilation of other variational quantum algorithms.
Demonstrating the claimed scaling on a small but concrete QRAM simulator would provide an immediate experimental testbed for the inner-product claim.

Load-bearing premise

Efficient, scalable Quantum Random Access Memory exists and can be used for state preparation without incurring costs that cancel the claimed logarithmic speedup.

What would settle it

An explicit construction or lower-bound proof showing that any QRAM implementation for N-dimensional vectors requires at least linear or super-logarithmic resources in N would falsify the O(log N) inner-product claim.

Figures

Figures reproduced from arXiv: 2604.02530 by Wladimir Silva.

**Figure 2.** Figure 2: Persistence of the Entropy Dividend across Sampling Scales. In both the low-shot (a) and high-shot (b) regimes, the Expected Variance (shot noise) for Normal States (red) shows no statistically significant correlation with entropy (p = 0.218). Conversely, Uniform States (green) exhibit a powerful and statistically significant negative correlation (r = −0.9365, p < 0.0001), demonstrating that high-entropy … view at source ↗

**Figure 3.** Figure 3: Distributional Efficiency and Convergence Thresholds. These results demonstrate that while the Entropy Dividend is a universal property of high-entropy states, the geometric profile of the distribution dictates the entropy threshold required to mitigate the shot-noise limit. 6 Appendix A: Formal Proof of The Entropy Dividend 6.1 Lemma 1: The Entropy-Variance Suppression Bound Objective: To prove that for … view at source ↗

read the original abstract

Matrix multiplication (MatMul) is the computational backbone of modern machine learning, yet its classical complexity remains a bottleneck for large-scale data processing. We propose a hybrid quantum-classical algorithm for matrix multiplication based on an adaptive configuration of Hadamard tests. By leveraging Quantum Random Access Memory (QRAM) for state preparation, we demonstrate that the complexity of computing the inner product of two vectors can be reduced to $O(\log N)$. We introduce an "Adaptive Stacking" framework that allows the algorithm to dynamically reconfigure its execution pattern from sequential horizontal stacking to massive vertical parallelism based on available qubit resources. This flexibility enables a tunable time-complexity range, theoretically reaching $O(N^2)$ on fault-tolerant systems while maintaining compatibility with near-term hardware. We validate the numerical stability of our approach through a Quantum Machine Learning (QML) simulation, achieving 96% accuracy on the MNIST handwritten digit dataset. Our results suggest that adaptive quantum MatMul provides a viable path toward super-classical efficiency in high-dimensional linear algebra operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The adaptive stacking idea has some engineering merit but the O(log N) claim ignores standard QRAM loading costs and lacks supporting derivations.

read the letter

The paper's headline reduction of inner-product complexity to O(log N) only holds if QRAM state preparation is treated as an O(log N) black box. Standard bucket-brigade constructions require linear resources to load the vectors first, so the overall scaling does not improve on classical methods once those costs are counted. The abstract presents this as a genuine advance without showing the full gate or depth accounting, which leaves the central claim unsupported on its own terms. The adaptive reconfiguration between horizontal and vertical Hadamard stacking modes is the part that actually feels new. It lets the algorithm tune its parallelism to the available qubit count, which is a practical consideration for near-term devices that most fixed quantum linear-algebra proposals overlook. The MNIST simulation reaching 96% accuracy gives at least one concrete check that the numerical behavior stays stable in simulation. That is useful data even if it does not prove asymptotic advantage. The main soft spot is the missing complexity breakdown. No derivation appears that separates query cost from preparation cost, and there is no comparison to prior Hadamard-test or swap-test inner-product routines. Without those pieces the O(log N) statement reads as an assumption rather than a result. The paper is aimed at quantum machine-learning researchers who want to explore resource-adaptive circuit designs rather than at complexity theorists looking for tight bounds. A reader already working on QRAM-based algorithms might find the stacking heuristic worth testing, but anyone expecting a complete complexity proof will need to ask for major revisions. I would send it to peer review so the authors can supply the missing accounting and baselines; the adaptive idea is concrete enough to be worth referee time even if the scaling claim needs work.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces AQ-Stacker, a hybrid quantum-classical algorithm for matrix multiplication that employs an adaptive configuration of Hadamard tests together with QRAM-assisted state preparation. It claims an O(log N) complexity for vector inner products, an 'Adaptive Stacking' framework that dynamically switches between sequential and parallel modes to achieve tunable time complexity (theoretically O(N^2) on fault-tolerant hardware), and reports 96% accuracy on MNIST via QML simulation.

Significance. If the complexity accounting were completed, the adaptive resource-tuning mechanism could represent a useful engineering contribution for deploying quantum linear-algebra primitives across hardware regimes, and the MNIST result would supply a concrete numerical sanity check. The absence of any explicit gate-count or qubit-count derivation, however, prevents the work from currently establishing a new complexity frontier.

major comments (2)

[Abstract] Abstract: the claim that inner-product complexity 'can be reduced to O(log N)' by leveraging QRAM treats state preparation as an O(log N) black box. Standard bucket-brigade QRAM constructions require Θ(N) qubits and Θ(N) gates to load a dense vector; without an explicit inclusion of these costs inside the adaptive Hadamard-stacking routine, the stated scaling is unsupported.
[Abstract] No section or equation is supplied that derives the overall gate complexity or depth of the parallel-stacking procedure once QRAM overhead is restored; the tunable O(N^2) upper bound therefore cannot be verified as a genuine improvement over classical MatMul.

minor comments (2)

[Abstract] The abstract contains no equations or pseudocode, making it impossible for a reader to reconstruct the claimed complexity reduction from the given text.
[Abstract] The MNIST experiment is described only as a 'QML simulation'; the manuscript should state the circuit depth, number of shots, and classical baseline accuracy for the same task.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We address each of the major comments below and have made revisions to clarify the complexity analysis as suggested.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that inner-product complexity 'can be reduced to O(log N)' by leveraging QRAM treats state preparation as an O(log N) black box. Standard bucket-brigade QRAM constructions require Θ(N) qubits and Θ(N) gates to load a dense vector; without an explicit inclusion of these costs inside the adaptive Hadamard-stacking routine, the stated scaling is unsupported.

Authors: The O(log N) scaling for the inner product is presented assuming access to QRAM for state preparation, which is a standard assumption in quantum algorithms literature. The adaptive Hadamard stacking contributes O(log N) to the complexity after state preparation. We agree that QRAM costs should be explicitly included for a complete analysis. In the revised manuscript, we will add a detailed discussion and equations incorporating the Θ(N) QRAM overhead into the overall complexity, explaining how the adaptive modes allow optimization across hardware constraints. revision: yes
Referee: [Abstract] No section or equation is supplied that derives the overall gate complexity or depth of the parallel-stacking procedure once QRAM overhead is restored; the tunable O(N^2) upper bound therefore cannot be verified as a genuine improvement over classical MatMul.

Authors: We acknowledge the absence of an explicit gate-count derivation in the current version. The O(N^2) upper bound corresponds to the sequential execution mode with per-operation QRAM costs. The parallel mode leverages additional qubits to reduce time complexity. We will revise the manuscript by adding a new section with equations that derive the full gate complexity and circuit depth for the parallel-stacking procedure, including restored QRAM costs, to substantiate the tunable complexity and its relation to classical MatMul. revision: yes

Circularity Check

0 steps flagged

No circularity: O(log N) claim follows from standard QRAM assumption, not internal redefinition

full rationale

The paper presents the O(log N) inner-product complexity as a direct consequence of leveraging existing QRAM for state preparation, which is an external standard assumption rather than a quantity derived or fitted inside the manuscript. The adaptive stacking framework and tunable complexity range are introduced as additional algorithmic contributions that build on this assumption without reducing to it by construction. No equations, self-definitional loops, fitted parameters renamed as predictions, or self-citation chains that bear the central claim are visible in the abstract or described structure. The derivation chain therefore remains self-contained against external benchmarks such as known QRAM query costs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Central claims rest on the unproven availability of QRAM and on the newly introduced Adaptive Stacking framework; no independent evidence for either is supplied in the abstract.

axioms (2)

domain assumption Efficient Quantum Random Access Memory (QRAM) exists and can be used for state preparation
Invoked to reach O(log N) inner-product complexity
domain assumption Sufficient qubit resources are available to support massive vertical parallelism when desired
Required for the claimed tunable time-complexity range

invented entities (1)

Adaptive Stacking framework no independent evidence
purpose: Dynamically switch between sequential horizontal and parallel vertical execution modes
New control structure introduced to achieve flexibility across hardware regimes

pith-pipeline@v0.9.0 · 5476 in / 1419 out tokens · 52839 ms · 2026-05-13T20:44:57.193348+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define the effective variance σ²_eff ≤ 1−e^{-H(ψ)}/S ... Entropy Dividend

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

A refined laser method for efficient matrix multiplication

Josh Alman and Virginia Vassilevska Williams. A refined laser method for efficient matrix multiplication. InProceedings of the 2021 ACM- SIAM Symposium on Discrete Algorithms (SODA), 2021. 21

work page 2021
[2]

FABLE: Fast approximate quan- tum circuits for block-encodings, 2022

Daan Camps and Roel Van Beeumen. FABLE: Fast approximate quan- tum circuits for block-encodings, 2022

work page 2022
[3]

Childs and Nathan Wiebe

Andrew M. Childs and Nathan Wiebe. Design and application of linear control units. Technical report, Technical Report, 2026

work page 2026
[4]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information The- ory. John Wiley & Sons, 2012

work page 2012
[5]

UCI machine learning repository, 2017

Dheeru Dua and Casey Graff. UCI machine learning repository, 2017

work page 2017
[6]

Quantum random access memory.Physical Review Letters, 100(16):160501, 2008

Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Quantum random access memory.Physical Review Letters, 100(16):160501, 2008

work page 2008
[7]

Golub and Charles F

Gene H. Golub and Charles F. Van Loan.Matrix Computations. Johns Hopkins University Press, 4 edition, 2013

work page 2013
[8]

Harrow, Avinatan Hassidim, and Seth Lloyd

Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations.Physical Review Letters, 103(15):150502, 2009

work page 2009
[9]

Daniel F. V. James, Paul G. Kwiat, William J. Munro, and Andrew G. White. Measurement of qubits.Physical Review A, 64(5):052312, 2001

work page 2001
[10]

Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. The MNIST database of handwritten digits. AT&T Labs [Online], 2010

work page 2010
[11]

Circuit-centric quantum classifiers.Physical Review A, 97(4):042314, 2018

Maria Schuld, Mark Fingerhuth, and Francesco Blance. Circuit-centric quantum classifiers.Physical Review A, 97(4):042314, 2018

work page 2018
[12]

Shende, Stephen S

Vivek V. Shende, Stephen S. Bullock, and Igor L. Markov. Synthesis of quantum-logic circuits.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6):1000–1010, 2006. 22

work page 2006

[1] [1]

A refined laser method for efficient matrix multiplication

Josh Alman and Virginia Vassilevska Williams. A refined laser method for efficient matrix multiplication. InProceedings of the 2021 ACM- SIAM Symposium on Discrete Algorithms (SODA), 2021. 21

work page 2021

[2] [2]

FABLE: Fast approximate quan- tum circuits for block-encodings, 2022

Daan Camps and Roel Van Beeumen. FABLE: Fast approximate quan- tum circuits for block-encodings, 2022

work page 2022

[3] [3]

Childs and Nathan Wiebe

Andrew M. Childs and Nathan Wiebe. Design and application of linear control units. Technical report, Technical Report, 2026

work page 2026

[4] [4]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information The- ory. John Wiley & Sons, 2012

work page 2012

[5] [5]

UCI machine learning repository, 2017

Dheeru Dua and Casey Graff. UCI machine learning repository, 2017

work page 2017

[6] [6]

Quantum random access memory.Physical Review Letters, 100(16):160501, 2008

Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Quantum random access memory.Physical Review Letters, 100(16):160501, 2008

work page 2008

[7] [7]

Golub and Charles F

Gene H. Golub and Charles F. Van Loan.Matrix Computations. Johns Hopkins University Press, 4 edition, 2013

work page 2013

[8] [8]

Harrow, Avinatan Hassidim, and Seth Lloyd

Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations.Physical Review Letters, 103(15):150502, 2009

work page 2009

[9] [9]

Daniel F. V. James, Paul G. Kwiat, William J. Munro, and Andrew G. White. Measurement of qubits.Physical Review A, 64(5):052312, 2001

work page 2001

[10] [10]

Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. The MNIST database of handwritten digits. AT&T Labs [Online], 2010

work page 2010

[11] [11]

Circuit-centric quantum classifiers.Physical Review A, 97(4):042314, 2018

Maria Schuld, Mark Fingerhuth, and Francesco Blance. Circuit-centric quantum classifiers.Physical Review A, 97(4):042314, 2018

work page 2018

[12] [12]

Shende, Stephen S

Vivek V. Shende, Stephen S. Bullock, and Igor L. Markov. Synthesis of quantum-logic circuits.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6):1000–1010, 2006. 22

work page 2006