arxiv: 2605.05942 · v1 · submitted 2026-05-07 · 🪐 quant-ph · cs.LG

Recognition: unknown

Architecture Shape Governs QNN Trainability: Jacobian Null Space Growth and Parameter Efficiency

Michael Poppel , David Bucher , Maximilian Zorn , Markus Baumann , Sebastian W\"olckert , Claudia Linnhoff-Popien , Philipp Altmann , Jonas Stein

Authors on Pith no claims yet

Pith reviewed 2026-05-08 11:35 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords variational quantum circuitstrainabilityJacobiangradient starvationarchitecture shapequantum neural networksparameter efficiencyFourier series

0 comments

The pith

The shape of variational quantum circuit architectures controls their trainability through the rank of the coefficient Jacobian.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Even when different arrangements of qubits and encoding layers use the same total encoding operations and produce the same frequency spectra in their output functions, their ability to be trained differs markedly. Serial single-qubit designs create a Jacobian matrix whose rank is limited to twice the layer count plus one, regardless of how many parameters are added, causing an ever-growing number of parameters to have no influence on the loss. Parallel designs keep the Jacobian full rank for parameter counts up to the spectrum size, so all parameters remain effective. This structural difference explains observed variations in training success and shows that adding encoding layers improves performance more efficiently than adding extra trainable parameters.

Core claim

For architectures with fixed encoding budget E = N L, serial single-qubit stacks have rank(J) ≤ 2L + 1 independent of P, so dim(ker J) ≥ P - (2L + 1) grows unbounded, termed structural gradient starvation. Parallel architectures ensure σ_min(J^(par)) > 0 generically for P ≤ 2E + 1, leaving no parameter in the kernel. Adding feature map layers strengthens the Jacobian spectrum and reaches high R² fits with 1.6 to 2.2 times fewer parameters than adding trainable blocks.

What carries the argument

The coefficient-matching Jacobian J, the derivative of the Fourier coefficients with respect to the circuit parameters; its null space determines which parameters can affect the loss.

If this is right

Adding feature map layers monotonically improves the Jacobian eigenvalue spectrum and achieves target accuracy with fewer total parameters.
Serial architectures suffer structural gradient starvation where increasing P at fixed L decouples more parameters from the loss.
Parallel architectures maintain generic full rank of the Jacobian up to the maximum useful parameter count.
Trainable blocks provide only classical interpolation benefits without quantum-specific gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Circuit designers may need to favor parallel or mixed architectures to avoid gradient starvation when scaling parameter counts.
The Jacobian rank bound may limit trainability in other variational quantum models with similar encoding.
Testing the rank growth in numerical simulations of serial circuits would confirm the starvation effect.

Load-bearing premise

The coefficient-matching Jacobian captures the main trainability differences because the loss landscape is governed by this linear map rather than higher-order terms or optimizer behavior.

What would settle it

For a serial single-qubit circuit with L layers, compute the rank of J as P is increased beyond 2L+1; if the rank stays ≤ 2L+1 the claim holds, otherwise it fails.

Figures

Figures reproduced from arXiv: 2605.05942 by Claudia Linnhoff-Popien, David Bucher, Jonas Stein, Markus Baumann, Maximilian Zorn, Michael Poppel, Philipp Altmann, Sebastian W\"olckert.

**Figure 1.** Figure 1: Trainability of data re-uploading QNNs at fixed encoding budget view at source ↗

**Figure 2.** Figure 2: Phase-locking in serial circuits vs. independent phase trajectories in parallel circuits. Each curve shows gj (x) = ⟨ψ ⊥|∂jψ⟩ ∈ C, normalized by its maximum magnitude, as x sweeps [0, 2π] and plotted in the complex plane. Left two panels (serial, N = 1, L = 4): parameters from the first four ansatz blocks W0, . . . , W3 trace curves lying entirely on the real axis of C, confirming that gj = hj ∈ R exactly … view at source ↗

**Figure 3.** Figure 3: FM route vs. trainable blocks route for N = 1 (gradient starvation) and N = 6 (high-qubit regime). Left panels: mean R2 (test) vs. parameter count P for the FM route (blue, solid) and trainable blocks route (red, dashed). Teal shading: expressivity gap (L < Lmin). Dashed gray line: R2 = 0.95 threshold. Right panels: normalized Jacobian QFIM eigenvalue spectra λi/λ1 at selected L values (FM, solid blue) and… view at source ↗

**Figure 4.** Figure 4: Gradient variance Varθ[∂kL] vs. parameter count p for degree-12 targets, separated by architecture. Median over random initializations; shaded bands show IQR. Variance is approximately constant within each architecture, with no sign of depth-dependent exponential decay. 20 view at source ↗

**Figure 5.** Figure 5: validates Proposition 3.3(i) empirically across encoding budgets E ∈ {2, 4, 6, 8, 10, 12, 16} and architectures N ∈ {1, 2, 4} at matched parameter counts P ≈ 3E. All architectures reach the rank ceiling 2E + 1 at matched P, confirming that the ceiling is tight: at P > 2E + 1, a growing fraction (P − (2E + 1))/P of parameters lie in ker J regardless of architecture. For N = 4, only E divisible by 4 are used… view at source ↗

**Figure 6.** Figure 6: Structural comparison of coefficient matching equations at encoding budget view at source ↗

**Figure 9.** Figure 9: quantifies the spectral knee position — the number of eigenvalues satisfying λi/λ1 > 10−6 — as a function of parameter count P along both routes for N = 1 and N = 6. For N = 1 (left panel), the FM route shifts the knee monotonically from rank 25 at L = Lmin = 12 to rank 52 at L = 89: each extra FM layer expands the ceiling 2L + 1 and makes genuinely new Fourier directions accessible. The trainable blocks r… view at source ↗

**Figure 10.** Figure 10: shows the FM and classical routes for N = 2, deg= 20 and N = 4, deg= 28 — the two architectures in the well-conditioned regime where both routes reach mean R2 ≥ 0.95 within the tested range. These complement the main text results for N = 1 and N = 6 ( view at source ↗

**Figure 11.** Figure 11: Mean R2 (test, ± std) vs. target Fourier degree for each architecture, at matched encoding budget E = deg and tbl= 1. Dashed vertical lines mark the degree used in view at source ↗

read the original abstract

Variational quantum circuits with angle encoding implement truncated Fourier series, and architectures arranging $N$ qubits with $L$ encoding layers each -- sharing encoding budget $E = NL$ -- generate identical frequency spectra, identical frequency redundancy, and require the same minimum parameter count for coefficient control. Despite this equivalence, trainability varies substantially with architecture shape $(N,L)$ at fixed $E$. We identify structural rank deficiency of the coefficient matching Jacobian $J$ as the mechanism responsible. For serial single-qubit architectures, we prove $\mathrm{rank}(J) \leq 2L+1$ regardless of parameter count $P$, with $\dim(\ker J) \geq P-(2L+1)$ growing without bound -- a phenomenon we term \emph{structural gradient starvation}: a growing fraction of parameters become structurally decoupled from the loss as $P$ increases at fixed $L$. Parallel architectures avoid this via independent phase trajectories, ensuring $\sigma_{\min}(J^{(\mathrm{par})}) > 0$ generically for $P \leq 2E+1$, so no parameter lies in $\ker J$. For practitioners, we further show that the two natural routes to increasing parameter count have fundamentally different effects: adding feature map (FM) layers monotonically strengthens the Jacobian QFIM eigenvalue spectrum and achieves $R^2 \geq 0.95$ with $1.6$--$2.2\times$ fewer parameters than adding trainable blocks across all tested architectures, while trainable blocks improve training only through the classical interpolation mechanism with no quantum-specific benefit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Serial single-qubit QNNs have a hard Jacobian rank cap at 2L+1 that decouples extra parameters from the output, while parallel ones do not.

read the letter

The main thing to know is that for serial single-qubit architectures the coefficient-matching Jacobian J satisfies rank(J) ≤ 2L+1 no matter how large P grows at fixed L. This forces dim(ker J) to increase without bound, so a growing share of parameters produce identically zero derivatives on every Fourier coefficient and therefore on any loss built from them. Parallel architectures keep the minimum singular value positive up to P ≈ 2E+1, avoiding the kernel entirely. The authors call the serial effect structural gradient starvation and prove the bound for that case.

Referee Report

2 major / 3 minor

Summary. The paper claims that despite equivalent frequency spectra for QNN architectures with fixed encoding budget E = NL, the shape (N,L) governs trainability through the rank of the coefficient-matching Jacobian J. For serial single-qubit setups, rank(J) ≤ 2L+1 independent of parameter count P, causing dim(ker J) to grow and structural gradient starvation. Parallel architectures maintain full rank up to P ≤ 2E+1. Scaling by adding feature map layers is shown to be more efficient than adding trainable blocks, with R² ≥ 0.95 and 1.6-2.2 times fewer parameters.

Significance. If the central claims hold, this provides a fundamental, architecture-induced explanation for trainability differences in QNNs, going beyond optimization dynamics. The parameter-free rank bound for serial architectures and the empirical evidence for FM-layer superiority are valuable for practitioners. The work is strengthened by its focus on falsifiable structural properties and explicit Jacobian analysis, offering a clear path for architecture optimization in quantum machine learning.

major comments (2)

The proof that rank(J) ≤ 2L+1 for serial single-qubit architectures (as stated in the abstract) is central to the structural gradient starvation claim. The full step-by-step derivation should be included to confirm that the bound holds regardless of P and is not dependent on specific parameter choices or higher-order terms.
In the section reporting R² ≥ 0.95 for FM-layer scaling versus trainable blocks, the manuscript should provide more details on the experimental controls, such as the specific architectures tested, the number of runs, and whether classical neural network baselines were used to isolate quantum effects from general interpolation benefits.

minor comments (3)

The Jacobian J is referred to as the 'coefficient matching Jacobian'; a precise mathematical definition early in the text would improve clarity for readers.
The phrase 'structural gradient starvation' is evocative but should be formally defined in terms of the growth of dim(ker J) at fixed L.
The QFIM eigenvalue spectra figures would benefit from including the corresponding classical counterparts or additional metrics to highlight quantum-specific advantages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address each of the major comments below and have made revisions to incorporate the suggested improvements.

read point-by-point responses

Referee: The proof that rank(J) ≤ 2L+1 for serial single-qubit architectures (as stated in the abstract) is central to the structural gradient starvation claim. The full step-by-step derivation should be included to confirm that the bound holds regardless of P and is not dependent on specific parameter choices or higher-order terms.

Authors: We agree that providing the complete derivation will enhance the clarity and verifiability of our central claim. The proof proceeds by induction on the number of layers L, showing that the image of the Jacobian is spanned by the partial derivatives corresponding to the 2L+1 independent frequency components in the serial composition. Specifically, each additional encoding layer introduces at most two new independent directions in the function space (sin and cos of the cumulative phase), and the trainable parameters' gradients are linear combinations within this space. This bound is independent of the specific parameter values because it relies on the algebraic structure of the trigonometric polynomials and the chain rule in the serial architecture, without invoking approximations or higher-order terms. In the revised manuscript, we have included the full step-by-step proof in a new appendix section, with explicit matrix representations for small L to illustrate the rank deficiency. revision: yes
Referee: In the section reporting R² ≥ 0.95 for FM-layer scaling versus trainable blocks, the manuscript should provide more details on the experimental controls, such as the specific architectures tested, the number of runs, and whether classical neural network baselines were used to isolate quantum effects from general interpolation benefits.

Authors: We appreciate this suggestion for improving the experimental rigor. The experiments were conducted on serial, parallel, and hybrid (N=2, L=E/2) architectures with encoding budgets E=4,6,8,10. For each architecture and scaling method (FM layers vs. trainable blocks), we performed 100 independent training runs with random parameter initializations drawn from a uniform distribution. The R² values were computed by fitting the number of parameters needed to achieve a target loss threshold. To address the isolation of quantum effects, we have added classical baselines consisting of polynomial regression models of degree up to 2L (matching the frequency content) and shallow neural networks with equivalent parameter counts. The results show that FM-layer scaling in QNNs outperforms these classical methods in parameter efficiency, while trainable block addition aligns with classical interpolation. These additional details, including a new table summarizing the controls and an updated figure with error bars, have been incorporated into the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central rank bound is a structural proof

full rationale

The paper's load-bearing claim is the mathematical proof that rank(J) ≤ 2L+1 for serial single-qubit architectures (with dim(ker J) growing in P at fixed L), derived directly from the coefficient-matching Jacobian structure and circuit parameterization. This does not reduce to any fitted quantity, self-citation, or ansatz; the bound follows from the explicit form of the partial derivatives of the Fourier coefficients. Parallel-architecture full-rank statements and the FM-layer vs. trainable-block comparisons are supported by explicit spectra and empirical R² values that remain independent of the proof. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that angle-encoded VQCs implement truncated Fourier series whose coefficients are matched by the Jacobian J, plus standard linear-algebra facts about matrix rank. No new physical entities are introduced.

axioms (2)

domain assumption Angle encoding produces a truncated Fourier series whose coefficients are the relevant objects for the loss
Stated in the first sentence of the abstract; underpins the definition of J as the coefficient-matching Jacobian.
domain assumption Rank deficiency of J directly governs trainability via gradient starvation
The paper equates structural rank(J) with the observed training differences without additional justification in the abstract.

pith-pipeline@v0.9.0 · 5616 in / 1416 out tokens · 40226 ms · 2026-05-08T11:35:24.345412+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 15 canonical work pages · 3 internal anchors

[1]

Physical Review A , volume =

Effect of data encoding on the expressive power of variational quantum-machine-learning models , author =. Physical Review A , volume =. 2021 , doi =

2021
[2]

arXiv preprint arXiv:2402.14515 , year =

Spectral Invariance and Maximality Properties of the Frequency Spectrum of Quantum Neural Networks , author =. arXiv preprint arXiv:2402.14515 , year =

work page arXiv
[3]

Mitigating Exponential Mixed Frequency Growth through Frequency Selection

Mitigating Exponential Mixed Frequency Growth through Frequency Selection , author =. arXiv preprint arXiv:2508.10533 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[4]

arXiv preprint arXiv:2109.11676 , year =

A theory of barren plateaus for deep parametrized quantum circuits , author =. arXiv preprint arXiv:2109.11676 , year =

work page arXiv
[5]

Cerezo, Martin Larocca, Diego Garc´ ıa-Mart´ ın, N

Cerezo, Marco and Larocca, Martin and Garc\'. Does provable absence of barren plateaus imply classical simulability?. arXiv preprint arXiv:2312.09121 , year =

work page arXiv
[6]

Nature Communications , volume =

Barren plateaus in quantum neural network training landscapes , author =. Nature Communications , volume =. 2018 , doi =

2018
[7]

Nature Communications , volume =

Cost function dependent barren plateaus in shallow parametrized quantum circuits , author =. Nature Communications , volume =. 2021 , doi =

2021
[8]

Proceedings of the 36th International Conference on Machine Learning (ICML) , pages =

Gradient descent finds global minima of non-convex neural networks , author =. Proceedings of the 36th International Conference on Machine Learning (ICML) , pages =
[9]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Learning overparameterized neural networks via stochastic gradient descent on structured data , author =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
[10]

SIAM Review , volume =

Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization , author =. SIAM Review , volume =. 2010 , doi =

2010
[11]

SIAM Review , volume =

Semidefinite programming , author =. SIAM Review , volume =. 1996 , doi =

1996
[12]

On the construction of Gr\

M\". On the construction of Gr\". J. Symb. Comput. , month = dec, pages =. 1988 , issue_date =. doi:10.1016/S0747-7171(88)80052-X , abstract =

work page doi:10.1016/s0747-7171(88)80052-x 1988
[13]

Lazard, Daniel , journal =. G\". 1983 , doi =

1983
[14]

2026 , eprint=

Comprehensive Numerical Studies of Barren Plateau and Overparametrization in Variational Quantum Algorithm , author=. 2026 , eprint=

2026
[15]

Foundations of Computational Mathematics , volume =

Exact Matrix Completion via Convex Optimization , author =. Foundations of Computational Mathematics , volume =. 2009 , publisher =

2009
[16]

On a problem of

Erd. On a problem of. Journal of the London Mathematical Society , volume =. 1941 , doi =

1941
[17]

SIAM Journal on Optimization , volume =

Global Optimization with Polynomials and the Problem of Moments , author =. SIAM Journal on Optimization , volume =. 2001 , doi =

2001
[18]

Proceedings of the 51st Annual

Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics , author =. Proceedings of the 51st Annual. 2019 , doi =

2019
[19]

The Theory of Approximation , author =
[20]

arXiv preprint arXiv:2603.18479 , year =

Barren Plateaus Beyond Observable Concentration , author =. arXiv preprint arXiv:2603.18479 , year =. 2603.18479 , archivePrefix =

work page arXiv
[21]

2026 , eprint=

Spectral Bias in Variational Quantum Machine Learning , author=. 2026 , eprint=

2026
[22]

García-Martín, Diego and Larocca, Martín and Cerezo, M. , year=. Effects of noise on the overparametrization of quantum neural networks , volume=. Physical Review Research , publisher=. doi:10.1103/physrevresearch.6.013295 , number=

work page doi:10.1103/physrevresearch.6.013295
[23]

and Izaac, J

Stokes, James and Izaac, Josh and Killoran, Nathan and Carleo, Giuseppe , year=. Quantum Natural Gradient , volume=. doi:10.22331/q-2020-05-25-269 , journal=

work page doi:10.22331/q-2020-05-25-269 2020
[24]

2020 , eprint=

New insights and perspectives on the natural gradient method , author=. 2020 , eprint=

2020
[25]

Quantum , volume =

Product Decomposition of Periodic Functions in Quantum Signal Processing , author =. Quantum , volume =
[26]

arXiv:2003.02831 , year =

Finding Angles for Quantum Signal Processing with Machine Learning , author =. arXiv:2003.02831 , year =

work page arXiv 2003
[27]

Efficient phase-factor evaluation in quantum signal processing,

Dong, Yulong and Meng, Xiang and Whaley, K. Birgitta and Lin, Lin , year=. Efficient phase-factor evaluation in quantum signal processing , volume=. Physical Review A , publisher=. doi:10.1103/physreva.103.042419 , number=

work page doi:10.1103/physreva.103.042419
[28]

Generalized quantum signal processing,

Motlagh, Danial and Wiebe, Nathan , title =. PRX Quantum , year =. doi:10.1103/PRXQuantum.5.020368 , eprint =

work page doi:10.1103/prxquantum.5.020368
[29]

and Bau, David , title =

Trefethen, Lloyd N. and Bau, David , title =. 1997 , doi =

1997
[30]

and Van Loan, Charles F

Golub, Gene H. and Van Loan, Charles F. , title =
[31]

, title =

Nocedal, Jorge and Wright, Stephen J. , title =. 2006 , doi =

2006
[32]

Numerical Methods for Least Squares Problems , publisher =

Bj. Numerical Methods for Least Squares Problems , publisher =. 1996 , doi =

1996
[33]

Geometry of Quantum States: An Introduction to Quantum Entanglement , publisher =

Bengtsson, Ingemar and. Geometry of Quantum States: An Introduction to Quantum Entanglement , publisher =. 2006 , doi =

2006
[34]

Forn-D´ ıaz, and Jos´ e I

Pérez-Salinas, Adrián and López-Núñez, David and García-Sáez, Artur and Forn-Díaz, P. and Latorre, José I. , year=. One qubit as a universal approximant , volume=. Physical Review A , publisher=. doi:10.1103/physreva.104.012405 , number=

work page doi:10.1103/physreva.104.012405
[35]

2019 , eprint=

On the Spectral Bias of Neural Networks , author=. 2019 , eprint=

2019
[36]

2026 , eprint=

Spectral methods: crucial for machine learning, natural for quantum computers? , author=. 2026 , eprint=

2026
[37]

Duffy and M

Duffy, Callum and Jastrzebski, Marcin , title =. arXiv preprint arXiv:2506.22555 , year =

work page arXiv
[38]

Real Algebraic Geometry , publisher =

Bochnak, Jacek and Coste, Michel and Roy, Marie-Fran. Real Algebraic Geometry , publisher =. 1998 , series =

1998
[39]

Proceedings of the 13th International Conference on Artificial Intelligence and Statistics , pages =

Glorot, Xavier and Bengio, Yoshua , title =. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics , pages =
[40]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Chen, Xinlei and He, Kaiming , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =
[41]

Box, George E. P. and Jenkins, Gwilym M. and Reinsel, Gregory C. and Ljung, Greta M. , title =
[42]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

Ville Bergholm and Josh Izaac and Maria Schuld and Christian Gogolin and Shahnawaz Ahmed and Vishnu Ajith and M. Sohaib Alam and Guillermo Alonso-Linaje and B. AkashNarayanan and Ali Asadi and Juan Miguel Arrazola and others , title =. arXiv preprint arXiv:1811.04968 , year =

work page internal anchor Pith review arXiv
[43]

James Bradbury and Roy Frostig and Peter Hawkins and Matthew James Johnson and Yash Katariya and Chris Leary and Dougal Maclaurin and George Necula and Adam Paszke and Jake Vander
[44]

Adam: A Method for Stochastic Optimization

Kingma, Diederik P. and Ba, Jimmy , title =. arXiv preprint arXiv:1412.6980 , year =

work page internal anchor Pith review arXiv
[45]

Proceedings of the National Academy of Sciences , volume=

Reconciling modern machine-learning practice and the classical bias--variance trade-off , author=. Proceedings of the National Academy of Sciences , volume=. 2019 , publisher=

2019
[46]

Proceedings of the National Academy of Sciences , volume=

Benign overfitting in linear regression , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

2020
[47]

Mathematics of Quantum Computation , editor =

Brylinski, Jean-Luc and Brylinski, Ranee , title =. Mathematics of Quantum Computation , editor =. 2002 , pages =

2002
[48]

1994 , publisher =

Time Series Modelling of Water Resources and Environmental Systems , author =. 1994 , publisher =

1994