pith. sign in

arxiv: 2507.06344 · v3 · pith:6UOTSUIZnew · submitted 2025-07-08 · 🪐 quant-ph · cs.CC· cs.LG

Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes

Pith reviewed 2026-05-21 23:42 UTC · model grok-4.3

classification 🪐 quant-ph cs.CCcs.LG
keywords variational quantum algorithmsbarren plateausgradient scalingclassical simulabilityTaylor surrogateLinear Clifford Encoderquantum complexity
0
0 comments X

The pith

A Linear Clifford Encoder produces constant-scaling gradients near Clifford circuits while keeping super-polynomial complexity outside classically simulable regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how variational quantum algorithms can maintain usable gradients without becoming fully classically simulable. It first develops a Taylor surrogate that matches known simulation runtimes near Clifford circuits and proves that complexity is at least super-polynomial beyond those regions. The central contribution is the Linear Clifford Encoder, an efficient ansatz change that holds gradients constant close to Clifford points. Numerical runs on the modified landscapes then show a transition region in which gradients appear to fall polynomially rather than exponentially while complexity remains hard. The work therefore points toward possible cases where non-vanishing gradients and quantum hardness can occur together.

Core claim

Beyond previously established classically simulable regions, variational quantum algorithm complexity is at least super-polynomial, and the Linear Clifford Encoder ensures constant-scaling gradients within landscape regions close to Clifford circuits, with numerical evidence indicating polynomial rather than exponential gradient decay in the transition to these harder regions.

What carries the argument

The Linear Clifford Encoder, a classically efficient ansatz modifier that enforces constant-scaling gradients near Clifford circuits.

If this is right

  • Gradients remain constant in scale near Clifford circuits after the ansatz modification.
  • Computational complexity stays at least super-polynomial outside the regions that admit efficient classical simulation.
  • In the transition zone, gradients can decay polynomially instead of exponentially while complexity remains hard.
  • Non-vanishing gradients and super-polynomial complexity may coexist in certain landscape instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These transition zones could be targets for hybrid algorithms that combine classical pre-processing with quantum evaluation.
  • Extending the Taylor surrogate to cover more of the transition region might yield practical simulation tools for near-term devices.
  • The findings weaken the strict link between avoiding barren plateaus and full classical simulability, suggesting the need for refined complexity measures.

Load-bearing premise

The Linear Clifford Encoder keeps gradients from vanishing near Clifford circuits without lowering the landscape complexity below super-polynomial outside the previously simulable areas.

What would settle it

Direct computation showing that gradients still vanish exponentially with system size in the modified landscapes even near Clifford circuits would refute the constant-scaling claim.

Figures

Figures reproduced from arXiv: 2507.06344 by Aurelien Lucchi, Francesco Scala, Francesco Tacchino, Sabri Meyer.

Figure 1
Figure 1. Figure 1: FIG. 1. Illustration of our decomposition of the theoretical link between complexity and trainability in VQAs. These two [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. (a) Three possible interpretations of the [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Worst-case analysis truncation threshold [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Comparing the computational complexity of the Tay [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Scaling behavior of the initial gradient norm at [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Scaling behavior of the gradient norm on various [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. VQE Optimization using vanilla gradient descent with a learning rate of 0.01, averaging over 5 runs of random Gaussian [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Examples of two-layer PQCs after LCE transformation. (a) [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. Stabilizer rank [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10. Simulation time of [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: FIG. 11. Illustrating the landscape transformation effect of LCE for a [PITH_FULL_IMAGE:figures/full_fig_p034_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: FIG. 12. Behavior of the required quantities [PITH_FULL_IMAGE:figures/full_fig_p061_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: FIG. 13. Scaling behavior of the initial gradient norm at [PITH_FULL_IMAGE:figures/full_fig_p063_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: FIG. 14. Scaling behavior of the initial gradient norm at [PITH_FULL_IMAGE:figures/full_fig_p064_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: FIG. 15. Scaling behavior of the gradient norm on various surrogate patches for PQC models with [PITH_FULL_IMAGE:figures/full_fig_p065_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: FIG. 16. Scaling behavior of the gradient norm on various surrogate patches for PQC models with [PITH_FULL_IMAGE:figures/full_fig_p066_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: FIG. 17. VQE Optimization using vanilla gradient descent with various learning rates of 0.1 (top row), 0.01 (second row), 0.001 [PITH_FULL_IMAGE:figures/full_fig_p067_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: FIG. 18. VQE Optimization using vanilla gradient descent with a learning rate of 0.01, estimating gradients with 8 shots (top [PITH_FULL_IMAGE:figures/full_fig_p068_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: FIG. 19. VQE Optimization using vanilla gradient descent with a learning rate of 0.01, averaging over 5 runs of random [PITH_FULL_IMAGE:figures/full_fig_p069_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: FIG. 20. VQE Optimization using vanilla gradient descent with a learning rate of 0.01, averaging over 5 runs of random [PITH_FULL_IMAGE:figures/full_fig_p070_20.png] view at source ↗
read the original abstract

Variational Quantum Algorithms are promising candidates for near-term quantum computing, yet they face scalability challenges due to barren plateaus, where gradients vanish exponentially relative to system size. Recent conjectures suggest that avoiding these plateaus might inherently lead to classical simulability, thereby limiting the opportunities for quantum advantage. In this work, we advance the theoretical understanding of the relationship between gradient scalability at initialization and the computational complexity of variational quantum algorithms. We first present the Taylor surrogate, a classical simulation technique that matches Pauli path runtime guarantees on near-Clifford regions while offering runtime advantages in specific regimes. Leveraging this surrogate, we prove that beyond previously established classically simulable regions, the computational complexity is at least super-polynomial. Next, we introduce the Linear Clifford Encoder, a classically efficient ansatz modifier that ensures constant-scaling gradients within landscape regions close to Clifford circuits. Finally, numerical experiments on these modified landscapes provide preliminary empirical evidence of a transition zone where constant-scaling gradients may decay polynomially in super-polynomially complex regions rather than exponentially. These findings suggest speculative instances where non-vanishing gradients and super-polynomial complexity could potentially coexist, vindicating the need for future formal proofs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the Taylor surrogate, a classical simulation technique matching Pauli path runtime guarantees near Clifford circuits with potential runtime advantages in specific regimes. It proves that beyond previously established classically simulable regions, the computational complexity of variational quantum algorithms is at least super-polynomial. The work then defines the Linear Clifford Encoder, a classically efficient ansatz modifier ensuring constant-scaling gradients near Clifford circuits, and presents numerical experiments on modified landscapes providing preliminary evidence of a transition zone in which constant-scaling gradients decay polynomially rather than exponentially within super-polynomially complex regions.

Significance. If the central claims are substantiated, the results would offer a concrete counter-example to conjectures that non-vanishing gradients necessarily imply classical simulability, thereby opening a speculative window for quantum advantage in variational algorithms. The Taylor surrogate is a positive contribution with explicit runtime comparisons, and the numerical identification of a polynomial-decay transition zone, while preliminary, supplies falsifiable empirical signatures that can guide future proofs. The overall framing strengthens the theoretical link between gradient scaling at initialization and complexity class.

major comments (3)
  1. [Linear Clifford Encoder definition and complexity analysis] Linear Clifford Encoder section: the central claim that this classically efficient modifier preserves super-polynomial complexity outside previously simulable regions while enforcing constant gradient scaling is load-bearing for the transition-zone interpretation, yet the manuscript supplies no explicit reduction or simulation argument showing that the encoder does not inadvertently introduce additional structure permitting efficient classical approximation or polynomial-time simulation.
  2. [Complexity proof after Taylor surrogate] Proof of super-polynomial complexity (following the Taylor surrogate introduction): the argument that complexity remains at least super-polynomial beyond simulable regions relies on the surrogate matching Pauli-path guarantees, but it is not shown how the surrogate avoids reducing the effective complexity class or how the complexity lower bound is independent of the surrogate's own fitting parameters.
  3. [Numerical experiments and transition-zone evidence] Numerical experiments section: the reported transition zone with polynomial rather than exponential gradient decay is the key empirical support, but the text provides no quantitative definition of the transition (e.g., fitting procedure, system-size scaling, or statistical test distinguishing polynomial from exponential), nor details on ansatz depth, number of samples, or error analysis, rendering the evidence difficult to assess or reproduce.
minor comments (2)
  1. [Abstract and Taylor surrogate runtime discussion] The abstract states that the surrogate 'matches Pauli path runtime guarantees'; a direct runtime comparison table or equation would clarify the claimed advantage in specific regimes.
  2. [Ansatz modification description] Notation for the Linear Clifford Encoder should include an explicit statement of how the encoder is applied to the original ansatz (e.g., as a pre- or post-processing layer) to avoid ambiguity in the numerical setup.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the potential significance of our results. We address each of the major comments in detail below, indicating the revisions we plan to make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Linear Clifford Encoder definition and complexity analysis] Linear Clifford Encoder section: the central claim that this classically efficient modifier preserves super-polynomial complexity outside previously simulable regions while enforcing constant gradient scaling is load-bearing for the transition-zone interpretation, yet the manuscript supplies no explicit reduction or simulation argument showing that the encoder does not inadvertently introduce additional structure permitting efficient classical approximation or polynomial-time simulation.

    Authors: We agree that an explicit argument for complexity preservation is important for the claim. The Linear Clifford Encoder is defined as a modification that applies a linear map over Clifford basis states to the variational parameters, which is classically computable but does not simplify the non-Clifford components of the original ansatz. We will add a new paragraph in the section providing a high-level argument that any efficient classical simulation of the modified circuit would imply one for the original, thus preserving the super-polynomial hardness. This will be a partial revision as a full formal reduction may require additional technical work. revision: partial

  2. Referee: [Complexity proof after Taylor surrogate] Proof of super-polynomial complexity (following the Taylor surrogate introduction): the argument that complexity remains at least super-polynomial beyond simulable regions relies on the surrogate matching Pauli-path guarantees, but it is not shown how the surrogate avoids reducing the effective complexity class or how the complexity lower bound is independent of the surrogate's own fitting parameters.

    Authors: The Taylor surrogate is employed solely as a tool to analyze the landscape near Clifford circuits and does not alter the underlying variational circuit for the complexity analysis. The super-polynomial complexity lower bound is derived from a direct reduction to known hard instances of variational quantum algorithms outside the simulable regime, independent of the surrogate's approximation parameters. We will revise the proof section to explicitly state this independence and clarify that the surrogate is not used in establishing the lower bound itself. revision: yes

  3. Referee: [Numerical experiments and transition-zone evidence] Numerical experiments section: the reported transition zone with polynomial rather than exponential gradient decay is the key empirical support, but the text provides no quantitative definition of the transition (e.g., fitting procedure, system-size scaling, or statistical test distinguishing polynomial from exponential), nor details on ansatz depth, number of samples, or error analysis, rendering the evidence difficult to assess or reproduce.

    Authors: We acknowledge the need for more rigorous presentation of the numerical results. In the revised manuscript, we will include: (1) a quantitative definition of the transition zone based on fitting the gradient norm scaling to both polynomial (O(1/n^k)) and exponential (O(exp(-c n))) forms and comparing goodness-of-fit via R^2 or AIC; (2) specification of ansatz depths used (typically 4-12 layers); (3) number of samples per data point (at least 5000 shots per gradient estimate); and (4) error analysis using standard deviation over 10 independent runs. These additions will make the evidence more reproducible and allow better assessment of the polynomial decay claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on introduced techniques and external benchmarks

full rationale

The paper defines the Taylor surrogate as a new classical simulation method matching Pauli-path guarantees near Clifford regions and leverages it to prove super-polynomial complexity outside previously simulable areas. It separately introduces the Linear Clifford Encoder as a classically efficient ansatz modification that enforces constant-scaling gradients near Clifford circuits by construction of the modifier itself. Numerical experiments then examine the resulting modified landscapes for a transition zone. No quoted equations or steps reduce the central claims to fitted inputs, self-definitions, or load-bearing self-citations; the complexity lower bound and gradient scaling are established via the new surrogate and encoder definitions plus external simulability results, making the chain self-contained against benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The central claims depend on newly introduced simulation and encoding constructs whose properties are asserted rather than derived from prior independent results; numerical evidence relies on unstated modeling choices.

free parameters (1)
  • Numerical experiment parameters
    Choices in landscape modification and sampling likely fitted or selected to demonstrate the transition zone, though unspecified in abstract.
axioms (2)
  • domain assumption Near-Clifford regions admit efficient classical simulation with Pauli path guarantees
    Invoked to establish runtime matching for the Taylor surrogate.
  • ad hoc to paper Linear Clifford Encoder preserves constant gradient scaling without changing complexity class
    Central to the ansatz modification and transition zone claim.
invented entities (2)
  • Taylor surrogate no independent evidence
    purpose: Classical simulation technique for quantum cost landscapes
    New method introduced to match and extend existing simulation runtimes.
  • Linear Clifford Encoder no independent evidence
    purpose: Ansatz modifier to ensure constant-scaling gradients near Clifford circuits
    New construct proposed to address gradient vanishing.

pith-pipeline@v0.9.0 · 5744 in / 1454 out tokens · 45671 ms · 2026-05-21T23:42:22.588185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Theorem 1) to establish a constant lower bound

    Constant Gradient Lower Bound in Small Patches Having derived a lower bound for the expected squared gradient norm with an error term perturbing a given component (∂θk C)(0), we combine this result with the LCE (cf. Theorem 1) to establish a constant lower bound. In particular, this guarantees a tight escape from barren-plateaus as per Eq. (9): Corollary ...

  2. [2]

    This outcome holds with probability one for all observablesH as defined in Eq

    Polynomial Gradient Lower Bound beyond Small Patches Utilizing the LCE method in conjunction with Lebesgue measure theory and previous results, we are ready to rigorously show that barren plateaus are mitigated beyond the threshold σ = O(D−1/2), a landscape patch lacking any known classical surrogate [18]. This outcome holds with probability one for all o...

  3. [3]

    Worst-Case Error: m = m(θ) is a deterministic function of a fixed configuration θ ∈ [−π, π]D

  4. [4]

    Mean-Squared Error: m = m(Pθ) is defined probabilistically based on an initialization distribution Pθ

  5. [5]

    Worst-Case Error Analysis The following lemma quantifies the (deterministic) worst-case error of the approximation Cm(θ) ≈ C(θ) as a function of θ. Lemma 7. Consider an expectation of the form Eq. (E1) with a variational ansatz U(θ) involving D ≥ 1 parameters, and suppose we simulate it classically using the Taylor series Cm(θ) defined in Eq. (D9). Let θ ...

  6. [6]

    balls and cells

    Mean-Squared Error Analysis Next, we quantify the mean squared error of the approximation Cm(θ) ≈ C(θ) with respect to the local distribution Pθ ∈ {N (0, σ2ID), Unif([−σ, σ]D)} for some σ > 0. However, we first require the following counting argument to conclude with the result: Lemma 10. Let m = O(1) be of constant order, i.e., independent of D. Then, X ...

  7. [7]

    one-size-fits-all

    The Complexity Class of LCE Hamiltonians In order to successfully mitigate barren plateaus of arbitrarily expressive PQC unitaries U(θ) on large parameter landscape patches with LCE (cf. Theorem 5), we are restricted to the class of observables, H = X i∈I ciPi with ∥H∥ = O(1) and ci0 = Ω(1) for some i0 ∈ I , (J1) where I ⊂ { I, X, Y, Z }⊗N \{I2N } is an a...

  8. [8]

    balls and cells

    Deterministic Worst-Case Simulation Complexity The next parts leverage the results of the previous section, and focuses on the computational complexity of sim- ulating the cost function C(θ) in Eq. (E1) with the help of the Taylor surrogate Cm(θ) in Eq. (D9). We shall use Time Cm(θ) to denote the time complexity of evaluating Cm(θ), which is, by definitio...

  9. [9]

    Theorem 4

    Probabilistic Mean-Squared Simulation Complexity Next, we use the mean-squared error analysis in order to determine the Gaussian/uniform initialization strategy Pθ that covers the largest landscape patch where Taylor surrogating the cost C(·) remains polynomially efficient. Theorem 4. Consider an expectation of the form in Eq. (E1) with a variational ansa...

  10. [10]

    [18], as long as the patch is sufficiently small: Corollary 3

    Complexity Comparison with the Pauli Path Surrogate We conclude our complexity analysis by showing that our Taylor surrogate outperforms the Pauli path surrogate employed by Lerch et al. [18], as long as the patch is sufficiently small: Corollary 3. Consider an expectation of the form in Eq. (E1) with a variational ansatz U(θ) involving D ≥ 1 parameters, ...

  11. [11]

    Efficient Recursive Classical Implementation In a practical implementation of our simulation strategy, one has to iterate over the summation {∥α∥1 < m } in Eq. (D9). Concretely, we are interested in finding all multi-index combinations in the set: Am,D := {α = (α1, . . . , αD) ∈ ND 0 : ∥α∥1 < m}. (J58) When using a for-loop naively, one will be confronted...

  12. [12]

    Gradient Scaling Experiments We repeat the initial gradient scaling experiments (cf. Figs. 5 and 6) comparing all models considered in this work: mHEA, fHEA, and rPQC (defined in Appendix C). Fig. 13 illustrates the expected behavior of the initial gradient norm at θ = 0 employing various PQC models with L = 1 layer. Gradients are computed analytically vi...

  13. [13]

    Furthermore, we address the issue of finite sampling noise [58]

    VQE Optimization with different Learning Rates and Finite Sampling Noise Here, we demonstrate that LCE is effective for smaller learning rates in vanilla gradient descent. Furthermore, we address the issue of finite sampling noise [58]. We use a small number of finite shots to evaluate the gradients in order to emphasize that LCE remains relevant for near...

  14. [14]

    VQE Optimization of the Heisenberg Model In this experiment, we leverage the benchmarked initial gradient scaling statistics to optimize the VQE of the N = 12 qubit Heisenberg model [59], HHeisenberg := N −1X j=1 {XiXi+1 + YiYi+1 + ZiZi+1} . (K1) Eq. (K1) defines a local observable and thus is less prone to the barren plateau problem [57]. In order to inc...