Structural $f$-divergence: Tight universal bounds for cost function moments and gradients in parameterized quantum circuits

Tomohiro Nishiyama; Yoshihiko Hasegawa

arxiv: 2605.18051 · v1 · pith:B6V6CZYEnew · submitted 2026-05-18 · 🪐 quant-ph

Structural f-divergence: Tight universal bounds for cost function moments and gradients in parameterized quantum circuits

Tomohiro Nishiyama , Yoshihiko Hasegawa This is my paper

Pith reviewed 2026-05-20 11:17 UTC · model grok-4.3

classification 🪐 quant-ph

keywords barren plateauf-divergenceparameterized quantum circuitvariational quantum algorithmgradient magnitudecost concentrationtrade-off inequality

0 comments

The pith

Structural f-divergence establishes tight universal bounds on gradients and cost moments for parameterized quantum circuits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the structural f-divergence as a symmetric measure between probability distributions defined over the parameters of a parameterized quantum circuit. Using this measure, the authors derive analytical trade-off inequalities that limit the differences in average gradient sizes and in the variation of cost function values when comparing any chosen distribution to a reference one. These inequalities become equalities for the simplest possible circuit consisting of a single qubit and a single layer, suggesting the bounds apply broadly. If correct, the result supplies necessary conditions that a probability measure must satisfy to prevent gradients from vanishing exponentially and to keep cost values from concentrating, while also offering sufficient conditions under which noise effects remain controlled.

Core claim

We introduce the structural f-divergence between probability distributions on the parameter space of parameterized quantum circuits. This leads to analytically derived trade-off inequalities that bound the discrepancies between a given distribution and a reference distribution in terms of the expected magnitude of the cost-function gradient and the moments of the cost function itself. The bounds are tight, with equality achieved by the minimal one-qubit, one-layer ansatz. These results yield necessary conditions on probability measures for avoiding barren plateaus and cost concentration, along with sufficient conditions for suppressing noise-induced deviations.

What carries the argument

The structural f-divergence, defined as a symmetric f-divergence on probability distributions over circuit parameters, which serves as the basis for proving the trade-off inequalities bounding gradient and moment discrepancies.

Load-bearing premise

The structural f-divergence functions as a well-defined symmetric measure on probability distributions over the parameter space of parameterized quantum circuits, and equality in the minimal one-qubit one-layer ansatz confirms the bounds are universal.

What would settle it

A numerical check on the one-qubit one-layer circuit to see if the trade-off equalities hold exactly for chosen distributions, or a counter-example distribution on a more complex circuit that violates the predicted bounds on gradient magnitudes.

Figures

Figures reproduced from arXiv: 2605.18051 by Tomohiro Nishiyama, Yoshihiko Hasegawa.

read the original abstract

The barren plateau phenomenon, in which cost-function gradients of variational quantum algorithms vanish exponentially, remains a central obstacle for near-term quantum computing. Existing analyses typically depend on t-design or Haar-random assumptions and bound quantities at the level of unitary distributions, offering limited insight for designing probability measures on the parameter space of parameterized quantum circuits. In this paper, we introduce the structural $f$-divergence, a symmetric $f$-divergence-based measure between probability distributions on the parameter space. We establish analytically trade-off inequalities that bound the discrepancies in the expected gradient magnitude and in the cost-function moments between a distribution on PQC and a reference distribution; equality is attained by a minimal one-qubit, one-layer ansatz. As applications, we derive necessary conditions on probability measures for avoiding BPs and cost concentration, and sufficient conditions that suppress noise-induced deviations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Structural f-divergence shifts focus to parameter distributions for bounding gradients and moments in PQCs, but universality depends on verifying the minimal ansatz equality case.

read the letter

The punchline for this paper is that it defines a structural f-divergence on parameter distributions for parameterized quantum circuits and uses it to derive analytical trade-off inequalities bounding differences in gradient magnitudes and cost moments compared to a reference. These inequalities come with an equality case in a minimal one-qubit one-layer ansatz, which supports claims of necessary conditions to avoid barren plateaus. The new element is moving the analysis to the parameter space rather than assuming t-designs or Haar randomness at the unitary level. This allows for more direct insight into how to choose probability measures on the parameters to suppress barren plateaus and cost concentration. The applications to noise suppression conditions are also a plus, as they address practical issues in near-term devices. The work does well in providing explicit analytical results and grounding them in a concrete minimal ansatz for tightness. It avoids circularity by using an external reference distribution, and the derivations appear to be parameter-free in the sense of not fitting to data. The soft spots are around the claimed universality. The stress-test note highlights that the symmetry of the structural f-divergence under the parameter-to-unitary map and the generality of the equality case need confirmation. If the bounds only tighten for that specific small circuit and do not extend cleanly to larger systems or varied cost functions, the guidance for avoiding barren plateaus loses some of its strength. Since the full proofs are in the paper, one would want to verify those steps carefully. This paper is aimed at researchers in quantum information and variational quantum algorithms who are tackling trainability issues. Someone working on theoretical bounds or practical initialization strategies could get value from the inequalities and conditions derived. I would recommend sending it to peer review. The core ideas are solid enough to warrant referee input, particularly on the scope of the results.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the structural f-divergence, a symmetric f-divergence-based measure between probability distributions on the parameter space of parameterized quantum circuits (PQCs). It derives analytical trade-off inequalities bounding discrepancies in expected gradient magnitudes and cost-function moments relative to a reference distribution, with equality attained in a minimal one-qubit, one-layer ansatz. Applications yield necessary conditions on probability measures to avoid barren plateaus and cost concentration, plus sufficient conditions to suppress noise-induced deviations.

Significance. If the derivations hold, the work offers a parameter-space perspective on barren plateaus that avoids t-design or Haar assumptions, potentially guiding the design of initial distributions for PQCs. The explicit tight equality case in a simple ansatz is a positive feature that could make the bounds useful for practical ansatz and sampling choices in variational quantum algorithms.

major comments (3)

Definition of structural f-divergence (Section 2): The symmetry of this new measure under the map from parameter distributions to induced unitary distributions must be shown explicitly; without this, the claimed universality of the subsequent trade-off inequalities cannot be established.
Theorem on trade-off inequalities (Section 4, main result): The equality case for the minimal one-qubit one-layer ansatz should be derived in full detail for arbitrary cost functions, not merely asserted, to confirm that the bounds on gradient magnitude and moments are indeed tight and universal rather than ansatz-specific.
Application to barren plateaus (Section 5): The necessary conditions derived for avoiding BPs must be checked against at least one known non-trivial cost function or multi-qubit example to verify they are not vacuous or overly restrictive.

minor comments (2)

Notation for the reference distribution should be introduced once and used consistently throughout the derivations.
Figure captions for any illustrative circuits or plots should explicitly state the cost function and number of qubits used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which have helped us identify areas for improvement. We address each major comment point by point below, indicating the revisions we will make.

read point-by-point responses

Referee: Definition of structural f-divergence (Section 2): The symmetry of this new measure under the map from parameter distributions to induced unitary distributions must be shown explicitly; without this, the claimed universality of the subsequent trade-off inequalities cannot be established.

Authors: We agree that an explicit proof of symmetry is required to rigorously support the universality claim. In the revised manuscript, we will insert a dedicated proposition in Section 2 deriving the symmetry of the structural f-divergence under the parameter-to-unitary distribution map, using the definition and properties of f-divergences. This addition will directly underpin the generality of the trade-off inequalities in Section 4. revision: yes
Referee: Theorem on trade-off inequalities (Section 4, main result): The equality case for the minimal one-qubit one-layer ansatz should be derived in full detail for arbitrary cost functions, not merely asserted, to confirm that the bounds on gradient magnitude and moments are indeed tight and universal rather than ansatz-specific.

Authors: The referee correctly notes that a full derivation strengthens the tightness claim. We will expand the proof of the main theorem in Section 4 to include a complete, step-by-step calculation of the equality case for the one-qubit one-layer ansatz, explicitly computing the expected gradient magnitudes and cost moments for arbitrary cost functions and showing how equality is attained with the reference distribution. revision: yes
Referee: Application to barren plateaus (Section 5): The necessary conditions derived for avoiding BPs must be checked against at least one known non-trivial cost function or multi-qubit example to verify they are not vacuous or overly restrictive.

Authors: We will add a new illustrative example in Section 5 applying the necessary conditions to a non-trivial cost function (e.g., a two-qubit Heisenberg model Hamiltonian) with a multi-qubit ansatz. This will explicitly verify that the conditions hold for parameter distributions known to avoid barren plateaus, demonstrating their practical relevance without being overly restrictive. revision: yes

Circularity Check

0 steps flagged

No circularity: analytical derivation from newly introduced structural f-divergence is self-contained

full rationale

The paper introduces the structural f-divergence as a symmetric measure on probability distributions over the PQC parameter space and analytically derives trade-off inequalities bounding discrepancies in expected gradient magnitudes and cost-function moments relative to a reference distribution. Equality is shown for a concrete minimal one-qubit one-layer ansatz to establish tightness. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the bounds follow from the properties of the introduced divergence without the outputs being presupposed in the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the introduction of the structural f-divergence and the analytical derivation of trade-off inequalities. No explicit free parameters are mentioned in the abstract. The new measure itself functions as an invented construct whose properties enable the bounds.

axioms (1)

standard math f-divergences admit symmetric variants suitable for measuring discrepancies between probability distributions on parameter spaces
The abstract builds directly on f-divergence concepts from information theory to define the structural variant.

invented entities (1)

structural f-divergence no independent evidence
purpose: Symmetric f-divergence-based measure between probability distributions on the parameter space of parameterized quantum circuits
Newly introduced to establish the trade-off inequalities and conditions for avoiding barren plateaus.

pith-pipeline@v0.9.0 · 5682 in / 1559 out tokens · 55084 ms · 2026-05-20T11:17:00.060321+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

introduce the structural f-divergence, a symmetric f-divergence-based measure... f∈C²(0,∞) with f(1)=0, f''>0
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

tight trade-off inequalities... equality attained by a minimal one-qubit, one-layer ansatz

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Lemma 2.For any functionf∈ F, df(t) = inf dTV(P,Q)=t ˜Df(P, Q).(B1) The infimum is attained by(P B, QB)defined by Eq.(6)

Lemmas Before the proof, we prove the following lemmas. Lemma 2.For any functionf∈ F, df(t) = inf dTV(P,Q)=t ˜Df(P, Q).(B1) The infimum is attained by(P B, QB)defined by Eq.(6). The proof for differentiablefis provided in Ref. [12–14]. For completeness, we briefly present an alternative proof below. For the full derivation, see the original papers. Proof....

work page
[2]

Proof For a random variable 0≤X≤X max, by applying Lemma 1–Lemma 3, it follows that ˜Df(P, Q)≥d f(dTV(P, Q))≥d f |EP [X]−E Q[X]| Xmax .(B15) SubstitutingX=|∂ j ⟨O⟩ |,P=P Θ, andQ=Q Θ into Eq. (B15) and using Lemma 1 and Lemma 4, we obtain ˜Df(PΘ, QΘ)≥d f |EPΘ[|∂ j ⟨O⟩ |]−E QΘ[|∂ j ⟨O⟩ |]| 2∥Hj∥R∥O∥∞ .(B16) Whenf(x) =|x−1|/2, the same inequality follows fro...

work page
[3]

Lemmas Lemma 5.Let−X max ≤X≤X max be a random variable. For probability measuresPandQfor the random variable X, |EP [X]−E Q[X]| ≤2X maxdTV(P, Q).(C1) 9 The equality holds if PB(X=−X max) = 1−r 2 , P B(X=X max) = 1 +r 2 ,(C2) QB(X=−X max) = 1 +r 2 , Q B(X=X max) = 1−r 2 .(C3) Applying Lemma 3 for a random variableY=X+X max, the result immediately follows. ...

work page
[4]

LettingX=⟨O⟩ k, the proof is analogous to that of Eq

Proof In the following, we first provide a proof for the probability measures (P U , QU). LettingX=⟨O⟩ k, the proof is analogous to that of Eq. (17). We consider the case wherekis even. Since⟨O⟩ k ≥0, by applying Lemma 1, 2, 3 and 6, we obtain Eq. (18) forC(k) = 1. Consider the ansatzC 1,1. From Eq. (C6), Lemma 2, 3, and 6, the equality holds for Eq. (21)...

work page
[5]

J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- bush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature communications9, 4812 (2018)

work page 2018
[6]

Ragone, B

M. Ragone, B. N. Bakalov, F. Sauvage, A. F. Kemper, C. Ortiz Marrero, M. Larocca, and M. Cerezo, A lie al- gebraic theory of barren plateaus for deep parameter- ized quantum circuits, Nature Communications15, 7172 (2024)

work page 2024
[7]

N. Diaz, D. Garc´ ıa-Mart´ ın, S. Kazi, M. Larocca, and M. Cerezo, Showcasing a barren plateau the- ory beyond the dynamical lie algebra, arXiv preprint arXiv:2310.11505 (2023)

work page arXiv 2023
[8]

Cerezo, A

M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, Cost function dependent barren plateaus in shal- low parametrized quantum circuits, Nature communica- tions12, 1791 (2021)

work page 2021
[9]

Grant, L

E. Grant, L. Wossnig, M. Ostaszewski, and M. Benedetti, An initialization strategy for addressing barren plateaus in parametrized quantum circuits, Quantum3, 214 (2019)

work page 2019
[10]

Skolik, J

A. Skolik, J. R. McClean, M. Mohseni, P. Van Der Smagt, and M. Leib, Layerwise learning for quantum neural net- works, Quantum Machine Intelligence3, 5 (2021). 11

work page 2021
[11]

S. Sim, P. D. Johnson, and A. Aspuru-Guzik, Express- ibility and entangling capability of parameterized quan- tum circuits for hybrid quantum-classical algorithms, Ad- vanced Quantum Technologies2, 1900070 (2019)

work page 2019
[12]

Holmes, K

Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, Con- necting ansatz expressibility to gradient magnitudes and barren plateaus, PRX quantum3, 010313 (2022)

work page 2022
[13]

Arrasmith, Z

A. Arrasmith, Z. Holmes, M. Cerezo, and P. J. Coles, Equivalence of quantum barren plateaus to cost concen- tration and narrow gorges, Quantum Science & Technol- ogy7, 045015 (2022)

work page 2022
[14]

Sason and S

I. Sason and S. Verd´ u,f-divergence inequalities, IEEE Transactions on Information Theory62, 5973 (2016)

work page 2016
[15]

Le Cam,Asymptotic methods in statistical decision theory(Springer Science & Business Media, 2012)

L. Le Cam,Asymptotic methods in statistical decision theory(Springer Science & Business Media, 2012)

work page 2012
[16]

Sason, Tight bounds for symmetric divergence mea- sures and a new inequality relating f-divergences, in2015 IEEE Information Theory Workshop (ITW)(2015) pp

I. Sason, Tight bounds for symmetric divergence mea- sures and a new inequality relating f-divergences, in2015 IEEE Information Theory Workshop (ITW)(2015) pp. 1–5

work page 2015
[17]

G. L. Gilardoni, On pinsker’s and vajda’s type inequal- ities for csisz´ ar’sf-divergences, IEEE Transactions on Information Theory56, 5377 (2010)

work page 2010
[18]

G. L. Gilardoni, On the minimumf-divergence for given total variation, Comptes Rendus. Math´ ematique343, 763 (2006)

work page 2006

[1] [1]

Lemma 2.For any functionf∈ F, df(t) = inf dTV(P,Q)=t ˜Df(P, Q).(B1) The infimum is attained by(P B, QB)defined by Eq.(6)

Lemmas Before the proof, we prove the following lemmas. Lemma 2.For any functionf∈ F, df(t) = inf dTV(P,Q)=t ˜Df(P, Q).(B1) The infimum is attained by(P B, QB)defined by Eq.(6). The proof for differentiablefis provided in Ref. [12–14]. For completeness, we briefly present an alternative proof below. For the full derivation, see the original papers. Proof....

work page

[2] [2]

Proof For a random variable 0≤X≤X max, by applying Lemma 1–Lemma 3, it follows that ˜Df(P, Q)≥d f(dTV(P, Q))≥d f |EP [X]−E Q[X]| Xmax .(B15) SubstitutingX=|∂ j ⟨O⟩ |,P=P Θ, andQ=Q Θ into Eq. (B15) and using Lemma 1 and Lemma 4, we obtain ˜Df(PΘ, QΘ)≥d f |EPΘ[|∂ j ⟨O⟩ |]−E QΘ[|∂ j ⟨O⟩ |]| 2∥Hj∥R∥O∥∞ .(B16) Whenf(x) =|x−1|/2, the same inequality follows fro...

work page

[3] [3]

Lemmas Lemma 5.Let−X max ≤X≤X max be a random variable. For probability measuresPandQfor the random variable X, |EP [X]−E Q[X]| ≤2X maxdTV(P, Q).(C1) 9 The equality holds if PB(X=−X max) = 1−r 2 , P B(X=X max) = 1 +r 2 ,(C2) QB(X=−X max) = 1 +r 2 , Q B(X=X max) = 1−r 2 .(C3) Applying Lemma 3 for a random variableY=X+X max, the result immediately follows. ...

work page

[4] [4]

LettingX=⟨O⟩ k, the proof is analogous to that of Eq

Proof In the following, we first provide a proof for the probability measures (P U , QU). LettingX=⟨O⟩ k, the proof is analogous to that of Eq. (17). We consider the case wherekis even. Since⟨O⟩ k ≥0, by applying Lemma 1, 2, 3 and 6, we obtain Eq. (18) forC(k) = 1. Consider the ansatzC 1,1. From Eq. (C6), Lemma 2, 3, and 6, the equality holds for Eq. (21)...

work page

[5] [5]

J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- bush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature communications9, 4812 (2018)

work page 2018

[6] [6]

Ragone, B

M. Ragone, B. N. Bakalov, F. Sauvage, A. F. Kemper, C. Ortiz Marrero, M. Larocca, and M. Cerezo, A lie al- gebraic theory of barren plateaus for deep parameter- ized quantum circuits, Nature Communications15, 7172 (2024)

work page 2024

[7] [7]

N. Diaz, D. Garc´ ıa-Mart´ ın, S. Kazi, M. Larocca, and M. Cerezo, Showcasing a barren plateau the- ory beyond the dynamical lie algebra, arXiv preprint arXiv:2310.11505 (2023)

work page arXiv 2023

[8] [8]

Cerezo, A

M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, Cost function dependent barren plateaus in shal- low parametrized quantum circuits, Nature communica- tions12, 1791 (2021)

work page 2021

[9] [9]

Grant, L

E. Grant, L. Wossnig, M. Ostaszewski, and M. Benedetti, An initialization strategy for addressing barren plateaus in parametrized quantum circuits, Quantum3, 214 (2019)

work page 2019

[10] [10]

Skolik, J

A. Skolik, J. R. McClean, M. Mohseni, P. Van Der Smagt, and M. Leib, Layerwise learning for quantum neural net- works, Quantum Machine Intelligence3, 5 (2021). 11

work page 2021

[11] [11]

S. Sim, P. D. Johnson, and A. Aspuru-Guzik, Express- ibility and entangling capability of parameterized quan- tum circuits for hybrid quantum-classical algorithms, Ad- vanced Quantum Technologies2, 1900070 (2019)

work page 2019

[12] [12]

Holmes, K

Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, Con- necting ansatz expressibility to gradient magnitudes and barren plateaus, PRX quantum3, 010313 (2022)

work page 2022

[13] [13]

Arrasmith, Z

A. Arrasmith, Z. Holmes, M. Cerezo, and P. J. Coles, Equivalence of quantum barren plateaus to cost concen- tration and narrow gorges, Quantum Science & Technol- ogy7, 045015 (2022)

work page 2022

[14] [14]

Sason and S

I. Sason and S. Verd´ u,f-divergence inequalities, IEEE Transactions on Information Theory62, 5973 (2016)

work page 2016

[15] [15]

Le Cam,Asymptotic methods in statistical decision theory(Springer Science & Business Media, 2012)

L. Le Cam,Asymptotic methods in statistical decision theory(Springer Science & Business Media, 2012)

work page 2012

[16] [16]

Sason, Tight bounds for symmetric divergence mea- sures and a new inequality relating f-divergences, in2015 IEEE Information Theory Workshop (ITW)(2015) pp

I. Sason, Tight bounds for symmetric divergence mea- sures and a new inequality relating f-divergences, in2015 IEEE Information Theory Workshop (ITW)(2015) pp. 1–5

work page 2015

[17] [17]

G. L. Gilardoni, On pinsker’s and vajda’s type inequal- ities for csisz´ ar’sf-divergences, IEEE Transactions on Information Theory56, 5377 (2010)

work page 2010

[18] [18]

G. L. Gilardoni, On the minimumf-divergence for given total variation, Comptes Rendus. Math´ ematique343, 763 (2006)

work page 2006