Hessian-vector products for tensor networks via recursive tangent-state propagation
Pith reviewed 2026-05-10 00:21 UTC · model grok-4.3
The pith
Recursive tangent-state propagation computes scalable Hessian-vector products for tensor networks without building the full matrix.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An analytical Hessian-vector product kernel is introduced for arbitrary compositions of linear maps. The kernel is realized as a two-pass algorithm that performs recursive tangent-state propagation while keeping the virtual bond dimension bounded, thereby guaranteeing scalability. When this kernel is placed inside a Riemannian trust-region optimizer for quantum circuit compression, the resulting procedure produces up to four orders of magnitude higher fidelity than naive Trotterization on time-evolution circuits for spin chains and exhibits markedly smoother and faster convergence than standard first-order Riemannian methods.
What carries the argument
Recursive tangent-state propagation, the two-pass mechanism that carries tangent vectors through successive linear maps while enforcing a fixed virtual bond dimension to evaluate the Hessian-vector product.
If this is right
- Second-order Riemannian optimization becomes practical for tensor networks whose size previously ruled out explicit Hessian construction.
- Quantum circuit compression of time-evolution operators on spin chains reaches fidelities four orders of magnitude above naive Trotterization.
- Convergence trajectories become smoother and require fewer iterations than those produced by first-order Riemannian ADAM.
- The same kernel applies without modification to any composition of linear maps that can be represented as a tensor network.
Where Pith is reading between the lines
- The bounded-dimension propagation technique may extend to the computation of higher-order derivative-vector products in the same tensor-network setting.
- The method could be tested on tensor-network ansatzes used in classical machine-learning models to check whether the same scalability gains appear outside quantum circuits.
- If the bounded virtual bond dimension remains accurate for deeper or wider networks, the approach would directly support optimization of larger many-body systems without additional truncation analysis.
Load-bearing premise
Bounding the virtual bond dimension during recursive tangent-state propagation still yields the exact Hessian-vector product rather than an uncontrolled approximation.
What would settle it
On a small tensor network where the full Hessian can be stored and multiplied explicitly, compare the output of the recursive kernel against a finite-difference approximation of the directional derivative of the gradient.
Figures
read the original abstract
Optimizing tensor networks with standard first-order methods often leads to slow convergence and entrapment in local minima. Although second-order optimization offers enhanced robustness, explicitly constructing the full Hessian matrix is computationally prohibitive for large-scale systems. In this work, we bypass this bottleneck by introducing an analytical Hessian-vector product kernel designed for arbitrary compositions of linear maps. This two-pass algorithm leverages recursive tangent-state propagation with a bounded virtual bond dimension to guarantee scalability. We demonstrate the practical utility of this kernel by integrating it into a Riemannian trust-region framework for quantum circuit compression. Evaluated on time-evolution circuits for various spin chains, our second-order approach achieves up to a four-order-of-magnitude improvement in fidelity over naive Trotterization, while delivering significantly smoother, faster convergence than conventional first-order methods such as Riemannian ADAM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces an analytical Hessian-vector product (HVP) kernel for tensor networks formed by arbitrary compositions of linear maps. The kernel is realized via a two-pass recursive tangent-state propagation algorithm that deliberately bounds the virtual bond dimension for scalability. This HVP is embedded in a Riemannian trust-region optimizer and applied to quantum circuit compression of time-evolution circuits on spin chains, where the second-order method reportedly yields up to four orders of magnitude higher fidelity than naive Trotterization and smoother convergence than first-order Riemannian ADAM.
Significance. If the HVP is shown to be exact (or with rigorously controlled error) despite the bond-dimension bound, the work supplies a missing computational primitive for second-order optimization of tensor networks. This would be valuable for variational quantum algorithms and many-body simulations where first-order methods converge slowly. The reported fidelity gains on spin-chain circuits provide initial evidence of practical impact, though quantitative validation details are needed to assess the strength of the empirical claims.
major comments (2)
- [Abstract and recursive tangent-state propagation section (likely §3)] The central claim of an 'analytical' (exact) HVP is load-bearing, yet the abstract and method description rely on recursive tangent-state propagation with an explicitly bounded virtual bond dimension. Any such bound introduces a truncation whose error vanishes only if the intermediate tangent states remain exactly within the truncated manifold. The manuscript must demonstrate (via projection identities, closed-form cancellation, or an explicit error analysis) that the recursion preserves exactness; otherwise the 'analytical' guarantee does not hold. This issue directly affects the weakest assumption identified in the stress-test note.
- [Application to quantum circuit compression and numerical results] The integration into the Riemannian trust-region framework assumes the supplied HVP is sufficiently accurate for the trust-region subproblem solver. No quantitative error analysis or comparison against finite-difference or full-Hessian baselines is referenced in the provided abstract; such validation is required to confirm that the observed fidelity improvements are attributable to the second-order information rather than to other implementation details.
minor comments (2)
- [Abstract] The abstract states 'large fidelity gains' and 'four-order-of-magnitude improvement' without specifying the system sizes, circuit depths, or exact fidelity metric used; these details should be added for reproducibility.
- [Method description] Notation for the tangent states and the two-pass algorithm would benefit from an explicit pseudocode listing or a small illustrative diagram showing the forward and backward recursions.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments, which help clarify the presentation of our analytical HVP kernel and its validation. We address each major comment below, indicating the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract and recursive tangent-state propagation section (likely §3)] The central claim of an 'analytical' (exact) HVP is load-bearing, yet the abstract and method description rely on recursive tangent-state propagation with an explicitly bounded virtual bond dimension. Any such bound introduces a truncation whose error vanishes only if the intermediate tangent states remain exactly within the truncated manifold. The manuscript must demonstrate (via projection identities, closed-form cancellation, or an explicit error analysis) that the recursion preserves exactness; otherwise the 'analytical' guarantee does not hold. This issue directly affects the weakest assumption identified in the stress-test note.
Authors: We agree that demonstrating exactness of the HVP with respect to the truncated manifold is essential for the 'analytical' claim. The recursive tangent-state propagation is constructed so that each tangent vector is explicitly projected onto the tangent space of the current bond-dimension truncation at every step; this ensures the computed product is exact for the truncated network rather than an approximation to the untruncated Hessian. We will add a new subsection to §3 containing a formal proof via projection identities showing that the two-pass recursion commutes with these projections and therefore preserves exactness on the truncated manifold. We will also include a short discussion of the (controlled) difference from the untruncated case. revision: yes
-
Referee: [Application to quantum circuit compression and numerical results] The integration into the Riemannian trust-region framework assumes the supplied HVP is sufficiently accurate for the trust-region subproblem solver. No quantitative error analysis or comparison against finite-difference or full-Hessian baselines is referenced in the provided abstract; such validation is required to confirm that the observed fidelity improvements are attributable to the second-order information rather than to other implementation details.
Authors: We acknowledge that stronger quantitative validation is needed to attribute the reported fidelity gains specifically to the second-order information. In the revised manuscript we will add a dedicated validation subsection (within the numerical results) that (i) compares the analytical HVP against finite-difference approximations on small spin-chain instances where the full Hessian remains tractable, (ii) reports observed relative errors and their scaling with bond dimension, and (iii) includes an ablation comparing trust-region steps using the analytical HVP versus first-order Riemannian ADAM on the same circuits. These additions will directly address the attribution question. revision: yes
Circularity Check
No circularity: analytical derivation is self-contained
full rationale
The paper introduces an analytical Hessian-vector product via recursive tangent-state propagation for tensor networks. No load-bearing step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the bounded bond dimension is explicitly a scalability choice rather than a hidden redefinition of the target quantity. The derivation chain remains independent of its inputs and does not rely on renaming known results or smuggling ansatzes through citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Tensor networks consist of compositions of linear maps that admit analytical differentiation.
Reference graph
Works this paper leans on
-
[1]
Optimizing Trotter circuits for spin chains To evaluate the performance of the HVP kernel, we op- timize Trotter circuits using a second-order Riemannian trust-region algorithm [15]. We benchmark its efficacy against the first-order Riemannian ADAM optimizer uti- lized in our recent study [13]. As a primary benchmark, we consider the transverse- field Isi...
-
[2]
Comparing convergence behavior To evaluate optimization efficiency, we compare the convergence speed of the Riemannian ADAM and trust- region optimizers in Fig. 5. For both the Ising and Heisenberg models, we optimize a quantum circuit com- posed of 11 layers. We enforce translational invariance across the circuit, meaning each layerℓis constructed by rep...
work page 2000
-
[3]
S. R. White, Density matrix formulation for quantum renormalization groups, Phys. Rev. Lett.69, 2863 (1992)
work page 1992
-
[4]
Schollw¨ ock, The density-matrix renormalization group in the age of matrix product states, Ann
U. Schollw¨ ock, The density-matrix renormalization group in the age of matrix product states, Ann. Phys.326, 96 (2011)
work page 2011
-
[5]
R. Or´ us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Ann. Phys.349, 117 (2014)
work page 2014
-
[6]
G. K.-L. Chan and S. Sharma, The density matrix renor- malization group in quantum chemistry, Annu. Rev. Phys. Chem.62, 465 (2011)
work page 2011
-
[7]
G. K. Chan, A. Keselman, N. Nakatani, Z. Li, and S. R. White, Matrix product operators, matrix product states, and ab initio density matrix renormalization group algo- rithms, J. Chem. Phys.145(2016)
work page 2016
-
[8]
E. Stoudenmire and D. J. Schwab, Supervised learning with tensor networks, Adv. Neural Inf. Process. Syst.29 (2016)
work page 2016
-
[9]
J. A. Reyes and E. M. Stoudenmire, Multi-scale tensor network architecture for machine learning, Mach. Learn.: Sci. Technol.2, 035036 (2021)
work page 2021
-
[10]
I. L. Markov and Y. Shi, Simulating quantum computa- tion by contracting tensor networks, SIAM J. Comput. 38, 963 (2008)
work page 2008
- [11]
-
[12]
J. Haegeman, J. I. Cirac, T. J. Osborne, I. Pizorn, H. Ver- schelde, and F. Verstraete, Time-dependent variational principle for quantum lattices, Phys. Rev. Lett.107, 070601 (2011)
work page 2011
-
[13]
J. Haegeman, C. Lubich, I. Oseledets, B. Vandereycken, and F. Verstraete, Unifying time evolution and optimiza- tion with matrix product states, Phys. Rev. B94, 165116 (2016)
work page 2016
-
[14]
V. Zauner-Stauber, L. Vanderstraeten, M. T. Fishman, F. Verstraete, and J. Haegeman, Variational optimization algorithms for uniform matrix product states, Phys. Rev. B97, 045145 (2018)
work page 2018
-
[15]
I. N. M. Le, S. Sun, and C. B. Mendl, Riemannian quan- tum circuit optimization based on matrix product oper- ators, Quantum9, 1833 (2025)
work page 2025
- [16]
- [17]
-
[18]
J. Haegeman, T. J. Osborne, and F. Verstraete, Post- matrix product state methods: To tangent space and beyond, Phys. Rev. B88, 075133 (2013). 12
work page 2013
-
[19]
L. Vanderstraeten, J. Haegeman, P. Corboz, and F. Ver- straete, Gradient methods for variational optimization of projected entangled-pair states, Phys. Rev. B94, 155123 (2016)
work page 2016
- [20]
-
[21]
F. Putterer, M. M. Zumpe, I. N. M. Le, Q. Huang, and C. B. Mendl, High-performance contraction of quan- tum circuits for Riemannian optimization, arXiv preprint arXiv:2506.23775 (2025)
-
[22]
arXiv preprint arXiv:2501.18691 , year=
M. Ben-Dov and J. Chen, Regularized second-order optimization of tensor-network Born machines, arXiv preprint arXiv:2501.18691 (2025)
- [23]
-
[24]
B. A. Pearlmutter, Fast exact multiplication by the Hes- sian, Neural Comput.6, 147 (1994)
work page 1994
-
[25]
A. Griewank and A. Walther,Evaluating derivatives: principles and techniques of algorithmic differentiation (SIAM, 2008)
work page 2008
-
[26]
H.-J. Liao, J.-G. Liu, L. Wang, and T. Xiang, Differ- entiable programming tensor networks, Phys. Rev. X9, 031041 (2019)
work page 2019
-
[27]
A. Francuz, N. Schuch, and B. Vanhecke, Stable and effi- cient differentiation of tensor network algorithms, Phys. Rev. Research7, 013237 (2025)
work page 2025
- [29]
-
[30]
M. C. Caro, H.-Y. Huang, N. Ezzell, J. Gibbs, A. T. Sornborger, L. Cincio, P. J. Coles, and Z. Holmes, Out- of-distribution generalization for learning quantum dy- namics, Nat. Commun.14, 3751 (2023)
work page 2023
-
[31]
J. Gibbs and L. Cincio, Deep circuit compression for quantum dynamics via tensor networks, Quantum9, 1789 (2025)
work page 2025
-
[32]
N. Robertson, A. Akhriev, J. Vala, and S. Zhuk, Ap- proximate quantum compiling for quantum simulation: A tensor network based approach, ACM Trans. Quan- tum Comput.6(2025)
work page 2025
- [33]
-
[34]
A short tutorial on Wirtinger Calculus with applications in quantum information,
M. D’Anna, Y. Zhang, R. Wiersema, M. S. Rudolph, and J. Carrasquilla, Circuit compression for 2d quantum dynamics, arXiv preprint arXiv:2312.04858 (2025)
-
[35]
I. A. Luchnikov, A. Ryzhov, S. N. Filippov, and H. Ouer- dane, QGOpt: Riemannian optimization for quantum technologies, SciPost Phys.10, 079 (2021)
work page 2021
-
[36]
I. A. Luchnikov, M. E. Krechetov, and S. N. Filippov, Riemannian geometry and automatic differentiation for optimization problems of quantum physics and quantum technologies, New J. Phys.23, 073006 (2021)
work page 2021
-
[37]
M. Herlihy, The art of multiprocessor programming, in Proceedings of the Twenty-Fifth Annual ACM Sympo- sium on Principles of Distributed Computing, PODC ’06 (Association for Computing Machinery, New York, NY, USA, 2006) p. 1–2
work page 2006
-
[38]
J. Gibbs and L. Cincio, Learning circuits with infi- nite tensor networks, arXiv preprint arXiv:2506.02105 (2025)
-
[39]
A. W. Sandvik and G. Vidal, Variational quantum Monte Carlo simulations with tensor-network states, Phys. Rev. Lett.99, 220602 (2007)
work page 2007
- [40]
- [41]
- [42]
- [43]
-
[44]
R. J. Webber and M. Lindsey, Rayleigh-Gauss-Newton optimization with enhanced sampling for variational Monte Carlo, Phys. Rev. Research4, 033099 (2022)
work page 2022
-
[45]
C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral op- erators, J. Res. Natl. Bur. Stand.45, 255 (1950)
work page 1950
-
[46]
ADAM implementation by qiskit v0.18 (2021)
work page 2021
-
[47]
trust-region implementation by rqcopt v1.0.0 (2022)
work page 2022
-
[48]
J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. Van- derPlas, S. Wanderman-Milne, and Q. Zhang, JAX: com- posable transformations of Python+NumPy programs (2018)
work page 2018
-
[49]
C. T. Kelley,Iterative methods for optimization(SIAM, 1999)
work page 1999
-
[50]
Hessian-vector products for tensor networks via recursive tangent-state propagation
J. Shewchuk,An introduction to the conjugate gradi- ent method without the agonizing pain, Technical Report CMU-CS-94-125 (School of Computer Science, Carnegie Mellon University, 1994). 1 Supplementary Material for: “Hessian-vector products for tensor networks via recursive tangent-state propagation” Appendix A: Complex derivatives In this section, we sum...
work page 1994
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.