pith. sign in

arxiv: 2606.23274 · v1 · pith:TLD7TH22new · submitted 2026-06-22 · ⚛️ physics.comp-ph

Parallelized contraction of tensor trains or matrix product operators

Pith reviewed 2026-06-26 06:12 UTC · model grok-4.3

classification ⚛️ physics.comp-ph
keywords tensor trainsmatrix product operatorsfit algorithmMPO contractionparallelizationrandomized projectionsinverse canonical gauge
0
0 comments X

The pith

Inverse canonical gauge yields near-ideal parallel speedup for MPO-MPO fit contractions across all sizes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes two combined accelerations for the fit algorithm used to contract matrix product operators. First, an MPI parallelization scheme based on the inverse canonical gauge distributes work so that speedup stays close to ideal even for small problems. Second, randomized projections replace the usual two-site local updates with one-site updates while preserving the accuracy of the original two-site procedure. These changes lower the dominant costs in high-dimensional tensor contractions without requiring changes to the overall algorithm structure.

Core claim

We present two strategies for accelerating the fit algorithm, usable in combination: (1) MPI-based distributed-memory parallelization tailored for MPO-MPO contractions, employing the inverse canonical gauge, which yields near-ideal parallelization speedup across all problem sizes; and (2) randomized projections to reduce the cost of local updates from 2-site to 1-site costs while retaining 2-site accuracy, and to speed up contractions of environment tensors with MPO tensors.

What carries the argument

The inverse canonical gauge choice together with randomized projections inside the fit algorithm for contracting two MPOs.

If this is right

  • Near-ideal parallel speedup holds for any problem size when the inverse canonical gauge is used.
  • Local update cost drops from two-site to one-site scaling while the final contraction accuracy remains the same.
  • Site-canonical gauge still delivers excellent speedup once the problem is large enough to need multiple sweeps.
  • Environment-tensor contractions with MPO tensors become cheaper under the same randomized-projection treatment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same gauge and projection techniques could be tested on contractions involving more than two MPOs or on other tensor-network algorithms that rely on similar sweeps.
  • If the one-site randomized version matches two-site accuracy on a wider set of physical operators, existing MPO-based codes could adopt it with only local changes to the update step.
  • For very large problems the site-canonical gauge may become preferable once communication overhead dominates over the extra consistency work.

Load-bearing premise

Randomized projections preserve the accuracy of the original 2-site fit algorithm when applied to MPO-MPO contractions.

What would settle it

A side-by-side run on a small exactly solvable MPO-MPO contraction where the error or number of sweeps required with randomized projections exceeds the error obtained by the standard 2-site fit.

Figures

Figures reproduced from arXiv: 2606.23274 by Hiroshi Shinaoka, Jan von Delft, Marc K. Ritter, Simone Foder\`a.

Figure 1
Figure 1. Figure 1: Diagrammatic representation of the MPO A = QL ℓ=1 Aℓ. An MPO-MPO contraction C = AB is defined as the con￾traction of the output indices of B with the corresponding input indices of A, as schematically depicted in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Diagrammatic representation of the zip-up contraction. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Diagrammatic representations of (a) the contraction of two MPOs, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Diagrammatic representation of an MPO in (a) the Vidal canonical [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Diagrammatic representation of the conditions for (a) left orthogonality [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: (a,b) Diagrammatic depictions of the left and right sides of Eq. [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Construction of the left environment in the inverse canonical form. [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (a) Visual representation of the sizes of the matrices involved in the [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Computation of environment tensor using two separate randomized [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Illustrating steps (2) to (5) of the parallel fit algorithm, for an odd-even [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Accuracy (averaged out of 5 runs with di [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 14
Figure 14. Figure 14: Logarithmic plot of the runtime of fit algorithm on a of TTs with [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗
Figure 13
Figure 13. Figure 13: Speedup (averaged out of 5 runs with different seeds) obtained by the parallel fit algorithm. The top-left (a) and top-right (b) plots show runs for different amounts of χ, with a TT with L = 102 and Ns = 1. The bottom-left (c) and bottom-right (d) plots show runs for different amounts of sweeps Ns , with a TT with L = 102 and χ = 100. leads to an error of the same order of magnitude of the norm of the te… view at source ↗
read the original abstract

Tensor Trains (TT), also known as Matrix Product States (MPS) and Matrix Product Operators (MPO), provide a compact and structured representation for high-dimensional data and operators. One of the most expensive manipulations involving tensor trains is the contraction of two MPOs. A popular and accurate method for mitigating this cost is the fit algorithm. However, it is still comparatively costly since it involves 2-site updates. Moreover, the parallelization of the fit algorithm when used for MPO-MPO contractions has received comparatively little attention. In this work, we present two strategies for accelerating the fit algorithm, usable in combination: (1) We use MPI-based distributed-memory parallelization tailored for MPO-MPO contractions, employing one of two MPO gauge choices: (1a) the inverse canonical gauge, which yields near-ideal parallelization speedup across all problem sizes; and (1b) the site-canonical gauge, which avoids the inversion of singular values but requires extra computations to ensure global consistency, thus yielding excellent parallelization speedup only for large problems requiring several sweeps before convergence. (2) We use randomized projections to reduce the cost of local updates from 2-site to 1-site costs while retaining 2-site accuracy, and to speed up contractions of environment tensors with MPO tensors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes two strategies to accelerate the fit algorithm for contracting matrix product operators (MPOs): MPI-based distributed parallelization using either the inverse canonical gauge (near-ideal speedup across sizes) or site-canonical gauge (good speedup for large problems), and randomized projections that reduce local 2-site updates to 1-site costs while retaining 2-site accuracy and accelerating environment contractions.

Significance. If the accuracy-retention claim for randomized projections holds with supporting validation, the combined approach would offer practical speedups for MPO-MPO contractions in tensor-network applications; the parallelization strategy builds directly on established gauge choices without introducing new parameters.

major comments (1)
  1. [Abstract] Abstract: the central claim that randomized projections 'retain 2-site accuracy' for MPO-MPO contractions is asserted without error bounds, convergence rates, or numerical validation on operator contractions; this is load-bearing because the 1-site speedup is only useful if the accuracy guarantee transfers from the MPS-MPS case to the MPO setting with its distinct environment structure.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the need for stronger support of the randomized-projection claim in the MPO-MPO setting. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that randomized projections 'retain 2-site accuracy' for MPO-MPO contractions is asserted without error bounds, convergence rates, or numerical validation on operator contractions; this is load-bearing because the 1-site speedup is only useful if the accuracy guarantee transfers from the MPS-MPS case to the MPO setting with its distinct environment structure.

    Authors: We agree that the original manuscript asserted retention of 2-site accuracy for randomized projections without providing explicit error bounds, convergence rates, or numerical validation on MPO-MPO contractions. While the underlying randomized-projection technique is mathematically the same as in the MPS-MPS literature and the environment tensors share the same low-rank structure (differing only by the presence of operator indices), we did not demonstrate transfer of the accuracy guarantee in the operator setting. To address this, the revised manuscript will include (i) a short theoretical paragraph clarifying why the projection error analysis carries over to the MPO environment contractions and (ii) a new numerical subsection with direct comparisons of 2-site versus randomized 1-site updates on representative MPO-MPO problems, reporting relative errors, sweep convergence, and wall-time speedups. These additions will be placed in the results section and referenced from the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper describes algorithmic improvements (MPI parallelization with specific MPO gauges and randomized projections for 1-site updates) to the standard fit algorithm for MPO-MPO contractions. These build on established tensor network techniques without any claimed first-principles derivation, uniqueness theorem, or prediction that reduces by construction to fitted parameters or self-citations defined within the paper. Performance claims rest on implementation details and numerical tests rather than self-referential logic, satisfying the criteria for a non-circular methods contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities; all elements are standard in tensor network literature.

pith-pipeline@v0.9.1-grok · 5764 in / 942 out tokens · 19722 ms · 2026-06-26T06:12:17.720269+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 10 canonical work pages

  1. [1]

    I. V . Oseledets, Tensor-train decomposition, SIAM Journal on Scientific Computing 33 (5) (2011) 2295–2317. doi: 10.1137/090752286

  2. [2]

    B. N. Khoromskij, O(d log n)-quantics approximation of n-d tensors in high-dimensional numerical modeling, Springer Nature 34 (2011) 257–280. doi:10.1007/ s00365-011-9131-1. 11

  3. [3]

    M. K. Ritter, Y . Núñez Fernández, M. Wallerberger, J. von Delft, H. Shinaoka, X. Waintal, Quantics tensor cross in- terpolation for high-resolution parsimonious representa- tions of multivariate functions, Phys. Rev. Lett. 132 (2024) 056501.doi:10.1103/PhysRevLett.132.056501

  4. [4]

    Available: https://doi.org/10.1016/j.aop.2010.09.012

    U. Schollwöck, The density-matrix renormalization group in the age of matrix product states, Annals of Physics 326 (2011) 96–192.doi:10.1016/j.aop.2010.09.012

  5. [5]

    E. M. Stoudenmire, S. R. White, Minimally entangled typical thermal state algorithms, New Journal of Physics 12 (5) (2010) 055026. doi:10.1088/1367-2630/12/5/ 055026

  6. [6]

    B.-B. Chen, L. Chen, Z. Chen, W. Li, A. Weichselbaum, Exponential thermal tensor network approach for quantum lattice models, Phys. Rev. X 8 (2018) 031082. doi:10. 1103/PhysRevX.8.031082

  7. [7]

    Núñez Fernández, M

    Y . Núñez Fernández, M. K. Ritter, M. Jeannin, J.-W. Li, T. Kloss, T. Louvet, S. Terasaki, O. Parcollet, J. von Delft, H. Shinaoka, X. Waintal, Learning tensor net- works with tensor cross interpolation: New algorithms and libraries, SciPost Physics 18 (3) (Mar. 2025). doi: 10.21468/scipostphys.18.3.104

  8. [8]

    Verstraete, J

    F. Verstraete, J. I. Cirac, Renormalization algorithms for quantum-many body systems in two and higher di- mensions, arXiv: Strongly Correlated Electrons (2004). arXiv:cond-mat/0407066

  9. [9]

    Camano, E

    C. Camano, E. N. Epperly, J. A. Tropp, Successive ran- domized compression: A randomized algorithm for the compressed mpo-mps product, Quantum 10 (2025) 2022

  10. [10]

    Z. Meng, Y . Khoo, J. Li, E. M. Stoudenmire, Recursive sketched interpolation: Efficient hadamard products of tensor trains (2026).arXiv:2602.17974

  11. [11]

    M. K. Ritter, Fast elementwise operations on tensor trains with alternating cross interpolation (2026). arXiv:2604. 00037

  12. [12]

    E. M. Stoudenmire, S. R. White, Real-space parallel den- sity matrix renormalization group, Phys. Rev. B 87 (2013) 155137.doi:10.1103/PhysRevB.87.155137

  13. [13]

    Gleis, J.-W

    A. Gleis, J.-W. Li, J. von Delft, Controlled bond expan- sion for density matrix renormalization group ground state search at single-site costs, Phys. Rev. Lett. 130 (2023) 246402.doi:10.1103/PhysRevLett.130.246402

  14. [14]

    controlled bond expansion for density matrix renormalization group ground state search at single-site costs

    I. P. McCulloch, J. Osborne, Comment on "controlled bond expansion for density matrix renormalization group ground state search at single-site costs" (extended version) (2024). arXiv:2403.00562

  15. [15]

    Efficient classical simulation of slightly en- tangled quantum computations.Physical Review Letters, 91(14), 2003.doi:10.1103/physrevlett.91.147902

    G. Vidal, Efficient classical simulation of slightly entangled quantum computations, Phys. Rev. Lett. 91 (2003) 147902. doi:10.1103/PhysRevLett.91.147902

  16. [16]

    M. B. Hastings, An area law for one-dimensional quantum systems, Journal of Statistical Mechanics: Theory and Experiment 2007 (08) (2007) P08024–P08024. doi:10. 1088/1742-5468/2007/08/p08024

  17. [17]

    Martinsson, J

    P.-G. Martinsson, J. A. Tropp, Randomized nu- merical linear algebra: Foundations and algorithms, Acta Numerica 29 (2020) 403–572. doi:10.1017/ S0962492920000021

  18. [18]

    Halko, P

    N. Halko, P. G. Martinsson, J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review 53 (2) (2011) 217–288.doi:10.1137/090771806

  19. [19]

    Zhang, J

    C. Zhang, J. von Delft, Finite-temperature study of the hubbard model via enhanced exponential tensor renormal- ization group (2025).arXiv:2510.25022

  20. [20]

    Secular, N

    P. Secular, N. Gourianov, M. Lubasch, S. Dolgov, S. R. Clark, D. Jaksch, Parallel time-dependent variational prin- ciple algorithm for matrix product states, Phys. Rev. B 101 (2020) 235123.doi:10.1103/PhysRevB.101.235123

  21. [21]

    Tensor4all,https://tensor4all.org. 12