pith. sign in

arxiv: 2309.06577 · v5 · submitted 2023-09-11 · 💻 cs.LG · quant-ph

Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Pith reviewed 2026-05-24 06:26 UTC · model grok-4.3

classification 💻 cs.LG quant-ph
keywords tensorized neural networkstensor networksinitialization algorithmsFrobenius normsmatrix product statestensor trainsmatrix product operators
0
0 comments X

The pith

Iterative partial norms allow finite initialization of tensorized neural network layers

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two algorithms for initializing layers in tensorized neural networks and tensor networks. These algorithms compute partial Frobenius norms and entrywise sums on subnetworks iteratively. This approach ensures normalization factors remain finite, avoiding the divergence or zero-norm issues that arise with full-network calculations. The method reuses intermediate results for efficiency and is demonstrated on matrix product state and matrix product operator layers, with analysis of scaling properties.

Core claim

The core discovery is that using norms of subnetworks iteratively normalizes tensor networks by finite values, enabling efficient initialization of tensorized layers where direct full-norm calculations fail due to divergence or vanishing.

What carries the argument

Iterative normalization using partial Frobenius norms of subnetworks, with reuse of intermediate calculations.

If this is right

  • Initialization becomes efficient for MPS/TT layers as the number of nodes increases.
  • Same for MPO/TT-M layers.
  • Scaling is characterized with respect to bond dimension and physical dimension.
  • Intermediate calculations are reused to reduce overall computation time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could extend to other tensor network formats not tested in the paper.
  • Improved initialization may lead to more stable training in tensorized models.
  • Partial norm techniques might address similar numerical issues in other high-dimensional representations.

Load-bearing premise

Partial norm calculations on subnetworks will always produce finite values and can be computed efficiently without the problems of the complete network.

What would settle it

Running the initialization on a network where the full Frobenius norm diverges or is zero, and checking if the partial method yields a usable finite scale factor and valid layer initialization.

Figures

Figures reproduced from arXiv: 2309.06577 by Aitor Moreno Fdez. de Leceta, Alejandro Mata Ali, I\~nigo Perez Delgado, Marina Ristol Roura.

Figure 1
Figure 1. Figure 1: FIG. 1. Arbitrary tensor network layer [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. a) Tensor Train layer with 5 indexes. b) Tensor Train [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Square of the Frobenius norm calculated to a) Tensor [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. a) PEPS layer with 9 nodes. b) Partial square norm [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. a) Tensor Train layer with 5 nodes. b) Partial square [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. a) Tensor Train Matrix layer with 5 nodes. b) Partial [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Lineal entrywise norm calculated to a) Tensor Train [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. a) Tensor Train layer with 5 nodes. b) Partial lineal [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. a) Tensor Train Matrix layer with 5 nodes. b) Partial [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10. a) PEPS layer with 9 nodes. b) Partial lineal norm [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: FIG. 11. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: FIG. 12. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: FIG. 13. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p008_13.png] view at source ↗
Figure 16
Figure 16. Figure 16: FIG. 16. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p009_16.png] view at source ↗
read the original abstract

We present two algorithms to initialize layers of tensorized neural networks and general tensor network algorithms using partial computations of their Frobenius norms and positive lineal entrywise sums, depending on the type of tensor network involved. The core of this method is the use of the norm of subnetworks of the tensor network in an iterative way, so that we normalize by the finite values of the norms that led to the divergence or zero norm. In addition, the method benefits from the reuse of intermediate calculations. We have also applied it to the Matrix Product State/Tensor Train (MPS/TT) and Matrix Product Operator/Tensor Train Matrix (MPO/TT-M) layers and have seen its scaling versus the number of nodes, bond dimension, and physical dimension. All code is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents two algorithms for initializing layers in tensorized neural networks and general tensor networks. The methods iteratively compute partial Frobenius norms and positive lineal entrywise sums on subnetworks to obtain finite normalizing factors that avoid the divergence or zero-norm problems of full-network calculations, with reuse of intermediate results. The algorithms are applied to MPS/TT and MPO/TT-M layers, with reported scaling behavior versus number of nodes, bond dimension, and physical dimension; all code is made publicly available.

Significance. If the initialization procedure can be shown to reliably produce finite normalizers across a broader range of tensor-network topologies, it would provide a practical tool for stable initialization of tensorized models, with the reuse of subnetwork calculations offering potential efficiency gains. The public release of code is a clear strength that supports reproducibility.

major comments (2)
  1. [Abstract] Abstract: the claim of applicability to 'general tensor networks' is not supported by any analysis or experiments on topologies containing cycles. Subnetwork norm computations can become intractable in the presence of cycles, directly undermining both the efficiency claim and the guarantee of finite outputs; all reported results are restricted to chain-like MPS/TT and MPO/TT-M structures.
  2. [Abstract] Abstract: no proofs, error bounds, or comparative benchmarks against existing initialization schemes are provided, leaving the central algorithmic claim without formal verification or quantitative assessment of numerical stability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the scope of our claims and the need for additional verification. We address the two major comments below and will make targeted revisions to the abstract and experimental section to better align the manuscript with the presented results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of applicability to 'general tensor networks' is not supported by any analysis or experiments on topologies containing cycles. Subnetwork norm computations can become intractable in the presence of cycles, directly undermining both the efficiency claim and the guarantee of finite outputs; all reported results are restricted to chain-like MPS/TT and MPO/TT-M structures.

    Authors: We agree that the abstract overstates the generality. The algorithms rely on efficient subnetwork norm computations that are tractable for acyclic (chain-like) topologies such as MPS/TT and MPO/TT-M, as demonstrated in our scaling experiments. For networks with cycles, the partial-norm approach can indeed become intractable due to the need to handle loops in the contraction order. We will revise the abstract to remove the unqualified reference to 'general tensor networks' and instead state that the methods are developed and tested for Matrix Product State/Tensor Train and Matrix Product Operator structures. A brief discussion of the limitation for cyclic topologies will be added to the introduction. revision: yes

  2. Referee: [Abstract] Abstract: no proofs, error bounds, or comparative benchmarks against existing initialization schemes are provided, leaving the central algorithmic claim without formal verification or quantitative assessment of numerical stability.

    Authors: The current manuscript focuses on the algorithmic procedure for obtaining finite normalizers via iterative partial norms and reports empirical scaling with respect to number of nodes, bond dimension, and physical dimension. No formal proofs or error bounds are derived, as the primary contribution is the reuse of intermediate subnetwork calculations to avoid divergence or zero-norm issues in practice. We acknowledge the absence of direct comparisons. In revision we will add a new experimental subsection that benchmarks our initialization against standard random initialization (and, where applicable, other layer-wise schemes) on the same MPS/TT and MPO/TT-M layers, measuring numerical stability via the frequency of finite-norm outcomes and downstream training convergence. This will provide the requested quantitative assessment without altering the core algorithmic focus. revision: partial

Circularity Check

0 steps flagged

No circularity: algorithmic construction is independent of fitted inputs or self-citations

full rationale

The paper presents two explicit algorithms for finite initialization of tensor networks via iterative partial Frobenius norms and entrywise sums on subnetworks. No equations or claims reduce a result to a fitted parameter, self-defined quantity, or load-bearing self-citation chain; the method is described as a direct computational procedure with reuse of intermediates, applied to MPS/TT and MPO/TT-M structures. The central claim rests on the efficiency and finiteness properties of subnetwork computations, which are presented as independent algorithmic facts rather than derived from the target initialization values themselves. This is a standard case of a self-contained algorithmic contribution with no reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard algebraic properties of the Frobenius norm and entrywise sums; no new free parameters, axioms beyond basic linear algebra, or invented entities are introduced.

axioms (1)
  • domain assumption Frobenius norms and positive entrywise sums of subnetworks can be computed independently and used for normalization without requiring the full network norm.
    Core of the iterative method described in the abstract.

pith-pipeline@v0.9.0 · 5680 in / 1283 out tokens · 23173 ms · 2026-05-24T06:26:55.576017+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 5 internal anchors

  1. [1]

    We initialize the node tensors with some initializa- tion method. We recommend random initialization with a Gaussian distribution of a constant stan- dard deviation (not greater than 0 .5) and a con- stant mean neither too high nor too low and posi- tive

  2. [2]

    If it is finite and non-zero, we divide each element of each node by ||A||F F 1/N and return A

    We compute ||A||F . If it is finite and non-zero, we divide each element of each node by ||A||F F 1/N and return A. Otherwise, we continue

  3. [3]

    (a) If it is infinite, we divide each element of the nodes of A by (10(1 + ξ))1/2N, being ξ a ran- dom number between 0 and 1, and return to Step 2

    We compute pF ||A||1,N. (a) If it is infinite, we divide each element of the nodes of A by (10(1 + ξ))1/2N, being ξ a ran- dom number between 0 and 1, and return to Step 2. (b) If it is zero, we multiply each element of the nodes of A by (10(1 + ξ))1/2N and return to Step 2. (c) Otherwise, we save this value as pF ||A||1,N and continue

  4. [4]

    (a) If it is infinite or zero, we divide each element of the nodes of A by ( pF ||A||n−1,N) 1 2N , and repeat Steps 2 and 4 (from this value of n)

    For n ∈ [2, N − 1], we compute pF ||A||n,N. (a) If it is infinite or zero, we divide each element of the nodes of A by ( pF ||A||n−1,N) 1 2N , and repeat Steps 2 and 4 (from this value of n). (b) If it is finite, but larger than b or smaller than a, we divide each element of the nodes of A by pF ||A||n,N F 1 2N , and repeat Steps 2 and 4 (from this value ...

  5. [5]

    We repeat the cycle until we obtain a valid A or we reach a stop condition, which entails repeating a certain maximum number of iterations

    If no partial square norm is outside the range, infi- nite or zero, we divide each element of the nodes of A by pF ||A||N −1,N F 1 2N , and repeat steps 2 and 5. We repeat the cycle until we obtain a valid A or we reach a stop condition, which entails repeating a certain maximum number of iterations. If we reach this last point, the protocol will have fai...

  6. [6]

    We initialize the node tensors with some initializa- 7 Node 2 Node 5Node 1 Node 9 Node 4 Node 8 Node 7 Node 3 Node 6 a) b) Node 1 c) Node 2 Node 1 d) Node 2 Node 5Node 1 Node 4 Node 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FIG. 10. a) PEPS layer with 9 nodes. b) Partial lineal norm at 1 node. c) Partial lineal norm at 2 nodes. d) Partial lineal norm at 5 nodes...

  7. [7]

    If it is finite and non-zero, we divide each element of each node by ||A||L F 1/N and return A

    We compute ||A||L. If it is finite and non-zero, we divide each element of each node by ||A||L F 1/N and return A. Otherwise, we continue

  8. [8]

    (a) If it is infinite, we divide each element of the nodes of A by (10(1+ξ))1/N, being ξ a random number between 0 and 1, and return to Step 2

    We compute pL||A||1,N. (a) If it is infinite, we divide each element of the nodes of A by (10(1+ξ))1/N, being ξ a random number between 0 and 1, and return to Step 2. (b) If it is zero, we multiply each element of the nodes of A by (10(1 + ξ))1/N and return to Step 2. (c) Otherwise, we save this value aspL||A||1,N and continue

  9. [9]

    (a) If it is infinite or zero, we divide each element of the nodes of A by (pL||A||n−1,N) 1 N , and re- peat Steps 2 and 4 (from this value of n)

    For n ∈ [2, N − 1], we compute pL||A||n,N. (a) If it is infinite or zero, we divide each element of the nodes of A by (pL||A||n−1,N) 1 N , and re- peat Steps 2 and 4 (from this value of n). (b) If it is finite, but larger than b or smaller than a, we divide each element of the nodes of A by pL||A||n,N F 1 N , and repeat Steps 2 and 4 (from this value of n...

  10. [10]

    As in the general case, we repeat the cycle until we reach a stop condition, which will be to have repeated a certain maximum number of iterations

    If no partial lineal norm is outside the range, infi- nite or zero, we divide each element of the nodes of A by pL||A||N −1,N F 1 N , and repeat steps 2 and 5. As in the general case, we repeat the cycle until we reach a stop condition, which will be to have repeated a certain maximum number of iterations. If we reach that point, the protocol will have fa...

  11. [11]

    Then, we check the number of steps against p for the same value of N = 25 and b = 10 in Fig. 12. Finally, we check the number of steps against b with the same value of N = 25 and p = 15 in Fig. 13. 5 10 15 20 25 30 Number of Nodes (N) 0 10 20 30 40 50Number of Steps p = 6 TT p = 6 TT-M p = 7 TT p = 7 TT-M p = 8 TT p = 8 TT-M p = 9 TT p = 9 TT-M p = 10 TT ...

  12. [12]

    ACKNOWLEDGMENTS The research leading to this paper has received funding from the Q4Real project (Quantum Computing for Real Industries), HAZITEK 2022, no

    to determine the appropriate decay factor and adapt it to quantum machine learning layers. ACKNOWLEDGMENTS The research leading to this paper has received funding from the Q4Real project (Quantum Computing for Real Industries), HAZITEK 2022, no. ZE-2022/00033

  13. [13]

    M. H. M. Noor and A. O. Ige, A survey on deep learning and state-of-the-art applications (2024), arXiv:2403.17561

  14. [14]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi` ere, N. Goyal, E. Ham- bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, Llama: Open and efficient foundation lan- guage models (2023), arXiv:2302.13971 [cs.CL]

  15. [15]

    Belis, P

    V. Belis, P. Odagiu, M. Grossi, F. Reiter, G. Dissertori, and S. Vallecorsa, Guided quantum compression for high dimensional data classification, Machine Learning: Sci- ence and Technology 5, 035010 (2024)

  16. [16]

    Tensor Networks in a Nutshell

    J. Biamonte and V. Bergholm, Tensor networks in a nut- shell (2017), arXiv:1708.00006

  17. [17]

    Or´ us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117–158 (2014)

    R. Or´ us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117–158 (2014)

  18. [18]

    Matrix Product State Representations

    D. Perez-Garcia, F. Verstraete, M. M. Wolf, and J. I. Cirac, Matrix product state representations (2007), arXiv:quant-ph/0608197 [quant-ph]

  19. [19]

    V. M. F. Verstraete and J. Cirac, Matrix prod- uct states, projected entangled pair states, and vari- ational renormalization group methods for quantum spin systems, Advances in Physics 57, 143 (2008), https://doi.org/10.1080/14789940801912366

  20. [20]

    Tensorizing Neural Networks

    A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, Tensorizing neural networks (2015), arXiv:1509.06569

  21. [21]

    Z.-F. Gao, S. Cheng, R.-Q. He, Z. Y. Xie, H.-H. Zhao, Z.- Y. Lu, and T. Xiang, Compressing deep neural networks by matrix product operators, Phys. Rev. Res. 2, 023300 (2020)

  22. [22]

    Y. Qing, K. Li, P.-F. Zhou, and S.-J. Ran, Compressing neural networks using tensor networks with exponentially fewer variational parameters, Intelligent Computing 4, 10.34133/icomputing.0123 (2025)

  23. [23]

    Singh, S

    S. Singh, S. S. Jahromi, and R. Orus, Tensor net- work compressibility of convolutional models (2024), arXiv:2403.14379 [cs.CV]

  24. [24]

    H. Li, J. Zhao, H. Huo, S. Fang, J. Chen, L. Yao, and Y. Hua, T3srs: Tensor train transformer for compressing sequential recommender systems, Expert Systems with Applications 238, 122260 (2024)

  25. [25]

    D. Lee, R. Yin, Y. Kim, A. Moitra, Y. Li, and P. Panda, Tt-snn: Tensor train decomposition for efficient spik- ing neural network training (2024), arXiv:2401.08001 [cs.NE]

  26. [26]

    Aizpurua, S

    B. Aizpurua, S. S. Jahromi, S. Singh, and R. Orus, Quan- tum large language models via tensor network disentan- glers (2024), arXiv:2410.17397 [quant-ph]

  27. [27]

    Tomut, S

    A. Tomut, S. S. Jahromi, A. Sarkar, U. Kurt, S. Singh, F. Ishtiaq, C. Mu˜ noz, P. S. Bajaj, A. Elborady, G. del 10 Bimbo, M. Alizadeh, D. Montero, P. Martin-Ramiro, M. Ibrahim, O. T. Alaoui, J. Malcolm, S. Mugel, and R. Orus, Compactifai: Extreme compression of large lan- guage models using quantum-inspired tensor networks (2024), arXiv:2401.14109 [cs.CL]

  28. [28]

    Qi, C.-H

    J. Qi, C.-H. H. Yang, P.-Y. Chen, and J. Tejedor, Exploit- ing low-rank tensor-train deep neural networks based on riemannian gradient descent with illustrations of speech processing (2022), arXiv:2203.06031

  29. [29]

    Blagoveschensky and A

    P. Blagoveschensky and A. H. Phan, Deep convolutional tensor network (2020), arXiv:2005.14506

  30. [30]

    S. K. Vemuri, T. B¨ uchner, J. Niebling, and J. Denzler, Functional tensor decompositions for physics-informed neural networks (2024), arXiv:2408.13101 [cs.LG]

  31. [31]

    Patel, C.-W

    R. Patel, C.-W. Hsing, S. Sahin, S. S. Jahromi, S. Palmer, S. Sharma, C. Michel, V. Porte, M. Abid, S. Aubert, P. Castellani, C.-G. Lee, S. Mugel, and R. Orus, Quantum-inspired tensor neural networks for partial dif- ferential equations (2022), arXiv:2208.02235

  32. [32]

    Barratt, J

    F. Barratt, J. Dborin, and L. Wright, Improvements to gradient descent methods for quantum tensor network machine learning (2022), arXiv:2203.03366

  33. [33]

    Glorot and Y

    X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Pro- ceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , Proceedings of Ma- chine Learning Research, Vol. 9, edited by Y. W. Teh and M. Titterington (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010) pp. 249–256

  34. [34]

    X. Tang, Y. Khoo, and L. Ying, Initialization and train- ing of matrix product state probabilistic models (2025), arXiv:2505.06419 [math.NA]

  35. [35]

    Oseledets and E

    I. Oseledets and E. Tyrtyshnikov, Tt-cross approxima- tion for multidimensional arrays, Linear Algebra and its Applications 432, 70 (2010)

  36. [36]

    How to generate random matrices from the classical compact groups

    F. Mezzadri, How to generate random matrices from the classical compact groups (2007), arXiv:math-ph/0609050 [math-ph]

  37. [37]

    Puljak, S

    E. Puljak, S. Sanchez-Ramirez, S. Masot-Llima, J. Vall` es- Muns, A. Garcia-Saez, and M. Pierini, tn4ml: Tensor network training and customization for machine learning (2025), arXiv:2502.13090 [cs.LG]

  38. [38]

    J. Wang, C. Roberts, G. Vidal, and S. Leichenauer, Anomaly detection with tensor networks (2020), arXiv:2006.02516

  39. [39]

    T. Hao, X. Huang, C. Jia, and C. Peng, A quantum- inspired tensor network algorithm for constrained combi- natorial optimization problems, Frontiers in Physics 10, 10.3389/fphy.2022.906590 (2022)