Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Aitor Moreno Fdez. de Leceta; Alejandro Mata Ali; I\~nigo Perez Delgado; Marina Ristol Roura

arxiv: 2309.06577 · v5 · submitted 2023-09-11 · 💻 cs.LG · quant-ph

Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Alejandro Mata Ali , I\~nigo Perez Delgado , Marina Ristol Roura , Aitor Moreno Fdez. de Leceta This is my paper

Pith reviewed 2026-05-24 06:26 UTC · model grok-4.3

classification 💻 cs.LG quant-ph

keywords tensorized neural networkstensor networksinitialization algorithmsFrobenius normsmatrix product statestensor trainsmatrix product operators

0 comments

The pith

Iterative partial norms allow finite initialization of tensorized neural network layers

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two algorithms for initializing layers in tensorized neural networks and tensor networks. These algorithms compute partial Frobenius norms and entrywise sums on subnetworks iteratively. This approach ensures normalization factors remain finite, avoiding the divergence or zero-norm issues that arise with full-network calculations. The method reuses intermediate results for efficiency and is demonstrated on matrix product state and matrix product operator layers, with analysis of scaling properties.

Core claim

The core discovery is that using norms of subnetworks iteratively normalizes tensor networks by finite values, enabling efficient initialization of tensorized layers where direct full-norm calculations fail due to divergence or vanishing.

What carries the argument

Iterative normalization using partial Frobenius norms of subnetworks, with reuse of intermediate calculations.

If this is right

Initialization becomes efficient for MPS/TT layers as the number of nodes increases.
Same for MPO/TT-M layers.
Scaling is characterized with respect to bond dimension and physical dimension.
Intermediate calculations are reused to reduce overall computation time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could extend to other tensor network formats not tested in the paper.
Improved initialization may lead to more stable training in tensorized models.
Partial norm techniques might address similar numerical issues in other high-dimensional representations.

Load-bearing premise

Partial norm calculations on subnetworks will always produce finite values and can be computed efficiently without the problems of the complete network.

What would settle it

Running the initialization on a network where the full Frobenius norm diverges or is zero, and checking if the partial method yields a usable finite scale factor and valid layer initialization.

Figures

Figures reproduced from arXiv: 2309.06577 by Aitor Moreno Fdez. de Leceta, Alejandro Mata Ali, I\~nigo Perez Delgado, Marina Ristol Roura.

**Figure 2.** Figure 2: FIG. 2. a) Tensor Train layer with 5 indexes. b) Tensor Train [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Square of the Frobenius norm calculated to a) Tensor [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. a) PEPS layer with 9 nodes. b) Partial square norm [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. a) Tensor Train layer with 5 nodes. b) Partial square [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. a) Tensor Train Matrix layer with 5 nodes. b) Partial [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Lineal entrywise norm calculated to a) Tensor Train [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. a) Tensor Train layer with 5 nodes. b) Partial lineal [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. a) Tensor Train Matrix layer with 5 nodes. b) Partial [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 10.** Figure 10: FIG. 10. a) PEPS layer with 9 nodes. b) Partial lineal norm [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: FIG. 11. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗

**Figure 12.** Figure 12: FIG. 12. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

**Figure 13.** Figure 13: FIG. 13. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p008_13.png] view at source ↗

**Figure 16.** Figure 16: FIG. 16. Number of steps vs [PITH_FULL_IMAGE:figures/full_fig_p009_16.png] view at source ↗

read the original abstract

We present two algorithms to initialize layers of tensorized neural networks and general tensor network algorithms using partial computations of their Frobenius norms and positive lineal entrywise sums, depending on the type of tensor network involved. The core of this method is the use of the norm of subnetworks of the tensor network in an iterative way, so that we normalize by the finite values of the norms that led to the divergence or zero norm. In addition, the method benefits from the reuse of intermediate calculations. We have also applied it to the Matrix Product State/Tensor Train (MPS/TT) and Matrix Product Operator/Tensor Train Matrix (MPO/TT-M) layers and have seen its scaling versus the number of nodes, bond dimension, and physical dimension. All code is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two concrete iterative algorithms for finite initialization of MPS/TT and MPO/TT-M layers via partial subnetwork norms, with public code and scaling data, but no general proof or tests beyond chains.

read the letter

The main takeaway is that the authors worked out two specific iterative procedures that normalize tensorized layers by computing Frobenius norms and entrywise sums only on chosen subnetworks. This keeps the scaling factors finite when a full-network norm would hit zero or infinity. They apply the method to MPS/TT and MPO/TT-M, reuse intermediate results, plot the cost against number of nodes, bond dimension, and physical dimension, and release the code publicly.

Referee Report

2 major / 0 minor

Summary. The paper presents two algorithms for initializing layers in tensorized neural networks and general tensor networks. The methods iteratively compute partial Frobenius norms and positive lineal entrywise sums on subnetworks to obtain finite normalizing factors that avoid the divergence or zero-norm problems of full-network calculations, with reuse of intermediate results. The algorithms are applied to MPS/TT and MPO/TT-M layers, with reported scaling behavior versus number of nodes, bond dimension, and physical dimension; all code is made publicly available.

Significance. If the initialization procedure can be shown to reliably produce finite normalizers across a broader range of tensor-network topologies, it would provide a practical tool for stable initialization of tensorized models, with the reuse of subnetwork calculations offering potential efficiency gains. The public release of code is a clear strength that supports reproducibility.

major comments (2)

[Abstract] Abstract: the claim of applicability to 'general tensor networks' is not supported by any analysis or experiments on topologies containing cycles. Subnetwork norm computations can become intractable in the presence of cycles, directly undermining both the efficiency claim and the guarantee of finite outputs; all reported results are restricted to chain-like MPS/TT and MPO/TT-M structures.
[Abstract] Abstract: no proofs, error bounds, or comparative benchmarks against existing initialization schemes are provided, leaving the central algorithmic claim without formal verification or quantitative assessment of numerical stability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the scope of our claims and the need for additional verification. We address the two major comments below and will make targeted revisions to the abstract and experimental section to better align the manuscript with the presented results.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of applicability to 'general tensor networks' is not supported by any analysis or experiments on topologies containing cycles. Subnetwork norm computations can become intractable in the presence of cycles, directly undermining both the efficiency claim and the guarantee of finite outputs; all reported results are restricted to chain-like MPS/TT and MPO/TT-M structures.

Authors: We agree that the abstract overstates the generality. The algorithms rely on efficient subnetwork norm computations that are tractable for acyclic (chain-like) topologies such as MPS/TT and MPO/TT-M, as demonstrated in our scaling experiments. For networks with cycles, the partial-norm approach can indeed become intractable due to the need to handle loops in the contraction order. We will revise the abstract to remove the unqualified reference to 'general tensor networks' and instead state that the methods are developed and tested for Matrix Product State/Tensor Train and Matrix Product Operator structures. A brief discussion of the limitation for cyclic topologies will be added to the introduction. revision: yes
Referee: [Abstract] Abstract: no proofs, error bounds, or comparative benchmarks against existing initialization schemes are provided, leaving the central algorithmic claim without formal verification or quantitative assessment of numerical stability.

Authors: The current manuscript focuses on the algorithmic procedure for obtaining finite normalizers via iterative partial norms and reports empirical scaling with respect to number of nodes, bond dimension, and physical dimension. No formal proofs or error bounds are derived, as the primary contribution is the reuse of intermediate subnetwork calculations to avoid divergence or zero-norm issues in practice. We acknowledge the absence of direct comparisons. In revision we will add a new experimental subsection that benchmarks our initialization against standard random initialization (and, where applicable, other layer-wise schemes) on the same MPS/TT and MPO/TT-M layers, measuring numerical stability via the frequency of finite-norm outcomes and downstream training convergence. This will provide the requested quantitative assessment without altering the core algorithmic focus. revision: partial

Circularity Check

0 steps flagged

No circularity: algorithmic construction is independent of fitted inputs or self-citations

full rationale

The paper presents two explicit algorithms for finite initialization of tensor networks via iterative partial Frobenius norms and entrywise sums on subnetworks. No equations or claims reduce a result to a fitted parameter, self-defined quantity, or load-bearing self-citation chain; the method is described as a direct computational procedure with reuse of intermediates, applied to MPS/TT and MPO/TT-M structures. The central claim rests on the efficiency and finiteness properties of subnetwork computations, which are presented as independent algorithmic facts rather than derived from the target initialization values themselves. This is a standard case of a self-contained algorithmic contribution with no reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard algebraic properties of the Frobenius norm and entrywise sums; no new free parameters, axioms beyond basic linear algebra, or invented entities are introduced.

axioms (1)

domain assumption Frobenius norms and positive entrywise sums of subnetworks can be computed independently and used for normalization without requiring the full network norm.
Core of the iterative method described in the abstract.

pith-pipeline@v0.9.0 · 5680 in / 1283 out tokens · 23173 ms · 2026-05-24T06:26:55.576017+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 5 internal anchors

[1]

We initialize the node tensors with some initializa- tion method. We recommend random initialization with a Gaussian distribution of a constant stan- dard deviation (not greater than 0 .5) and a con- stant mean neither too high nor too low and posi- tive

work page
[2]

If it is finite and non-zero, we divide each element of each node by ||A||F F 1/N and return A

We compute ||A||F . If it is finite and non-zero, we divide each element of each node by ||A||F F 1/N and return A. Otherwise, we continue

work page
[3]

(a) If it is infinite, we divide each element of the nodes of A by (10(1 + ξ))1/2N, being ξ a ran- dom number between 0 and 1, and return to Step 2

We compute pF ||A||1,N. (a) If it is infinite, we divide each element of the nodes of A by (10(1 + ξ))1/2N, being ξ a ran- dom number between 0 and 1, and return to Step 2. (b) If it is zero, we multiply each element of the nodes of A by (10(1 + ξ))1/2N and return to Step 2. (c) Otherwise, we save this value as pF ||A||1,N and continue

work page
[4]

(a) If it is infinite or zero, we divide each element of the nodes of A by ( pF ||A||n−1,N) 1 2N , and repeat Steps 2 and 4 (from this value of n)

For n ∈ [2, N − 1], we compute pF ||A||n,N. (a) If it is infinite or zero, we divide each element of the nodes of A by ( pF ||A||n−1,N) 1 2N , and repeat Steps 2 and 4 (from this value of n). (b) If it is finite, but larger than b or smaller than a, we divide each element of the nodes of A by pF ||A||n,N F 1 2N , and repeat Steps 2 and 4 (from this value ...

work page
[5]

We repeat the cycle until we obtain a valid A or we reach a stop condition, which entails repeating a certain maximum number of iterations

If no partial square norm is outside the range, infi- nite or zero, we divide each element of the nodes of A by pF ||A||N −1,N F 1 2N , and repeat steps 2 and 5. We repeat the cycle until we obtain a valid A or we reach a stop condition, which entails repeating a certain maximum number of iterations. If we reach this last point, the protocol will have fai...

work page
[6]

We initialize the node tensors with some initializa- 7 Node 2 Node 5Node 1 Node 9 Node 4 Node 8 Node 7 Node 3 Node 6 a) b) Node 1 c) Node 2 Node 1 d) Node 2 Node 5Node 1 Node 4 Node 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FIG. 10. a) PEPS layer with 9 nodes. b) Partial lineal norm at 1 node. c) Partial lineal norm at 2 nodes. d) Partial lineal norm at 5 nodes...

work page
[7]

If it is finite and non-zero, we divide each element of each node by ||A||L F 1/N and return A

We compute ||A||L. If it is finite and non-zero, we divide each element of each node by ||A||L F 1/N and return A. Otherwise, we continue

work page
[8]

(a) If it is infinite, we divide each element of the nodes of A by (10(1+ξ))1/N, being ξ a random number between 0 and 1, and return to Step 2

We compute pL||A||1,N. (a) If it is infinite, we divide each element of the nodes of A by (10(1+ξ))1/N, being ξ a random number between 0 and 1, and return to Step 2. (b) If it is zero, we multiply each element of the nodes of A by (10(1 + ξ))1/N and return to Step 2. (c) Otherwise, we save this value aspL||A||1,N and continue

work page
[9]

(a) If it is infinite or zero, we divide each element of the nodes of A by (pL||A||n−1,N) 1 N , and re- peat Steps 2 and 4 (from this value of n)

For n ∈ [2, N − 1], we compute pL||A||n,N. (a) If it is infinite or zero, we divide each element of the nodes of A by (pL||A||n−1,N) 1 N , and re- peat Steps 2 and 4 (from this value of n). (b) If it is finite, but larger than b or smaller than a, we divide each element of the nodes of A by pL||A||n,N F 1 N , and repeat Steps 2 and 4 (from this value of n...

work page
[10]

As in the general case, we repeat the cycle until we reach a stop condition, which will be to have repeated a certain maximum number of iterations

If no partial lineal norm is outside the range, infi- nite or zero, we divide each element of the nodes of A by pL||A||N −1,N F 1 N , and repeat steps 2 and 5. As in the general case, we repeat the cycle until we reach a stop condition, which will be to have repeated a certain maximum number of iterations. If we reach that point, the protocol will have fa...

work page
[11]

Then, we check the number of steps against p for the same value of N = 25 and b = 10 in Fig. 12. Finally, we check the number of steps against b with the same value of N = 25 and p = 15 in Fig. 13. 5 10 15 20 25 30 Number of Nodes (N) 0 10 20 30 40 50Number of Steps p = 6 TT p = 6 TT-M p = 7 TT p = 7 TT-M p = 8 TT p = 8 TT-M p = 9 TT p = 9 TT-M p = 10 TT ...

work page
[12]

ACKNOWLEDGMENTS The research leading to this paper has received funding from the Q4Real project (Quantum Computing for Real Industries), HAZITEK 2022, no

to determine the appropriate decay factor and adapt it to quantum machine learning layers. ACKNOWLEDGMENTS The research leading to this paper has received funding from the Q4Real project (Quantum Computing for Real Industries), HAZITEK 2022, no. ZE-2022/00033

work page 2022
[13]

M. H. M. Noor and A. O. Ige, A survey on deep learning and state-of-the-art applications (2024), arXiv:2403.17561

work page arXiv 2024
[14]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi` ere, N. Goyal, E. Ham- bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, Llama: Open and efficient foundation lan- guage models (2023), arXiv:2302.13971 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Belis, P

V. Belis, P. Odagiu, M. Grossi, F. Reiter, G. Dissertori, and S. Vallecorsa, Guided quantum compression for high dimensional data classification, Machine Learning: Sci- ence and Technology 5, 035010 (2024)

work page 2024
[16]

Tensor Networks in a Nutshell

J. Biamonte and V. Bergholm, Tensor networks in a nut- shell (2017), arXiv:1708.00006

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Or´ us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117–158 (2014)

R. Or´ us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117–158 (2014)

work page 2014
[18]

Matrix Product State Representations

D. Perez-Garcia, F. Verstraete, M. M. Wolf, and J. I. Cirac, Matrix product state representations (2007), arXiv:quant-ph/0608197 [quant-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2007
[19]

V. M. F. Verstraete and J. Cirac, Matrix prod- uct states, projected entangled pair states, and vari- ational renormalization group methods for quantum spin systems, Advances in Physics 57, 143 (2008), https://doi.org/10.1080/14789940801912366

work page doi:10.1080/14789940801912366 2008
[20]

Tensorizing Neural Networks

A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, Tensorizing neural networks (2015), arXiv:1509.06569

work page internal anchor Pith review Pith/arXiv arXiv 2015
[21]

Z.-F. Gao, S. Cheng, R.-Q. He, Z. Y. Xie, H.-H. Zhao, Z.- Y. Lu, and T. Xiang, Compressing deep neural networks by matrix product operators, Phys. Rev. Res. 2, 023300 (2020)

work page 2020
[22]

Y. Qing, K. Li, P.-F. Zhou, and S.-J. Ran, Compressing neural networks using tensor networks with exponentially fewer variational parameters, Intelligent Computing 4, 10.34133/icomputing.0123 (2025)

work page doi:10.34133/icomputing.0123 2025
[23]

Singh, S

S. Singh, S. S. Jahromi, and R. Orus, Tensor net- work compressibility of convolutional models (2024), arXiv:2403.14379 [cs.CV]

work page arXiv 2024
[24]

H. Li, J. Zhao, H. Huo, S. Fang, J. Chen, L. Yao, and Y. Hua, T3srs: Tensor train transformer for compressing sequential recommender systems, Expert Systems with Applications 238, 122260 (2024)

work page 2024
[25]

D. Lee, R. Yin, Y. Kim, A. Moitra, Y. Li, and P. Panda, Tt-snn: Tensor train decomposition for efficient spik- ing neural network training (2024), arXiv:2401.08001 [cs.NE]

work page arXiv 2024
[26]

Aizpurua, S

B. Aizpurua, S. S. Jahromi, S. Singh, and R. Orus, Quan- tum large language models via tensor network disentan- glers (2024), arXiv:2410.17397 [quant-ph]

work page arXiv 2024
[27]

Tomut, S

A. Tomut, S. S. Jahromi, A. Sarkar, U. Kurt, S. Singh, F. Ishtiaq, C. Mu˜ noz, P. S. Bajaj, A. Elborady, G. del 10 Bimbo, M. Alizadeh, D. Montero, P. Martin-Ramiro, M. Ibrahim, O. T. Alaoui, J. Malcolm, S. Mugel, and R. Orus, Compactifai: Extreme compression of large lan- guage models using quantum-inspired tensor networks (2024), arXiv:2401.14109 [cs.CL]

work page arXiv 2024
[28]

Qi, C.-H

J. Qi, C.-H. H. Yang, P.-Y. Chen, and J. Tejedor, Exploit- ing low-rank tensor-train deep neural networks based on riemannian gradient descent with illustrations of speech processing (2022), arXiv:2203.06031

work page arXiv 2022
[29]

Blagoveschensky and A

P. Blagoveschensky and A. H. Phan, Deep convolutional tensor network (2020), arXiv:2005.14506

work page arXiv 2020
[30]

S. K. Vemuri, T. B¨ uchner, J. Niebling, and J. Denzler, Functional tensor decompositions for physics-informed neural networks (2024), arXiv:2408.13101 [cs.LG]

work page arXiv 2024
[31]

Patel, C.-W

R. Patel, C.-W. Hsing, S. Sahin, S. S. Jahromi, S. Palmer, S. Sharma, C. Michel, V. Porte, M. Abid, S. Aubert, P. Castellani, C.-G. Lee, S. Mugel, and R. Orus, Quantum-inspired tensor neural networks for partial dif- ferential equations (2022), arXiv:2208.02235

work page arXiv 2022
[32]

Barratt, J

F. Barratt, J. Dborin, and L. Wright, Improvements to gradient descent methods for quantum tensor network machine learning (2022), arXiv:2203.03366

work page arXiv 2022
[33]

Glorot and Y

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Pro- ceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , Proceedings of Ma- chine Learning Research, Vol. 9, edited by Y. W. Teh and M. Titterington (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010) pp. 249–256

work page 2010
[34]

X. Tang, Y. Khoo, and L. Ying, Initialization and train- ing of matrix product state probabilistic models (2025), arXiv:2505.06419 [math.NA]

work page arXiv 2025
[35]

Oseledets and E

I. Oseledets and E. Tyrtyshnikov, Tt-cross approxima- tion for multidimensional arrays, Linear Algebra and its Applications 432, 70 (2010)

work page 2010
[36]

How to generate random matrices from the classical compact groups

F. Mezzadri, How to generate random matrices from the classical compact groups (2007), arXiv:math-ph/0609050 [math-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2007
[37]

Puljak, S

E. Puljak, S. Sanchez-Ramirez, S. Masot-Llima, J. Vall` es- Muns, A. Garcia-Saez, and M. Pierini, tn4ml: Tensor network training and customization for machine learning (2025), arXiv:2502.13090 [cs.LG]

work page arXiv 2025
[38]

J. Wang, C. Roberts, G. Vidal, and S. Leichenauer, Anomaly detection with tensor networks (2020), arXiv:2006.02516

work page arXiv 2020
[39]

T. Hao, X. Huang, C. Jia, and C. Peng, A quantum- inspired tensor network algorithm for constrained combi- natorial optimization problems, Frontiers in Physics 10, 10.3389/fphy.2022.906590 (2022)

work page doi:10.3389/fphy.2022.906590 2022

[1] [1]

We initialize the node tensors with some initializa- tion method. We recommend random initialization with a Gaussian distribution of a constant stan- dard deviation (not greater than 0 .5) and a con- stant mean neither too high nor too low and posi- tive

work page

[2] [2]

If it is finite and non-zero, we divide each element of each node by ||A||F F 1/N and return A

We compute ||A||F . If it is finite and non-zero, we divide each element of each node by ||A||F F 1/N and return A. Otherwise, we continue

work page

[3] [3]

(a) If it is infinite, we divide each element of the nodes of A by (10(1 + ξ))1/2N, being ξ a ran- dom number between 0 and 1, and return to Step 2

We compute pF ||A||1,N. (a) If it is infinite, we divide each element of the nodes of A by (10(1 + ξ))1/2N, being ξ a ran- dom number between 0 and 1, and return to Step 2. (b) If it is zero, we multiply each element of the nodes of A by (10(1 + ξ))1/2N and return to Step 2. (c) Otherwise, we save this value as pF ||A||1,N and continue

work page

[4] [4]

(a) If it is infinite or zero, we divide each element of the nodes of A by ( pF ||A||n−1,N) 1 2N , and repeat Steps 2 and 4 (from this value of n)

For n ∈ [2, N − 1], we compute pF ||A||n,N. (a) If it is infinite or zero, we divide each element of the nodes of A by ( pF ||A||n−1,N) 1 2N , and repeat Steps 2 and 4 (from this value of n). (b) If it is finite, but larger than b or smaller than a, we divide each element of the nodes of A by pF ||A||n,N F 1 2N , and repeat Steps 2 and 4 (from this value ...

work page

[5] [5]

We repeat the cycle until we obtain a valid A or we reach a stop condition, which entails repeating a certain maximum number of iterations

If no partial square norm is outside the range, infi- nite or zero, we divide each element of the nodes of A by pF ||A||N −1,N F 1 2N , and repeat steps 2 and 5. We repeat the cycle until we obtain a valid A or we reach a stop condition, which entails repeating a certain maximum number of iterations. If we reach this last point, the protocol will have fai...

work page

[6] [6]

We initialize the node tensors with some initializa- 7 Node 2 Node 5Node 1 Node 9 Node 4 Node 8 Node 7 Node 3 Node 6 a) b) Node 1 c) Node 2 Node 1 d) Node 2 Node 5Node 1 Node 4 Node 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FIG. 10. a) PEPS layer with 9 nodes. b) Partial lineal norm at 1 node. c) Partial lineal norm at 2 nodes. d) Partial lineal norm at 5 nodes...

work page

[7] [7]

If it is finite and non-zero, we divide each element of each node by ||A||L F 1/N and return A

We compute ||A||L. If it is finite and non-zero, we divide each element of each node by ||A||L F 1/N and return A. Otherwise, we continue

work page

[8] [8]

(a) If it is infinite, we divide each element of the nodes of A by (10(1+ξ))1/N, being ξ a random number between 0 and 1, and return to Step 2

We compute pL||A||1,N. (a) If it is infinite, we divide each element of the nodes of A by (10(1+ξ))1/N, being ξ a random number between 0 and 1, and return to Step 2. (b) If it is zero, we multiply each element of the nodes of A by (10(1 + ξ))1/N and return to Step 2. (c) Otherwise, we save this value aspL||A||1,N and continue

work page

[9] [9]

(a) If it is infinite or zero, we divide each element of the nodes of A by (pL||A||n−1,N) 1 N , and re- peat Steps 2 and 4 (from this value of n)

For n ∈ [2, N − 1], we compute pL||A||n,N. (a) If it is infinite or zero, we divide each element of the nodes of A by (pL||A||n−1,N) 1 N , and re- peat Steps 2 and 4 (from this value of n). (b) If it is finite, but larger than b or smaller than a, we divide each element of the nodes of A by pL||A||n,N F 1 N , and repeat Steps 2 and 4 (from this value of n...

work page

[10] [10]

As in the general case, we repeat the cycle until we reach a stop condition, which will be to have repeated a certain maximum number of iterations

If no partial lineal norm is outside the range, infi- nite or zero, we divide each element of the nodes of A by pL||A||N −1,N F 1 N , and repeat steps 2 and 5. As in the general case, we repeat the cycle until we reach a stop condition, which will be to have repeated a certain maximum number of iterations. If we reach that point, the protocol will have fa...

work page

[11] [11]

Then, we check the number of steps against p for the same value of N = 25 and b = 10 in Fig. 12. Finally, we check the number of steps against b with the same value of N = 25 and p = 15 in Fig. 13. 5 10 15 20 25 30 Number of Nodes (N) 0 10 20 30 40 50Number of Steps p = 6 TT p = 6 TT-M p = 7 TT p = 7 TT-M p = 8 TT p = 8 TT-M p = 9 TT p = 9 TT-M p = 10 TT ...

work page

[12] [12]

ACKNOWLEDGMENTS The research leading to this paper has received funding from the Q4Real project (Quantum Computing for Real Industries), HAZITEK 2022, no

to determine the appropriate decay factor and adapt it to quantum machine learning layers. ACKNOWLEDGMENTS The research leading to this paper has received funding from the Q4Real project (Quantum Computing for Real Industries), HAZITEK 2022, no. ZE-2022/00033

work page 2022

[13] [13]

M. H. M. Noor and A. O. Ige, A survey on deep learning and state-of-the-art applications (2024), arXiv:2403.17561

work page arXiv 2024

[14] [14]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi` ere, N. Goyal, E. Ham- bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, Llama: Open and efficient foundation lan- guage models (2023), arXiv:2302.13971 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Belis, P

V. Belis, P. Odagiu, M. Grossi, F. Reiter, G. Dissertori, and S. Vallecorsa, Guided quantum compression for high dimensional data classification, Machine Learning: Sci- ence and Technology 5, 035010 (2024)

work page 2024

[16] [16]

Tensor Networks in a Nutshell

J. Biamonte and V. Bergholm, Tensor networks in a nut- shell (2017), arXiv:1708.00006

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

Or´ us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117–158 (2014)

R. Or´ us, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117–158 (2014)

work page 2014

[18] [18]

Matrix Product State Representations

D. Perez-Garcia, F. Verstraete, M. M. Wolf, and J. I. Cirac, Matrix product state representations (2007), arXiv:quant-ph/0608197 [quant-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2007

[19] [19]

V. M. F. Verstraete and J. Cirac, Matrix prod- uct states, projected entangled pair states, and vari- ational renormalization group methods for quantum spin systems, Advances in Physics 57, 143 (2008), https://doi.org/10.1080/14789940801912366

work page doi:10.1080/14789940801912366 2008

[20] [20]

Tensorizing Neural Networks

A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, Tensorizing neural networks (2015), arXiv:1509.06569

work page internal anchor Pith review Pith/arXiv arXiv 2015

[21] [21]

Z.-F. Gao, S. Cheng, R.-Q. He, Z. Y. Xie, H.-H. Zhao, Z.- Y. Lu, and T. Xiang, Compressing deep neural networks by matrix product operators, Phys. Rev. Res. 2, 023300 (2020)

work page 2020

[22] [22]

Y. Qing, K. Li, P.-F. Zhou, and S.-J. Ran, Compressing neural networks using tensor networks with exponentially fewer variational parameters, Intelligent Computing 4, 10.34133/icomputing.0123 (2025)

work page doi:10.34133/icomputing.0123 2025

[23] [23]

Singh, S

S. Singh, S. S. Jahromi, and R. Orus, Tensor net- work compressibility of convolutional models (2024), arXiv:2403.14379 [cs.CV]

work page arXiv 2024

[24] [24]

H. Li, J. Zhao, H. Huo, S. Fang, J. Chen, L. Yao, and Y. Hua, T3srs: Tensor train transformer for compressing sequential recommender systems, Expert Systems with Applications 238, 122260 (2024)

work page 2024

[25] [25]

D. Lee, R. Yin, Y. Kim, A. Moitra, Y. Li, and P. Panda, Tt-snn: Tensor train decomposition for efficient spik- ing neural network training (2024), arXiv:2401.08001 [cs.NE]

work page arXiv 2024

[26] [26]

Aizpurua, S

B. Aizpurua, S. S. Jahromi, S. Singh, and R. Orus, Quan- tum large language models via tensor network disentan- glers (2024), arXiv:2410.17397 [quant-ph]

work page arXiv 2024

[27] [27]

Tomut, S

A. Tomut, S. S. Jahromi, A. Sarkar, U. Kurt, S. Singh, F. Ishtiaq, C. Mu˜ noz, P. S. Bajaj, A. Elborady, G. del 10 Bimbo, M. Alizadeh, D. Montero, P. Martin-Ramiro, M. Ibrahim, O. T. Alaoui, J. Malcolm, S. Mugel, and R. Orus, Compactifai: Extreme compression of large lan- guage models using quantum-inspired tensor networks (2024), arXiv:2401.14109 [cs.CL]

work page arXiv 2024

[28] [28]

Qi, C.-H

J. Qi, C.-H. H. Yang, P.-Y. Chen, and J. Tejedor, Exploit- ing low-rank tensor-train deep neural networks based on riemannian gradient descent with illustrations of speech processing (2022), arXiv:2203.06031

work page arXiv 2022

[29] [29]

Blagoveschensky and A

P. Blagoveschensky and A. H. Phan, Deep convolutional tensor network (2020), arXiv:2005.14506

work page arXiv 2020

[30] [30]

S. K. Vemuri, T. B¨ uchner, J. Niebling, and J. Denzler, Functional tensor decompositions for physics-informed neural networks (2024), arXiv:2408.13101 [cs.LG]

work page arXiv 2024

[31] [31]

Patel, C.-W

R. Patel, C.-W. Hsing, S. Sahin, S. S. Jahromi, S. Palmer, S. Sharma, C. Michel, V. Porte, M. Abid, S. Aubert, P. Castellani, C.-G. Lee, S. Mugel, and R. Orus, Quantum-inspired tensor neural networks for partial dif- ferential equations (2022), arXiv:2208.02235

work page arXiv 2022

[32] [32]

Barratt, J

F. Barratt, J. Dborin, and L. Wright, Improvements to gradient descent methods for quantum tensor network machine learning (2022), arXiv:2203.03366

work page arXiv 2022

[33] [33]

Glorot and Y

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Pro- ceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , Proceedings of Ma- chine Learning Research, Vol. 9, edited by Y. W. Teh and M. Titterington (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010) pp. 249–256

work page 2010

[34] [34]

X. Tang, Y. Khoo, and L. Ying, Initialization and train- ing of matrix product state probabilistic models (2025), arXiv:2505.06419 [math.NA]

work page arXiv 2025

[35] [35]

Oseledets and E

I. Oseledets and E. Tyrtyshnikov, Tt-cross approxima- tion for multidimensional arrays, Linear Algebra and its Applications 432, 70 (2010)

work page 2010

[36] [36]

How to generate random matrices from the classical compact groups

F. Mezzadri, How to generate random matrices from the classical compact groups (2007), arXiv:math-ph/0609050 [math-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2007

[37] [37]

Puljak, S

E. Puljak, S. Sanchez-Ramirez, S. Masot-Llima, J. Vall` es- Muns, A. Garcia-Saez, and M. Pierini, tn4ml: Tensor network training and customization for machine learning (2025), arXiv:2502.13090 [cs.LG]

work page arXiv 2025

[38] [38]

J. Wang, C. Roberts, G. Vidal, and S. Leichenauer, Anomaly detection with tensor networks (2020), arXiv:2006.02516

work page arXiv 2020

[39] [39]

T. Hao, X. Huang, C. Jia, and C. Peng, A quantum- inspired tensor network algorithm for constrained combi- natorial optimization problems, Frontiers in Physics 10, 10.3389/fphy.2022.906590 (2022)

work page doi:10.3389/fphy.2022.906590 2022