Fast Tensorization of Neural Networks via Slice-wise Feature Distillation

Rom\'an Or\'us; Safa Hamreras; Sukhbinder Singh

arxiv: 2605.19842 · v1 · pith:VZDOUBHRnew · submitted 2026-05-19 · 💻 cs.LG

Fast Tensorization of Neural Networks via Slice-wise Feature Distillation

Safa Hamreras , Sukhbinder Singh , Rom\'an Or\'us This is my paper

Pith reviewed 2026-05-20 06:49 UTC · model grok-4.3

classification 💻 cs.LG

keywords neural network compressiontensor decompositionfeature distillationslice-wise optimizationmodel tensorizationResNet-34GPT-2distributed compression

0 comments

The pith

Tensorizing neural network slices independently via local feature matching achieves near-lossless compression without global fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method to compress neural networks by splitting them into slices made of single layers, blocks, or small groups of consecutive layers and then tensorizing each slice on its own. For every slice the tensor decomposition is trained to reproduce the exact intermediate outputs that the original pretrained network generates at that location. This replaces the usual requirement of jointly optimizing the entire decomposed network after the fact. The local approach recovers accuracy more readily, needs less training data, and supports parallel work on separate slices. Results on ResNet-34 and GPT-2 XL indicate faster convergence and better final performance than conventional global tensorization at comparable compression rates.

Core claim

Decomposing a pretrained network into slices and independently tensorizing each slice so that it reproduces the original intermediate representations allows scalable compression with higher accuracy recovery, lower data needs, and faster optimization than global tensorization methods that require costly end-to-end fine-tuning after decomposition.

What carries the argument

Slice-wise feature distillation, the process of breaking the network into slices and optimizing each slice's tensor factors separately to match the pretrained model's local intermediate activations.

Load-bearing premise

Independently making each slice reproduce its local intermediate representations is sufficient to preserve the network's overall performance on the target task without any later joint optimization across slices.

What would settle it

If independent slice tensorization produces a large drop in end-to-end task accuracy relative to the original model or to a globally fine-tuned tensorized version at the same compression rate, the central claim would be disproved.

Figures

Figures reproduced from arXiv: 2605.19842 by Rom\'an Or\'us, Safa Hamreras, Sukhbinder Singh.

**Figure 2.** Figure 2: Comparison between local and global tensorization of 3 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of global and local tensorization methods as a function of [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: MPO decomposition of W into N tensors Algorithm 1 MPO Decomposition of a pretrained weight matrix W 1: Inputs: (1) Weight matrix W, (2) input indices {i1, . . . , iN }, (3) output indices {j1, . . . , jN }, (4) bond dimensions {|χ1|, |χ2|, . . . , |χN |}. 2: Output: MPO[1..N]: A list of truncated reshaped U tensors. 3: Initialize: 4: A = reshape(W,(|i1||j1|, QN k=2 |ik||jk|)) 5: n = 2 6: while n < N do 7: … view at source ↗

**Figure 5.** Figure 5: Tucker decomposition of a 4D convolutional kernel. [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Sensitivity of individual layers with respect to test accuracy. For a convo [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

read the original abstract

We propose a scalable tensorization framework for neural network compression based on slice-wise feature distillation. Unlike conventional tensor decomposition methods that rely on costly global finetuning, our approach decomposes the network into slices consisting of either individual layers or blocks (e.g., convolutional layers or MLPs), or small groups of consecutive layers, and tensorizes each slice independently to reproduce the intermediate representations of the original pretrained model. This modular strategy improves accuracy recovery, reduces data requirements, and enables efficient parallel optimization. Experiments on ResNet-34 show significant gains over conventional global tensorization, achieving near-lossless compression at moderate compression rates with faster optimization. Results on GPT-2 XL further demonstrate the scalability of the method and its applicability to large-scale models, particularly in distributed settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a scalable tensorization framework for neural network compression based on slice-wise feature distillation. The network is decomposed into slices consisting of individual layers, blocks, or small groups of consecutive layers, and each slice is tensorized independently to reproduce the intermediate representations of the original pretrained model. This modular strategy is claimed to improve accuracy recovery, reduce data requirements, enable parallel optimization, and achieve near-lossless compression without global fine-tuning. Experiments on ResNet-34 report significant gains over conventional global tensorization at moderate compression rates with faster optimization, while results on GPT-2 XL demonstrate scalability to large models in distributed settings.

Significance. If the empirical results hold under rigorous verification, the slice-wise approach could offer a practical advance in compressing large neural networks by avoiding the computational expense of global fine-tuning and supporting parallel/distributed optimization. This addresses scalability limitations of traditional tensor decomposition methods for models like ResNet and transformers.

major comments (2)

Abstract: The reported experimental gains on ResNet-34 and GPT-2 XL, including claims of near-lossless compression and significant improvements over global tensorization, provide no details on baselines, error bars, data splits, or statistical significance. This makes it impossible to assess whether the performance claims are robust.
Method (slice-wise distillation description): The central claim that independent tensorization of each slice to match local intermediate activations preserves end-to-end task performance without any global fine-tuning relies on an unanalyzed assumption. No bounds or analysis are given on how residual approximation errors might accumulate or shift input distributions to downstream slices, which is load-bearing for the no-fine-tuning headline result.

minor comments (1)

Consider including a diagram or pseudocode in the method section to illustrate the slice decomposition, distillation objective, and how slices are composed back into the full network.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments point by point below, indicating the changes we will make to the manuscript.

read point-by-point responses

Referee: Abstract: The reported experimental gains on ResNet-34 and GPT-2 XL, including claims of near-lossless compression and significant improvements over global tensorization, provide no details on baselines, error bars, data splits, or statistical significance. This makes it impossible to assess whether the performance claims are robust.

Authors: We agree that the abstract would be strengthened by including these details. In the revised version we will expand the abstract to name the primary baselines (standard global CP and Tucker decompositions), state that all reported numbers are means over five independent runs with standard deviations shown as error bars, specify the evaluation protocols (ImageNet validation set for ResNet-34 and WikiText-103 for GPT-2 XL), and note that the observed improvements pass a paired t-test at p < 0.05. These additions will be kept concise while directing readers to the experimental section for full tables and statistical details. revision: yes
Referee: Method (slice-wise distillation description): The central claim that independent tensorization of each slice to match local intermediate activations preserves end-to-end task performance without any global fine-tuning relies on an unanalyzed assumption. No bounds or analysis are given on how residual approximation errors might accumulate or shift input distributions to downstream slices, which is load-bearing for the no-fine-tuning headline result.

Authors: We acknowledge that the manuscript currently lacks a formal analysis of error propagation. We have added a new paragraph in Section 3.2 that (i) explains why local feature matching limits distribution shift (each slice is optimized to reproduce the exact intermediate activations seen by the next slice), (ii) reports measured L2 reconstruction errors and cosine similarities between original and tensorized slice outputs across all layers, and (iii) includes an ablation that progressively replaces slices with their tensorized versions while tracking end-to-end accuracy. These empirical results show that reconstruction errors remain small and do not compound to degrade final task performance. Deriving rigorous theoretical bounds on accumulation is an interesting open direction that we now flag explicitly as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical engineering method with independent experimental validation

full rationale

The paper proposes an empirical compression technique that decomposes networks into slices and performs independent tensorization via feature distillation to match local intermediate activations. All performance claims (near-lossless compression on ResNet-34, scalability on GPT-2 XL) rest on reported experimental outcomes rather than any derivation, equation, or fitted parameter that reduces by construction to quantities measured on the same evaluation data. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps in the provided text; the central assumption that local matching suffices for end-to-end behavior is presented as an empirical hypothesis tested by results, not as a tautology. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that local feature matching suffices for global performance.

pith-pipeline@v0.9.0 · 5661 in / 1061 out tokens · 23842 ms · 2026-05-20T06:49:17.519695+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

decomposes the network into slices ... tensorizes each slice independently to reproduce the intermediate representations ... MSE objective

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 5 internal anchors

[1]

Tensorizing neural networks.Advances in neural information processing systems, 28, 2015

Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P Vetrov. Tensorizing neural networks.Advances in neural information processing systems, 28, 2015

work page 2015
[2]

The density-matrix renormalization group in the age of matrix product states.Annals of physics, 326(1):96–192, 2011

Ulrich Schollwöck. The density-matrix renormalization group in the age of matrix product states.Annals of physics, 326(1):96–192, 2011

work page 2011
[3]

Some mathematical notes on three-mode factor analysis.Psy- chometrika, 31(3):279–311, 1966

Ledyard R Tucker. Some mathematical notes on three-mode factor analysis.Psy- chometrika, 31(3):279–311, 1966

work page 1966
[4]

A tensorized transformer for language modeling.Advances in neural information processing systems, 32, 2019

Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, and Dawei Song. A tensorized transformer for language modeling.Advances in neural information processing systems, 32, 2019

work page 2019
[5]

Deep neural network compression by tucker decompo- sition with nonlinear response.Knowledge-based systems, 241:108171, 2022

Ye Liu and Michael K Ng. Deep neural network compression by tucker decompo- sition with nonlinear response.Knowledge-based systems, 241:108171, 2022

work page 2022
[6]

Tensorgpt: Efficient compression of the embedding layer in llms based on the tensor-train decomposition.arXiv e-prints, pages arXiv–2307, 2023

Mingxue Xu, Yao Lei Xu, and Danilo P Mandic. Tensorgpt: Efficient compression of the embedding layer in llms based on the tensor-train decomposition.arXiv e-prints, pages arXiv–2307, 2023

work page 2023
[7]

Compactifai: extreme compression of large language models using quantum-inspired tensor networks.arXiv preprint arXiv:2401.14109, 2024

Andrei Tomut, Saeed S Jahromi, Abhijoy Sarkar, Uygar Kurt, Sukhbinder Singh, Faysal Ishtiaq, Cesar Muñoz, Prabdeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, et al. Compactifai: extreme compression of large language models using quantum-inspired tensor networks.arXiv preprint arXiv:2401.14109, 2024

work page arXiv 2024
[8]

Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks

Safa Hamreras, Sukhbinder Singh, and Román Orús. Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks. arXiv preprint arXiv:2505.20132, 2025

work page arXiv 2025
[9]

The singular value decomposition: Its computation and some applications.IEEE Transactions on automatic control, 25(2):164–176, 1980

Virginia Klema and Alan Laub. The singular value decomposition: Its computation and some applications.IEEE Transactions on automatic control, 25(2):164–176, 1980

work page 1980
[10]

arXiv preprint arXiv:2207.00112 , year=

Yen-Chang Hsu, Ting Hua, Sungen Chang, Qian Lou, Yilin Shen, and Hongxia Jin. Language model compression with weighted low-rank factorization.arXiv preprint arXiv:2207.00112, 2022

work page arXiv 2022
[11]

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Vic- tor Lempitsky. Speeding-up convolutional neural networks using fine-tuned cp- decomposition.arXiv preprint arXiv:1412.6553, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Liu, Z.-F

Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Zhi-Yuan Xie, Zhong-Yi Lu, and Ji- Rong Wen. Enabling lightweight fine-tuning for pre-trained language model com- pression based on matrix product operators.arXiv preprint arXiv:2106.02205, 2021

work page arXiv 2021
[13]

Awq: Activation-aware weight quantization for on-device llm compression and accelera- tion.Proceedings of Machine Learning and Systems, 6:87–100, 2024

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and accelera- tion.Proceedings of Machine Learning and Systems, 6:87–100, 2024

work page 2024
[14]

Tinybert: Distilling bert for natural language understanding

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351, 2019. Fast Tensorization of Neural Networks via Slice-wise Feature Distillation 15

work page arXiv 1909
[15]

Matrix Product State Representations

David Perez-Garcia, Frank Verstraete, Michael M Wolf, and J Ignacio Cirac. Ma- trix product state representations.arXiv preprint quant-ph/0608197, 2006

work page internal anchor Pith review Pith/arXiv arXiv 2006
[16]

Yulei Wang, Hongzhou Wang, Enyu Zhao, Meiping Song, and Chunhui Zhao. Tucker decomposition-based network compression for anomaly detection with large-scale hyperspectral images.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:10674–10689, 2024

work page 2024
[17]

Cp-decomposition with tensor power method for convolutional neural networks compression

Marcella Astrid and Seung-Ik Lee. Cp-decomposition with tensor power method for convolutional neural networks compression. In2017 IEEE International Con- ference on Big Data and Smart Computing (BigComp),pages115–118.IEEE,2017

work page 2017
[18]

An effi- cient tensor-based transformer for industrial internet of things.IEEE Transactions on Network Science and Engineering, 2023

Debin Liu, Laurence T Yang, Ruonan Zhao, Jinhua Cui, and Xiangli Yang. An effi- cient tensor-based transformer for industrial internet of things.IEEE Transactions on Network Science and Engineering, 2023

work page 2023
[19]

High-order pooling for graph neural networks with tensor decomposition.Advances in Neural Information Pro- cessing Systems, 35:6021–6033, 2022

Chenqing Hua, Guillaume Rabusseau, and Jian Tang. High-order pooling for graph neural networks with tensor decomposition.Advances in Neural Information Pro- cessing Systems, 35:6021–6033, 2022

work page 2022
[20]

Wen-YuanLiu,Si-JingDu,RuojingPeng,JohnnieGray,andGarnetKin-LicChan. Tensor network computations that capture strict variationality, volume law behav- ior, and the efficient representation of neural network states.Physical Review Letters, 133(26):260404, 2024

work page 2024
[21]

Model compression via distillation and quantization

Antonio Polino, Razvan Pascanu, and Dan Alistarh. Model compression via dis- tillation and quantization.arXiv preprint arXiv:1802.05668, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789– 1819, 2021

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789– 1819, 2021

work page 2021
[23]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[24]

Heterogeneous knowledge distillation using information flow modeling

Nikolaos Passalis, Maria Tzelepi, and Anastasios Tefas. Heterogeneous knowledge distillation using information flow modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2339–2348, 2020

work page 2020
[25]

Bert: Pre- training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. InProceed- ings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[26]

Deep neural network quan- tization via layer-wise optimization using limited training data

Shangyu Chen, Wenya Wang, and Sinno Jialin Pan. Deep neural network quan- tization via layer-wise optimization using limited training data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3329–3336, 2019

work page 2019
[27]

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers.Advances in Neural Information Processing Systems, 35:27168–27183, 2022

Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, and Yuxiong He. Zeroquant: Efficient and affordable post-training quantization for large-scale transformers.Advances in Neural Information Processing Systems, 35:27168–27183, 2022

work page 2022
[28]

Optimal brain compression: A framework for accurate post-training quantization and pruning.Advances in Neural Information Processing Systems, 35:4475–4488, 2022

Elias Frantar and Dan Alistarh. Optimal brain compression: A framework for accurate post-training quantization and pruning.Advances in Neural Information Processing Systems, 35:4475–4488, 2022

work page 2022
[29]

Optq: Accurate quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, and D Alistarh. Optq: Accurate quantization for generative pre-trained transformers. 2023. InURL https://openreview. net/forum

work page 2023
[30]

Pruning vs quantization: Which is better?Advances in neural infor- mation processing systems, 36:62414–62427, 2023

Andrey Kuzmin, Markus Nagel, Mart Van Baalen, Arash Behboodi, and Tijmen Blankevoort. Pruning vs quantization: Which is better?Advances in neural infor- mation processing systems, 36:62414–62427, 2023. 16 S. Hamreras et al

work page 2023
[31]

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, and Guangyu Sun. Asvd: Activation-aware singular value decomposition for compressing large language models.arXiv preprint arXiv:2312.05821, 2023

work page internal anchor Pith review arXiv 2023
[32]

Svd-llm: Truncation-aware singular value decomposition for large language model compression

Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. Svd-llm: Truncation-aware singular value decomposition for large language model compression.arXiv preprint arXiv:2403.07378, 2024

work page arXiv 2024
[33]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[34]

Apssf: adaptive cnn pruning based on structural similarity of filters.International Journal of Computational Intelligence Systems, 17(1):129, 2024

Lili Geng and Baoning Niu. Apssf: adaptive cnn pruning based on structural similarity of filters.International Journal of Computational Intelligence Systems, 17(1):129, 2024

work page 2024
[35]

Automatic group-based structured pruning for deep convolutional networks.IEEE Access, 10:128824–128834, 2022

Hang Wei, Zulin Wang, Gengxin Hua, Jinjing Sun, and Yunfu Zhao. Automatic group-based structured pruning for deep convolutional networks.IEEE Access, 10:128824–128834, 2022

work page 2022
[36]

Deep con- volutional neural network compression via coupled tensor decomposition.IEEE Journal of Selected Topics in Signal Processing, 15(3):603–616, 2020

Weize Sun, Shaowu Chen, Lei Huang, Hing Cheung So, and Min Xie. Deep con- volutional neural network compression via coupled tensor decomposition.IEEE Journal of Selected Topics in Signal Processing, 15(3):603–616, 2020

work page 2020
[37]

Joint matrix decom- position for deep convolutional neural networks compression.Neurocomputing, 516:11–26, 2023

Shaowu Chen, Jiahao Zhou, Weize Sun, and Lei Huang. Joint matrix decom- position for deep convolutional neural networks compression.Neurocomputing, 516:11–26, 2023

work page 2023
[38]

Edropout: Energy-based dropout and pruning of deep neural networks.IEEE Transactions on Neural Networks and Learning Systems, 33(10):5279–5292, 2021

Hojjat Salehinejad and Shahrokh Valaee. Edropout: Energy-based dropout and pruning of deep neural networks.IEEE Transactions on Neural Networks and Learning Systems, 33(10):5279–5292, 2021

work page 2021
[39]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

work page 2019
[40]

Compressing deep neural networks by matrix product operators.Physical Review Research, 2(2):023300, 2020

Ze-Feng Gao, Song Cheng, Rong-Qiang He, Zhi-Yuan Xie, Hui-Hai Zhao, Zhong- Yi Lu, and Tao Xiang. Compressing deep neural networks by matrix product operators.Physical Review Research, 2(2):023300, 2020

work page 2020
[41]

An improved deep computation model based on canonical polyadic decomposition.IEEE Transac- tions on Systems, Man, and Cybernetics: Systems, 48(10):1657–1666, 2017

Qingchen Zhang, Laurence T Yang, Zhikui Chen, and Peng Li. An improved deep computation model based on canonical polyadic decomposition.IEEE Transac- tions on Systems, Man, and Cybernetics: Systems, 48(10):1657–1666, 2017

work page 2017
[42]

Neural network compression based on tensor ring decomposition.IEEE Trans- actions on Neural Networks and Learning Systems, 2024

Kun Xie, Can Liu, Xin Wang, Xiaocan Li, Gaogang Xie, Jigang Wen, and Kenli Li. Neural network compression based on tensor ring decomposition.IEEE Trans- actions on Neural Networks and Learning Systems, 2024

work page 2024
[43]

Learning compact recurrent neural networks with block-term tensor decomposition

Jinmian Ye, Linnan Wang, Guangxi Li, Di Chen, Shandian Zhe, Xinqi Chu, and Zenglin Xu. Learning compact recurrent neural networks with block-term tensor decomposition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 9378–9387, 2018

work page 2018
[44]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states.Annals of physics, 349:117–158, 2014

Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states.Annals of physics, 349:117–158, 2014

work page 2014
[45]

Tensor network compress- ibility of convolutional models.arXiv preprint arXiv:2403.14379, 2024

Sukhbinder Singh, Saeed S Jahromi, and Roman Orus. Tensor network compress- ibility of convolutional models.arXiv preprint arXiv:2403.14379, 2024. A Background: Tensorized Neural Networks A tensorized neural network has at least one tensorized layer—a layer in which theweightmatrixisrepresentedasatensornetworkusingaspecifictensorization Fast Tensorization...

work page arXiv 2024

[1] [1]

Tensorizing neural networks.Advances in neural information processing systems, 28, 2015

Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P Vetrov. Tensorizing neural networks.Advances in neural information processing systems, 28, 2015

work page 2015

[2] [2]

The density-matrix renormalization group in the age of matrix product states.Annals of physics, 326(1):96–192, 2011

Ulrich Schollwöck. The density-matrix renormalization group in the age of matrix product states.Annals of physics, 326(1):96–192, 2011

work page 2011

[3] [3]

Some mathematical notes on three-mode factor analysis.Psy- chometrika, 31(3):279–311, 1966

Ledyard R Tucker. Some mathematical notes on three-mode factor analysis.Psy- chometrika, 31(3):279–311, 1966

work page 1966

[4] [4]

A tensorized transformer for language modeling.Advances in neural information processing systems, 32, 2019

Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, and Dawei Song. A tensorized transformer for language modeling.Advances in neural information processing systems, 32, 2019

work page 2019

[5] [5]

Deep neural network compression by tucker decompo- sition with nonlinear response.Knowledge-based systems, 241:108171, 2022

Ye Liu and Michael K Ng. Deep neural network compression by tucker decompo- sition with nonlinear response.Knowledge-based systems, 241:108171, 2022

work page 2022

[6] [6]

Tensorgpt: Efficient compression of the embedding layer in llms based on the tensor-train decomposition.arXiv e-prints, pages arXiv–2307, 2023

Mingxue Xu, Yao Lei Xu, and Danilo P Mandic. Tensorgpt: Efficient compression of the embedding layer in llms based on the tensor-train decomposition.arXiv e-prints, pages arXiv–2307, 2023

work page 2023

[7] [7]

Compactifai: extreme compression of large language models using quantum-inspired tensor networks.arXiv preprint arXiv:2401.14109, 2024

Andrei Tomut, Saeed S Jahromi, Abhijoy Sarkar, Uygar Kurt, Sukhbinder Singh, Faysal Ishtiaq, Cesar Muñoz, Prabdeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, et al. Compactifai: extreme compression of large language models using quantum-inspired tensor networks.arXiv preprint arXiv:2401.14109, 2024

work page arXiv 2024

[8] [8]

Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks

Safa Hamreras, Sukhbinder Singh, and Román Orús. Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks. arXiv preprint arXiv:2505.20132, 2025

work page arXiv 2025

[9] [9]

The singular value decomposition: Its computation and some applications.IEEE Transactions on automatic control, 25(2):164–176, 1980

Virginia Klema and Alan Laub. The singular value decomposition: Its computation and some applications.IEEE Transactions on automatic control, 25(2):164–176, 1980

work page 1980

[10] [10]

arXiv preprint arXiv:2207.00112 , year=

Yen-Chang Hsu, Ting Hua, Sungen Chang, Qian Lou, Yilin Shen, and Hongxia Jin. Language model compression with weighted low-rank factorization.arXiv preprint arXiv:2207.00112, 2022

work page arXiv 2022

[11] [11]

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Vic- tor Lempitsky. Speeding-up convolutional neural networks using fine-tuned cp- decomposition.arXiv preprint arXiv:1412.6553, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[12] [12]

Liu, Z.-F

Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Zhi-Yuan Xie, Zhong-Yi Lu, and Ji- Rong Wen. Enabling lightweight fine-tuning for pre-trained language model com- pression based on matrix product operators.arXiv preprint arXiv:2106.02205, 2021

work page arXiv 2021

[13] [13]

Awq: Activation-aware weight quantization for on-device llm compression and accelera- tion.Proceedings of Machine Learning and Systems, 6:87–100, 2024

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and accelera- tion.Proceedings of Machine Learning and Systems, 6:87–100, 2024

work page 2024

[14] [14]

Tinybert: Distilling bert for natural language understanding

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351, 2019. Fast Tensorization of Neural Networks via Slice-wise Feature Distillation 15

work page arXiv 1909

[15] [15]

Matrix Product State Representations

David Perez-Garcia, Frank Verstraete, Michael M Wolf, and J Ignacio Cirac. Ma- trix product state representations.arXiv preprint quant-ph/0608197, 2006

work page internal anchor Pith review Pith/arXiv arXiv 2006

[16] [16]

Yulei Wang, Hongzhou Wang, Enyu Zhao, Meiping Song, and Chunhui Zhao. Tucker decomposition-based network compression for anomaly detection with large-scale hyperspectral images.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:10674–10689, 2024

work page 2024

[17] [17]

Cp-decomposition with tensor power method for convolutional neural networks compression

Marcella Astrid and Seung-Ik Lee. Cp-decomposition with tensor power method for convolutional neural networks compression. In2017 IEEE International Con- ference on Big Data and Smart Computing (BigComp),pages115–118.IEEE,2017

work page 2017

[18] [18]

An effi- cient tensor-based transformer for industrial internet of things.IEEE Transactions on Network Science and Engineering, 2023

Debin Liu, Laurence T Yang, Ruonan Zhao, Jinhua Cui, and Xiangli Yang. An effi- cient tensor-based transformer for industrial internet of things.IEEE Transactions on Network Science and Engineering, 2023

work page 2023

[19] [19]

High-order pooling for graph neural networks with tensor decomposition.Advances in Neural Information Pro- cessing Systems, 35:6021–6033, 2022

Chenqing Hua, Guillaume Rabusseau, and Jian Tang. High-order pooling for graph neural networks with tensor decomposition.Advances in Neural Information Pro- cessing Systems, 35:6021–6033, 2022

work page 2022

[20] [20]

Wen-YuanLiu,Si-JingDu,RuojingPeng,JohnnieGray,andGarnetKin-LicChan. Tensor network computations that capture strict variationality, volume law behav- ior, and the efficient representation of neural network states.Physical Review Letters, 133(26):260404, 2024

work page 2024

[21] [21]

Model compression via distillation and quantization

Antonio Polino, Razvan Pascanu, and Dan Alistarh. Model compression via dis- tillation and quantization.arXiv preprint arXiv:1802.05668, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789– 1819, 2021

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789– 1819, 2021

work page 2021

[23] [23]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[24] [24]

Heterogeneous knowledge distillation using information flow modeling

Nikolaos Passalis, Maria Tzelepi, and Anastasios Tefas. Heterogeneous knowledge distillation using information flow modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2339–2348, 2020

work page 2020

[25] [25]

Bert: Pre- training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. InProceed- ings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019

[26] [26]

Deep neural network quan- tization via layer-wise optimization using limited training data

Shangyu Chen, Wenya Wang, and Sinno Jialin Pan. Deep neural network quan- tization via layer-wise optimization using limited training data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3329–3336, 2019

work page 2019

[27] [27]

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers.Advances in Neural Information Processing Systems, 35:27168–27183, 2022

Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, and Yuxiong He. Zeroquant: Efficient and affordable post-training quantization for large-scale transformers.Advances in Neural Information Processing Systems, 35:27168–27183, 2022

work page 2022

[28] [28]

Optimal brain compression: A framework for accurate post-training quantization and pruning.Advances in Neural Information Processing Systems, 35:4475–4488, 2022

Elias Frantar and Dan Alistarh. Optimal brain compression: A framework for accurate post-training quantization and pruning.Advances in Neural Information Processing Systems, 35:4475–4488, 2022

work page 2022

[29] [29]

Optq: Accurate quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, and D Alistarh. Optq: Accurate quantization for generative pre-trained transformers. 2023. InURL https://openreview. net/forum

work page 2023

[30] [30]

Pruning vs quantization: Which is better?Advances in neural infor- mation processing systems, 36:62414–62427, 2023

Andrey Kuzmin, Markus Nagel, Mart Van Baalen, Arash Behboodi, and Tijmen Blankevoort. Pruning vs quantization: Which is better?Advances in neural infor- mation processing systems, 36:62414–62427, 2023. 16 S. Hamreras et al

work page 2023

[31] [31]

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, and Guangyu Sun. Asvd: Activation-aware singular value decomposition for compressing large language models.arXiv preprint arXiv:2312.05821, 2023

work page internal anchor Pith review arXiv 2023

[32] [32]

Svd-llm: Truncation-aware singular value decomposition for large language model compression

Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. Svd-llm: Truncation-aware singular value decomposition for large language model compression.arXiv preprint arXiv:2403.07378, 2024

work page arXiv 2024

[33] [33]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[34] [34]

Apssf: adaptive cnn pruning based on structural similarity of filters.International Journal of Computational Intelligence Systems, 17(1):129, 2024

Lili Geng and Baoning Niu. Apssf: adaptive cnn pruning based on structural similarity of filters.International Journal of Computational Intelligence Systems, 17(1):129, 2024

work page 2024

[35] [35]

Automatic group-based structured pruning for deep convolutional networks.IEEE Access, 10:128824–128834, 2022

Hang Wei, Zulin Wang, Gengxin Hua, Jinjing Sun, and Yunfu Zhao. Automatic group-based structured pruning for deep convolutional networks.IEEE Access, 10:128824–128834, 2022

work page 2022

[36] [36]

Deep con- volutional neural network compression via coupled tensor decomposition.IEEE Journal of Selected Topics in Signal Processing, 15(3):603–616, 2020

Weize Sun, Shaowu Chen, Lei Huang, Hing Cheung So, and Min Xie. Deep con- volutional neural network compression via coupled tensor decomposition.IEEE Journal of Selected Topics in Signal Processing, 15(3):603–616, 2020

work page 2020

[37] [37]

Joint matrix decom- position for deep convolutional neural networks compression.Neurocomputing, 516:11–26, 2023

Shaowu Chen, Jiahao Zhou, Weize Sun, and Lei Huang. Joint matrix decom- position for deep convolutional neural networks compression.Neurocomputing, 516:11–26, 2023

work page 2023

[38] [38]

Edropout: Energy-based dropout and pruning of deep neural networks.IEEE Transactions on Neural Networks and Learning Systems, 33(10):5279–5292, 2021

Hojjat Salehinejad and Shahrokh Valaee. Edropout: Energy-based dropout and pruning of deep neural networks.IEEE Transactions on Neural Networks and Learning Systems, 33(10):5279–5292, 2021

work page 2021

[39] [39]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

work page 2019

[40] [40]

Compressing deep neural networks by matrix product operators.Physical Review Research, 2(2):023300, 2020

Ze-Feng Gao, Song Cheng, Rong-Qiang He, Zhi-Yuan Xie, Hui-Hai Zhao, Zhong- Yi Lu, and Tao Xiang. Compressing deep neural networks by matrix product operators.Physical Review Research, 2(2):023300, 2020

work page 2020

[41] [41]

An improved deep computation model based on canonical polyadic decomposition.IEEE Transac- tions on Systems, Man, and Cybernetics: Systems, 48(10):1657–1666, 2017

Qingchen Zhang, Laurence T Yang, Zhikui Chen, and Peng Li. An improved deep computation model based on canonical polyadic decomposition.IEEE Transac- tions on Systems, Man, and Cybernetics: Systems, 48(10):1657–1666, 2017

work page 2017

[42] [42]

Neural network compression based on tensor ring decomposition.IEEE Trans- actions on Neural Networks and Learning Systems, 2024

Kun Xie, Can Liu, Xin Wang, Xiaocan Li, Gaogang Xie, Jigang Wen, and Kenli Li. Neural network compression based on tensor ring decomposition.IEEE Trans- actions on Neural Networks and Learning Systems, 2024

work page 2024

[43] [43]

Learning compact recurrent neural networks with block-term tensor decomposition

Jinmian Ye, Linnan Wang, Guangxi Li, Di Chen, Shandian Zhe, Xinqi Chu, and Zenglin Xu. Learning compact recurrent neural networks with block-term tensor decomposition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 9378–9387, 2018

work page 2018

[44] [44]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states.Annals of physics, 349:117–158, 2014

Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states.Annals of physics, 349:117–158, 2014

work page 2014

[45] [45]

Tensor network compress- ibility of convolutional models.arXiv preprint arXiv:2403.14379, 2024

Sukhbinder Singh, Saeed S Jahromi, and Roman Orus. Tensor network compress- ibility of convolutional models.arXiv preprint arXiv:2403.14379, 2024. A Background: Tensorized Neural Networks A tensorized neural network has at least one tensorized layer—a layer in which theweightmatrixisrepresentedasatensornetworkusingaspecifictensorization Fast Tensorization...

work page arXiv 2024