EinSort: Sorting is All We Need for Tensorizing LLM

Jing Liu; Toshiaki Koike-Akino; Ye Wang

arxiv: 2606.08565 · v1 · pith:3Z4DCTGCnew · submitted 2026-06-07 · 💻 cs.LG · cs.AI

EinSort: Sorting is All We Need for Tensorizing LLM

Toshiaki Koike-Akino , Jing Liu , Ye Wang This is my paper

Pith reviewed 2026-06-27 18:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tensorizationLLM compressionlow-rank structureindex orderingsortingKV-cache compressionweight compressiontensor networks

0 comments

The pith

Sorting the indices of a tensor uncovers its hidden low-rank structure for better LLM compression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that reordering the indices of a target tensor through sorting can expose implicit low-rank structure that standard fixed-order tensorization misses. This adaptive method is tested on compressing the weights and KV-caches of large language models, where it yields lower reconstruction error than baseline approaches. A sympathetic reader would care because it offers a lightweight way to reduce the memory footprint of foundation models by leveraging structure already present in their parameters. The approach avoids the need for hand-crafted tensor shapes or topologies by letting the data ordering itself guide the decomposition.

Core claim

The central claim is that an adaptive tensorization procedure based on index ordering by sorting reliably discovers inherent low-rank structure inside the otherwise unstructured weight tensors of large language models, leading to higher-fidelity reconstructions when the resulting tensor networks are used for weight and KV-cache compression.

What carries the argument

EinSort, the sorting-based index reordering step that adaptively permutes tensor modes to expose low-rank structure before tensor-network decomposition.

If this is right

Tensor networks can represent LLM weights at lower error for a given compression ratio once indices are sorted.
KV-cache tensors in transformer inference also admit better compression under the same reordering.
The method works without requiring changes to model architecture or additional training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The success of sorting suggests that index order in weight tensors is not arbitrary and may reflect training dynamics or architectural symmetries.
Similar permutation-based preprocessing could be tested on other tensor formats such as attention matrices or embedding tables.
If the reordering effect holds across scales, it may reduce the search space needed when designing tensor topologies for new models.

Load-bearing premise

Reordering a tensor's indices by sorting will consistently make its low-rank structure visible even when the original weight values appear unstructured.

What would settle it

Running the same tensor decomposition on LLM weight tensors with and without the sorting step and finding that reconstruction error does not decrease, or increases, for multiple models and decomposition ranks.

Figures

Figures reproduced from arXiv: 2606.08565 by Jing Liu, Toshiaki Koike-Akino, Ye Wang.

**Figure 1.** Figure 1: Various tensor networks in tensor diagrams. operations can expose latent low-rank structure that is otherwise hidden in pretrained LLM weights and KV caches. Motivated by this observation, we propose Einstein sorted sum (EinSort), a tensorization framework that augments Einstein summation with reversible permutation operators. Our contributions are summarized as follows: (1) We introduce EinSort, a tenso… view at source ↗

**Figure 2.** Figure 2: Tensorization error vs. rank for v proj weights at the first layer of Qwen3-0.6B model. Left: linear; right: log-log plots. 2020) for tensor factorization can be improved. More importantly, such gauge fixing can notably reduce the number of parameters when using a specific junction (KoikeAkino et al., 2026a). For instance, we can choose a junction to convert adjacent tensor cores into block identity form.… view at source ↗

**Figure 3.** Figure 3: Sliced sorting with nonlinear mapping and reduction. Non-negative tensor decomposition keeps one-bit sign information [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: PPL for Qwen3-0.6B and Gemma3-4B models [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: PPL vs. exponent for Qwen3-1.7B at 50% compression. Gauge fixing and non-negative tensorization improve the accuracy. 3. Experiments We evaluate some LLMs, including Qwen3 (Yang et al., 2025) and Gemma3 (Gemma Team et al., 2025) on the witkitext-2 (WT2) (Merity et al., 2016) benchmark. See Appendix S for details of LLM models, and Appendices T/U for datasets and libraries we use. We focus on KV cache deco… view at source ↗

**Figure 6.** Figure 6: Tensorization error vs. rank for v proj weights at the first layer of Qwen3-0.6B model. Sorting can significantly reduce the required rank. Left: linear; right: log-log plots. perm, _ = torch.sort(perm.view(-1, groupsize), dim=-1) # revert permutation within group Wg = torch.gather(W, dim=-1, index=perm.view_as(W)) # Row-wise sequential axis sorting primes = sympy.factorint(W.shape[1]) # prime factorizatio… view at source ↗

**Figure 7.** Figure 7: Different nonlinear mapping for Qwen3-1.7B at 50% compression: additive logarithm; multiplicative logarithm; and exponentiation [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Different reduction and nonlinear mapping for Phi3-mini at 50% and 80% compression. matrices by parameterizing as the Stiefel manifold, either based on exponential map, Cayley transform, Householder reflections, Givens rotation, Taylor series, or Neumann series. The Stiefel manifold parameterization can reduce the required number of parameters from dr to dr − r(r + 1)/2 for a tensor of shape d × r with d ≥… view at source ↗

**Figure 9.** Figure 9: Gauge freedom: Tensor cores can be arbitrary up to any full-rank identity rotation injection at each cuts. P, L, U = torch.linalg.lu(A) # permutation index from matrix perm = P.argmax(-1) assert torch.allclose(A, (L @ U)[perm]) # A = (L @ U)[perm] # partition L0 = L[:r] # upper part, which is unitriangular L1 = L[r:] # lower part # block identity’s lower part: A = L1 @ L0.inverse() A = torch.linalg.solve_t… view at source ↗

**Figure 10.** Figure 10: Gauge fixing with SVD and LU canonicalization. SVD: Tensor cores can be left- or right-unitary around the orthogonality center. Orthogonal tensors can be represented by lower number of parameters with Stiefel manifold (Koike-Akino et al., 2025b). LU: Tensor cores can be block identity. Block identity tensors can be represented with lower number of parameters (Koike-Akino et al., 2026a). 26 [PITH_FULL_IMA… view at source ↗

**Figure 11.** Figure 11: Test-time tensor train adaptation based on online activation preconditioner C. key insight that the weight decomposition should be adaptive to the input tokens at inference time to be more efficient and effective. We may use a simple scoring network to determine the bond ranks at test time. Specifically, a small neural network feeding online activation input X, and decide the ranks for tensor networks. Th… view at source ↗

**Figure 12.** Figure 12: CNOT chain to realize the row-wise sorting π for W0 in (89). The sorted matrix Wπ can then be decomposed in low-rank tensors. The input X is contracted with the decomposed Wπ through reverse permutation π −1 with conjugate CNOT chain. L. Connection with quantum gates Tensor index permutations in EinSort are closely related to entangling operations in quantum tensor networks. In particular, reversible Bool… view at source ↗

**Figure 13.** Figure 13: EinSort pipeline with generalized folding/unfolding, ordering, slicing, sorting, and mapping. • 4-mode TT for X as is; • 4-mode TT for axis-swapped version of X, like X.permute(3,1,0,2); • Use the second axis as a batch dimension, and apply 3-mode TT for the rest 3 axis; • Unfold into [d1 × d2, d3, d4], and apply 3-mode TT: X.reshape(-1, d3, d4); • Fold X into 6-mode tensor like X.reshape(a,b,c,d,e,f), an… view at source ↗

**Figure 14.** Figure 14: PPL over compression rate for Qwen3-1.7B and Phi3-mini (3.8B) models. • Computing large tensor factorization is computationally challenging. • Permutation memory is not easy to reduce. • Compatible high-performance CUDA kernels which can optimize contraction and index reordering are not available. Q. Experiments setup We conduct experiments for LLM benchmarks to evaluate the effectiveness of our method. O… view at source ↗

**Figure 15.** Figure 15: GSM8K accuracy for Phi-4-mini as a function of KV cache compression ratio. Shaded zone is Wilson’s 99% confidence interval. (a) Qwen3-VL-4B-Instruct (b) Qwen3-VL-8B-Instruct [PITH_FULL_IMAGE:figures/full_fig_p033_15.png] view at source ↗

**Figure 16.** Figure 16: Accuracy for compressed Qwen3-VL models on TextVQA benchmark. R.2. LLM math reasoning Besides perplexity evaluations, we add more practical LLM benchmark, specifically GSM8K (Cobbe et al., 2021) for mathematical reasoning [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗

**Figure 17.** Figure 17: Robot manipulation video snapshots for the compressed VLA-JEPA model on LIBERO benchmarks [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗

read the original abstract

Tensor networks provide efficient representations for compressing large neural networks. By carefully designing shapes and topologies, they can significantly reduce memory and computational costs. However, identifying implicit low-rank structures in large foundation models remains challenging due to their enormous scale and un-structured weight distributions. We propose an adaptive tensorization method that discovers inherent low-rank structure in a target tensor by index ordering. Experiments on weight and KV-cache compression demonstrate improved reconstruction quality compared to baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims that sorting tensor indices is enough to expose low-rank structure for LLM compression, but the abstract alone gives no way to check if the results hold up.

read the letter

The main takeaway is that this work proposes using simple index sorting as an adaptive way to tensorize LLM weights and KV caches, claiming better reconstruction than baselines. If the full paper backs this with clear experiments, it could offer a low-effort compression trick.

What stands out as new is the specific choice of sorting for discovering structure instead of more involved search or hand-tuned shapes. The method stays automatic and lightweight, which aligns with practical needs for large models.

It does well at framing the problem of unstructured weights in foundation models and showing tensor networks as a compression route. The emphasis on KV-cache compression is also timely.

The soft spots are clear from the abstract: no equations, no algorithm steps for the sorting, no model sizes, no baseline descriptions, and no numbers or error bars on the gains. The key assumption—that sorting will reliably reveal implicit low-rank patterns in typical LLM weights—remains untested in the provided text, so we cannot judge if the experiments actually support the claim or if other factors are at play.

This is for researchers already working on tensor methods or LLM efficiency. A reader in that niche might pick up the sorting idea to try, but most others will find it too thin without the full details.

I would bring it to a reading group only after seeing the full manuscript with reproducible results. I would not cite it yet. It deserves peer review if the complete version supplies the missing method description and solid evidence, since the core idea is straightforward to evaluate.

Referee Report

0 major / 1 minor

Summary. The paper proposes EinSort, an adaptive tensorization method for LLMs that discovers inherent low-rank structure in target tensors via index ordering (sorting). Experiments on weight and KV-cache compression are claimed to show improved reconstruction quality relative to baselines.

Significance. If the central claim holds under detailed scrutiny, the approach would supply a notably simple mechanism (index sorting) for exposing usable low-rank structure in the unstructured weights typical of foundation models, thereby easing the design burden for tensor-network compression and potentially lowering memory and compute costs.

minor comments (1)

The provided text consists solely of the abstract; without access to the full manuscript, algorithms, experimental protocols, quantitative results, or error analysis, no technical evaluation of the method or its empirical support is possible.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and for acknowledging the potential significance of a simple index-sorting mechanism for exposing low-rank structure in foundation-model weights. The recommendation is listed as uncertain with no specific major comments provided in the report. We therefore stand by the manuscript as submitted while remaining available to address any additional questions or clarifications the referee may have.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and available context present a methodological proposal for adaptive tensorization via index ordering to discover low-rank structure, without any derivations, equations, fitted parameters, or self-citations that reduce claims to inputs by construction. No load-bearing steps are described that would qualify under the enumerated circularity patterns, making the derivation self-contained as an empirical suggestion rather than a tautological re-expression of prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is abstract-only; ledger therefore records the minimal domain assumptions required for any tensor-network compression claim. No free parameters or invented entities are visible in the abstract.

axioms (1)

domain assumption Weight tensors of large foundation models contain exploitable low-rank structure once indices are suitably reordered.
This premise is required for the adaptive tensorization claim to be meaningful and is stated directly in the abstract.

pith-pipeline@v0.9.1-grok · 5592 in / 1189 out tokens · 22345 ms · 2026-06-27T18:29:13.173666+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

103 extracted references · 50 canonical work pages · 25 internal anchors

[1]

Phi-4 Technical Report

Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R. J., Javaheripi, M., Kauffmann, P., et al. Phi-4 technical report. arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abouelenin, A., Ashfaq, A., Atkinson, A., Awadalla, H., Bach, N., Bao, J., Benhaim, A., Cai, M., Chaudhary, V., Chen, C., et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of- LoRAs . arXiv preprint arXiv:2503.01743, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

TQCompressor : improving tensor decomposition methods in neural networks via permutations

Abronin, V., Naumov, A., Mazur, D., Bystrov, D., Tsarova, K., Melnikov, A., Dolgov, S., Brasher, R., and Perelshein, M. TQCompressor : improving tensor decomposition methods in neural networks via permutations. In 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp.\ 503--506. IEEE, 2024

2024
[4]

Phi-4-reasoning-vision- 15B technical report

Aneja, J., Harrison, M., Joshi, N., LaBonte, T., Langford, J., and Salinas, E. Phi-4-reasoning-vision- 15B technical report. arXiv preprint arXiv:2603.03975, 2026

work page arXiv 2026
[5]

and Ichikawa, Y

Arai, Y. and Ichikawa, Y. Quantization error propagation: Revisiting layer-wise post-training quantization. arXiv preprint arXiv:2504.09629, 2025

work page arXiv 2025
[6]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Assran, M., Bardes, A., Fan, D., Garrido, Q., Howes, R., Muckley, M., Rizvi, A., Roberts, C., Sinha, K., Zholus, A., et al. V-JEPA 2 : Self-supervised video models enable understanding, prediction and planning. arXiv preprint arXiv:2506.09985, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Available: https://arxiv.org/abs/2401.00625

Bai, G., Chai, Z., Ling, C., Wang, S., Lu, J., Zhang, N., Shi, T., Yu, Z., Zhu, M., Zhang, Y., et al. Beyond efficiency: A systematic survey of resource-efficient large language models. arXiv preprint arXiv:2401.00625, 2024 a

work page arXiv 2024
[8]

SparseLLM : Towards global pruning for pre-trained language models

Bai, G., Li, Y., Ling, C., Kim, K., and Zhao, L. SparseLLM : Towards global pruning for pre-trained language models. arXiv preprint arXiv:2402.17946, 2024 b

work page arXiv 2024
[9]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al. Qwen3-VL technical report. arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

LoTR : Low tensor rank weight adaptation

Bershatsky, D., Cherniuk, D., Daulbaev, T., Mikhalev, A., and Oseledets, I. LoTR : Low tensor rank weight adaptation. arXiv preprint arXiv:2402.01376, 2024

work page arXiv 2024
[11]

PaliGemma: A versatile 3B VLM for transfer

Beyer, L., Steiner, A., Pinto, A. S., Kolesnikov, A., Wang, X., Salz, D., Neumann, M., Alabdulmohsin, I., Tschannen, M., Bugliarello, E., et al. PaliGemma : A versatile 3B VLM for transfer. arXiv preprint arXiv:2407.07726, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

K., Hachtel, G

Brayton, R. K., Hachtel, G. D., McMullen, C., and Sangiovanni-Vincentelli, A. Logic minimization algorithms for VLSI synthesis , volume 2. Springer Science & Business Media, 1984

1984
[13]

Pyramid KV : Dynamic KV cache compression based on pyramidal information funneling

Cai, Z., Zhang, Y., Gao, B., Liu, Y., Li, Y., Liu, T., Lu, K., Xiong, W., Dong, Y., Hu, J., and Xiao, W. Pyramid KV : Dynamic KV cache compression based on pyramidal information funneling. In Second Conference on Language Modeling, 2025. URL https://openreview.net/forum?id=ayi7qezU87

2025
[14]

arXiv preprint arXiv:2407.21118 , year=

Chang, C.-C., Lin, W.-C., Lin, C.-Y., Chen, C.-Y., Hu, Y.-F., Wang, P.-S., Huang, N.-C., Ceze, L., Abdelfattah, M. S., and Wu, K.-C. Palu: Compressing KV -cache with low-rank projection. arXiv preprint arXiv:2407.21118, 2024

work page arXiv 2024
[15]

One-for-all: Generalized LoRA for parameter-efficient fine-tuning

Chavan, A., Liu, Z., Gupta, D., Xing, E., and Shen, Z. One-for-all: Generalized LoRA for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967, 2023

work page arXiv 2023
[16]

SuperLoRA : Parameter-efficient unified adaptation for large vision models

Chen, X., Liu, J., Wang, Y., Wang, P., Brand, M., Wang, G., and Koike-Akino, T. SuperLoRA : Parameter-efficient unified adaptation for large vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 8050--8055, 2024 a

2024
[17]

QuanTA : Efficient high-rank fine-tuning of LLMs with quantum-informed tensor adaptation

Chen, Z., Dangovski, R., Loh, C., Dugan, O., Luo, D., and Solja c i \'c , M. QuanTA : Efficient high-rank fine-tuning of LLMs with quantum-informed tensor adaptation. Advances in Neural Information Processing Systems, 37: 0 92210--92245, 2024 b

2024
[18]

Training Verifiers to Solve Math Word Problems

Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al. Training verifiers to solve math word problems. URL https://arxiv. org/abs/2110.14168, 9, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[19]

David, H. A. and Nagaraja, H. N. Order statistics. John Wiley & Sons, 2004

2004
[20]

Predicting parameters in deep learning

Denil, M., Shakibi, B., Dinh, L., Ranzato, M., and De Freitas, N. Predicting parameters in deep learning. Advances in neural information processing systems, 26, 2013

2013
[21]

L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R

Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. Exploiting linear structure within convolutional networks for efficient evaluation. Advances in neural information processing systems, 27, 2014

2014
[22]

DaViT : Dual attention vision transformers

Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., and Yuan, L. DaViT : Dual attention vision transformers. In European conference on computer vision, pp.\ 74--92. Springer, 2022

2022
[23]

Learning to prune deep neural networks via layer-wise optimal brain surgeon

Dong, X., Chen, S., and Pan, S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in neural information processing systems, 30, 2017

2017
[24]

W., and Keutzer, K

Dong, Z., Yao, Z., Gholami, A., Mahoney, M. W., and Keutzer, K. HAWQ : Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 293--302, 2019

2019
[25]

P., Clark, J

Edalati, A., Tahaei, M., Kobyzev, I., Nia, V. P., Clark, J. J., and Rezagholizadeh, M. KronA : Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650, 2022

work page arXiv 2022
[26]

Gauge fixing, canonical forms, and optimal truncations in tensor networks with closed loops

Evenbly, G. Gauge fixing, canonical forms, and optimal truncations in tensor networks with closed loops. Physical Review B, 98 0 (8): 0 085155, 2018

2018
[27]

and Alistarh, D

Frantar, E. and Alistarh, D. SparseGPT : Massive language models can be accurately pruned in one-shot. In International Conference on Machine Learning, pp.\ 10323--10337. PMLR, 2023

2023
[28]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. GPTQ : Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

S., Thomas, A., Spector, B., Poli, M., Rudra, A., and R \'e , C

Fu, D., Arora, S., Grogan, J., Johnson, I., Eyuboglu, E. S., Thomas, A., Spector, B., Poli, M., Rudra, A., and R \'e , C. Monarch mixer: A simple sub-quadratic GEMM -based architecture. Advances in Neural Information Processing Systems, 36: 0 77546--77603, 2023

2023
[30]

Gemma 3 Technical Report

Gemma Team , Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ram \'e , A., Rivi \`e re, M., et al. Gemma 3 technical report. arXiv preprint arXiv:2503.19786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

Stochastic Optimization of Sorting Networks via Continuous Relaxations

Grover, A., Wang, E., Zweig, A., and Ermon, S. Stochastic optimization of sorting networks via continuous relaxations. arXiv preprint arXiv:1903.08850, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[32]

Phi-3 safety post-training: Aligning language models with a ``break-fix'' cycle

Haider, E., Perez-Becker, D., Portet, T., Madan, P., Garg, A., Ashfaq, A., Majercak, D., Wen, W., Kim, D., Yang, Z., et al. Phi-3 safety post-training: Aligning language models with a ``break-fix'' cycle. arXiv preprint arXiv:2407.13833, 2024

work page arXiv 2024
[33]

Optimal brain surgeon: Extensions and performance comparisons

Hassibi, B., Stork, D., and Wolff, G. Optimal brain surgeon: Extensions and performance comparisons. Advances in neural information processing systems, 6, 1993

1993
[34]

LoRA +: Efficient low rank adaptation of large models

Hayou, S., Ghosh, N., and Yu, B. LoRA +: Efficient low rank adaptation of large models. In International Conference on Machine Learning, 2024

2024
[35]

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Hsieh, C.-Y., Li, C.-L., Yeh, C.-K., Nakhost, H., Fujii, Y., Ratner, A., Krishna, R., Lee, C.-Y., and Pfister, T. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

J., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al

Hu, E. J., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al. LoRA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021

2021
[37]

B., and Stoudenmire, E

Huggins, W., Patil, P., Mitchell, B., Whaley, K. B., and Stoudenmire, E. M. Towards quantum machine learning with tensor networks. Quantum Science and technology, 4 0 (2): 0 024001, 2019

2019
[38]

PC-LoRA : Low-rank adaptation for progressive model compression with knowledge distillation

Hwang, I., Park, H., Lee, Y., Yang, J., and Maeng, S. PC-LoRA : Low-rank adaptation for progressive model compression with knowledge distillation. arXiv preprint arXiv:2406.09117, 2024

work page arXiv 2024
[39]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Intelligence, P., Black, K., Brown, N., Darpinian, J., Dhabalia, K., Driess, D., Esmail, A., Equi, M., Finn, C., Fusai, N., et al. _ 0.5 : a vision-language-action model with open-world generalization. arXiv preprint arXiv:2504.16054, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Speeding up Convolutional Neural Networks with Low Rank Expansions

Jaderberg, M., Vedaldi, A., and Zisserman, A. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[41]

M., Bommarito, M

Katz, D. M., Bommarito, M. J., Gao, S., and Arredondo, P. GPT -4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382 0 (2270): 0 20230254, 2024

2024
[42]

-moe: Test-time pruning as micro-grained mixture-of-experts

Koike-Akino, T., Liu, J., and Wang, Y. -moe: Test-time pruning as micro-grained mixture-of-experts. arXiv preprint arXiv:2505.18451, 2025 a

work page arXiv 2025
[43]

Z., Candogan, L

Koike-Akino, T., Tonin, F., Wu, Y., Wu, F. Z., Candogan, L. N., and Cevher, V. Quantum-PEFT : Ultra parameter-efficient fine-tuning. arXiv preprint arXiv:2503.05431, 2025 b

work page arXiv 2025
[44]

P., and Brand, M

Koike-Akino, T., Chen, X., Liu, J., Wang, Y., Wang, P. P., and Brand, M. LatentLLM : Activation-aware transform to multi-head latent attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pp.\ 22644--22652, 2026 a

2026
[45]

TTQ : Activation-aware test-time quantization to accelerate LLM inference on the fly

Koike-Akino, T., Liu, J., and Wang, Y. TTQ : Activation-aware test-time quantization to accelerate LLM inference on the fly. arXiv preprint arXiv:2603.19296, 2026 b

work page arXiv 2026
[46]

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., and Lempitsky, V. Speeding-up convolutional neural networks using fine-tuned CP -decomposition. arXiv preprint arXiv:1412.6553, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[47]

Optimal brain damage

LeCun, Y., Denker, J., and Solla, S. Optimal brain damage. Advances in neural information processing systems, 2, 1989

1989
[48]

Lehmer, D. H. Teaching combinatorial tricks to a computer. In Proceedings of Symposia in Applied Mathematics, pp.\ 179--193. American Mathematical Society, 1960

1960
[49]

Reversible simulation of irreversible computation

Li, M., Tromp, J., and Vit \'a nyi, P. Reversible simulation of irreversible computation. Physica D: Nonlinear Phenomena, 120 0 (1-2): 0 168--176, 1998

1998
[50]

Optimal brain decomposition for accurate LLM low-rank approximation

Li, Y., Lee, D., Yin, R., and Panda, P. Optimal brain decomposition for accurate LLM low-rank approximation. arXiv preprint arXiv:2604.00821, 2026

work page arXiv 2026
[51]

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Lin, B., Tang, Z., Ye, Y., Cui, J., Zhu, B., Jin, P., Zhang, J., Ning, M., and Yuan, L. MoE-LlaVa : Mixture of experts for large vision-language models. arXiv preprint arXiv:2401.15947, 2024 a

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Lin, J., Tang, J., Tang, H., Yang, S., Dang, X., and Han, S. AWQ : Activation-aware weight quantization for LLM compression and acceleration. arXiv preprint arXiv:2306.00978, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[53]

AWQ : Activation-aware weight quantization for on-device LLM compression and acceleration

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., and Han, S. AWQ : Activation-aware weight quantization for on-device LLM compression and acceleration. Proceedings of Machine Learning and Systems, 6: 0 87--100, 2024 b

2024
[54]

DeepSeek-V3 Technical Report

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al. DeepSeek -v3 technical report. arXiv preprint arXiv:2412.19437, 2024 a

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

LIBERO : Benchmarking knowledge transfer for lifelong robot learning

Liu, B., Zhu, Y., Gao, C., Feng, Y., Liu, Q., Zhu, Y., and Stone, P. LIBERO : Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems, 36: 0 44776--44791, 2023 a

2023
[56]

LoDA : Low-dimensional adaptation of large language models

Liu, J., Koike-Akino, T., Wang, P., Brand, M., Wang, Y., and Parsons, K. LoDA : Low-dimensional adaptation of large language models. In NeurIPS’23 Workshop on on Efficient Natural Language and Speech Processing, 2023 b

2023
[57]

AWP : Activation-aware weight pruning and quantization with projected gradient descent

Liu, J., Koike-Akino, T., Wang, Y., Mansour, H., and Brand, M. AWP : Activation-aware weight pruning and quantization with projected gradient descent. arXiv preprint arXiv:2506.10205, 2025

work page arXiv 2025
[58]

KIVI : A tuning-free asymmetric 2bit quantization for KV cache

Liu, Z., Yuan, J., Jin, H., Zhong, S., Xu, Z., Braverman, V., Chen, B., and Hu, X. KIVI : A tuning-free asymmetric 2bit quantization for KV cache. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of ...

2024
[59]

An adaptive tensor-train decomposition approach for efficient deep neural network compression

Luo, S., Liu, M., Yu, Y., Ren, S., and Bai, Y. An adaptive tensor-train decomposition approach for efficient deep neural network compression. arXiv preprint arXiv:2408.01534, 2024

work page arXiv 2024
[60]

McCluskey, E. J. Minimization of Boolean functions. The Bell System Technical Journal, 35 0 (6): 0 1417--1444, 1956

1956
[61]

Learning Latent Permutations with Gumbel-Sinkhorn Networks

Mena, G., Belanger, D., Linderman, S., and Snoek, J. Learning latent permutations with Gumbel - Sinkhorn networks. arXiv preprint arXiv:1802.08665, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[62]

Pointer Sentinel Mixture Models

Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[63]

Novikov, A., Podoprikhin, D., Osokin, A., and Vetrov, D. P. Tensorizing neural networks. Advances in neural information processing systems, 28, 2015

2015
[64]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states

Or \'u s, R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics, 349: 0 117--158, 2014

2014
[65]

PyTorch : An imperative style, high-performance deep learning library

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. PyTorch : An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

2019
[66]

and Xie, S

Peebles, W. and Xie, S. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 4195--4205, 2023

2023
[67]

Stable low-rank tensor decomposition for compression of convolutional neural network

Phan, A.-H., Sobolev, K., Sozykin, K., Ermilov, D., Gusak, J., Tichavsk \`y , P., Glukhov, V., Oseledets, I., and Cichocki, A. Stable low-rank tensor decomposition for compression of convolutional neural network. In European Conference on Computer Vision, pp.\ 522--539. Springer, 2020

2020
[68]

and Eisenschlos, J

Prillo, S. and Eisenschlos, J. SoftSort : A continuous relaxation for the argsort operator. In International Conference on Machine Learning, pp.\ 7793--7802. PMLR, 2020

2020
[69]

TensorNetwork: A Library for Physics and Machine Learning

Roberts, C., Milsted, A., Ganahl, M., Zalcman, A., Fontaine, B., Zou, Y., Hidary, J., Vidal, G., and Leichenauer, S. TensorNetwork : A library for physics and machine learning. arXiv preprint arXiv:1905.01330, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[70]

Compressing large language models using low rank and low precision decomposition

Saha, R., Sagan, N., Srivastava, V., Goldsmith, A., and Pilanci, M. Compressing large language models using low rank and low precision decomposition. Advances in Neural Information Processing Systems, 37: 0 88981--89018, 2024

2024
[71]

N., Kingsbury, B., Sindhwani, V., Arisoy, E., and Ramabhadran, B

Sainath, T. N., Kingsbury, B., Sindhwani, V., Arisoy, E., and Ramabhadran, B. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing, pp.\ 6655--6659. IEEE, 2013

2013
[72]

Eigen attention: Attention in low-rank space for KV cache compression

Saxena, U., Saha, G., Choudhary, S., and Roy, K. Eigen attention: Attention in low-rank space for KV cache compression. arXiv preprint arXiv:2408.05646, 2024

work page arXiv 2024
[73]

The density-matrix renormalization group in the age of matrix product states

Schollw \"o ck, U. The density-matrix renormalization group in the age of matrix product states. Annals of physics, 326 0 (1): 0 96--192, 2011

2011
[74]

A., and Etzioni, O

Schwartz, R., Dodge, J., Smith, N. A., and Etzioni, O. Green AI . Communications of the ACM, 63 0 (12): 0 54--63, 2020

2020
[75]

D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E

Sidiropoulos, N. D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E. E., and Faloutsos, C. Tensor decomposition for signal processing and machine learning. IEEE Transactions on signal processing, 65 0 (13): 0 3551--3582, 2017

2017
[76]

Towards VQA models that can read

Singh, A., Natarjan, V., Shah, M., Jiang, Y., Chen, X., Parikh, D., and Rohrbach, M. Towards VQA models that can read. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 8317--8326, 2019

2019
[77]

Sinha, A. K. and Fleuret, F. AA-SVD : Anchored and adaptive SVD for large language model compression. arXiv preprint arXiv:2604.02119, 2026

work page arXiv 2026
[78]

Vla-jepa: Enhancing vision-language-action model with latent world model.arXiv preprint arXiv:2602.10098,

Sun, J., Zhang, W., Qi, Z., Ren, S., Liu, Z., Zhu, H., Sun, G., Jin, X., and Chen, Z. VLA-JEPA : Enhancing vision-language-action model with latent world model. arXiv preprint arXiv:2602.10098, 2026

work page arXiv 2026
[79]

Sun, M., Liu, Z., Bair, A., and Kolter, J. Z. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[80]

Takeshita, O. Y. Permutation polynomial interleavers: An algebraic-geometric perspective. IEEE Transactions on Information Theory, 53 0 (6): 0 2116--2132, 2007

2007

Showing first 80 references.

[1] [1]

Phi-4 Technical Report

Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R. J., Javaheripi, M., Kauffmann, P., et al. Phi-4 technical report. arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abouelenin, A., Ashfaq, A., Atkinson, A., Awadalla, H., Bach, N., Bao, J., Benhaim, A., Cai, M., Chaudhary, V., Chen, C., et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of- LoRAs . arXiv preprint arXiv:2503.01743, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

TQCompressor : improving tensor decomposition methods in neural networks via permutations

Abronin, V., Naumov, A., Mazur, D., Bystrov, D., Tsarova, K., Melnikov, A., Dolgov, S., Brasher, R., and Perelshein, M. TQCompressor : improving tensor decomposition methods in neural networks via permutations. In 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp.\ 503--506. IEEE, 2024

2024

[4] [4]

Phi-4-reasoning-vision- 15B technical report

Aneja, J., Harrison, M., Joshi, N., LaBonte, T., Langford, J., and Salinas, E. Phi-4-reasoning-vision- 15B technical report. arXiv preprint arXiv:2603.03975, 2026

work page arXiv 2026

[5] [5]

and Ichikawa, Y

Arai, Y. and Ichikawa, Y. Quantization error propagation: Revisiting layer-wise post-training quantization. arXiv preprint arXiv:2504.09629, 2025

work page arXiv 2025

[6] [6]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Assran, M., Bardes, A., Fan, D., Garrido, Q., Howes, R., Muckley, M., Rizvi, A., Roberts, C., Sinha, K., Zholus, A., et al. V-JEPA 2 : Self-supervised video models enable understanding, prediction and planning. arXiv preprint arXiv:2506.09985, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Available: https://arxiv.org/abs/2401.00625

Bai, G., Chai, Z., Ling, C., Wang, S., Lu, J., Zhang, N., Shi, T., Yu, Z., Zhu, M., Zhang, Y., et al. Beyond efficiency: A systematic survey of resource-efficient large language models. arXiv preprint arXiv:2401.00625, 2024 a

work page arXiv 2024

[8] [8]

SparseLLM : Towards global pruning for pre-trained language models

Bai, G., Li, Y., Ling, C., Kim, K., and Zhao, L. SparseLLM : Towards global pruning for pre-trained language models. arXiv preprint arXiv:2402.17946, 2024 b

work page arXiv 2024

[9] [9]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al. Qwen3-VL technical report. arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

LoTR : Low tensor rank weight adaptation

Bershatsky, D., Cherniuk, D., Daulbaev, T., Mikhalev, A., and Oseledets, I. LoTR : Low tensor rank weight adaptation. arXiv preprint arXiv:2402.01376, 2024

work page arXiv 2024

[11] [11]

PaliGemma: A versatile 3B VLM for transfer

Beyer, L., Steiner, A., Pinto, A. S., Kolesnikov, A., Wang, X., Salz, D., Neumann, M., Alabdulmohsin, I., Tschannen, M., Bugliarello, E., et al. PaliGemma : A versatile 3B VLM for transfer. arXiv preprint arXiv:2407.07726, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

K., Hachtel, G

Brayton, R. K., Hachtel, G. D., McMullen, C., and Sangiovanni-Vincentelli, A. Logic minimization algorithms for VLSI synthesis , volume 2. Springer Science & Business Media, 1984

1984

[13] [13]

Pyramid KV : Dynamic KV cache compression based on pyramidal information funneling

Cai, Z., Zhang, Y., Gao, B., Liu, Y., Li, Y., Liu, T., Lu, K., Xiong, W., Dong, Y., Hu, J., and Xiao, W. Pyramid KV : Dynamic KV cache compression based on pyramidal information funneling. In Second Conference on Language Modeling, 2025. URL https://openreview.net/forum?id=ayi7qezU87

2025

[14] [14]

arXiv preprint arXiv:2407.21118 , year=

Chang, C.-C., Lin, W.-C., Lin, C.-Y., Chen, C.-Y., Hu, Y.-F., Wang, P.-S., Huang, N.-C., Ceze, L., Abdelfattah, M. S., and Wu, K.-C. Palu: Compressing KV -cache with low-rank projection. arXiv preprint arXiv:2407.21118, 2024

work page arXiv 2024

[15] [15]

One-for-all: Generalized LoRA for parameter-efficient fine-tuning

Chavan, A., Liu, Z., Gupta, D., Xing, E., and Shen, Z. One-for-all: Generalized LoRA for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967, 2023

work page arXiv 2023

[16] [16]

SuperLoRA : Parameter-efficient unified adaptation for large vision models

Chen, X., Liu, J., Wang, Y., Wang, P., Brand, M., Wang, G., and Koike-Akino, T. SuperLoRA : Parameter-efficient unified adaptation for large vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 8050--8055, 2024 a

2024

[17] [17]

QuanTA : Efficient high-rank fine-tuning of LLMs with quantum-informed tensor adaptation

Chen, Z., Dangovski, R., Loh, C., Dugan, O., Luo, D., and Solja c i \'c , M. QuanTA : Efficient high-rank fine-tuning of LLMs with quantum-informed tensor adaptation. Advances in Neural Information Processing Systems, 37: 0 92210--92245, 2024 b

2024

[18] [18]

Training Verifiers to Solve Math Word Problems

Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al. Training verifiers to solve math word problems. URL https://arxiv. org/abs/2110.14168, 9, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[19] [19]

David, H. A. and Nagaraja, H. N. Order statistics. John Wiley & Sons, 2004

2004

[20] [20]

Predicting parameters in deep learning

Denil, M., Shakibi, B., Dinh, L., Ranzato, M., and De Freitas, N. Predicting parameters in deep learning. Advances in neural information processing systems, 26, 2013

2013

[21] [21]

L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R

Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. Exploiting linear structure within convolutional networks for efficient evaluation. Advances in neural information processing systems, 27, 2014

2014

[22] [22]

DaViT : Dual attention vision transformers

Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., and Yuan, L. DaViT : Dual attention vision transformers. In European conference on computer vision, pp.\ 74--92. Springer, 2022

2022

[23] [23]

Learning to prune deep neural networks via layer-wise optimal brain surgeon

Dong, X., Chen, S., and Pan, S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in neural information processing systems, 30, 2017

2017

[24] [24]

W., and Keutzer, K

Dong, Z., Yao, Z., Gholami, A., Mahoney, M. W., and Keutzer, K. HAWQ : Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 293--302, 2019

2019

[25] [25]

P., Clark, J

Edalati, A., Tahaei, M., Kobyzev, I., Nia, V. P., Clark, J. J., and Rezagholizadeh, M. KronA : Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650, 2022

work page arXiv 2022

[26] [26]

Gauge fixing, canonical forms, and optimal truncations in tensor networks with closed loops

Evenbly, G. Gauge fixing, canonical forms, and optimal truncations in tensor networks with closed loops. Physical Review B, 98 0 (8): 0 085155, 2018

2018

[27] [27]

and Alistarh, D

Frantar, E. and Alistarh, D. SparseGPT : Massive language models can be accurately pruned in one-shot. In International Conference on Machine Learning, pp.\ 10323--10337. PMLR, 2023

2023

[28] [28]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. GPTQ : Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[29] [29]

S., Thomas, A., Spector, B., Poli, M., Rudra, A., and R \'e , C

Fu, D., Arora, S., Grogan, J., Johnson, I., Eyuboglu, E. S., Thomas, A., Spector, B., Poli, M., Rudra, A., and R \'e , C. Monarch mixer: A simple sub-quadratic GEMM -based architecture. Advances in Neural Information Processing Systems, 36: 0 77546--77603, 2023

2023

[30] [30]

Gemma 3 Technical Report

Gemma Team , Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ram \'e , A., Rivi \`e re, M., et al. Gemma 3 technical report. arXiv preprint arXiv:2503.19786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

Stochastic Optimization of Sorting Networks via Continuous Relaxations

Grover, A., Wang, E., Zweig, A., and Ermon, S. Stochastic optimization of sorting networks via continuous relaxations. arXiv preprint arXiv:1903.08850, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[32] [32]

Phi-3 safety post-training: Aligning language models with a ``break-fix'' cycle

Haider, E., Perez-Becker, D., Portet, T., Madan, P., Garg, A., Ashfaq, A., Majercak, D., Wen, W., Kim, D., Yang, Z., et al. Phi-3 safety post-training: Aligning language models with a ``break-fix'' cycle. arXiv preprint arXiv:2407.13833, 2024

work page arXiv 2024

[33] [33]

Optimal brain surgeon: Extensions and performance comparisons

Hassibi, B., Stork, D., and Wolff, G. Optimal brain surgeon: Extensions and performance comparisons. Advances in neural information processing systems, 6, 1993

1993

[34] [34]

LoRA +: Efficient low rank adaptation of large models

Hayou, S., Ghosh, N., and Yu, B. LoRA +: Efficient low rank adaptation of large models. In International Conference on Machine Learning, 2024

2024

[35] [35]

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Hsieh, C.-Y., Li, C.-L., Yeh, C.-K., Nakhost, H., Fujii, Y., Ratner, A., Krishna, R., Lee, C.-Y., and Pfister, T. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

J., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al

Hu, E. J., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al. LoRA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021

2021

[37] [37]

B., and Stoudenmire, E

Huggins, W., Patil, P., Mitchell, B., Whaley, K. B., and Stoudenmire, E. M. Towards quantum machine learning with tensor networks. Quantum Science and technology, 4 0 (2): 0 024001, 2019

2019

[38] [38]

PC-LoRA : Low-rank adaptation for progressive model compression with knowledge distillation

Hwang, I., Park, H., Lee, Y., Yang, J., and Maeng, S. PC-LoRA : Low-rank adaptation for progressive model compression with knowledge distillation. arXiv preprint arXiv:2406.09117, 2024

work page arXiv 2024

[39] [39]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Intelligence, P., Black, K., Brown, N., Darpinian, J., Dhabalia, K., Driess, D., Esmail, A., Equi, M., Finn, C., Fusai, N., et al. _ 0.5 : a vision-language-action model with open-world generalization. arXiv preprint arXiv:2504.16054, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Speeding up Convolutional Neural Networks with Low Rank Expansions

Jaderberg, M., Vedaldi, A., and Zisserman, A. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[41] [41]

M., Bommarito, M

Katz, D. M., Bommarito, M. J., Gao, S., and Arredondo, P. GPT -4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382 0 (2270): 0 20230254, 2024

2024

[42] [42]

-moe: Test-time pruning as micro-grained mixture-of-experts

Koike-Akino, T., Liu, J., and Wang, Y. -moe: Test-time pruning as micro-grained mixture-of-experts. arXiv preprint arXiv:2505.18451, 2025 a

work page arXiv 2025

[43] [43]

Z., Candogan, L

Koike-Akino, T., Tonin, F., Wu, Y., Wu, F. Z., Candogan, L. N., and Cevher, V. Quantum-PEFT : Ultra parameter-efficient fine-tuning. arXiv preprint arXiv:2503.05431, 2025 b

work page arXiv 2025

[44] [44]

P., and Brand, M

Koike-Akino, T., Chen, X., Liu, J., Wang, Y., Wang, P. P., and Brand, M. LatentLLM : Activation-aware transform to multi-head latent attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pp.\ 22644--22652, 2026 a

2026

[45] [45]

TTQ : Activation-aware test-time quantization to accelerate LLM inference on the fly

Koike-Akino, T., Liu, J., and Wang, Y. TTQ : Activation-aware test-time quantization to accelerate LLM inference on the fly. arXiv preprint arXiv:2603.19296, 2026 b

work page arXiv 2026

[46] [46]

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., and Lempitsky, V. Speeding-up convolutional neural networks using fine-tuned CP -decomposition. arXiv preprint arXiv:1412.6553, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[47] [47]

Optimal brain damage

LeCun, Y., Denker, J., and Solla, S. Optimal brain damage. Advances in neural information processing systems, 2, 1989

1989

[48] [48]

Lehmer, D. H. Teaching combinatorial tricks to a computer. In Proceedings of Symposia in Applied Mathematics, pp.\ 179--193. American Mathematical Society, 1960

1960

[49] [49]

Reversible simulation of irreversible computation

Li, M., Tromp, J., and Vit \'a nyi, P. Reversible simulation of irreversible computation. Physica D: Nonlinear Phenomena, 120 0 (1-2): 0 168--176, 1998

1998

[50] [50]

Optimal brain decomposition for accurate LLM low-rank approximation

Li, Y., Lee, D., Yin, R., and Panda, P. Optimal brain decomposition for accurate LLM low-rank approximation. arXiv preprint arXiv:2604.00821, 2026

work page arXiv 2026

[51] [51]

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Lin, B., Tang, Z., Ye, Y., Cui, J., Zhu, B., Jin, P., Zhang, J., Ning, M., and Yuan, L. MoE-LlaVa : Mixture of experts for large vision-language models. arXiv preprint arXiv:2401.15947, 2024 a

work page internal anchor Pith review Pith/arXiv arXiv 2024

[52] [52]

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Lin, J., Tang, J., Tang, H., Yang, S., Dang, X., and Han, S. AWQ : Activation-aware weight quantization for LLM compression and acceleration. arXiv preprint arXiv:2306.00978, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[53] [53]

AWQ : Activation-aware weight quantization for on-device LLM compression and acceleration

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., and Han, S. AWQ : Activation-aware weight quantization for on-device LLM compression and acceleration. Proceedings of Machine Learning and Systems, 6: 0 87--100, 2024 b

2024

[54] [54]

DeepSeek-V3 Technical Report

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al. DeepSeek -v3 technical report. arXiv preprint arXiv:2412.19437, 2024 a

work page internal anchor Pith review Pith/arXiv arXiv 2024

[55] [55]

LIBERO : Benchmarking knowledge transfer for lifelong robot learning

Liu, B., Zhu, Y., Gao, C., Feng, Y., Liu, Q., Zhu, Y., and Stone, P. LIBERO : Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems, 36: 0 44776--44791, 2023 a

2023

[56] [56]

LoDA : Low-dimensional adaptation of large language models

Liu, J., Koike-Akino, T., Wang, P., Brand, M., Wang, Y., and Parsons, K. LoDA : Low-dimensional adaptation of large language models. In NeurIPS’23 Workshop on on Efficient Natural Language and Speech Processing, 2023 b

2023

[57] [57]

AWP : Activation-aware weight pruning and quantization with projected gradient descent

Liu, J., Koike-Akino, T., Wang, Y., Mansour, H., and Brand, M. AWP : Activation-aware weight pruning and quantization with projected gradient descent. arXiv preprint arXiv:2506.10205, 2025

work page arXiv 2025

[58] [58]

KIVI : A tuning-free asymmetric 2bit quantization for KV cache

Liu, Z., Yuan, J., Jin, H., Zhong, S., Xu, Z., Braverman, V., Chen, B., and Hu, X. KIVI : A tuning-free asymmetric 2bit quantization for KV cache. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of ...

2024

[59] [59]

An adaptive tensor-train decomposition approach for efficient deep neural network compression

Luo, S., Liu, M., Yu, Y., Ren, S., and Bai, Y. An adaptive tensor-train decomposition approach for efficient deep neural network compression. arXiv preprint arXiv:2408.01534, 2024

work page arXiv 2024

[60] [60]

McCluskey, E. J. Minimization of Boolean functions. The Bell System Technical Journal, 35 0 (6): 0 1417--1444, 1956

1956

[61] [61]

Learning Latent Permutations with Gumbel-Sinkhorn Networks

Mena, G., Belanger, D., Linderman, S., and Snoek, J. Learning latent permutations with Gumbel - Sinkhorn networks. arXiv preprint arXiv:1802.08665, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[62] [62]

Pointer Sentinel Mixture Models

Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[63] [63]

Novikov, A., Podoprikhin, D., Osokin, A., and Vetrov, D. P. Tensorizing neural networks. Advances in neural information processing systems, 28, 2015

2015

[64] [64]

A practical introduction to tensor networks: Matrix product states and projected entangled pair states

Or \'u s, R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics, 349: 0 117--158, 2014

2014

[65] [65]

PyTorch : An imperative style, high-performance deep learning library

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. PyTorch : An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

2019

[66] [66]

and Xie, S

Peebles, W. and Xie, S. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 4195--4205, 2023

2023

[67] [67]

Stable low-rank tensor decomposition for compression of convolutional neural network

Phan, A.-H., Sobolev, K., Sozykin, K., Ermilov, D., Gusak, J., Tichavsk \`y , P., Glukhov, V., Oseledets, I., and Cichocki, A. Stable low-rank tensor decomposition for compression of convolutional neural network. In European Conference on Computer Vision, pp.\ 522--539. Springer, 2020

2020

[68] [68]

and Eisenschlos, J

Prillo, S. and Eisenschlos, J. SoftSort : A continuous relaxation for the argsort operator. In International Conference on Machine Learning, pp.\ 7793--7802. PMLR, 2020

2020

[69] [69]

TensorNetwork: A Library for Physics and Machine Learning

Roberts, C., Milsted, A., Ganahl, M., Zalcman, A., Fontaine, B., Zou, Y., Hidary, J., Vidal, G., and Leichenauer, S. TensorNetwork : A library for physics and machine learning. arXiv preprint arXiv:1905.01330, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[70] [70]

Compressing large language models using low rank and low precision decomposition

Saha, R., Sagan, N., Srivastava, V., Goldsmith, A., and Pilanci, M. Compressing large language models using low rank and low precision decomposition. Advances in Neural Information Processing Systems, 37: 0 88981--89018, 2024

2024

[71] [71]

N., Kingsbury, B., Sindhwani, V., Arisoy, E., and Ramabhadran, B

Sainath, T. N., Kingsbury, B., Sindhwani, V., Arisoy, E., and Ramabhadran, B. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing, pp.\ 6655--6659. IEEE, 2013

2013

[72] [72]

Eigen attention: Attention in low-rank space for KV cache compression

Saxena, U., Saha, G., Choudhary, S., and Roy, K. Eigen attention: Attention in low-rank space for KV cache compression. arXiv preprint arXiv:2408.05646, 2024

work page arXiv 2024

[73] [73]

The density-matrix renormalization group in the age of matrix product states

Schollw \"o ck, U. The density-matrix renormalization group in the age of matrix product states. Annals of physics, 326 0 (1): 0 96--192, 2011

2011

[74] [74]

A., and Etzioni, O

Schwartz, R., Dodge, J., Smith, N. A., and Etzioni, O. Green AI . Communications of the ACM, 63 0 (12): 0 54--63, 2020

2020

[75] [75]

D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E

Sidiropoulos, N. D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E. E., and Faloutsos, C. Tensor decomposition for signal processing and machine learning. IEEE Transactions on signal processing, 65 0 (13): 0 3551--3582, 2017

2017

[76] [76]

Towards VQA models that can read

Singh, A., Natarjan, V., Shah, M., Jiang, Y., Chen, X., Parikh, D., and Rohrbach, M. Towards VQA models that can read. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 8317--8326, 2019

2019

[77] [77]

Sinha, A. K. and Fleuret, F. AA-SVD : Anchored and adaptive SVD for large language model compression. arXiv preprint arXiv:2604.02119, 2026

work page arXiv 2026

[78] [78]

Vla-jepa: Enhancing vision-language-action model with latent world model.arXiv preprint arXiv:2602.10098,

Sun, J., Zhang, W., Qi, Z., Ren, S., Liu, Z., Zhu, H., Sun, G., Jin, X., and Chen, Z. VLA-JEPA : Enhancing vision-language-action model with latent world model. arXiv preprint arXiv:2602.10098, 2026

work page arXiv 2026

[79] [79]

Sun, M., Liu, Z., Bair, A., and Kolter, J. Z. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[80] [80]

Takeshita, O. Y. Permutation polynomial interleavers: An algebraic-geometric perspective. IEEE Transactions on Information Theory, 53 0 (6): 0 2116--2132, 2007

2007