pith. machine review for the scientific record. sign in

arxiv: 2604.21905 · v1 · submitted 2026-04-23 · 💻 cs.LG · eess.SP

Recognition: unknown

Low-Rank Adaptation Redux for Large Models

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:45 UTC · model grok-4.3

classification 💻 cs.LG eess.SP
keywords low-rank adaptationLoRAparameter-efficient fine-tuningsignal processinglarge language modelsadapter designinverse problemsfine-tuning
0
0 comments X

The pith

Re-examining low-rank adaptation through signal processing connects adapter designs to classical low-rank tools and inverse problems for more principled fine-tuning of large models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that signal processing principles provide a unifying framework for understanding and advancing low-rank adaptation methods in fine-tuning large models. It links modern parameter-efficient techniques to classical low-rank modeling and inverse problems, then categorizes progress along three axes of architectural design, efficient optimization, and applications across the model lifecycle. A sympathetic reader would care because this framing supplies a vocabulary from established SP tools for choosing among adapter variants and reducing overhead in billion-parameter networks. Emphasis falls on technical mechanisms like SVD factorization and alternating solvers rather than new head-to-head experiments.

Core claim

The paper claims that viewing LoRA through the lens of signal processing bridges modern adapter designs with classical low-rank factorization and inverse problem tools, allowing SP principles to guide advances in architectural design, efficient optimization, and applications throughout the lifecycle of large models.

What carries the argument

The signal processing lens applied to LoRA, which organizes advances into SVD-based factorization with rank-augmentation and cross-layer tensorization for architecture, initialization and gauge-invariant solvers for optimization, and extensions from pre-training through deployment.

If this is right

  • Architectural choices such as cross-layer tensorization and rank augmentation follow directly from low-rank modeling principles and can be selected systematically.
  • Optimization routines that use alternating solvers or parameterization-aware updates reduce training cost while preserving adaptation quality.
  • LoRA-style adapters apply beyond fine-tuning to pre-training, post-training, and serving stages of large models.
  • Open research at the SP-deep learning intersection can produce new methods that treat adaptation as an inverse problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This framing may encourage direct translation of classical SP algorithms, such as iterative solvers for inverse problems, into adapter update rules.
  • Practitioners could test whether SP-inspired initializations improve convergence speed when fine-tuning on new domains.
  • The approach suggests treating deployment constraints as additional regularization terms in the underlying low-rank formulation.

Load-bearing premise

That categorizing existing methods by technical mechanisms and highlighting connections to signal processing is sufficient to justify their effectiveness without new empirical comparisons.

What would settle it

A side-by-side experiment that applies SP-derived initialization or alternating optimization to a standard LoRA baseline and measures whether the resulting adapter achieves lower memory use or higher task accuracy on the same large-model benchmarks.

Figures

Figures reproduced from arXiv: 2604.21905 by Bingcong Li, Georgios B. Giannakis, Yilang Zhang.

Figure 1
Figure 1. Figure 1: LoRA fine-tuning of a GPT-3 model. Grey and orange boxes are [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example code snippet of using LoRA through PEFT [ [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SVD factorization in AdaLoRA [236] (top) and PoLAR [114] (bottom), where Stiefel manifold St(m, r) := {U ∈ Rm×r | U⊤U = Ir}. remaining amenable to analysis. Indeed, several methods originally developed for solving (5) have proven effective in LoRA fine-tuning; cf. Sections V-B and V-C. IV. MODEL ARCHITECTURES OF LORA Having outlined low-rank tools from SP, attention now turns to fine-tuning methods that ar… view at source ↗
Figure 4
Figure 4. Figure 4: (a) PoLAR parameterization facilitates better utilization of rank. (b)(c) Fine-tuning a LLaMA2-7B model on the HellaSwag dataset [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Rank-augmented parameterization. From up to down are FedPara [ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of correlated layers in Llama 3.1-8B-Instruct using linear [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: LoRA with [69], Tucker [11], and TT [220] parameterizations. generalizes the notion of matrix rank to higher-order tensors. It represents an N-th order tensor as the superposition of rank￾one summands, given by the outer product (denoted by ◦) of N vectors A := s1 ◦ . . . ◦ sN , with (i1, . . . , iN )-th entry A[i1, . . . , iN ] = QN n=1 sn,in . For a three-way additive fine-tuning update ∆W ∈ R m×n×L, CP … view at source ↗
Figure 8
Figure 8. Figure 8: Convergence of (12) with random Gaussian and Nystrom initializations. ¨ an exponential effect on the convergence rate with a given optimization algorithm. To be specific, a Nystrom sketch [ ¨ 43], [190] is applied for initialization X0 = AΦ, Y0 = 0 (14) where Φ ∈ R r×r is a Gaussian random matrix. We first highlight a representative case, where initialization can have a pronounced impact. Consider the opti… view at source ↗
Figure 9
Figure 9. Figure 9: Minimizing f(x, y) = 1 2 (xy − 1)2 . The manifold shown on the right is for visualization purposes only. D. Gauge-invariant optimization Standard optimization methods often ignore the “invariance” inherent in LoRA parameterization. Consider a bilinear pair (X, Y). The training objective depends on (X, Y) solely through their outer product XY⊤, which is invariant to invertible linear transformations; that i… view at source ↗
Figure 10
Figure 10. Figure 10: Convergence comparison of LoRA [71], ScaledGD [187], and RefLoRA [238] for low-rank matrix factorization (12). where (ξX, ξY) and (φX, φY) are tangential to the manifold at (X, Y). With the Riemannian gradient derived in Appendix F, RefLoRA updates are given by Xt+1 = Xt − η∇Xf(Xt, Yt)S −1 t , (19a) Yt+1 = Yt − η∇Yf(Xt, Yt)St. (19b) The main benefit of this metric is that it eliminates the vertical compon… view at source ↗
Figure 11
Figure 11. Figure 11: A comparison of LoRA (left) and QLoRA (right). [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Mix of multiple (two here) concepts. Figure taken from Zi [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: LoRA for fine-tuning multi-modal models. [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
read the original abstract

Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive which architectural choices, optimization techniques, and deployment constraints should guide practical method selection. This overview revisits LoRA through the lens of signal processing (SP), bridging modern adapter designs with classical low-rank modeling tools and inverse problems, as well as highlighting how SP principles can inform principled advances of fine-tuning approaches. Rather than providing a comprehensive enumeration and empirical comparisons of LoRA variants, emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness. These advances are categorized into three complementary axes: architectural design, efficient optimization, and pertinent applications. The first axis builds on singular value decomposition (SVD)-based factorization, rank-augmentation constructions, and cross-layer tensorization, while the second axis deals with initialization, alternating solvers, gauge-invariant optimization, and parameterization-aware methods. Beyond fine-tuning, emerging applications of LoRA are accounted across the entire lifecycle of large models, ranging from pre- and post-training to serving/deployment. Finally, open research directions are outlined at the confluence of SP and deep learning to catalyze a bidirectional frontier: classical SP tools provide a principled vocabulary for designing principled PEFT methods, while the unique challenges facing modern deep learning, especially the overwhelming scale and prohibitive overhead, also offer new research lines benefiting the SP community in return.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper is an overview of low-rank adaptation (LoRA) for parameter-efficient fine-tuning of large models. It revisits existing LoRA variants through signal processing (SP) principles, categorizing advances along three axes—architectural design (SVD-based factorization, rank augmentation, cross-layer tensorization), efficient optimization (initialization, alternating solvers, gauge-invariant parameterization), and applications across pre-training, fine-tuning, and deployment—while outlining open directions for bidirectional SP-DL research. The manuscript explicitly forgoes new empirical comparisons or experiments, instead emphasizing technical mechanisms to justify effectiveness and suggest principled future designs.

Significance. If the SP connections (e.g., inverse-problem formulations and classical low-rank tools) prove insightful, the synthesis could bridge communities and provide a vocabulary for designing more effective PEFT methods. The paper's value lies in its structured re-categorization of prior literature and identification of open problems; however, without new validation, its ability to demonstrate that the SP lens produces superior designs remains interpretive rather than demonstrated.

major comments (2)
  1. [Abstract] Abstract: The assertion that 'emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness' is not load-bearingly supported, as the manuscript provides no new quantitative analysis, derivations, or head-to-head experiments showing that SP lenses (SVD, alternating solvers, gauge invariance) predict better rank selection, convergence, or generalization than mechanisms already studied in the LoRA literature.
  2. [Abstract / Conclusion] The central claim that SP principles 'can inform principled advances' reduces to re-categorization of published results whose empirical support is taken as given; no concrete test or example is supplied demonstrating that the SP framing yields falsifiable improvements in design choices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. Our manuscript is explicitly positioned as an overview that synthesizes and re-categorizes existing LoRA literature through signal-processing principles, without new experiments or quantitative comparisons. We address each major comment below and indicate where revisions to wording will be made for clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that 'emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness' is not load-bearingly supported, as the manuscript provides no new quantitative analysis, derivations, or head-to-head experiments showing that SP lenses (SVD, alternating solvers, gauge invariance) predict better rank selection, convergence, or generalization than mechanisms already studied in the LoRA literature.

    Authors: We agree that the manuscript contains no new quantitative analysis, derivations, or experiments; this is stated explicitly in the abstract and introduction. The phrase 'justify their effectiveness' refers to re-interpreting the technical mechanisms already validated in the cited original works (e.g., how SVD-based low-rank updates align with classical matrix approximation guarantees) using SP concepts. We do not claim that the SP lens has been shown here to yield superior rank selection or convergence. We will revise the abstract to state that the SP perspective provides a unifying vocabulary for existing mechanisms and suggests directions for future principled designs, removing any implication of new validation. revision: partial

  2. Referee: [Abstract / Conclusion] The central claim that SP principles 'can inform principled advances' reduces to re-categorization of published results whose empirical support is taken as given; no concrete test or example is supplied demonstrating that the SP framing yields falsifiable improvements in design choices.

    Authors: The claim is forward-looking rather than demonstrative: the open-research-directions section outlines how SP tools (inverse-problem formulations, gauge invariance, tensor decompositions) could guide future PEFT designs. The manuscript takes the empirical results of prior work as given and uses them to illustrate SP connections. We acknowledge that no concrete falsifiable example of an SP-derived improvement is provided. We will expand the conclusion with one brief hypothetical example (e.g., using alternating minimization ideas to motivate a new initialization scheme) to illustrate the intended use of the framing, while preserving the overview character of the paper. revision: partial

Circularity Check

0 steps flagged

No significant circularity: synthesis of cited literature without self-referential derivations or fitted predictions.

full rationale

The paper is an overview that categorizes existing LoRA methods along architectural, optimization, and application axes by drawing on classical SP tools (SVD, inverse problems, alternating solvers) from the broader literature. No new equations are derived, no parameters are fitted to data within the manuscript, and no predictions or uniqueness claims reduce to the paper's own inputs or self-citations. The central thesis—that SP principles can inform PEFT advances—rests on re-interpretation and bridging of prior published results rather than any load-bearing self-referential step. This matches the default expectation for non-circular survey papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a review paper the claims rest on the accuracy of summarizing prior LoRA literature and classical SP results; no new free parameters, axioms, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption LoRA has emerged as the de facto standard for parameter-efficient fine-tuning of foundation models
    Stated directly in the opening sentence of the abstract as background.

pith-pipeline@v0.9.0 · 5577 in / 1135 out tokens · 37532 ms · 2026-05-09T21:45:24.142353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

245 extracted references · 49 canonical work pages · 13 internal anchors

  1. [1]

    Fast and accurate optimization on the orthogonal manifold without retraction,

    P. Ablin and G. Peyr´e, “Fast and accurate optimization on the orthogonal manifold without retraction,” inProc. Int. Conf. on Artificial Intelligence and Statistics, Jan. 2022

  2. [2]

    Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

    A. Abouelenin, A. Ashfaq, A. Atkinson, H. Awadalla, N. Bach, J. Bao, A. Benhaim, M. Cai, V . Chaudhary, C. Chenet al., “Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture- of-loras,”arXiv preprint arXiv:2503.01743, 2025

  3. [3]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

  4. [4]

    Tensor decompositions for learning latent variable models

    A. Anandkumar, R. Ge, D. J. Hsu, S. M. Kakade, M. Telgarskyet al., “Tensor decompositions for learning latent variable models.”J. Mach. Learn. Res., vol. 15, no. 1, pp. 2773–2832, 2014

  5. [5]

    Deep canonical correlation analysis,

    G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical correlation analysis,” inProc. Int. Conf. on Machine Learn., 2013

  6. [6]

    A convergence analysis of gradient descent for deep linear neural networks,

    S. Arora, N. Cohen, N. Golowich, and W. Hu, “A convergence analysis of gradient descent for deep linear neural networks,” inProc. Int. Conf. on Learn. Representations, 2019

  7. [7]

    The landscape of the spiked tensor model,

    G. B. Arous, S. Mei, A. Montanari, and M. Nica, “The landscape of the spiked tensor model,”Comm. on Pure and Applied Mathematics, vol. 72, no. 11, pp. 2282–2330, 2019

  8. [8]

    Qwen Technical Report

    J. Bai, S. Bai, Y . Chu, Z. Cui, K. Dang, X. Deng, Y . Fan, W. Ge, Y . Han, F. Huanget al., “Qwen technical report,”arXiv preprint arXiv:2309.16609, 2023. IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 23

  9. [9]

    Longbench: A bilingual, multitask benchmark for long context understanding,

    Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,” inProc. Conf. Assoc. Comput. Linguist. Meet., 2024, pp. 3119–3137

  10. [10]

    Embedding graphs under centrality constraints for network visualization,

    B. Baingana and G. B. Giannakis, “Embedding graphs under centrality constraints for network visualization,”arXiv preprint arXiv:1401.4408, 2014

  11. [11]

    LoTR: Low tensor rank weight adaptation,

    D. Bershatsky, D. Cherniuk, T. Daulbaev, A. Mikhalev, and I. Os- eledets, “LoTR: Low tensor rank weight adaptation,”arXiv preprint arXiv:2402.01376, 2024

  12. [12]

    Rapid switching and multi-adapter fusion via sparse high rank adapters,

    K. Bhardwaj, N. P. Pandey, S. Priyadarshi, V . Ganapathy, M. Nagel, R. Esteves, S. Kadambi, S. Borse, P. Whatmough, R. Garrepalli, M. V . Baalen, and H. Teague, “Rapid switching and multi-adapter fusion via sparse high rank adapters,” inICML Workshop on Foundation Models in the Wild, 2024

  13. [13]

    Ether: Efficient finetuning of large-scale models with hyperplane reflections,

    M. Bini, K. Roth, Z. Akata, and A. Khoreva, “Ether: Efficient finetuning of large-scale models with hyperplane reflections,”arXiv preprint arXiv:2405.20271, 2024

  14. [14]

    Boumal,An Introduction to Optimization on Smooth Manifolds

    N. Boumal,An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023

  15. [15]

    Language models are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amode...

  16. [16]

    A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization,

    S. Burer and R. D. Monteiro, “A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization,”Mathematical Programming, vol. 95, no. 2, pp. 329–357, 2003

  17. [17]

    Olora: Orthonormal low-rank adaptation of large language models

    K. B ¨uy¨ukaky¨uz, “OLoRA: Orthonormal low-rank adaptation of large language models,”arXiv:2406.01775, 2024

  18. [18]

    Provable tensor-train format tensor completion by riemannian optimization,

    J.-F. Cai, J. Li, and D. Xia, “Provable tensor-train format tensor completion by riemannian optimization,”Journal of Machine Learning Research, vol. 23, no. 123, pp. 1–77, 2022

  19. [19]

    Robust principal component analysis?

    E. J. Cand `es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?”Journal of the ACM, vol. 58, no. 3, pp. 1–37, 2011

  20. [20]

    Robust principal component analysis?

    ——, “Robust principal component analysis?”Journal of the ACM (JACM), vol. 58, no. 3, pp. 1–37, 2011

  21. [21]

    Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,

    E. J. Candes, T. Strohmer, and V . V oroninski, “Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,” Comm. on Pure and Applied Mathematics, vol. 66, no. 8, pp. 1241–1274, 2013

  22. [22]

    Near-optimal signal recovery from random projections: Universal encoding strategies?

    E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?”IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406–5425, 2006

  23. [23]

    Large language models are strong audio- visual speech recognition learners,

    U. Cappellazzo, M. Kim, H. Chen, P. Ma, S. Petridis, D. Falavigna, A. Brutti, and M. Pantic, “Large language models are strong audio- visual speech recognition learners,” inProc. IEEE Int. Conf. Acoust., Speech, Sig. Process., 2025, pp. 1–5

  24. [24]

    Graph multiview canonical correlation analysis,

    J. Chen, G. Wang, and G. B. Giannakis, “Graph multiview canonical correlation analysis,”IEEE Trans. Signal Processing, vol. 67, no. 11, pp. 2826–2838, 2019

  25. [25]

    Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.arXiv preprint arXiv:1509.03025, 2015

    Y . Chen and M. J. Wainwright, “Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees,”arXiv preprint arXiv:1509.03025, 2015

  26. [26]

    Long- LoRA: Efficient fine-tuning of long-context large language models,

    Y . Chen, S. Qian, H. Tang, X. Lai, Z. Liu, S. Han, and J. Jia, “Long- LoRA: Efficient fine-tuning of long-context large language models,” in Proc. Int. Conf. on Learn. Representations, 2024

  27. [27]

    Nonconvex optimization meets low- rank matrix factorization: An overview,

    Y . Chi, Y . M. Lu, and Y . Chen, “Nonconvex optimization meets low- rank matrix factorization: An overview,”IEEE Trans. Signal Processing, vol. 67, no. 20, pp. 5239–5269, 2019

  28. [28]

    The hadamard decomposition problem,

    M. Ciaperoni, A. Gionis, and H. Mannila, “The hadamard decomposition problem,”Data Mining and Knowledge Discovery, vol. 38, no. 4, pp. 2306–2347, 2024

  29. [29]

    Tensor decompositions for signal processing applications: From two-way to multiway component analysis,

    A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa, and H. A. Phan, “Tensor decompositions for signal processing applications: From two-way to multiway component analysis,”IEEE Sig. Process. Mag., vol. 32, no. 2, pp. 145–163, 2015

  30. [30]

    Independent component analysis, a new concept?

    P. Comon, “Independent component analysis, a new concept?”Signal processing, vol. 36, no. 3, pp. 287–314, 1994

  31. [31]

    Do language models use their depth efficiently?

    R. Csord ´as, C. D. Manning, and C. Potts, “Do language models use their depth efficiently?” inProc. Neural Info. Processing Syst., 2025

  32. [32]

    A multilinear singular value decomposition,

    L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,”SIAM journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000

  33. [33]

    QLoRA: Efficient finetuning of quantized LLMs,

    T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” inProc. Neural Info. Process- ing Syst., vol. 36, 2023

  34. [34]

    Flat minima generalize for low-rank matrix recovery,

    L. Ding, D. Drusvyatskiy, M. Fazel, and Z. Harchaoui, “Flat minima generalize for low-rank matrix recovery,”Information and Inference: A Journal of the IMA, vol. 13, no. 2, p. iaae009, 2024

  35. [35]

    Sparse low-rank adaptation of pre-trained language models,

    N. Ding, X. Lv, Q. Wang, Y . Chen, B. Zhou, Z. Liu, and M. Sun, “Sparse low-rank adaptation of pre-trained language models,” inProc. Conf. on Empirical Methods in Natural Language Process., Dec. 2023, pp. 4133–4145

  36. [36]

    Sensor network localization, euclidean distance matrix completions, and graph realiza- tion,

    Y . Ding, N. Krislock, J. Qian, and H. Wolkowicz, “Sensor network localization, euclidean distance matrix completions, and graph realiza- tion,” inProc. of the first ACM international workshop on Mobile entity localization and tracking in GPS-less environments, 2008, pp. 129–134

  37. [37]

    Fast and provable tensor robust principal component analysis via scaled gradient descent,

    H. Dong, T. Tong, C. Ma, and Y . Chi, “Fast and provable tensor robust principal component analysis via scaled gradient descent,”Information and Inference: A Journal of the IMA, vol. 12, no. 3, pp. 1716–1758, 2023

  38. [38]

    Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced,

    S. S. Du, W. Hu, and J. D. Lee, “Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced,” inProc. Neural Info. Processing Syst., vol. 31, 2018

  39. [39]

    The Llama 3 Herd of Models

    A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

  40. [40]

    Krona: Parameter-efficient tuning with kronecker adapter,

    A. Edalati, M. Tahaei, I. Kobyzev, V . P. Nia, J. J. Clark, and M. Rezagholizadeh, “Krona: Parameter-efficient tuning with kronecker adapter,” inEnhancing LLM Performance: Efficacy, Fine-Tuning, and Inference Techniques, 2025, pp. 49–65

  41. [41]

    AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025

    J. Fang, H. Jiang, K. Wang, Y . Ma, S. Jie, X. Wang, X. He, and T.-S. Chua, “Alphaedit: Null-space constrained knowledge editing for language models,”arXiv preprint arXiv:2410.02355, 2024

  42. [42]

    Sharpness-aware minimization for efficiently improving generalization,

    P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” inProc. Int. Conf. on Learn. Representations, 2021

  43. [43]

    Randomized nystr ¨om preconditioning,

    Z. Frangella, J. A. Tropp, and M. Udell, “Randomized nystr ¨om preconditioning,”SIAM Journal on Matrix Analysis and Applications, vol. 44, no. 2, pp. 718–752, 2023

  44. [44]

    OPTQ: Accurate quantization for generative pre-trained transformers,

    E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “OPTQ: Accurate quantization for generative pre-trained transformers,” inProc. Int. Conf. on Learn. Representations, 2023

  45. [45]

    Tensor completion and low-n-rank tensor recovery via convex optimization,

    S. Gandy, B. Recht, and I. Yamada, “Tensor completion and low-n-rank tensor recovery via convex optimization,”Inverse problems, vol. 27, no. 2, p. 025010, 2011

  46. [46]

    Optimization flows landing on the stiefel manifold,

    B. Gao, S. Vary, P. Ablin, and P.-A. Absil, “Optimization flows landing on the stiefel manifold,”International Symposium on Mathematical Theory of Networks and Systems MTNS, vol. 55, no. 30, pp. 25–30, 2022

  47. [47]

    Parameter- efficient fine-tuning with discrete fourier transform,

    Z. Gao, Q. Wang, A. Chen, Z. Liu, B. Wu, L. Chen, and J. Li, “Parameter- efficient fine-tuning with discrete fourier transform,” inProc. Int. Conf. on Machine Learn., 2024

  48. [48]

    No spurious local minima in nonconvex low rank problems: A unified geometric analysis,

    R. Ge, C. Jin, and Y . Zheng, “No spurious local minima in nonconvex low rank problems: A unified geometric analysis,” inProc. Int. Conf. on Machine Learn., 2017, pp. 1233–1242

  49. [49]

    Matrix completion has no spurious local minimum,

    R. Ge, J. D. Lee, and T. Ma, “Matrix completion has no spurious local minimum,”Proc. Neural Info. Processing Syst., vol. 29, 2016

  50. [50]

    Understanding deflation process in over-parametrized tensor decomposition,

    R. Ge, Y . Ren, X. Wang, and M. Zhou, “Understanding deflation process in over-parametrized tensor decomposition,” inProc. Neural Info. Processing Syst., M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34, 2021, pp. 1299–1311

  51. [51]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma-team, T. Mesnard, C. Hardin, R. Dadashi, S. Bhupatiraju, S. Pathak, L. Sifre, M. Rivi`ere, M. S. Kale, J. Loveet al., “Gemma: Open models based on gemini research and technology,”arXiv:2403.08295, 2024

  52. [52]

    G. B. Giannakis, Q. Ling, G. Mateos, I. D. Schizas, and H. Zhu, Decentralized Learning for Wireless Communications and Networking. Springer International Publishing, 2016, pp. 461–497

  53. [53]

    Quantum state tomography via compressed sensing,

    D. Gross, Y .-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum state tomography via compressed sensing,”Physical review letters, vol. 105, no. 15, p. 150401, 2010

  54. [54]

    Mix-of-show: Decentralized low- rank adaptation for multi-concept customization of diffusion models,

    Y . Gu, X. Wang, J. Z. Wu, Y . Shi, Y . Chen, Z. Fan, W. Xiao, R. Zhao, S. Chang, W. Wuet al., “Mix-of-show: Decentralized low- rank adaptation for multi-concept customization of diffusion models,” inProc. Neural Info. Processing Syst., vol. 36, 2023

  55. [55]

    A new scheme for the tensor representa- tion,

    W. Hackbusch and S. K ¨uhn, “A new scheme for the tensor representa- tion,”Journal of Fourier analysis and applications, vol. 15, no. 5, pp. 706–722, 2009. IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 24

  56. [56]

    SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining,

    A. Han, J. Li, W. Huang, M. Hong, A. Takeda, P. K. Jawanpuria, and B. Mishra, “SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining,” inProc. Neural Info. Processing Syst., vol. 37, 2024, pp. 118 267–118 295

  57. [57]

    Svdiff: Compact parameter space for diffusion fine-tuning,

    L. Han, Y . Li, H. Zhang, P. Milanfar, D. Metaxas, and F. Yang, “Svdiff: Compact parameter space for diffusion fine-tuning,” inProc. Int. Conf. on Computer Vision, 2023, pp. 7323–7334

  58. [58]

    Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

    Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,”arXiv preprint arXiv:2403.14608, 2024

  59. [59]

    Fast matrix completion without the condition number,

    M. Hardt and M. Wootters, “Fast matrix completion without the condition number,” inConf. on learning theory, 2014, pp. 638–678

  60. [60]

    Deploying AI for signal processing education: Selected challenges and intriguing opportunities,

    J. Haupt, Q. Lu, Y . Shen, J. Chen, Y . Dong, D. McCreary, M. Akc ¸akaya, and G. B. Giannakis, “Deploying AI for signal processing education: Selected challenges and intriguing opportunities,”IEEE Sig. Process. Mag., Special Issue on Artificial Intelligence for Education: A Signal Processing Perspective, vol. 43, no. 1, pp. 32–46, 2026

  61. [61]

    The impact of initialization on lora finetuning dynamics,

    S. Hayou, N. Ghosh, and B. Yu, “The impact of initialization on lora finetuning dynamics,” inProc. Neural Info. Processing Syst., vol. 37, 2024, pp. 117 015–117 040

  62. [62]

    LoRA+: Efficient low rank adaptation of large models,

    ——, “LoRA+: Efficient low rank adaptation of large models,” inProc. Int. Conf. on Machine Learn., 2024

  63. [63]

    Measuring massive multitask language understanding,

    D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” inProc. Int. Conf. on Learn. Representations, 2021

  64. [64]

    Most tensor problems are np-hard,

    C. J. Hillar and L.-H. Lim, “Most tensor problems are np-hard,”Journal of the ACM (JACM), vol. 60, no. 6, pp. 1–39, 2013

  65. [65]

    The expression of a tensor or a polyadic as a sum of products,

    F. L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,”Journal of Mathematics and Physics, vol. 6, no. 1-4, pp. 164–189, 1927

  66. [66]

    Multiple invariants and generalized rank of a p-way matrix or tensor,

    ——, “Multiple invariants and generalized rank of a p-way matrix or tensor,”Journal of Mathematics and Physics, vol. 7, no. 1-4, pp. 39–79, 1928

  67. [67]

    Relations between two sets of variates,

    H. Hotelling, “Relations between two sets of variates,” inBreakthroughs in statistics: methodology and distribution, 1992, pp. 162–190

  68. [68]

    Parameter-efficient transfer learning for NLP,

    N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” inProc. Int. Conf. on Machine Learn., 2019, pp. 2790–2799

  69. [69]

    Lorta: Low rank tensor adaptation of large language models,

    I. Hounie, C. Kanatsoulis, A. Tandon, and A. Ribeiro, “Lorta: Low rank tensor adaptation of large language models,”arXiv preprint arXiv:2410.04060, 2024

  70. [70]

    Safe lora: The silver lining of reducing safety risks when finetuning large language models,

    C.-Y . Hsu, Y .-L. Tsai, C.-H. Lin, P.-Y . Chen, C.-M. Yu, and C.-Y . Huang, “Safe lora: The silver lining of reducing safety risks when finetuning large language models,” inProc. Neural Info. Processing Syst., vol. 37, 2024, pp. 65 072–65 094

  71. [71]

    LoRA: Low-rank adaptation of large language models,

    E. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in Proc. Int. Conf. on Learn. Representations, 2022

  72. [72]

    arXiv preprint arXiv:2307.13269 , year=

    C. Huang, Q. Liu, B. Y . Lin, T. Pang, C. Du, and M. Lin, “Lorahub: Efficient cross-task generalization via dynamic lora composition,”arXiv preprint arXiv:2307.13269, 2023

  73. [73]

    Hira: Parameter- efficient hadamard high-rank adaptation for large language models,

    Q. Huang, T. Ko, Z. Zhuang, L. Tang, and Y . Zhang, “Hira: Parameter- efficient hadamard high-rank adaptation for large language models,” in Proc. Int. Conf. on Learn. Representations, 2025

  74. [74]

    Fedpara: Low-rank hadamard product for communication-efficient federated learning,

    N. Hyeon-Woo, M. Ye-Bin, and T.-H. Oh, “Fedpara: Low-rank hadamard product for communication-efficient federated learning,” inProc. Int. Conf. on Learn. Representations, 2022

  75. [75]

    Independent component analysis: algorithms and applications,

    A. Hyv¨arinen and E. Oja, “Independent component analysis: algorithms and applications,”Neural networks, vol. 13, no. 4-5, pp. 411–430, 2000

  76. [76]

    arXiv preprint arXiv:2106.15933 , year=

    A. Jacot, F. Ged, B. S ¸ims ¸ek, C. Hongler, and F. Gabriel, “Saddle-to- saddle dynamics in deep linear networks: Small initialization training, symmetry, and sparsity,”arXiv preprint arXiv:2106.15933, 2021

  77. [77]

    Revisiting frank-wolfe: Projection-free sparse convex opti- mization,

    M. Jaggi, “Revisiting frank-wolfe: Projection-free sparse convex opti- mization,” inProc. Int. Conf. on Machine Learn., 2013, pp. 427–435

  78. [78]

    Low-rank matrix completion using alternating minimization,

    P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” inProc. of the forty-fifth annual ACM symposium on Theory of computing, 2013, pp. 665–674

  79. [79]

    LoRA training in the NTK regime has no spurious local minima,

    U. Jang, J. D. Lee, and E. K. Ryu, “LoRA training in the NTK regime has no spurious local minima,” inProc. Int. Conf. on Machine Learn., 2024

  80. [80]

    Preconditioning matters: Fast global convergence of non-convex matrix factorization via scaled gradient descent,

    X. Jia, H. Wang, J. Peng, X. Feng, and D. Meng, “Preconditioning matters: Fast global convergence of non-convex matrix factorization via scaled gradient descent,” inProc. Neural Info. Processing Syst., 2023

Showing first 80 references.