Recognition: unknown
Low-Rank Adaptation Redux for Large Models
Pith reviewed 2026-05-09 21:45 UTC · model grok-4.3
The pith
Re-examining low-rank adaptation through signal processing connects adapter designs to classical low-rank tools and inverse problems for more principled fine-tuning of large models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that viewing LoRA through the lens of signal processing bridges modern adapter designs with classical low-rank factorization and inverse problem tools, allowing SP principles to guide advances in architectural design, efficient optimization, and applications throughout the lifecycle of large models.
What carries the argument
The signal processing lens applied to LoRA, which organizes advances into SVD-based factorization with rank-augmentation and cross-layer tensorization for architecture, initialization and gauge-invariant solvers for optimization, and extensions from pre-training through deployment.
If this is right
- Architectural choices such as cross-layer tensorization and rank augmentation follow directly from low-rank modeling principles and can be selected systematically.
- Optimization routines that use alternating solvers or parameterization-aware updates reduce training cost while preserving adaptation quality.
- LoRA-style adapters apply beyond fine-tuning to pre-training, post-training, and serving stages of large models.
- Open research at the SP-deep learning intersection can produce new methods that treat adaptation as an inverse problem.
Where Pith is reading between the lines
- This framing may encourage direct translation of classical SP algorithms, such as iterative solvers for inverse problems, into adapter update rules.
- Practitioners could test whether SP-inspired initializations improve convergence speed when fine-tuning on new domains.
- The approach suggests treating deployment constraints as additional regularization terms in the underlying low-rank formulation.
Load-bearing premise
That categorizing existing methods by technical mechanisms and highlighting connections to signal processing is sufficient to justify their effectiveness without new empirical comparisons.
What would settle it
A side-by-side experiment that applies SP-derived initialization or alternating optimization to a standard LoRA baseline and measures whether the resulting adapter achieves lower memory use or higher task accuracy on the same large-model benchmarks.
Figures
read the original abstract
Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive which architectural choices, optimization techniques, and deployment constraints should guide practical method selection. This overview revisits LoRA through the lens of signal processing (SP), bridging modern adapter designs with classical low-rank modeling tools and inverse problems, as well as highlighting how SP principles can inform principled advances of fine-tuning approaches. Rather than providing a comprehensive enumeration and empirical comparisons of LoRA variants, emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness. These advances are categorized into three complementary axes: architectural design, efficient optimization, and pertinent applications. The first axis builds on singular value decomposition (SVD)-based factorization, rank-augmentation constructions, and cross-layer tensorization, while the second axis deals with initialization, alternating solvers, gauge-invariant optimization, and parameterization-aware methods. Beyond fine-tuning, emerging applications of LoRA are accounted across the entire lifecycle of large models, ranging from pre- and post-training to serving/deployment. Finally, open research directions are outlined at the confluence of SP and deep learning to catalyze a bidirectional frontier: classical SP tools provide a principled vocabulary for designing principled PEFT methods, while the unique challenges facing modern deep learning, especially the overwhelming scale and prohibitive overhead, also offer new research lines benefiting the SP community in return.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is an overview of low-rank adaptation (LoRA) for parameter-efficient fine-tuning of large models. It revisits existing LoRA variants through signal processing (SP) principles, categorizing advances along three axes—architectural design (SVD-based factorization, rank augmentation, cross-layer tensorization), efficient optimization (initialization, alternating solvers, gauge-invariant parameterization), and applications across pre-training, fine-tuning, and deployment—while outlining open directions for bidirectional SP-DL research. The manuscript explicitly forgoes new empirical comparisons or experiments, instead emphasizing technical mechanisms to justify effectiveness and suggest principled future designs.
Significance. If the SP connections (e.g., inverse-problem formulations and classical low-rank tools) prove insightful, the synthesis could bridge communities and provide a vocabulary for designing more effective PEFT methods. The paper's value lies in its structured re-categorization of prior literature and identification of open problems; however, without new validation, its ability to demonstrate that the SP lens produces superior designs remains interpretive rather than demonstrated.
major comments (2)
- [Abstract] Abstract: The assertion that 'emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness' is not load-bearingly supported, as the manuscript provides no new quantitative analysis, derivations, or head-to-head experiments showing that SP lenses (SVD, alternating solvers, gauge invariance) predict better rank selection, convergence, or generalization than mechanisms already studied in the LoRA literature.
- [Abstract / Conclusion] The central claim that SP principles 'can inform principled advances' reduces to re-categorization of published results whose empirical support is taken as given; no concrete test or example is supplied demonstrating that the SP framing yields falsifiable improvements in design choices.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. Our manuscript is explicitly positioned as an overview that synthesizes and re-categorizes existing LoRA literature through signal-processing principles, without new experiments or quantitative comparisons. We address each major comment below and indicate where revisions to wording will be made for clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that 'emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness' is not load-bearingly supported, as the manuscript provides no new quantitative analysis, derivations, or head-to-head experiments showing that SP lenses (SVD, alternating solvers, gauge invariance) predict better rank selection, convergence, or generalization than mechanisms already studied in the LoRA literature.
Authors: We agree that the manuscript contains no new quantitative analysis, derivations, or experiments; this is stated explicitly in the abstract and introduction. The phrase 'justify their effectiveness' refers to re-interpreting the technical mechanisms already validated in the cited original works (e.g., how SVD-based low-rank updates align with classical matrix approximation guarantees) using SP concepts. We do not claim that the SP lens has been shown here to yield superior rank selection or convergence. We will revise the abstract to state that the SP perspective provides a unifying vocabulary for existing mechanisms and suggests directions for future principled designs, removing any implication of new validation. revision: partial
-
Referee: [Abstract / Conclusion] The central claim that SP principles 'can inform principled advances' reduces to re-categorization of published results whose empirical support is taken as given; no concrete test or example is supplied demonstrating that the SP framing yields falsifiable improvements in design choices.
Authors: The claim is forward-looking rather than demonstrative: the open-research-directions section outlines how SP tools (inverse-problem formulations, gauge invariance, tensor decompositions) could guide future PEFT designs. The manuscript takes the empirical results of prior work as given and uses them to illustrate SP connections. We acknowledge that no concrete falsifiable example of an SP-derived improvement is provided. We will expand the conclusion with one brief hypothetical example (e.g., using alternating minimization ideas to motivate a new initialization scheme) to illustrate the intended use of the framing, while preserving the overview character of the paper. revision: partial
Circularity Check
No significant circularity: synthesis of cited literature without self-referential derivations or fitted predictions.
full rationale
The paper is an overview that categorizes existing LoRA methods along architectural, optimization, and application axes by drawing on classical SP tools (SVD, inverse problems, alternating solvers) from the broader literature. No new equations are derived, no parameters are fitted to data within the manuscript, and no predictions or uniqueness claims reduce to the paper's own inputs or self-citations. The central thesis—that SP principles can inform PEFT advances—rests on re-interpretation and bridging of prior published results rather than any load-bearing self-referential step. This matches the default expectation for non-circular survey papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LoRA has emerged as the de facto standard for parameter-efficient fine-tuning of foundation models
Reference graph
Works this paper leans on
-
[1]
Fast and accurate optimization on the orthogonal manifold without retraction,
P. Ablin and G. Peyr´e, “Fast and accurate optimization on the orthogonal manifold without retraction,” inProc. Int. Conf. on Artificial Intelligence and Statistics, Jan. 2022
2022
-
[2]
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
A. Abouelenin, A. Ashfaq, A. Atkinson, H. Awadalla, N. Bach, J. Bao, A. Benhaim, M. Cai, V . Chaudhary, C. Chenet al., “Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture- of-loras,”arXiv preprint arXiv:2503.01743, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
Absil, R
P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008
2008
-
[4]
Tensor decompositions for learning latent variable models
A. Anandkumar, R. Ge, D. J. Hsu, S. M. Kakade, M. Telgarskyet al., “Tensor decompositions for learning latent variable models.”J. Mach. Learn. Res., vol. 15, no. 1, pp. 2773–2832, 2014
2014
-
[5]
Deep canonical correlation analysis,
G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical correlation analysis,” inProc. Int. Conf. on Machine Learn., 2013
2013
-
[6]
A convergence analysis of gradient descent for deep linear neural networks,
S. Arora, N. Cohen, N. Golowich, and W. Hu, “A convergence analysis of gradient descent for deep linear neural networks,” inProc. Int. Conf. on Learn. Representations, 2019
2019
-
[7]
The landscape of the spiked tensor model,
G. B. Arous, S. Mei, A. Montanari, and M. Nica, “The landscape of the spiked tensor model,”Comm. on Pure and Applied Mathematics, vol. 72, no. 11, pp. 2282–2330, 2019
2019
-
[8]
J. Bai, S. Bai, Y . Chu, Z. Cui, K. Dang, X. Deng, Y . Fan, W. Ge, Y . Han, F. Huanget al., “Qwen technical report,”arXiv preprint arXiv:2309.16609, 2023. IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 23
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[9]
Longbench: A bilingual, multitask benchmark for long context understanding,
Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,” inProc. Conf. Assoc. Comput. Linguist. Meet., 2024, pp. 3119–3137
2024
-
[10]
Embedding graphs under centrality constraints for network visualization,
B. Baingana and G. B. Giannakis, “Embedding graphs under centrality constraints for network visualization,”arXiv preprint arXiv:1401.4408, 2014
-
[11]
LoTR: Low tensor rank weight adaptation,
D. Bershatsky, D. Cherniuk, T. Daulbaev, A. Mikhalev, and I. Os- eledets, “LoTR: Low tensor rank weight adaptation,”arXiv preprint arXiv:2402.01376, 2024
-
[12]
Rapid switching and multi-adapter fusion via sparse high rank adapters,
K. Bhardwaj, N. P. Pandey, S. Priyadarshi, V . Ganapathy, M. Nagel, R. Esteves, S. Kadambi, S. Borse, P. Whatmough, R. Garrepalli, M. V . Baalen, and H. Teague, “Rapid switching and multi-adapter fusion via sparse high rank adapters,” inICML Workshop on Foundation Models in the Wild, 2024
2024
-
[13]
Ether: Efficient finetuning of large-scale models with hyperplane reflections,
M. Bini, K. Roth, Z. Akata, and A. Khoreva, “Ether: Efficient finetuning of large-scale models with hyperplane reflections,”arXiv preprint arXiv:2405.20271, 2024
-
[14]
Boumal,An Introduction to Optimization on Smooth Manifolds
N. Boumal,An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023
2023
-
[15]
Language models are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amode...
2020
-
[16]
A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization,
S. Burer and R. D. Monteiro, “A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization,”Mathematical Programming, vol. 95, no. 2, pp. 329–357, 2003
2003
-
[17]
Olora: Orthonormal low-rank adaptation of large language models
K. B ¨uy¨ukaky¨uz, “OLoRA: Orthonormal low-rank adaptation of large language models,”arXiv:2406.01775, 2024
-
[18]
Provable tensor-train format tensor completion by riemannian optimization,
J.-F. Cai, J. Li, and D. Xia, “Provable tensor-train format tensor completion by riemannian optimization,”Journal of Machine Learning Research, vol. 23, no. 123, pp. 1–77, 2022
2022
-
[19]
Robust principal component analysis?
E. J. Cand `es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?”Journal of the ACM, vol. 58, no. 3, pp. 1–37, 2011
2011
-
[20]
Robust principal component analysis?
——, “Robust principal component analysis?”Journal of the ACM (JACM), vol. 58, no. 3, pp. 1–37, 2011
2011
-
[21]
Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,
E. J. Candes, T. Strohmer, and V . V oroninski, “Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,” Comm. on Pure and Applied Mathematics, vol. 66, no. 8, pp. 1241–1274, 2013
2013
-
[22]
Near-optimal signal recovery from random projections: Universal encoding strategies?
E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?”IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406–5425, 2006
2006
-
[23]
Large language models are strong audio- visual speech recognition learners,
U. Cappellazzo, M. Kim, H. Chen, P. Ma, S. Petridis, D. Falavigna, A. Brutti, and M. Pantic, “Large language models are strong audio- visual speech recognition learners,” inProc. IEEE Int. Conf. Acoust., Speech, Sig. Process., 2025, pp. 1–5
2025
-
[24]
Graph multiview canonical correlation analysis,
J. Chen, G. Wang, and G. B. Giannakis, “Graph multiview canonical correlation analysis,”IEEE Trans. Signal Processing, vol. 67, no. 11, pp. 2826–2838, 2019
2019
-
[25]
Y . Chen and M. J. Wainwright, “Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees,”arXiv preprint arXiv:1509.03025, 2015
-
[26]
Long- LoRA: Efficient fine-tuning of long-context large language models,
Y . Chen, S. Qian, H. Tang, X. Lai, Z. Liu, S. Han, and J. Jia, “Long- LoRA: Efficient fine-tuning of long-context large language models,” in Proc. Int. Conf. on Learn. Representations, 2024
2024
-
[27]
Nonconvex optimization meets low- rank matrix factorization: An overview,
Y . Chi, Y . M. Lu, and Y . Chen, “Nonconvex optimization meets low- rank matrix factorization: An overview,”IEEE Trans. Signal Processing, vol. 67, no. 20, pp. 5239–5269, 2019
2019
-
[28]
The hadamard decomposition problem,
M. Ciaperoni, A. Gionis, and H. Mannila, “The hadamard decomposition problem,”Data Mining and Knowledge Discovery, vol. 38, no. 4, pp. 2306–2347, 2024
2024
-
[29]
Tensor decompositions for signal processing applications: From two-way to multiway component analysis,
A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa, and H. A. Phan, “Tensor decompositions for signal processing applications: From two-way to multiway component analysis,”IEEE Sig. Process. Mag., vol. 32, no. 2, pp. 145–163, 2015
2015
-
[30]
Independent component analysis, a new concept?
P. Comon, “Independent component analysis, a new concept?”Signal processing, vol. 36, no. 3, pp. 287–314, 1994
1994
-
[31]
Do language models use their depth efficiently?
R. Csord ´as, C. D. Manning, and C. Potts, “Do language models use their depth efficiently?” inProc. Neural Info. Processing Syst., 2025
2025
-
[32]
A multilinear singular value decomposition,
L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,”SIAM journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000
2000
-
[33]
QLoRA: Efficient finetuning of quantized LLMs,
T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” inProc. Neural Info. Process- ing Syst., vol. 36, 2023
2023
-
[34]
Flat minima generalize for low-rank matrix recovery,
L. Ding, D. Drusvyatskiy, M. Fazel, and Z. Harchaoui, “Flat minima generalize for low-rank matrix recovery,”Information and Inference: A Journal of the IMA, vol. 13, no. 2, p. iaae009, 2024
2024
-
[35]
Sparse low-rank adaptation of pre-trained language models,
N. Ding, X. Lv, Q. Wang, Y . Chen, B. Zhou, Z. Liu, and M. Sun, “Sparse low-rank adaptation of pre-trained language models,” inProc. Conf. on Empirical Methods in Natural Language Process., Dec. 2023, pp. 4133–4145
2023
-
[36]
Sensor network localization, euclidean distance matrix completions, and graph realiza- tion,
Y . Ding, N. Krislock, J. Qian, and H. Wolkowicz, “Sensor network localization, euclidean distance matrix completions, and graph realiza- tion,” inProc. of the first ACM international workshop on Mobile entity localization and tracking in GPS-less environments, 2008, pp. 129–134
2008
-
[37]
Fast and provable tensor robust principal component analysis via scaled gradient descent,
H. Dong, T. Tong, C. Ma, and Y . Chi, “Fast and provable tensor robust principal component analysis via scaled gradient descent,”Information and Inference: A Journal of the IMA, vol. 12, no. 3, pp. 1716–1758, 2023
2023
-
[38]
Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced,
S. S. Du, W. Hu, and J. D. Lee, “Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced,” inProc. Neural Info. Processing Syst., vol. 31, 2018
2018
-
[39]
A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[40]
Krona: Parameter-efficient tuning with kronecker adapter,
A. Edalati, M. Tahaei, I. Kobyzev, V . P. Nia, J. J. Clark, and M. Rezagholizadeh, “Krona: Parameter-efficient tuning with kronecker adapter,” inEnhancing LLM Performance: Efficacy, Fine-Tuning, and Inference Techniques, 2025, pp. 49–65
2025
-
[41]
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025
J. Fang, H. Jiang, K. Wang, Y . Ma, S. Jie, X. Wang, X. He, and T.-S. Chua, “Alphaedit: Null-space constrained knowledge editing for language models,”arXiv preprint arXiv:2410.02355, 2024
-
[42]
Sharpness-aware minimization for efficiently improving generalization,
P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” inProc. Int. Conf. on Learn. Representations, 2021
2021
-
[43]
Randomized nystr ¨om preconditioning,
Z. Frangella, J. A. Tropp, and M. Udell, “Randomized nystr ¨om preconditioning,”SIAM Journal on Matrix Analysis and Applications, vol. 44, no. 2, pp. 718–752, 2023
2023
-
[44]
OPTQ: Accurate quantization for generative pre-trained transformers,
E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “OPTQ: Accurate quantization for generative pre-trained transformers,” inProc. Int. Conf. on Learn. Representations, 2023
2023
-
[45]
Tensor completion and low-n-rank tensor recovery via convex optimization,
S. Gandy, B. Recht, and I. Yamada, “Tensor completion and low-n-rank tensor recovery via convex optimization,”Inverse problems, vol. 27, no. 2, p. 025010, 2011
2011
-
[46]
Optimization flows landing on the stiefel manifold,
B. Gao, S. Vary, P. Ablin, and P.-A. Absil, “Optimization flows landing on the stiefel manifold,”International Symposium on Mathematical Theory of Networks and Systems MTNS, vol. 55, no. 30, pp. 25–30, 2022
2022
-
[47]
Parameter- efficient fine-tuning with discrete fourier transform,
Z. Gao, Q. Wang, A. Chen, Z. Liu, B. Wu, L. Chen, and J. Li, “Parameter- efficient fine-tuning with discrete fourier transform,” inProc. Int. Conf. on Machine Learn., 2024
2024
-
[48]
No spurious local minima in nonconvex low rank problems: A unified geometric analysis,
R. Ge, C. Jin, and Y . Zheng, “No spurious local minima in nonconvex low rank problems: A unified geometric analysis,” inProc. Int. Conf. on Machine Learn., 2017, pp. 1233–1242
2017
-
[49]
Matrix completion has no spurious local minimum,
R. Ge, J. D. Lee, and T. Ma, “Matrix completion has no spurious local minimum,”Proc. Neural Info. Processing Syst., vol. 29, 2016
2016
-
[50]
Understanding deflation process in over-parametrized tensor decomposition,
R. Ge, Y . Ren, X. Wang, and M. Zhou, “Understanding deflation process in over-parametrized tensor decomposition,” inProc. Neural Info. Processing Syst., M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34, 2021, pp. 1299–1311
2021
-
[51]
Gemma: Open Models Based on Gemini Research and Technology
Gemma-team, T. Mesnard, C. Hardin, R. Dadashi, S. Bhupatiraju, S. Pathak, L. Sifre, M. Rivi`ere, M. S. Kale, J. Loveet al., “Gemma: Open models based on gemini research and technology,”arXiv:2403.08295, 2024
work page internal anchor Pith review arXiv 2024
-
[52]
G. B. Giannakis, Q. Ling, G. Mateos, I. D. Schizas, and H. Zhu, Decentralized Learning for Wireless Communications and Networking. Springer International Publishing, 2016, pp. 461–497
2016
-
[53]
Quantum state tomography via compressed sensing,
D. Gross, Y .-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum state tomography via compressed sensing,”Physical review letters, vol. 105, no. 15, p. 150401, 2010
2010
-
[54]
Mix-of-show: Decentralized low- rank adaptation for multi-concept customization of diffusion models,
Y . Gu, X. Wang, J. Z. Wu, Y . Shi, Y . Chen, Z. Fan, W. Xiao, R. Zhao, S. Chang, W. Wuet al., “Mix-of-show: Decentralized low- rank adaptation for multi-concept customization of diffusion models,” inProc. Neural Info. Processing Syst., vol. 36, 2023
2023
-
[55]
A new scheme for the tensor representa- tion,
W. Hackbusch and S. K ¨uhn, “A new scheme for the tensor representa- tion,”Journal of Fourier analysis and applications, vol. 15, no. 5, pp. 706–722, 2009. IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 24
2009
-
[56]
SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining,
A. Han, J. Li, W. Huang, M. Hong, A. Takeda, P. K. Jawanpuria, and B. Mishra, “SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining,” inProc. Neural Info. Processing Syst., vol. 37, 2024, pp. 118 267–118 295
2024
-
[57]
Svdiff: Compact parameter space for diffusion fine-tuning,
L. Han, Y . Li, H. Zhang, P. Milanfar, D. Metaxas, and F. Yang, “Svdiff: Compact parameter space for diffusion fine-tuning,” inProc. Int. Conf. on Computer Vision, 2023, pp. 7323–7334
2023
-
[58]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,”arXiv preprint arXiv:2403.14608, 2024
work page internal anchor Pith review arXiv 2024
-
[59]
Fast matrix completion without the condition number,
M. Hardt and M. Wootters, “Fast matrix completion without the condition number,” inConf. on learning theory, 2014, pp. 638–678
2014
-
[60]
Deploying AI for signal processing education: Selected challenges and intriguing opportunities,
J. Haupt, Q. Lu, Y . Shen, J. Chen, Y . Dong, D. McCreary, M. Akc ¸akaya, and G. B. Giannakis, “Deploying AI for signal processing education: Selected challenges and intriguing opportunities,”IEEE Sig. Process. Mag., Special Issue on Artificial Intelligence for Education: A Signal Processing Perspective, vol. 43, no. 1, pp. 32–46, 2026
2026
-
[61]
The impact of initialization on lora finetuning dynamics,
S. Hayou, N. Ghosh, and B. Yu, “The impact of initialization on lora finetuning dynamics,” inProc. Neural Info. Processing Syst., vol. 37, 2024, pp. 117 015–117 040
2024
-
[62]
LoRA+: Efficient low rank adaptation of large models,
——, “LoRA+: Efficient low rank adaptation of large models,” inProc. Int. Conf. on Machine Learn., 2024
2024
-
[63]
Measuring massive multitask language understanding,
D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” inProc. Int. Conf. on Learn. Representations, 2021
2021
-
[64]
Most tensor problems are np-hard,
C. J. Hillar and L.-H. Lim, “Most tensor problems are np-hard,”Journal of the ACM (JACM), vol. 60, no. 6, pp. 1–39, 2013
2013
-
[65]
The expression of a tensor or a polyadic as a sum of products,
F. L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,”Journal of Mathematics and Physics, vol. 6, no. 1-4, pp. 164–189, 1927
1927
-
[66]
Multiple invariants and generalized rank of a p-way matrix or tensor,
——, “Multiple invariants and generalized rank of a p-way matrix or tensor,”Journal of Mathematics and Physics, vol. 7, no. 1-4, pp. 39–79, 1928
1928
-
[67]
Relations between two sets of variates,
H. Hotelling, “Relations between two sets of variates,” inBreakthroughs in statistics: methodology and distribution, 1992, pp. 162–190
1992
-
[68]
Parameter-efficient transfer learning for NLP,
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” inProc. Int. Conf. on Machine Learn., 2019, pp. 2790–2799
2019
-
[69]
Lorta: Low rank tensor adaptation of large language models,
I. Hounie, C. Kanatsoulis, A. Tandon, and A. Ribeiro, “Lorta: Low rank tensor adaptation of large language models,”arXiv preprint arXiv:2410.04060, 2024
-
[70]
Safe lora: The silver lining of reducing safety risks when finetuning large language models,
C.-Y . Hsu, Y .-L. Tsai, C.-H. Lin, P.-Y . Chen, C.-M. Yu, and C.-Y . Huang, “Safe lora: The silver lining of reducing safety risks when finetuning large language models,” inProc. Neural Info. Processing Syst., vol. 37, 2024, pp. 65 072–65 094
2024
-
[71]
LoRA: Low-rank adaptation of large language models,
E. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in Proc. Int. Conf. on Learn. Representations, 2022
2022
-
[72]
arXiv preprint arXiv:2307.13269 , year=
C. Huang, Q. Liu, B. Y . Lin, T. Pang, C. Du, and M. Lin, “Lorahub: Efficient cross-task generalization via dynamic lora composition,”arXiv preprint arXiv:2307.13269, 2023
-
[73]
Hira: Parameter- efficient hadamard high-rank adaptation for large language models,
Q. Huang, T. Ko, Z. Zhuang, L. Tang, and Y . Zhang, “Hira: Parameter- efficient hadamard high-rank adaptation for large language models,” in Proc. Int. Conf. on Learn. Representations, 2025
2025
-
[74]
Fedpara: Low-rank hadamard product for communication-efficient federated learning,
N. Hyeon-Woo, M. Ye-Bin, and T.-H. Oh, “Fedpara: Low-rank hadamard product for communication-efficient federated learning,” inProc. Int. Conf. on Learn. Representations, 2022
2022
-
[75]
Independent component analysis: algorithms and applications,
A. Hyv¨arinen and E. Oja, “Independent component analysis: algorithms and applications,”Neural networks, vol. 13, no. 4-5, pp. 411–430, 2000
2000
-
[76]
arXiv preprint arXiv:2106.15933 , year=
A. Jacot, F. Ged, B. S ¸ims ¸ek, C. Hongler, and F. Gabriel, “Saddle-to- saddle dynamics in deep linear networks: Small initialization training, symmetry, and sparsity,”arXiv preprint arXiv:2106.15933, 2021
-
[77]
Revisiting frank-wolfe: Projection-free sparse convex opti- mization,
M. Jaggi, “Revisiting frank-wolfe: Projection-free sparse convex opti- mization,” inProc. Int. Conf. on Machine Learn., 2013, pp. 427–435
2013
-
[78]
Low-rank matrix completion using alternating minimization,
P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” inProc. of the forty-fifth annual ACM symposium on Theory of computing, 2013, pp. 665–674
2013
-
[79]
LoRA training in the NTK regime has no spurious local minima,
U. Jang, J. D. Lee, and E. K. Ryu, “LoRA training in the NTK regime has no spurious local minima,” inProc. Int. Conf. on Machine Learn., 2024
2024
-
[80]
Preconditioning matters: Fast global convergence of non-convex matrix factorization via scaled gradient descent,
X. Jia, H. Wang, J. Peng, X. Feng, and D. Meng, “Preconditioning matters: Fast global convergence of non-convex matrix factorization via scaled gradient descent,” inProc. Neural Info. Processing Syst., 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.