pith. sign in

arxiv: 2605.21147 · v1 · pith:TMZOORUZnew · submitted 2026-05-20 · 💻 cs.LG · cs.CL

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

Pith reviewed 2026-05-21 05:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CL
keywords parameter-efficient fine-tuningLoRAspectrum modulationHadamard modulationspectral blockslow-rank adaptationlarge language models
0
0 comments X

The pith

SMoA improves fine-tuning performance over LoRA at lower parameter budgets by partitioning layers into aligned spectral blocks with Hadamard modulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SMoA to address the representational limits of low-rank methods like LoRA when ranks are reduced to control parameter count. It partitions each layer into multiple aligned spectral blocks and applies a Hadamard-modulated low-rank branch to each diagonal block. This construction is meant to reach a wider set of pretrained spectral directions without raising the parameter budget. Readers would care if the result holds because it offers a way to adapt large models more effectively when compute and memory are constrained.

Core claim

SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block, yielding broader coverage of pretrained spectral directions under a smaller parameter budget than a comparable-rank LoRA update.

What carries the argument

Partitioning the weight matrix into aligned spectral blocks and applying per-block Hadamard-modulated low-rank branches.

If this is right

  • Broader coverage of pretrained spectral directions is obtained at fixed parameter cost.
  • Average performance rises in current lower-budget fine-tuning regimes relative to LoRA.
  • Competitive results are retained against other LoRA-style baselines while using fewer parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The block-wise modulation idea could be tested in other parameter-efficient methods to check whether spectral coverage gains appear more generally.
  • Measuring the actual singular-value coverage achieved by SMoA versus LoRA on real weight matrices would provide a direct test of the proposed mechanism.

Load-bearing premise

That spectral block partitioning together with per-block Hadamard modulation enlarges the family of spectrum-aware updates while keeping the total trainable parameters below those of a comparable-rank LoRA.

What would settle it

A head-to-head experiment on standard benchmarks where SMoA shows no average performance gain over LoRA at the same low parameter budget would disprove the central empirical claim.

Figures

Figures reproduced from arXiv: 2605.21147 by Daling Wang, Feiliang Ren, Hinrich Sch\"utze, Mengjie Zhao, Qian Li, Shanru Zhang, Shi Feng, Xing Li, Yongkang Liu, Zijing Wang.

Figure 1
Figure 1. Figure 1: Pretrained weights have informative spectral tails. Top: Distribution of normalized sin￾gular values ν for Llama-2-7B’s layer 5, 15, and 25. The black dashed curve shows Marchenko–Pastur prediction for a random matrix of the same shape; values beyond the bulk edge (vertical line) are tail outliers encoding structured and task-relevant information. Bottom: Maximum overlap Ok between right singular vectors a… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of SMoA pipeline (when K = 4). We illustrate the informative spec￾tral tails of pretrained model weights in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Original Llama2-7B vs. SMoA finetuned spectra and overlaps on BoolQ for key weight matrix of attention. We compare all layers before and after finetuning. The figure shows the 5th, 15th, and 25th layers, results for other layers are in the Appendix F. Top: empirical distributions of normalized singular values ν; light brown denotes the pretrained model and orange denotes the finetuned model. The dashed cur… view at source ↗
Figure 4
Figure 4. Figure 4: Left: comparison of the measured update rank of [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Original Llama2-7B vs. finetuned spectra and overlaps for key weight matrix (layer 0 to 15). Top: empirical distributions of normalized singular values ν; light brown denotes the pretrained model and orange denotes the finetuned model. The dashed curve is the theoretical random-matrix bulk. Bottom: the corresponding maximum overlap Ok between right singular vectors and activation￾covariance eigenvectors, w… view at source ↗
Figure 6
Figure 6. Figure 6: Pretrained vs. finetuned spectra and overlaps on Llama2-7B key weight matrix (layer 16 to 31). Top: empirical distributions of normalized singular values ν; light brown denotes the pretrained model and orange denotes the finetuned model. The dashed curve is the theoretical random-matrix bulk. Bottom: the corresponding maximum overlap Ok between right singular vectors and activation-covariance eigenvectors,… view at source ↗
Figure 7
Figure 7. Figure 7: Attention key singular-value histograms of the Llama-2-7B based on SMoA fine-tuning in [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
read the original abstract

As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However, decreasing the rank encounters challenges with limited representational capacity. Theory suggests that LoRA fine-tuning with rank r converges toward the top r singular values of the pre-trained weight matrix. As the rank increases, more principal singular directions are preserved, which generally improves the model's performance. However, a larger rank also introduces more trainable parameters, leading to higher computational cost. To overcome this dilemma, we propose SMoA, a \textbf{S}pectrum \textbf{Mo}dulation \textbf{A}dapter that enlarges the accessible family of spectrum-aware updates under a smaller parameter budget. SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block, yielding broader coverage of pretrained spectral directions. We provide theoretical analysis and empirical results on multiple tasks. In our experiments, SMoA improves average performance in the current lower-budget setting over LoRA and competitive LoRA-style baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes SMoA (Spectrum Modulation Adapter), a PEFT method for large language models. It partitions each pretrained weight matrix into multiple aligned spectral blocks, then applies a single Hadamard-modulated low-rank branch per diagonal block. The central claim is that this construction enlarges the family of spectrum-aware updates relative to standard LoRA while using a strictly smaller parameter budget, yielding broader coverage of pretrained singular directions; the claim is supported by a theoretical analysis and by empirical results showing higher average performance than LoRA and other LoRA-style baselines in low-budget regimes.

Significance. If the theoretical argument is made rigorous and the empirical gains prove robust to rank and block-size choices, SMoA would constitute a meaningful incremental advance in the design of spectrum-aware low-rank adapters. The explicit use of the pretrained singular spectrum to guide both partitioning and modulation is a concrete idea that could be adopted by other PEFT variants; reproducible code and clear ablation tables would further increase its utility to the community.

major comments (3)
  1. [Method] Method section (definition of SMoA): the paper must supply an explicit matrix-level equation showing how the per-block Hadamard modulation vector is constructed from the singular values of the full weight matrix and how the sum of the per-block ranks is constrained to remain below the rank that would be required for a comparable LoRA update. Without this, it is impossible to verify that the construction is not algebraically equivalent to a single low-rank factor of the same total parameter count.
  2. [Theoretical analysis] Theoretical analysis section: the claim that the modulation injects non-zero components into singular vectors outside the top-r subspace of the full matrix must be accompanied by a short proof sketch or a concrete low-dimensional counter-example demonstrating that the effective column space is strictly larger than that of a standard LoRA update with identical total trainable parameters. The current argument appears to rest on the assumption that the blocks and modulation are independent of the low-rank factors; this independence needs to be shown formally.
  3. [Experiments] Experiments section (Table X and Figure Y): the reported performance advantage must be accompanied by the exact rank and block-size settings used for SMoA versus each LoRA baseline so that the parameter-budget comparison is transparent. In addition, standard deviations or confidence intervals over at least three random seeds should be provided; without them the average improvement cannot be assessed for statistical reliability.
minor comments (3)
  1. [Abstract] The abstract states that SMoA “improves average performance … over LoRA and competitive LoRA-style baselines” but does not quantify the improvement or name the tasks; adding one sentence with the magnitude and the main benchmarks would improve clarity.
  2. [Method] Notation for the Hadamard product and the modulation vector should be introduced once in the method section and used consistently thereafter; several passages currently reuse the same symbol for different quantities.
  3. [Related work] The related-work section should explicitly contrast SMoA with prior spectral or block-wise LoRA variants (e.g., those that also exploit singular-value information) to clarify the incremental contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to improve clarity in the method definition, rigor in the theoretical analysis, and transparency in the experimental reporting. We address each major comment below and outline the planned revisions.

read point-by-point responses
  1. Referee: [Method] Method section (definition of SMoA): the paper must supply an explicit matrix-level equation showing how the per-block Hadamard modulation vector is constructed from the singular values of the full weight matrix and how the sum of the per-block ranks is constrained to remain below the rank that would be required for a comparable LoRA update. Without this, it is impossible to verify that the construction is not algebraically equivalent to a single low-rank factor of the same total parameter count.

    Authors: We agree that an explicit matrix-level formulation would strengthen verifiability. In the revised manuscript we will insert a new displayed equation in Section 3 that defines the per-block Hadamard modulation vector directly from the singular values of the corresponding spectral partition of the pretrained weight matrix and states the global constraint that the sum of the per-block ranks is strictly smaller than the rank that would be needed for a LoRA update achieving the same total parameter count. This addition will make the algebraic distinction from a single low-rank factor transparent. revision: yes

  2. Referee: [Theoretical analysis] Theoretical analysis section: the claim that the modulation injects non-zero components into singular vectors outside the top-r subspace of the full matrix must be accompanied by a short proof sketch or a concrete low-dimensional counter-example demonstrating that the effective column space is strictly larger than that of a standard LoRA update with identical total trainable parameters. The current argument appears to rest on the assumption that the blocks and modulation are independent of the low-rank factors; this independence needs to be shown formally.

    Authors: We acknowledge that a concise formal demonstration would be helpful. The revised theoretical section will contain a short proof sketch establishing that the combination of spectral partitioning and Hadamard modulation (derived solely from pretrained singular values) produces an effective column space that properly contains directions outside the top-r subspace of the full matrix, even under a smaller total parameter budget. We will also supply a concrete 4-by-4 low-dimensional counter-example that isolates the contribution of the block-wise modulation and clarifies the independence between the modulation vectors and the trainable low-rank factors. revision: yes

  3. Referee: [Experiments] Experiments section (Table X and Figure Y): the reported performance advantage must be accompanied by the exact rank and block-size settings used for SMoA versus each LoRA baseline so that the parameter-budget comparison is transparent. In addition, standard deviations or confidence intervals over at least three random seeds should be provided; without them the average improvement cannot be assessed for statistical reliability.

    Authors: We agree that explicit hyper-parameter disclosure and statistical reporting are necessary for reproducibility and fair comparison. In the revised version we will augment Table X and Figure Y with the precise rank and block-size values employed for SMoA and every baseline, together with the resulting parameter counts. We will additionally report mean performance plus standard deviation over three independent random seeds for all tasks, allowing readers to evaluate the reliability of the observed gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity: SMoA construction and claims remain independent of fitted inputs or self-citation chains.

full rationale

The paper defines SMoA via an explicit architectural choice—partitioning into aligned spectral blocks with per-block Hadamard-modulated low-rank branches—and supports its claim of broader spectral coverage under reduced parameter count through theoretical analysis plus direct empirical comparison against external LoRA baselines. No equation or claim in the provided text reduces the reported gains or the 'enlarged family of spectrum-aware updates' to a quantity defined by the method's own fitted parameters, nor does any load-bearing premise rest on a self-citation whose content is itself unverified. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract invokes the existing theoretical claim that LoRA rank-r updates converge to the top-r singular directions; the new design elements (spectral blocks, Hadamard modulation) are introduced without explicit free parameters or new invented entities listed.

axioms (1)
  • domain assumption LoRA fine-tuning with rank r converges toward the top r singular values of the pre-trained weight matrix.
    Stated directly in the abstract as background theory that motivates the need for broader spectral coverage.

pith-pipeline@v0.9.0 · 5783 in / 1313 out tokens · 37844 ms · 2026-05-21T05:56:11.907089+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 14 internal anchors

  1. [1]

    Winogrande: An adversarial winograd schema challenge at scale. 2019

  2. [2]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  3. [3]

    Qwen Technical Report

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

  4. [4]

    Piqa: Reasoning about physical commonsense in natural language

    Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, and Yejin Choi. Piqa: Reasoning about physical commonsense in natural language. InThirty-Fourth AAAI Conference on Artificial Intelligence, 2020

  5. [5]

    B ool Q : Exploring the Surprising Difficulty of Natural Yes/No Questions

    Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Huma...

  6. [6]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge.arXiv preprint arXiv:1803.05457, 2018

  7. [8]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

  8. [9]

    Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

  9. [10]

    The second conversational intelligence challenge (convai2)

    Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Ur- banek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, et al. The second conversational intelligence challenge (convai2). InThe NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations, pages 187–208. Springer, 2019

  10. [11]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  11. [12]

    Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

    Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608, 2024

  12. [13]

    Hayou, N

    Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Efficient low rank adaptation of large models.arXiv preprint arXiv:2402.12354, 2024

  13. [14]

    Training compute-optimal large language models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems, pages 30016–30030, 2022

  14. [15]

    Parameter-efficient transfer learning for nlp

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019. 10

  15. [16]

    Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022

  16. [17]

    Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models

    Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Lee. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 5254–5276, 2023

  17. [18]

    Hira: Parameter-efficient hadamard high-rank adaptation for large language models

    Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. Hira: Parameter-efficient hadamard high-rank adaptation for large language models. InThe Thirteenth International Conference on Learning Representations, 2025

  18. [19]

    Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024

    Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, et al. Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024

  19. [20]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

  20. [21]

    Vera: Vector-based random matrix adaptation.arXiv preprint arXiv:2310.11454, 2023

    Dawid J Kopiczko, Tijmen Blankevoort, and Yuki M Asano. Vera: Vector-based random matrix adaptation.arXiv preprint arXiv:2310.11454, 2023

  21. [22]

    The power of scale for parameter-efficient prompt tuning

    Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021

  22. [23]

    Prefix-tuning: Optimizing continuous prompts for generation

    Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021

  23. [24]

    Stack more layers differently: High-rank training through low-rank updates

    Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. Stack more layers differently: High-rank training through low-rank updates. 2023

  24. [25]

    Relora: High- rank training through low-rank updates.arXiv preprint arXiv:2307.05695, 2023

    Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, and Anna Rumshisky. Relora: High- rank training through low-rank updates.arXiv preprint arXiv:2307.05695, 2023

  25. [26]

    DeepSeek-V3 Technical Report

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

  26. [27]

    Dora: weight-decomposed low-rank adaptation

    Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: weight-decomposed low-rank adaptation. In Proceedings of the 41st International Conference on Machine Learning, pages 32100–32121, 2024

  27. [28]

    P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks

    Xiang Liu et al. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. InACL, 2021

  28. [29]

    P- tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks

    Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P- tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, 2022

  29. [30]

    Gpt understands, too.AI Open, 5:208–215, 2024

    Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. Gpt understands, too.AI Open, 5:208–215, 2024

  30. [31]

    Look within or look beyond? a theoretical comparison between parameter- efficient and full fine-tuning.arXiv preprint arXiv:2505.22355, 2025

    Yongkang Liu, Xingle Xu, Ercong Nie, Zijing Wang, Shi Feng, Daling Wang, Qian Li, and Hinrich Schütze. Look within or look beyond? a theoretical comparison between parameter- efficient and full fine-tuning.arXiv preprint arXiv:2505.22355, 2025. 11

  31. [32]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  32. [33]

    Distribution of eigenvalues for some sets of random matrices.Mathematics of the USSR-Sbornik, 1(4):457–483, 1967

    Vladimir A Marˇcenko and Leonid Andreevich Pastur. Distribution of eigenvalues for some sets of random matrices.Mathematics of the USSR-Sbornik, 1(4):457–483, 1967

  33. [34]

    Can a suit of armor conduct electricity? a new dataset for open book question answering

    Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 conference on empirical methods in natural language processing, pages 2381–2391, 2018

  34. [35]

    Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

  35. [36]

    Bleu: a method for automatic evaluation of machine translation

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

  36. [37]

    Melora: Mini-ensemble low-rank adapters for parameter- efficient fine-tuning

    Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten Rijke, Zhumin Chen, and Jiahuan Pei. Melora: Mini-ensemble low-rank adapters for parameter- efficient fine-tuning. InProceedings of the 62nd Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), pages 3052–3064, 2024

  37. [38]

    Rank-accuracy trade-off for lora: A gradient-flow analysis

    Michael Rushka and Diego Klabjan. Rank-accuracy trade-off for lora: A gradient-flow analysis. arXiv preprint arXiv:2602.10212, 2026

  38. [39]

    SocialIQA: Commonsense Reasoning about Social Interactions

    Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, and Yejin Choi. Socialiqa: Com- monsense reasoning about social interactions.arXiv preprint arXiv:1904.09728, 2019

  39. [40]

    Small singular values matter: A random matrix analysis of transformer models.arXiv preprint arXiv:2410.17770, 2024

    Max Staats, Matthias Thamm, and Bernd Rosenow. Small singular values matter: A random matrix analysis of transformer models.arXiv preprint arXiv:2410.17770, 2024

  40. [41]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  41. [42]

    Finetuned language models are zero-shot learners

    Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations

  42. [43]

    Batched low-rank adaptation of foundation models

    Yeming Wen and Swarat Chaudhuri. Batched low-rank adaptation of foundation models. In The Twelfth International Conference on Learning Representations

  43. [44]

    Chain of lora: Efficient fine-tuning of language models via residual learning.arXiv preprint arXiv:2401.04151, 2024

    Wenhan Xia, Chengwei Qin, and Elad Hazan. Chain of lora: Efficient fine-tuning of language models via residual learning.arXiv preprint arXiv:2401.04151, 2024

  44. [45]

    Ssmlora: Enhancing low-rank adaptation with state space model

    Jiayang Yu, Yihang Zhang, Bin Wang, Peiqin Lin, Yongkang Liu, and Shi Feng. Ssmlora: Enhancing low-rank adaptation with state space model. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4493–4506, 2025

  45. [46]

    Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models

    Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, 2022

  46. [47]

    Hellaswag: Can a machine really finish your sentence? InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

    Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

  47. [48]

    AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

    Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adalora: Adaptive budget allocation for parameter- efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023. 12

  48. [49]

    OPT: Open Pre-trained Transformer Language Models

    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

  49. [50]

    BERTScore: Evaluating Text Generation with BERT

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert.arXiv preprint arXiv:1904.09675, 2019

  50. [51]

    Time- varying lora: Towards effective cross-domain fine-tuning of diffusion models.Advances in Neural Information Processing Systems, 37:73920–73951, 2024

    Zhan Zhuang, Yulong Zhang, Xuehao Wang, Jiangang Lu, Ying Wei, and Yu Zhang. Time- varying lora: Towards effective cross-domain fine-tuning of diffusion models.Advances in Neural Information Processing Systems, 37:73920–73951, 2024

  51. [52]

    open-book

    Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, et al. Come together, but not right now: A progressive strategy to boost low-rank adaptation. In42nd International Conference on Machine Learning, ICML 2025, 2025. 13 A Datasets We evaluate three categories of tasks: commonsense reasoning...

  52. [53]

    Limitations

    If rank(g∆W ⋆ )> r,(58) then P ⊤ outg∆W ⋆ Pin /∈ FLoRA(r).(59) Therefore, any block-aligned anchor-modulated target whose rank exceedsr serves as a witness that FSMoA(W0;r, K)\ F LoRA(r)̸=∅.(60) Proof.Since rank(Ck)≤ρ,(61) each matrixC k admits a rank-ρfactorization Ck =B kAk, A k ∈R ρ×din/K, B k ∈R dout/K×ρ .(62) Substituting these factorizations into th...

  53. [54]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...