SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning
Pith reviewed 2026-05-21 05:56 UTC · model grok-4.3
The pith
SMoA improves fine-tuning performance over LoRA at lower parameter budgets by partitioning layers into aligned spectral blocks with Hadamard modulation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block, yielding broader coverage of pretrained spectral directions under a smaller parameter budget than a comparable-rank LoRA update.
What carries the argument
Partitioning the weight matrix into aligned spectral blocks and applying per-block Hadamard-modulated low-rank branches.
If this is right
- Broader coverage of pretrained spectral directions is obtained at fixed parameter cost.
- Average performance rises in current lower-budget fine-tuning regimes relative to LoRA.
- Competitive results are retained against other LoRA-style baselines while using fewer parameters.
Where Pith is reading between the lines
- The block-wise modulation idea could be tested in other parameter-efficient methods to check whether spectral coverage gains appear more generally.
- Measuring the actual singular-value coverage achieved by SMoA versus LoRA on real weight matrices would provide a direct test of the proposed mechanism.
Load-bearing premise
That spectral block partitioning together with per-block Hadamard modulation enlarges the family of spectrum-aware updates while keeping the total trainable parameters below those of a comparable-rank LoRA.
What would settle it
A head-to-head experiment on standard benchmarks where SMoA shows no average performance gain over LoRA at the same low parameter budget would disprove the central empirical claim.
Figures
read the original abstract
As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However, decreasing the rank encounters challenges with limited representational capacity. Theory suggests that LoRA fine-tuning with rank r converges toward the top r singular values of the pre-trained weight matrix. As the rank increases, more principal singular directions are preserved, which generally improves the model's performance. However, a larger rank also introduces more trainable parameters, leading to higher computational cost. To overcome this dilemma, we propose SMoA, a \textbf{S}pectrum \textbf{Mo}dulation \textbf{A}dapter that enlarges the accessible family of spectrum-aware updates under a smaller parameter budget. SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block, yielding broader coverage of pretrained spectral directions. We provide theoretical analysis and empirical results on multiple tasks. In our experiments, SMoA improves average performance in the current lower-budget setting over LoRA and competitive LoRA-style baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SMoA (Spectrum Modulation Adapter), a PEFT method for large language models. It partitions each pretrained weight matrix into multiple aligned spectral blocks, then applies a single Hadamard-modulated low-rank branch per diagonal block. The central claim is that this construction enlarges the family of spectrum-aware updates relative to standard LoRA while using a strictly smaller parameter budget, yielding broader coverage of pretrained singular directions; the claim is supported by a theoretical analysis and by empirical results showing higher average performance than LoRA and other LoRA-style baselines in low-budget regimes.
Significance. If the theoretical argument is made rigorous and the empirical gains prove robust to rank and block-size choices, SMoA would constitute a meaningful incremental advance in the design of spectrum-aware low-rank adapters. The explicit use of the pretrained singular spectrum to guide both partitioning and modulation is a concrete idea that could be adopted by other PEFT variants; reproducible code and clear ablation tables would further increase its utility to the community.
major comments (3)
- [Method] Method section (definition of SMoA): the paper must supply an explicit matrix-level equation showing how the per-block Hadamard modulation vector is constructed from the singular values of the full weight matrix and how the sum of the per-block ranks is constrained to remain below the rank that would be required for a comparable LoRA update. Without this, it is impossible to verify that the construction is not algebraically equivalent to a single low-rank factor of the same total parameter count.
- [Theoretical analysis] Theoretical analysis section: the claim that the modulation injects non-zero components into singular vectors outside the top-r subspace of the full matrix must be accompanied by a short proof sketch or a concrete low-dimensional counter-example demonstrating that the effective column space is strictly larger than that of a standard LoRA update with identical total trainable parameters. The current argument appears to rest on the assumption that the blocks and modulation are independent of the low-rank factors; this independence needs to be shown formally.
- [Experiments] Experiments section (Table X and Figure Y): the reported performance advantage must be accompanied by the exact rank and block-size settings used for SMoA versus each LoRA baseline so that the parameter-budget comparison is transparent. In addition, standard deviations or confidence intervals over at least three random seeds should be provided; without them the average improvement cannot be assessed for statistical reliability.
minor comments (3)
- [Abstract] The abstract states that SMoA “improves average performance … over LoRA and competitive LoRA-style baselines” but does not quantify the improvement or name the tasks; adding one sentence with the magnitude and the main benchmarks would improve clarity.
- [Method] Notation for the Hadamard product and the modulation vector should be introduced once in the method section and used consistently thereafter; several passages currently reuse the same symbol for different quantities.
- [Related work] The related-work section should explicitly contrast SMoA with prior spectral or block-wise LoRA variants (e.g., those that also exploit singular-value information) to clarify the incremental contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight opportunities to improve clarity in the method definition, rigor in the theoretical analysis, and transparency in the experimental reporting. We address each major comment below and outline the planned revisions.
read point-by-point responses
-
Referee: [Method] Method section (definition of SMoA): the paper must supply an explicit matrix-level equation showing how the per-block Hadamard modulation vector is constructed from the singular values of the full weight matrix and how the sum of the per-block ranks is constrained to remain below the rank that would be required for a comparable LoRA update. Without this, it is impossible to verify that the construction is not algebraically equivalent to a single low-rank factor of the same total parameter count.
Authors: We agree that an explicit matrix-level formulation would strengthen verifiability. In the revised manuscript we will insert a new displayed equation in Section 3 that defines the per-block Hadamard modulation vector directly from the singular values of the corresponding spectral partition of the pretrained weight matrix and states the global constraint that the sum of the per-block ranks is strictly smaller than the rank that would be needed for a LoRA update achieving the same total parameter count. This addition will make the algebraic distinction from a single low-rank factor transparent. revision: yes
-
Referee: [Theoretical analysis] Theoretical analysis section: the claim that the modulation injects non-zero components into singular vectors outside the top-r subspace of the full matrix must be accompanied by a short proof sketch or a concrete low-dimensional counter-example demonstrating that the effective column space is strictly larger than that of a standard LoRA update with identical total trainable parameters. The current argument appears to rest on the assumption that the blocks and modulation are independent of the low-rank factors; this independence needs to be shown formally.
Authors: We acknowledge that a concise formal demonstration would be helpful. The revised theoretical section will contain a short proof sketch establishing that the combination of spectral partitioning and Hadamard modulation (derived solely from pretrained singular values) produces an effective column space that properly contains directions outside the top-r subspace of the full matrix, even under a smaller total parameter budget. We will also supply a concrete 4-by-4 low-dimensional counter-example that isolates the contribution of the block-wise modulation and clarifies the independence between the modulation vectors and the trainable low-rank factors. revision: yes
-
Referee: [Experiments] Experiments section (Table X and Figure Y): the reported performance advantage must be accompanied by the exact rank and block-size settings used for SMoA versus each LoRA baseline so that the parameter-budget comparison is transparent. In addition, standard deviations or confidence intervals over at least three random seeds should be provided; without them the average improvement cannot be assessed for statistical reliability.
Authors: We agree that explicit hyper-parameter disclosure and statistical reporting are necessary for reproducibility and fair comparison. In the revised version we will augment Table X and Figure Y with the precise rank and block-size values employed for SMoA and every baseline, together with the resulting parameter counts. We will additionally report mean performance plus standard deviation over three independent random seeds for all tasks, allowing readers to evaluate the reliability of the observed gains. revision: yes
Circularity Check
No significant circularity: SMoA construction and claims remain independent of fitted inputs or self-citation chains.
full rationale
The paper defines SMoA via an explicit architectural choice—partitioning into aligned spectral blocks with per-block Hadamard-modulated low-rank branches—and supports its claim of broader spectral coverage under reduced parameter count through theoretical analysis plus direct empirical comparison against external LoRA baselines. No equation or claim in the provided text reduces the reported gains or the 'enlarged family of spectrum-aware updates' to a quantity defined by the method's own fitted parameters, nor does any load-bearing premise rest on a self-citation whose content is itself unverified. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LoRA fine-tuning with rank r converges toward the top r singular values of the pre-trained weight matrix.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 1 (Spectrum-Aware Rank Ceiling) ... rank(ΔW) ≤ U := Σ min(sk, ρ rank(Mk))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Winogrande: An adversarial winograd schema challenge at scale. 2019
work page 2019
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Piqa: Reasoning about physical commonsense in natural language
Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, and Yejin Choi. Piqa: Reasoning about physical commonsense in natural language. InThirty-Fourth AAAI Conference on Artificial Intelligence, 2020
work page 2020
-
[5]
BoolQ: Exploring the surprising difficulty of natural yes/no questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Huma...
-
[6]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge.arXiv preprint arXiv:1803.05457, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[9]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023
work page 2023
-
[10]
The second conversational intelligence challenge (convai2)
Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Ur- banek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, et al. The second conversational intelligence challenge (convai2). InThe NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations, pages 187–208. Springer, 2019
work page 2019
-
[11]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Lora+: Efficient low rank adaptation of large models.arXiv preprint arXiv:2402.12354, 2024
Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Efficient low rank adaptation of large models.arXiv preprint arXiv:2402.12354, 2024
-
[14]
Training compute-optimal large language models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems, pages 30016–30030, 2022
work page 2022
-
[15]
Parameter-efficient transfer learning for nlp
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019. 10
work page 2019
-
[16]
Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022
work page 2022
-
[17]
Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models
Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Lee. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 5254–5276, 2023
work page 2023
-
[18]
Hira: Parameter-efficient hadamard high-rank adaptation for large language models
Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. Hira: Parameter-efficient hadamard high-rank adaptation for large language models. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[19]
Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024
Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, et al. Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024
-
[20]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[21]
Vera: Vector-based random matrix adaptation.arXiv preprint arXiv:2310.11454, 2023
Dawid J Kopiczko, Tijmen Blankevoort, and Yuki M Asano. Vera: Vector-based random matrix adaptation.arXiv preprint arXiv:2310.11454, 2023
-
[22]
The power of scale for parameter-efficient prompt tuning
Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021
work page 2021
-
[23]
Prefix-tuning: Optimizing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021
work page 2021
-
[24]
Stack more layers differently: High-rank training through low-rank updates
Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. Stack more layers differently: High-rank training through low-rank updates. 2023
work page 2023
-
[25]
Relora: High- rank training through low-rank updates.arXiv preprint arXiv:2307.05695, 2023
Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, and Anna Rumshisky. Relora: High- rank training through low-rank updates.arXiv preprint arXiv:2307.05695, 2023
-
[26]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Dora: weight-decomposed low-rank adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: weight-decomposed low-rank adaptation. In Proceedings of the 41st International Conference on Machine Learning, pages 32100–32121, 2024
work page 2024
-
[28]
P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks
Xiang Liu et al. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. InACL, 2021
work page 2021
-
[29]
P- tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P- tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, 2022
work page 2022
-
[30]
Gpt understands, too.AI Open, 5:208–215, 2024
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. Gpt understands, too.AI Open, 5:208–215, 2024
work page 2024
-
[31]
Yongkang Liu, Xingle Xu, Ercong Nie, Zijing Wang, Shi Feng, Daling Wang, Qian Li, and Hinrich Schütze. Look within or look beyond? a theoretical comparison between parameter- efficient and full fine-tuning.arXiv preprint arXiv:2505.22355, 2025. 11
-
[32]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[33]
Vladimir A Marˇcenko and Leonid Andreevich Pastur. Distribution of eigenvalues for some sets of random matrices.Mathematics of the USSR-Sbornik, 1(4):457–483, 1967
work page 1967
-
[34]
Can a suit of armor conduct electricity? a new dataset for open book question answering
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 conference on empirical methods in natural language processing, pages 2381–2391, 2018
work page 2018
-
[35]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022
work page 2022
-
[36]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002
work page 2002
-
[37]
Melora: Mini-ensemble low-rank adapters for parameter- efficient fine-tuning
Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten Rijke, Zhumin Chen, and Jiahuan Pei. Melora: Mini-ensemble low-rank adapters for parameter- efficient fine-tuning. InProceedings of the 62nd Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), pages 3052–3064, 2024
work page 2024
-
[38]
Rank-accuracy trade-off for lora: A gradient-flow analysis
Michael Rushka and Diego Klabjan. Rank-accuracy trade-off for lora: A gradient-flow analysis. arXiv preprint arXiv:2602.10212, 2026
-
[39]
SocialIQA: Commonsense Reasoning about Social Interactions
Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, and Yejin Choi. Socialiqa: Com- monsense reasoning about social interactions.arXiv preprint arXiv:1904.09728, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[40]
Max Staats, Matthias Thamm, and Bernd Rosenow. Small singular values matter: A random matrix analysis of transformer models.arXiv preprint arXiv:2410.17770, 2024
-
[41]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[42]
Finetuned language models are zero-shot learners
Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations
-
[43]
Batched low-rank adaptation of foundation models
Yeming Wen and Swarat Chaudhuri. Batched low-rank adaptation of foundation models. In The Twelfth International Conference on Learning Representations
-
[44]
Wenhan Xia, Chengwei Qin, and Elad Hazan. Chain of lora: Efficient fine-tuning of language models via residual learning.arXiv preprint arXiv:2401.04151, 2024
-
[45]
Ssmlora: Enhancing low-rank adaptation with state space model
Jiayang Yu, Yihang Zhang, Bin Wang, Peiqin Lin, Yongkang Liu, and Shi Feng. Ssmlora: Enhancing low-rank adaptation with state space model. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4493–4506, 2025
work page 2025
-
[46]
Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models
Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, 2022
work page 2022
-
[47]
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019
work page 2019
-
[48]
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adalora: Adaptive budget allocation for parameter- efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023. 12
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[49]
OPT: Open Pre-trained Transformer Language Models
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[50]
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert.arXiv preprint arXiv:1904.09675, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[51]
Zhan Zhuang, Yulong Zhang, Xuehao Wang, Jiangang Lu, Ying Wei, and Yu Zhang. Time- varying lora: Towards effective cross-domain fine-tuning of diffusion models.Advances in Neural Information Processing Systems, 37:73920–73951, 2024
work page 2024
-
[52]
Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, et al. Come together, but not right now: A progressive strategy to boost low-rank adaptation. In42nd International Conference on Machine Learning, ICML 2025, 2025. 13 A Datasets We evaluate three categories of tasks: commonsense reasoning...
work page 2025
-
[53]
If rank(g∆W ⋆ )> r,(58) then P ⊤ outg∆W ⋆ Pin /∈ FLoRA(r).(59) Therefore, any block-aligned anchor-modulated target whose rank exceedsr serves as a witness that FSMoA(W0;r, K)\ F LoRA(r)̸=∅.(60) Proof.Since rank(Ck)≤ρ,(61) each matrixC k admits a rank-ρfactorization Ck =B kAk, A k ∈R ρ×din/K, B k ∈R dout/K×ρ .(62) Substituting these factorizations into th...
-
[54]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.