ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning
Pith reviewed 2026-05-18 03:41 UTC · model grok-4.3
The pith
Optimally scaling the columns of each low-rank update lets successive increments accumulate into a high-rank weight change that approximates full fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The per-update optimal low-rank matrix is formed by scaling the columns of the base low-rank factors so that the loss decrease is maximized at every step; this scaling admits an analytical expression, and the resulting sequence of increments can be summed without resetting the optimizer while still approximating the loss landscape of full-rank fine-tuning.
What carries the argument
Analytical column-wise scaling of the low-rank matrix at each update step, chosen to minimize the immediate loss and enable seamless accumulation toward a high-rank update.
If this is right
- The method delivers measurable accuracy improvements over existing LoRA variants on natural language understanding, commonsense reasoning, and mathematical problem solving.
- Convergence occurs in fewer steps for models ranging from small to 12 billion parameters.
- No optimizer restart is required when switching to the optimally scaled low-rank increments.
- The closed-form scaling removes the need for extra hyper-parameter search at each update.
Where Pith is reading between the lines
- The same scaling logic could be applied to other low-rank families such as prefix tuning or adapter modules.
- If the analytical form generalizes, training-time compute for very large models could be further reduced by skipping full-matrix gradient steps entirely.
- Longer training runs on downstream tasks might reveal whether the accumulated high-rank updates improve generalization beyond what standard LoRA achieves.
Load-bearing premise
Successive optimally scaled low-rank increments can be accumulated without restarting the optimizer and still stay close to the loss surface of full-rank fine-tuning.
What would settle it
If replacing the analytical scaling with any other fixed or learned factor erases the reported gains in convergence speed or final accuracy on the same 12-billion-parameter models and tasks, the central claim would be falsified.
Figures
read the original abstract
As large language models (LLMs) continue to scale in size, the computational overhead has become a major bottleneck for task-specific fine-tuning. While low-rank adaptation (LoRA) effectively curtails this cost by confining the weight updates to a low-dimensional subspace, such a restriction can hinder effectiveness and slow convergence. This contribution deals with these limitations by accumulating progressively a high-rank weight update from consecutive low-rank increments. Specifically, the per update optimal low-rank matrix is identified to minimize the loss function and closely approximate full fine-tuning. To endow efficient and seamless optimization without restarting, this optimal choice is formed by appropriately scaling the columns of the original low-rank matrix. Rigorous performance guarantees reveal that the optimal scaling can be found analytically. Extensive numerical tests with popular LLMs scaling up to 12 billion parameters demonstrate a consistent performance gain and fast convergence relative to state-of-the-art LoRA variants on diverse tasks including natural language understanding, commonsense reasoning, and mathematical problem solving.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ScaLoRA, a method to accumulate progressively higher-rank weight updates during LLM fine-tuning by identifying an analytically optimal scaling vector for the columns of each low-rank increment. This scaling is derived to minimize a local loss approximation at each step, allowing seamless optimizer continuation without restart while approximating full fine-tuning trajectories. The paper asserts rigorous performance guarantees for the closed-form scaling and reports consistent gains in convergence speed and task performance versus prior LoRA variants on models up to 12B parameters across NLU, commonsense reasoning, and mathematical tasks.
Significance. If the analytical optimality derivation is correct and the local quadratic approximation remains sufficiently accurate across successive updates, ScaLoRA would provide a principled, low-overhead route to effective high-rank adaptation. This could meaningfully narrow the performance gap between parameter-efficient methods and full fine-tuning while preserving the computational advantages of low-rank updates.
major comments (2)
- [Abstract] Abstract (paragraph on per-update optimal low-rank matrix): The central claim that successive optimally scaled increments can be accumulated without optimizer restart while closely approximating the full fine-tuning loss surface rests on an unverified assumption that the local quadratic model (or equivalent stationarity condition) remains valid after optimizer state updates; no Hessian tracking, curvature monitoring, or multi-step deviation analysis is described to confirm this.
- [Abstract] Abstract: The assertion of 'rigorous performance guarantees' and an 'analytical' solution for optimal scaling lacks any derivation steps, explicit assumptions, or error bounds, which is load-bearing for the optimality claim and prevents verification that the scaling does not reduce to a post-hoc fit.
minor comments (1)
- [Abstract] Numerical results summary would benefit from error bars, ablation details on scaling vector computation, and explicit comparison of effective rank achieved versus baseline LoRA variants.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications on the theoretical claims while indicating revisions that will be incorporated to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on per-update optimal low-rank matrix): The central claim that successive optimally scaled increments can be accumulated without optimizer restart while closely approximating the full fine-tuning loss surface rests on an unverified assumption that the local quadratic model (or equivalent stationarity condition) remains valid after optimizer state updates; no Hessian tracking, curvature monitoring, or multi-step deviation analysis is described to confirm this.
Authors: The derivation in the manuscript establishes per-update optimality under a local quadratic approximation of the loss, with the scaling chosen to minimize that local model while allowing the optimizer state (momentum and second-moment estimates) to continue uninterrupted. The abstract summarizes the outcome rather than the multi-step justification. We agree that explicit verification of the approximation's validity over successive steps would strengthen the presentation. In the revised manuscript we will add a dedicated subsection with empirical curvature monitoring (via gradient-norm ratios and local Hessian diagonal estimates) and quantitative deviation analysis between the quadratic model and observed loss changes across fine-tuning trajectories. revision: yes
-
Referee: [Abstract] Abstract: The assertion of 'rigorous performance guarantees' and an 'analytical' solution for optimal scaling lacks any derivation steps, explicit assumptions, or error bounds, which is load-bearing for the optimality claim and prevents verification that the scaling does not reduce to a post-hoc fit.
Authors: The abstract is intentionally concise, but Section 3 of the manuscript contains the full analytical derivation: the scaling vector is obtained in closed form by setting the gradient of the local quadratic loss approximation to zero, under the explicit assumptions of twice-differentiability of the loss and a diagonal Hessian approximation for computational tractability. Error bounds are stated in terms of the Taylor remainder. We acknowledge that these elements are not visible from the abstract alone. We will revise the abstract to include a brief outline of the key derivation steps, the main assumptions, and a reference to the detailed proof and bounds in the main text. revision: yes
Circularity Check
Analytical derivation of column scaling is self-contained and independent of fitted inputs or self-citation chains
full rationale
The paper's core step identifies an optimal scaling vector for low-rank factors by minimizing a local loss approximation (via second-order Taylor expansion or stationarity condition) and then accumulates these increments. This is a direct mathematical derivation from the stated quadratic model rather than a post-hoc fit renamed as prediction or a self-referential definition. No load-bearing uniqueness theorem, ansatz smuggled via prior self-citation, or renaming of known empirical patterns is invoked; the guarantees follow from the closed-form stationarity condition under the local model. The successive-update validity is an empirical modeling assumption, not a circularity in the derivation itself. The result remains falsifiable against full fine-tuning trajectories and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Successive low-rank updates can be scaled to approximate the loss-minimizing high-rank direction at each step.
Forward citations
Cited by 1 Pith paper
-
Low-Rank Adaptation Redux for Large Models
An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
James Baglama and Lothar Reichel. Augmented implicitly restarted lanczos bidiagonalization methods.SIAM Journal on Scientific Computing, 27(1):19–42, 2005
work page 2005
-
[3]
Dimitri Bertsekas.Nonlinear Programming, volume 4. Athena Scientific, 2016
work page 2016
-
[4]
Piqa: Reasoning about physical commonsense in natural language
Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical commonsense in natural language. InProc. AAAI Conf. Artif. Intel., pp. 7432–7439, 2020
work page 2020
-
[5]
SemEval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation
Daniel Cer, Mona Diab, Eneko Agirre, I ˜nigo Lopez-Gazpio, and Lucia Specia. SemEval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. InProc. Int. Workshop Semant. Eval., pp. 1–14. ACL, 2017
work page 2017
-
[6]
Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
On the Measure of Intelligence
Franc ¸ois Chollet. On the measure of intelligence.arXiv:1911.01547, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1911
-
[8]
BoolQ: Exploring the surprising difficulty of natural yes/no questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...
-
[9]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[10]
Qlora: Efficient fine- tuning of quantized llms
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient fine- tuning of quantized llms. InProc. Neural Information Processing Systems (NeurIPS), volume 36, pp. 10088–10115, 2023
work page 2023
-
[11]
Automatically constructing a corpus of sentential paraphrases
Bill Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. InProc. Int. Workshop Paraphrasing, 2005
work page 2005
-
[12]
The approximation of one matrix by another of lower rank.Psy- chometrika, 1(3):211–218, 1936
Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank.Psy- chometrika, 1(3):211–218, 1936. 10 Optimally Scaled Low-Rank Adaptation (ScaLoRA)
work page 1936
-
[13]
The lan- guage model evaluation harness, 07 2024
Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The lan- gu...
work page 2024
-
[14]
Parameter-efficient fine-tuning with discrete Fourier transform
Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter-efficient fine-tuning with discrete Fourier transform. InProc. Int. Conf. on Machine Learning (ICML), volume 235, pp. 14884–14901. PMLR, 21–27 Jul 2024
work page 2024
-
[15]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.Deep learning, volume 1. MIT press Cambridge, 2016
work page 2016
-
[16]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ah- mad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The Llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Flora: Low-rank adapters are secretly gradient compressors
Yongchang Hao, Yanshuai Cao, and Lili Mou. Flora: Low-rank adapters are secretly gradient compressors. InProc. Int. Conf. on Machine Learning (ICML), volume 235, pp. 17554–17571. PMLR, 21–27 Jul 2024
work page 2024
-
[18]
Pengcheng He, Jianfeng Gao, and Weizhu Chen. DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. InProc. Int. Conf. on Learning Representations (ICLR), 2023
work page 2023
-
[19]
Measuring mathematical problem solving with the MATH dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021
work page 2021
-
[20]
Cambridge university press, 2012
Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge university press, 2012
work page 2012
-
[21]
Parameter-efficient transfer learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InProc. Int. Conf. on Machine Learning (ICML), volume 97, pp. 2790–2799. PMLR, 09–15 Jun 2019
work page 2019
-
[22]
LoRA: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2022
work page 2022
-
[23]
LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models
Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Sou- janya Poria, and Roy Ka-Wei Lee. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2023
work page 2023
-
[24]
Hira: Parameter-efficient hadamard high-rank adaptation for large language models
Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. Hira: Parameter-efficient hadamard high-rank adaptation for large language models. InProc. Int. Conf. on Learning Rep- resentations (ICLR), 2025
work page 2025
-
[25]
FedPara: Low-rank hadamard product for communication-efficient federated learning
Nam Hyeon-Woo, Moon Ye-Bin, and Tae-Hyun Oh. FedPara: Low-rank hadamard product for communication-efficient federated learning. InProc. Int. Conf. on Learning Representations (ICLR), 2022
work page 2022
-
[26]
Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024
Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, et al. Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024
-
[27]
Adam: A method for stochastic optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProc. Int. Conf. on Learning Representations (ICLR), 2015
work page 2015
-
[28]
Prefix-tuning: Optimizing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProc. Conf. Assoc. Comput. Linguist. Meet. (ACL), pp. 4582–4597, August 2021. 11 Optimally Scaled Low-Rank Adaptation (ScaLoRA)
work page 2021
-
[29]
LoftQ: LoRA-fine-tuning-aware quantization for large language models
Yixiao Li, Yifan Yu, Chen Liang, Nikos Karampatziakis, Pengcheng He, Weizhu Chen, and Tuo Zhao. LoftQ: LoRA-fine-tuning-aware quantization for large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2024
work page 2024
-
[30]
ReloRA: High- rank training through low-rank updates
Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. ReloRA: High- rank training through low-rank updates. InProc. Int. Conf. on Learning Representations (ICLR), 2024
work page 2024
-
[31]
Polar: Polar-decomposed low-rank adapter representation.arXiv preprint arXiv:2506.03133, 2025
Kai Lion, Liang Zhang, Bingcong Li, and Niao He. Polar: Polar-decomposed low-rank adapter representation.arXiv preprint arXiv:2506.03133, 2025
-
[32]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProc. Int. Conf. on Learning Representations (ICLR), 2019
work page 2019
-
[33]
Pissa: Principal singular values and singu- lar vectors adaptation of large language models
Fanxu Meng, Zhaohui Wang, and Muhan Zhang. Pissa: Principal singular values and singu- lar vectors adaptation of large language models. InProc. Neural Information Processing Systems (NeurIPS), volume 37, pp. 121038–121072, 2024
work page 2024
-
[34]
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? A new dataset for open book question answering.arXiv:1809.02789, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[35]
Pytorch: An imperative style, high- performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high- perf...
work page 2019
-
[36]
Know what you don’t know: Unanswerable questions for SQuAD
Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don’t know: Unanswerable questions for SQuAD. InProc. Conf. Assoc. Comput. Linguist. Meet. (ACL), pp. 784–789, 2018
work page 2018
-
[37]
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021
work page 2021
-
[38]
SocialIQA: Commonsense Reasoning about Social Interactions
Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, and Yejin Choi. Socialiqa: Com- monsense reasoning about social interactions.arXiv:1904.09728, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[39]
J. Schur. Bemerkungen zur theorie der beschr ¨ankten bilinearformen mit unendlich vielen ver¨anderlichen.Journal f ¨ur die reine und angewandte Mathematik, 1911(140):1–28, 1911. doi: doi:10.1515/crll.1911.140.1
-
[40]
Cambridge university press, 2014
Shai Shalev-Shwartz and Shai Ben-David.Understanding machine learning: From theory to algo- rithms. Cambridge university press, 2014
work page 2014
-
[41]
Recursive deep models for semantic compositionality over a sen- timent treebank
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sen- timent treebank. InProc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642, 2013
work page 2013
-
[42]
Training neural networks with fixed sparse masks
Yi-Lin Sung, Varun Nair, and Colin A Raffel. Training neural networks with fixed sparse masks. InProc. Neural Information Processing Systems (NeurIPS), volume 34, pp. 24193–24205, 2021
work page 2021
-
[43]
Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Mer- hej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram´e, Morgane Rivi`ere, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
GLUE: A multi-task benchmark and analysis platform for natural language understanding
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proc. Int. Conf. on Learning Representations (ICLR), 2019. 12 Optimally Scaled Low-Rank Adaptation (ScaLoRA)
work page 2019
-
[46]
Lora-ga: Low-rank adaptation with gradient approxi- mation
Shaowen Wang, Linxi Yu, and Jian Li. Lora-ga: Low-rank adaptation with gradient approxi- mation. InProc. Neural Information Processing Systems (NeurIPS), volume 37, pp. 54905–54931, 2024
work page 2024
-
[47]
LoRA-pro: Are low-rank adapters properly optimized? InProc
Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. LoRA-pro: Are low-rank adapters properly optimized? InProc. Int. Conf. on Learning Representations (ICLR), 2025
work page 2025
-
[48]
Neural network acceptability judg- ments.Trans
Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. Neural network acceptability judg- ments.Trans. Assoc. Comput. Linguist., 7:625–641, 2019
work page 2019
-
[49]
A broad-coverage challenge corpus for sentence understanding through inference
Adina Williams, Nikita Nangia, and Samuel R Bowman. A broad-coverage challenge corpus for sentence understanding through inference. InProc. Conf. North Am. Chapter Assoc. Comput. Linguist., pp. 1112–1122, 2018
work page 2018
-
[50]
DoRA: Weight-decomposed low-rank adaptation
Shih yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang- Ting Cheng, and Min-Hung Chen. DoRA: Weight-decomposed low-rank adaptation. InProc. Int. Conf. on Machine Learning (ICML), 2024
work page 2024
-
[51]
Navigating text-to-image customization: From LyCORIS fine-tuning to model evalua- tion
Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong Oh, and Yanmin Gong. Navigating text-to-image customization: From LyCORIS fine-tuning to model evalua- tion. InProc. Int. Conf. on Learning Representations (ICLR), 2024
work page 2024
-
[52]
LoRA done RITE: Robust invariant transformation equilibration for loRA optimization
Jui-Nan Yen, Si Si, Zhao Meng, Felix Yu, Sai Surya Duvvuri, Inderjit S Dhillon, Cho-Jui Hsieh, and Sanjiv Kumar. LoRA done RITE: Robust invariant transformation equilibration for loRA optimization. InProc. Int. Conf. on Learning Representations (ICLR), 2025
work page 2025
-
[53]
Metamath: Bootstrap your own mathematical questions for large language models
Longhui Yu, Weisen Jiang, Han Shi, Jincheng YU, Zhengying Liu, Yu Zhang, James Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2024
work page 2024
-
[54]
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence?arXiv:1905.07830, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[55]
Adaptive budget allocation for parameter-efficient fine-tuning
Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. InProc. Int. Conf. on Learning Representations (ICLR), 2023
work page 2023
-
[56]
arXiv preprint arXiv:2403.02901 , year=
Yang Zhang, Hanlei Jin, Dan Meng, Jun Wang, and Jinghua Tan. A comprehensive survey on process-oriented automatic text summarization with exploration of llm-based methods.arXiv preprint arXiv:2403.02901, 2024
- [57]
-
[58]
Simulating classroom education with llm- empowered agents.arXiv preprint arXiv:2406.19226, 2024
Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianx- iao Jiang, Jie Cao, Huiqin Liu, Zhiyuan Liu, et al. Simulating classroom education with llm- empowered agents.arXiv preprint arXiv:2406.19226, 2024. 13 Optimally Scaled Low-Rank Adaptation (ScaLoRA) A Missing proofs This section provides the proofs omitted in the main...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.