ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning

Georgios B. Giannakis; Xiaodong Yang; Yilang Zhang; Yiwei Cai

arxiv: 2510.23818 · v2 · pith:DQ4PIA4Vnew · submitted 2025-10-27 · 💻 cs.LG

ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning

Yilang Zhang , Xiaodong Yang , Yiwei Cai , Georgios B. Giannakis This is my paper

Pith reviewed 2026-05-18 03:41 UTC · model grok-4.3

classification 💻 cs.LG

keywords low-rank adaptationLoRAfine-tuninglarge language modelsparameter-efficient tuningoptimal scalinghigh-rank approximationconvergence

0 comments

The pith

Optimally scaling the columns of each low-rank update lets successive increments accumulate into a high-rank weight change that approximates full fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a way to build high-rank model updates by adding together many low-rank increments, each scaled in an optimal way. Instead of fixing the scale once, the method computes a fresh scaling factor for every update that minimizes the loss at that step. Because the scaling has a simple closed-form solution, the optimizer can keep running without restarts while the accumulated change tracks the full-rank fine-tuning surface more closely. Tests on language models up to 12 billion parameters show faster convergence and higher accuracy on understanding, reasoning, and math tasks compared with prior LoRA variants.

Core claim

The per-update optimal low-rank matrix is formed by scaling the columns of the base low-rank factors so that the loss decrease is maximized at every step; this scaling admits an analytical expression, and the resulting sequence of increments can be summed without resetting the optimizer while still approximating the loss landscape of full-rank fine-tuning.

What carries the argument

Analytical column-wise scaling of the low-rank matrix at each update step, chosen to minimize the immediate loss and enable seamless accumulation toward a high-rank update.

If this is right

The method delivers measurable accuracy improvements over existing LoRA variants on natural language understanding, commonsense reasoning, and mathematical problem solving.
Convergence occurs in fewer steps for models ranging from small to 12 billion parameters.
No optimizer restart is required when switching to the optimally scaled low-rank increments.
The closed-form scaling removes the need for extra hyper-parameter search at each update.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scaling logic could be applied to other low-rank families such as prefix tuning or adapter modules.
If the analytical form generalizes, training-time compute for very large models could be further reduced by skipping full-matrix gradient steps entirely.
Longer training runs on downstream tasks might reveal whether the accumulated high-rank updates improve generalization beyond what standard LoRA achieves.

Load-bearing premise

Successive optimally scaled low-rank increments can be accumulated without restarting the optimizer and still stay close to the loss surface of full-rank fine-tuning.

What would settle it

If replacing the analytical scaling with any other fixed or learned factor erases the reported gains in convergence speed or final accuracy on the same 12-billion-parameter models and tasks, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2510.23818 by Georgios B. Giannakis, Xiaodong Yang, Yilang Zhang, Yiwei Cai.

**Figure 2.** Figure 2: Visualization on the RTE dataset with DebertaV3-base. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Overhead comparison using LLaMA3-8B. Next, [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

As large language models (LLMs) continue to scale in size, the computational overhead has become a major bottleneck for task-specific fine-tuning. While low-rank adaptation (LoRA) effectively curtails this cost by confining the weight updates to a low-dimensional subspace, such a restriction can hinder effectiveness and slow convergence. This contribution deals with these limitations by accumulating progressively a high-rank weight update from consecutive low-rank increments. Specifically, the per update optimal low-rank matrix is identified to minimize the loss function and closely approximate full fine-tuning. To endow efficient and seamless optimization without restarting, this optimal choice is formed by appropriately scaling the columns of the original low-rank matrix. Rigorous performance guarantees reveal that the optimal scaling can be found analytically. Extensive numerical tests with popular LLMs scaling up to 12 billion parameters demonstrate a consistent performance gain and fast convergence relative to state-of-the-art LoRA variants on diverse tasks including natural language understanding, commonsense reasoning, and mathematical problem solving.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ScaLoRA, a method to accumulate progressively higher-rank weight updates during LLM fine-tuning by identifying an analytically optimal scaling vector for the columns of each low-rank increment. This scaling is derived to minimize a local loss approximation at each step, allowing seamless optimizer continuation without restart while approximating full fine-tuning trajectories. The paper asserts rigorous performance guarantees for the closed-form scaling and reports consistent gains in convergence speed and task performance versus prior LoRA variants on models up to 12B parameters across NLU, commonsense reasoning, and mathematical tasks.

Significance. If the analytical optimality derivation is correct and the local quadratic approximation remains sufficiently accurate across successive updates, ScaLoRA would provide a principled, low-overhead route to effective high-rank adaptation. This could meaningfully narrow the performance gap between parameter-efficient methods and full fine-tuning while preserving the computational advantages of low-rank updates.

major comments (2)

[Abstract] Abstract (paragraph on per-update optimal low-rank matrix): The central claim that successive optimally scaled increments can be accumulated without optimizer restart while closely approximating the full fine-tuning loss surface rests on an unverified assumption that the local quadratic model (or equivalent stationarity condition) remains valid after optimizer state updates; no Hessian tracking, curvature monitoring, or multi-step deviation analysis is described to confirm this.
[Abstract] Abstract: The assertion of 'rigorous performance guarantees' and an 'analytical' solution for optimal scaling lacks any derivation steps, explicit assumptions, or error bounds, which is load-bearing for the optimality claim and prevents verification that the scaling does not reduce to a post-hoc fit.

minor comments (1)

[Abstract] Numerical results summary would benefit from error bars, ablation details on scaling vector computation, and explicit comparison of effective rank achieved versus baseline LoRA variants.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications on the theoretical claims while indicating revisions that will be incorporated to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on per-update optimal low-rank matrix): The central claim that successive optimally scaled increments can be accumulated without optimizer restart while closely approximating the full fine-tuning loss surface rests on an unverified assumption that the local quadratic model (or equivalent stationarity condition) remains valid after optimizer state updates; no Hessian tracking, curvature monitoring, or multi-step deviation analysis is described to confirm this.

Authors: The derivation in the manuscript establishes per-update optimality under a local quadratic approximation of the loss, with the scaling chosen to minimize that local model while allowing the optimizer state (momentum and second-moment estimates) to continue uninterrupted. The abstract summarizes the outcome rather than the multi-step justification. We agree that explicit verification of the approximation's validity over successive steps would strengthen the presentation. In the revised manuscript we will add a dedicated subsection with empirical curvature monitoring (via gradient-norm ratios and local Hessian diagonal estimates) and quantitative deviation analysis between the quadratic model and observed loss changes across fine-tuning trajectories. revision: yes
Referee: [Abstract] Abstract: The assertion of 'rigorous performance guarantees' and an 'analytical' solution for optimal scaling lacks any derivation steps, explicit assumptions, or error bounds, which is load-bearing for the optimality claim and prevents verification that the scaling does not reduce to a post-hoc fit.

Authors: The abstract is intentionally concise, but Section 3 of the manuscript contains the full analytical derivation: the scaling vector is obtained in closed form by setting the gradient of the local quadratic loss approximation to zero, under the explicit assumptions of twice-differentiability of the loss and a diagonal Hessian approximation for computational tractability. Error bounds are stated in terms of the Taylor remainder. We acknowledge that these elements are not visible from the abstract alone. We will revise the abstract to include a brief outline of the key derivation steps, the main assumptions, and a reference to the detailed proof and bounds in the main text. revision: yes

Circularity Check

0 steps flagged

Analytical derivation of column scaling is self-contained and independent of fitted inputs or self-citation chains

full rationale

The paper's core step identifies an optimal scaling vector for low-rank factors by minimizing a local loss approximation (via second-order Taylor expansion or stationarity condition) and then accumulates these increments. This is a direct mathematical derivation from the stated quadratic model rather than a post-hoc fit renamed as prediction or a self-referential definition. No load-bearing uniqueness theorem, ansatz smuggled via prior self-citation, or renaming of known empirical patterns is invoked; the guarantees follow from the closed-form stationarity condition under the local model. The successive-update validity is an empirical modeling assumption, not a circularity in the derivation itself. The result remains falsifiable against full fine-tuning trajectories and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the existence of an analytical per-step scaling that minimizes loss without restart; no explicit free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Successive low-rank updates can be scaled to approximate the loss-minimizing high-rank direction at each step.
Abstract states that the optimal low-rank matrix is identified to minimize the loss and approximate full fine-tuning.

pith-pipeline@v0.9.0 · 5714 in / 1162 out tokens · 27237 ms · 2026-05-18T03:41:32.982871+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Low-Rank Adaptation Redux for Large Models
cs.LG 2026-04 unverdicted novelty 3.0

An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · cited by 1 Pith paper · 10 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Augmented implicitly restarted lanczos bidiagonalization methods.SIAM Journal on Scientific Computing, 27(1):19–42, 2005

James Baglama and Lothar Reichel. Augmented implicitly restarted lanczos bidiagonalization methods.SIAM Journal on Scientific Computing, 27(1):19–42, 2005

work page 2005
[3]

Athena Scientific, 2016

Dimitri Bertsekas.Nonlinear Programming, volume 4. Athena Scientific, 2016

work page 2016
[4]

Piqa: Reasoning about physical commonsense in natural language

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical commonsense in natural language. InProc. AAAI Conf. Artif. Intel., pp. 7432–7439, 2020

work page 2020
[5]

SemEval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation

Daniel Cer, Mona Diab, Eneko Agirre, I ˜nigo Lopez-Gazpio, and Lucia Specia. SemEval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. InProc. Int. Workshop Semant. Eval., pp. 1–14. ACL, 2017

work page 2017
[6]

Evaluating Large Language Models Trained on Code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

On the Measure of Intelligence

Franc ¸ois Chollet. On the measure of intelligence.arXiv:1911.01547, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911
[8]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...

work page doi:10.18653/v1/n19-1300 2019
[9]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[10]

Qlora: Efficient fine- tuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient fine- tuning of quantized llms. InProc. Neural Information Processing Systems (NeurIPS), volume 36, pp. 10088–10115, 2023

work page 2023
[11]

Automatically constructing a corpus of sentential paraphrases

Bill Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. InProc. Int. Workshop Paraphrasing, 2005

work page 2005
[12]

The approximation of one matrix by another of lower rank.Psy- chometrika, 1(3):211–218, 1936

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank.Psy- chometrika, 1(3):211–218, 1936. 10 Optimally Scaled Low-Rank Adaptation (ScaLoRA)

work page 1936
[13]

The lan- guage model evaluation harness, 07 2024

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The lan- gu...

work page 2024
[14]

Parameter-efficient fine-tuning with discrete Fourier transform

Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter-efficient fine-tuning with discrete Fourier transform. InProc. Int. Conf. on Machine Learning (ICML), volume 235, pp. 14884–14901. PMLR, 21–27 Jul 2024

work page 2024
[15]

MIT press Cambridge, 2016

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.Deep learning, volume 1. MIT press Cambridge, 2016

work page 2016
[16]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ah- mad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The Llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Flora: Low-rank adapters are secretly gradient compressors

Yongchang Hao, Yanshuai Cao, and Lili Mou. Flora: Low-rank adapters are secretly gradient compressors. InProc. Int. Conf. on Machine Learning (ICML), volume 235, pp. 17554–17571. PMLR, 21–27 Jul 2024

work page 2024
[18]

DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing

Pengcheng He, Jianfeng Gao, and Weizhu Chen. DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. InProc. Int. Conf. on Learning Representations (ICLR), 2023

work page 2023
[19]

Measuring mathematical problem solving with the MATH dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

work page 2021
[20]

Cambridge university press, 2012

Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge university press, 2012

work page 2012
[21]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InProc. Int. Conf. on Machine Learning (ICML), volume 97, pp. 2790–2799. PMLR, 09–15 Jun 2019

work page 2019
[22]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2022

work page 2022
[23]

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models

Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Sou- janya Poria, and Roy Ka-Wei Lee. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2023

work page 2023
[24]

Hira: Parameter-efficient hadamard high-rank adaptation for large language models

Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. Hira: Parameter-efficient hadamard high-rank adaptation for large language models. InProc. Int. Conf. on Learning Rep- resentations (ICLR), 2025

work page 2025
[25]

FedPara: Low-rank hadamard product for communication-efficient federated learning

Nam Hyeon-Woo, Moon Ye-Bin, and Tae-Hyun Oh. FedPara: Low-rank hadamard product for communication-efficient federated learning. InProc. Int. Conf. on Learning Representations (ICLR), 2022

work page 2022
[26]

Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024

Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, et al. Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024

work page arXiv 2024
[27]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProc. Int. Conf. on Learning Representations (ICLR), 2015

work page 2015
[28]

Prefix-tuning: Optimizing continuous prompts for generation

Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProc. Conf. Assoc. Comput. Linguist. Meet. (ACL), pp. 4582–4597, August 2021. 11 Optimally Scaled Low-Rank Adaptation (ScaLoRA)

work page 2021
[29]

LoftQ: LoRA-fine-tuning-aware quantization for large language models

Yixiao Li, Yifan Yu, Chen Liang, Nikos Karampatziakis, Pengcheng He, Weizhu Chen, and Tuo Zhao. LoftQ: LoRA-fine-tuning-aware quantization for large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024
[30]

ReloRA: High- rank training through low-rank updates

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. ReloRA: High- rank training through low-rank updates. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024
[31]

Polar: Polar-decomposed low-rank adapter representation.arXiv preprint arXiv:2506.03133, 2025

Kai Lion, Liang Zhang, Bingcong Li, and Niao He. Polar: Polar-decomposed low-rank adapter representation.arXiv preprint arXiv:2506.03133, 2025

work page arXiv 2025
[32]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProc. Int. Conf. on Learning Representations (ICLR), 2019

work page 2019
[33]

Pissa: Principal singular values and singu- lar vectors adaptation of large language models

Fanxu Meng, Zhaohui Wang, and Muhan Zhang. Pissa: Principal singular values and singu- lar vectors adaptation of large language models. InProc. Neural Information Processing Systems (NeurIPS), volume 37, pp. 121038–121072, 2024

work page 2024
[34]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? A new dataset for open book question answering.arXiv:1809.02789, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Pytorch: An imperative style, high- performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high- perf...

work page 2019
[36]

Know what you don’t know: Unanswerable questions for SQuAD

Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don’t know: Unanswerable questions for SQuAD. InProc. Conf. Assoc. Comput. Linguist. Meet. (ACL), pp. 784–789, 2018

work page 2018
[37]

Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021

work page 2021
[38]

SocialIQA: Commonsense Reasoning about Social Interactions

Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, and Yejin Choi. Socialiqa: Com- monsense reasoning about social interactions.arXiv:1904.09728, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[39]

J. Schur. Bemerkungen zur theorie der beschr ¨ankten bilinearformen mit unendlich vielen ver¨anderlichen.Journal f ¨ur die reine und angewandte Mathematik, 1911(140):1–28, 1911. doi: doi:10.1515/crll.1911.140.1

work page doi:10.1515/crll.1911.140.1 1911
[40]

Cambridge university press, 2014

Shai Shalev-Shwartz and Shai Ben-David.Understanding machine learning: From theory to algo- rithms. Cambridge university press, 2014

work page 2014
[41]

Recursive deep models for semantic compositionality over a sen- timent treebank

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sen- timent treebank. InProc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642, 2013

work page 2013
[42]

Training neural networks with fixed sparse masks

Yi-Lin Sung, Varun Nair, and Colin A Raffel. Training neural networks with fixed sparse masks. InProc. Neural Information Processing Systems (NeurIPS), volume 34, pp. 24193–24205, 2021

work page 2021
[43]

Gemma 3 Technical Report

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Mer- hej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram´e, Morgane Rivi`ere, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[45]

GLUE: A multi-task benchmark and analysis platform for natural language understanding

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proc. Int. Conf. on Learning Representations (ICLR), 2019. 12 Optimally Scaled Low-Rank Adaptation (ScaLoRA)

work page 2019
[46]

Lora-ga: Low-rank adaptation with gradient approxi- mation

Shaowen Wang, Linxi Yu, and Jian Li. Lora-ga: Low-rank adaptation with gradient approxi- mation. InProc. Neural Information Processing Systems (NeurIPS), volume 37, pp. 54905–54931, 2024

work page 2024
[47]

LoRA-pro: Are low-rank adapters properly optimized? InProc

Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. LoRA-pro: Are low-rank adapters properly optimized? InProc. Int. Conf. on Learning Representations (ICLR), 2025

work page 2025
[48]

Neural network acceptability judg- ments.Trans

Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. Neural network acceptability judg- ments.Trans. Assoc. Comput. Linguist., 7:625–641, 2019

work page 2019
[49]

A broad-coverage challenge corpus for sentence understanding through inference

Adina Williams, Nikita Nangia, and Samuel R Bowman. A broad-coverage challenge corpus for sentence understanding through inference. InProc. Conf. North Am. Chapter Assoc. Comput. Linguist., pp. 1112–1122, 2018

work page 2018
[50]

DoRA: Weight-decomposed low-rank adaptation

Shih yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang- Ting Cheng, and Min-Hung Chen. DoRA: Weight-decomposed low-rank adaptation. InProc. Int. Conf. on Machine Learning (ICML), 2024

work page 2024
[51]

Navigating text-to-image customization: From LyCORIS fine-tuning to model evalua- tion

Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong Oh, and Yanmin Gong. Navigating text-to-image customization: From LyCORIS fine-tuning to model evalua- tion. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024
[52]

LoRA done RITE: Robust invariant transformation equilibration for loRA optimization

Jui-Nan Yen, Si Si, Zhao Meng, Felix Yu, Sai Surya Duvvuri, Inderjit S Dhillon, Cho-Jui Hsieh, and Sanjiv Kumar. LoRA done RITE: Robust invariant transformation equilibration for loRA optimization. InProc. Int. Conf. on Learning Representations (ICLR), 2025

work page 2025
[53]

Metamath: Bootstrap your own mathematical questions for large language models

Longhui Yu, Weisen Jiang, Han Shi, Jincheng YU, Zhengying Liu, Yu Zhang, James Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024
[54]

HellaSwag: Can a Machine Really Finish Your Sentence?

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence?arXiv:1905.07830, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[55]

Adaptive budget allocation for parameter-efficient fine-tuning

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. InProc. Int. Conf. on Learning Representations (ICLR), 2023

work page 2023
[56]

arXiv preprint arXiv:2403.02901 , year=

Yang Zhang, Hanlei Jin, Dan Meng, Jun Wang, and Jinghua Tan. A comprehensive survey on process-oriented automatic text summarization with exploration of llm-based methods.arXiv preprint arXiv:2403.02901, 2024

work page arXiv 2024
[57]

Giannakis

Yilang Zhang, Bingcong Li, and Georgios B. Giannakis. Reflora: Refactored low-rank adap- tation for efficient fine-tuning of large models. InAdvances in Neural Information Processing Systems, 2025

work page 2025
[58]

Simulating classroom education with llm- empowered agents.arXiv preprint arXiv:2406.19226, 2024

Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianx- iao Jiang, Jie Cao, Huiqin Liu, Zhiyuan Liu, et al. Simulating classroom education with llm- empowered agents.arXiv preprint arXiv:2406.19226, 2024. 13 Optimally Scaled Low-Rank Adaptation (ScaLoRA) A Missing proofs This section provides the proofs omitted in the main...

work page arXiv 2024

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Augmented implicitly restarted lanczos bidiagonalization methods.SIAM Journal on Scientific Computing, 27(1):19–42, 2005

James Baglama and Lothar Reichel. Augmented implicitly restarted lanczos bidiagonalization methods.SIAM Journal on Scientific Computing, 27(1):19–42, 2005

work page 2005

[3] [3]

Athena Scientific, 2016

Dimitri Bertsekas.Nonlinear Programming, volume 4. Athena Scientific, 2016

work page 2016

[4] [4]

Piqa: Reasoning about physical commonsense in natural language

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical commonsense in natural language. InProc. AAAI Conf. Artif. Intel., pp. 7432–7439, 2020

work page 2020

[5] [5]

SemEval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation

Daniel Cer, Mona Diab, Eneko Agirre, I ˜nigo Lopez-Gazpio, and Lucia Specia. SemEval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. InProc. Int. Workshop Semant. Eval., pp. 1–14. ACL, 2017

work page 2017

[6] [6]

Evaluating Large Language Models Trained on Code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[7] [7]

On the Measure of Intelligence

Franc ¸ois Chollet. On the measure of intelligence.arXiv:1911.01547, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911

[8] [8]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...

work page doi:10.18653/v1/n19-1300 2019

[9] [9]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[10] [10]

Qlora: Efficient fine- tuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient fine- tuning of quantized llms. InProc. Neural Information Processing Systems (NeurIPS), volume 36, pp. 10088–10115, 2023

work page 2023

[11] [11]

Automatically constructing a corpus of sentential paraphrases

Bill Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. InProc. Int. Workshop Paraphrasing, 2005

work page 2005

[12] [12]

The approximation of one matrix by another of lower rank.Psy- chometrika, 1(3):211–218, 1936

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank.Psy- chometrika, 1(3):211–218, 1936. 10 Optimally Scaled Low-Rank Adaptation (ScaLoRA)

work page 1936

[13] [13]

The lan- guage model evaluation harness, 07 2024

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The lan- gu...

work page 2024

[14] [14]

Parameter-efficient fine-tuning with discrete Fourier transform

Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter-efficient fine-tuning with discrete Fourier transform. InProc. Int. Conf. on Machine Learning (ICML), volume 235, pp. 14884–14901. PMLR, 21–27 Jul 2024

work page 2024

[15] [15]

MIT press Cambridge, 2016

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.Deep learning, volume 1. MIT press Cambridge, 2016

work page 2016

[16] [16]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ah- mad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The Llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Flora: Low-rank adapters are secretly gradient compressors

Yongchang Hao, Yanshuai Cao, and Lili Mou. Flora: Low-rank adapters are secretly gradient compressors. InProc. Int. Conf. on Machine Learning (ICML), volume 235, pp. 17554–17571. PMLR, 21–27 Jul 2024

work page 2024

[18] [18]

DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing

Pengcheng He, Jianfeng Gao, and Weizhu Chen. DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. InProc. Int. Conf. on Learning Representations (ICLR), 2023

work page 2023

[19] [19]

Measuring mathematical problem solving with the MATH dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

work page 2021

[20] [20]

Cambridge university press, 2012

Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge university press, 2012

work page 2012

[21] [21]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InProc. Int. Conf. on Machine Learning (ICML), volume 97, pp. 2790–2799. PMLR, 09–15 Jun 2019

work page 2019

[22] [22]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2022

work page 2022

[23] [23]

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models

Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Sou- janya Poria, and Roy Ka-Wei Lee. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2023

work page 2023

[24] [24]

Hira: Parameter-efficient hadamard high-rank adaptation for large language models

Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. Hira: Parameter-efficient hadamard high-rank adaptation for large language models. InProc. Int. Conf. on Learning Rep- resentations (ICLR), 2025

work page 2025

[25] [25]

FedPara: Low-rank hadamard product for communication-efficient federated learning

Nam Hyeon-Woo, Moon Ye-Bin, and Tae-Hyun Oh. FedPara: Low-rank hadamard product for communication-efficient federated learning. InProc. Int. Conf. on Learning Representations (ICLR), 2022

work page 2022

[26] [26]

Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024

Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, et al. Mora: High-rank updating for parameter- efficient fine-tuning.arXiv preprint arXiv:2405.12130, 2024

work page arXiv 2024

[27] [27]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProc. Int. Conf. on Learning Representations (ICLR), 2015

work page 2015

[28] [28]

Prefix-tuning: Optimizing continuous prompts for generation

Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProc. Conf. Assoc. Comput. Linguist. Meet. (ACL), pp. 4582–4597, August 2021. 11 Optimally Scaled Low-Rank Adaptation (ScaLoRA)

work page 2021

[29] [29]

LoftQ: LoRA-fine-tuning-aware quantization for large language models

Yixiao Li, Yifan Yu, Chen Liang, Nikos Karampatziakis, Pengcheng He, Weizhu Chen, and Tuo Zhao. LoftQ: LoRA-fine-tuning-aware quantization for large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024

[30] [30]

ReloRA: High- rank training through low-rank updates

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. ReloRA: High- rank training through low-rank updates. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024

[31] [31]

Polar: Polar-decomposed low-rank adapter representation.arXiv preprint arXiv:2506.03133, 2025

Kai Lion, Liang Zhang, Bingcong Li, and Niao He. Polar: Polar-decomposed low-rank adapter representation.arXiv preprint arXiv:2506.03133, 2025

work page arXiv 2025

[32] [32]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProc. Int. Conf. on Learning Representations (ICLR), 2019

work page 2019

[33] [33]

Pissa: Principal singular values and singu- lar vectors adaptation of large language models

Fanxu Meng, Zhaohui Wang, and Muhan Zhang. Pissa: Principal singular values and singu- lar vectors adaptation of large language models. InProc. Neural Information Processing Systems (NeurIPS), volume 37, pp. 121038–121072, 2024

work page 2024

[34] [34]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? A new dataset for open book question answering.arXiv:1809.02789, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

Pytorch: An imperative style, high- performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high- perf...

work page 2019

[36] [36]

Know what you don’t know: Unanswerable questions for SQuAD

Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don’t know: Unanswerable questions for SQuAD. InProc. Conf. Assoc. Comput. Linguist. Meet. (ACL), pp. 784–789, 2018

work page 2018

[37] [37]

Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021

work page 2021

[38] [38]

SocialIQA: Commonsense Reasoning about Social Interactions

Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, and Yejin Choi. Socialiqa: Com- monsense reasoning about social interactions.arXiv:1904.09728, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[39] [39]

J. Schur. Bemerkungen zur theorie der beschr ¨ankten bilinearformen mit unendlich vielen ver¨anderlichen.Journal f ¨ur die reine und angewandte Mathematik, 1911(140):1–28, 1911. doi: doi:10.1515/crll.1911.140.1

work page doi:10.1515/crll.1911.140.1 1911

[40] [40]

Cambridge university press, 2014

Shai Shalev-Shwartz and Shai Ben-David.Understanding machine learning: From theory to algo- rithms. Cambridge university press, 2014

work page 2014

[41] [41]

Recursive deep models for semantic compositionality over a sen- timent treebank

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sen- timent treebank. InProc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642, 2013

work page 2013

[42] [42]

Training neural networks with fixed sparse masks

Yi-Lin Sung, Varun Nair, and Colin A Raffel. Training neural networks with fixed sparse masks. InProc. Neural Information Processing Systems (NeurIPS), volume 34, pp. 24193–24205, 2021

work page 2021

[43] [43]

Gemma 3 Technical Report

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Mer- hej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram´e, Morgane Rivi`ere, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[45] [45]

GLUE: A multi-task benchmark and analysis platform for natural language understanding

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proc. Int. Conf. on Learning Representations (ICLR), 2019. 12 Optimally Scaled Low-Rank Adaptation (ScaLoRA)

work page 2019

[46] [46]

Lora-ga: Low-rank adaptation with gradient approxi- mation

Shaowen Wang, Linxi Yu, and Jian Li. Lora-ga: Low-rank adaptation with gradient approxi- mation. InProc. Neural Information Processing Systems (NeurIPS), volume 37, pp. 54905–54931, 2024

work page 2024

[47] [47]

LoRA-pro: Are low-rank adapters properly optimized? InProc

Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. LoRA-pro: Are low-rank adapters properly optimized? InProc. Int. Conf. on Learning Representations (ICLR), 2025

work page 2025

[48] [48]

Neural network acceptability judg- ments.Trans

Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. Neural network acceptability judg- ments.Trans. Assoc. Comput. Linguist., 7:625–641, 2019

work page 2019

[49] [49]

A broad-coverage challenge corpus for sentence understanding through inference

Adina Williams, Nikita Nangia, and Samuel R Bowman. A broad-coverage challenge corpus for sentence understanding through inference. InProc. Conf. North Am. Chapter Assoc. Comput. Linguist., pp. 1112–1122, 2018

work page 2018

[50] [50]

DoRA: Weight-decomposed low-rank adaptation

Shih yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang- Ting Cheng, and Min-Hung Chen. DoRA: Weight-decomposed low-rank adaptation. InProc. Int. Conf. on Machine Learning (ICML), 2024

work page 2024

[51] [51]

Navigating text-to-image customization: From LyCORIS fine-tuning to model evalua- tion

Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong Oh, and Yanmin Gong. Navigating text-to-image customization: From LyCORIS fine-tuning to model evalua- tion. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024

[52] [52]

LoRA done RITE: Robust invariant transformation equilibration for loRA optimization

Jui-Nan Yen, Si Si, Zhao Meng, Felix Yu, Sai Surya Duvvuri, Inderjit S Dhillon, Cho-Jui Hsieh, and Sanjiv Kumar. LoRA done RITE: Robust invariant transformation equilibration for loRA optimization. InProc. Int. Conf. on Learning Representations (ICLR), 2025

work page 2025

[53] [53]

Metamath: Bootstrap your own mathematical questions for large language models

Longhui Yu, Weisen Jiang, Han Shi, Jincheng YU, Zhengying Liu, Yu Zhang, James Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. InProc. Int. Conf. on Learning Representations (ICLR), 2024

work page 2024

[54] [54]

HellaSwag: Can a Machine Really Finish Your Sentence?

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence?arXiv:1905.07830, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[55] [55]

Adaptive budget allocation for parameter-efficient fine-tuning

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. InProc. Int. Conf. on Learning Representations (ICLR), 2023

work page 2023

[56] [56]

arXiv preprint arXiv:2403.02901 , year=

Yang Zhang, Hanlei Jin, Dan Meng, Jun Wang, and Jinghua Tan. A comprehensive survey on process-oriented automatic text summarization with exploration of llm-based methods.arXiv preprint arXiv:2403.02901, 2024

work page arXiv 2024

[57] [57]

Giannakis

Yilang Zhang, Bingcong Li, and Georgios B. Giannakis. Reflora: Refactored low-rank adap- tation for efficient fine-tuning of large models. InAdvances in Neural Information Processing Systems, 2025

work page 2025

[58] [58]

Simulating classroom education with llm- empowered agents.arXiv preprint arXiv:2406.19226, 2024

Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianx- iao Jiang, Jie Cao, Huiqin Liu, Zhiyuan Liu, et al. Simulating classroom education with llm- empowered agents.arXiv preprint arXiv:2406.19226, 2024. 13 Optimally Scaled Low-Rank Adaptation (ScaLoRA) A Missing proofs This section provides the proofs omitted in the main...

work page arXiv 2024