hub Canonical reference

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Damjan Kalajdzievski · 2023 · cs.CL · arXiv 2312.03732

Canonical reference. 100% of citing Pith papers cite this work as background.

22 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 22 citing papers arXiv PDF

abstract

As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.

HAC: Parameter-Efficient Hyperbolic Adaptation of CLIP for Zero-Shot VQA

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

HAC provides a parameter-efficient way to move CLIP into hyperbolic geometry, yielding consistent gains on zero-shot VQA benchmarks without any VQA training data overlap.

DifFoundMAD: Foundation Models meet Differential Morphing Attack Detection

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

DifFoundMAD improves differential morphing attack detection by replacing traditional embeddings with those from vision foundation models and applying class-balanced lightweight fine-tuning, cutting high-security error rates from 6.16% to 2.17%.

PreFT: Prefill-only finetuning for efficient inference

cs.LG · 2026-05-14 · accept · novelty 6.0

Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.

Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

MoLF routes updates between full fine-tuning and LoRA at the optimizer level to match or exceed the better of the two static methods on SQL, medical QA, and counterfactual tasks while an efficient variant outperforms prior adaptive LoRA by up to 20%.

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.

A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

Frozen random backbones with low-rank LoRA adapters recover 96-100% of fully trained performance on diverse architectures while training only 0.5-40% of parameters.

InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation

cs.RO · 2026-02-26 · unverdicted · novelty 6.0

InCoM achieves 23-28% higher success rates in mobile manipulation tasks by inferring motion intent for adaptive perception and decoupling base-arm action generation.

ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods

cs.CE · 2026-01-08 · unverdicted · novelty 6.0

ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.

MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

cs.LG · 2025-06-02 · conditional · novelty 6.0

MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.

LoCO: Low-rank Compositional Rotation Fine-tuning

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

LoCO is a PEFT technique that constructs orthogonal transformations via low-rank skew-symmetric matrices and compositional rotation chains with a parallelizable approximation, validated on transformer adaptations.

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

LoRA-Over injects auxiliary parameters into low-rank adapters during training and decomposes them back into standard LoRA at inference, with static or dynamic scheduling to allocate extra capacity where needed, yielding better generalization than vanilla LoRA on GLUE, MT-Bench, GSM8K and HumanEval.

VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

cs.RO · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.

SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

cs.DC · 2026-04-29 · unverdicted · novelty 5.0

SplitFT adapts cut-layer selection and reduces LoRA rank per client in federated split learning to improve efficiency and performance when fine-tuning LLMs on heterogeneous devices and data.

Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali

cs.CL · 2026-03-25 · unverdicted · novelty 5.0

Fine-tuning Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali data enables effective generation where zero-shot fails, with Qwen3-8B performing best overall and Llama-3.1-8B showing the largest gains.

Can Muon Fine-tune Adam-Pretrained Models?

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language

cs.CL · 2026-05-09 · conditional · novelty 4.0

Qwen2.5-3B was continued-pretrained and then fine-tuned with rsLoRA r256 on Sardinian data to reach 28.5 BLEU into the language, outperforming full fine-tuning and other LoRA variants.

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

eess.AS · 2026-04-28 · unverdicted · novelty 4.0

A system based on OmniVoice with multi-model ensemble distillation for fine-tuning shows consistent gains in intelligibility metrics while keeping speaker similarity for cross-lingual scientific speech.

LLMs and Speech: Integration vs. Combination

eess.AS · 2026-03-16 · unverdicted · novelty 4.0

Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.

Efficient Task Adaptation in Large Language Models via Selective Parameter Optimization

cs.CL · 2026-04-18 · unverdicted · novelty 3.0

The paper claims a selective fine-tuning method that identifies and freezes core parameters to mitigate catastrophic forgetting in LLMs while improving domain adaptation, shown in experiments with GPT-J and LLaMA-3.

When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden

cs.CL · 2026-04-15

citing papers explorer

Showing 22 of 22 citing papers.

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning cs.CL · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.
HAC: Parameter-Efficient Hyperbolic Adaptation of CLIP for Zero-Shot VQA cs.CV · 2026-04-26 · unverdicted · none · ref 15 · internal anchor
HAC provides a parameter-efficient way to move CLIP into hyperbolic geometry, yielding consistent gains on zero-shot VQA benchmarks without any VQA training data overlap.
DifFoundMAD: Foundation Models meet Differential Morphing Attack Detection cs.CV · 2026-04-20 · unverdicted · none · ref 28 · internal anchor
DifFoundMAD improves differential morphing attack detection by replacing traditional embeddings with those from vision foundation models and applying class-balanced lightweight fine-tuning, cutting high-security error rates from 6.16% to 2.17%.
PreFT: Prefill-only finetuning for efficient inference cs.LG · 2026-05-14 · accept · none · ref 20 · internal anchor
Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation cs.LG · 2026-05-12 · unverdicted · none · ref 43 · internal anchor
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation cs.CL · 2026-05-08 · unverdicted · none · ref 45 · 2 links · internal anchor
MoLF routes updates between full fine-tuning and LoRA at the optimizer level to match or exceed the better of the two static methods on SQL, medical QA, and counterfactual tasks while an efficient variant outperforms prior adaptive LoRA by up to 20%.
TLoRA: Task-aware Low Rank Adaptation of Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 39 · internal anchor
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need cs.LG · 2026-04-09 · unverdicted · none · ref 34 · internal anchor
Frozen random backbones with low-rank LoRA adapters recover 96-100% of fully trained performance on diverse architectures while training only 0.5-40% of parameters.
InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation cs.RO · 2026-02-26 · unverdicted · none · ref 43 · internal anchor
InCoM achieves 23-28% higher success rates in mobile manipulation tasks by inferring motion intent for adaptive perception and decoupling base-arm action generation.
ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods cs.CE · 2026-01-08 · unverdicted · none · ref 70 · internal anchor
ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation cs.LG · 2025-06-02 · conditional · none · ref 6 · internal anchor
MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.
LoCO: Low-rank Compositional Rotation Fine-tuning cs.LG · 2026-05-15 · unverdicted · none · ref 24 · internal anchor
LoCO is a PEFT technique that constructs orthogonal transformations via low-rank skew-symmetric matrices and compositional rotation chains with a parallelizable approximation, validated on transformer adaptations.
Strategic Over-Parameterization for Generalizable Low-Rank Adaptation cs.LG · 2026-05-15 · unverdicted · none · ref 17 · internal anchor
LoRA-Over injects auxiliary parameters into low-rank adapters during training and decomposes them back into standard LoRA at inference, with static or dynamic scheduling to allocate extra capacity where needed, yielding better generalization than vanilla LoRA on GLUE, MT-Bench, GSM8K and HumanEval.
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts cs.RO · 2026-05-07 · unverdicted · none · ref 16 · 2 links · internal anchor
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.
SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning cs.DC · 2026-04-29 · unverdicted · none · ref 52 · internal anchor
SplitFT adapts cut-layer selection and reduces LoRA rank per client in federated split learning to improve efficiency and performance when fine-tuning LLMs on heterogeneous devices and data.
Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali cs.CL · 2026-03-25 · unverdicted · none · ref 2 · internal anchor
Fine-tuning Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali data enables effective generation where zero-shot fails, with Qwen3-8B performing best overall and Llama-3.1-8B showing the largest gains.
Can Muon Fine-tune Adam-Pretrained Models? cs.LG · 2026-05-11 · unverdicted · none · ref 13 · internal anchor
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.
LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language cs.CL · 2026-05-09 · conditional · none · ref 8 · internal anchor
Qwen2.5-3B was continued-pretrained and then fine-tuned with rsLoRA r256 on Sardinian data to reach 28.5 BLEU into the language, outperforming full fine-tuning and other LoRA variants.
One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech eess.AS · 2026-04-28 · unverdicted · none · ref 4 · internal anchor
A system based on OmniVoice with multi-model ensemble distillation for fine-tuning shows consistent gains in intelligibility metrics while keeping speaker similarity for cross-lingual scientific speech.
LLMs and Speech: Integration vs. Combination eess.AS · 2026-03-16 · unverdicted · none · ref 49 · internal anchor
Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.
Efficient Task Adaptation in Large Language Models via Selective Parameter Optimization cs.CL · 2026-04-18 · unverdicted · none · ref 18 · internal anchor
The paper claims a selective fine-tuning method that identifies and freezes core parameters to mitigate catastrophic forgetting in LLMs while improving domain adaptation, shown in experiments with GPT-J and LLaMA-3.
When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden cs.CL · 2026-04-15 · unreviewed · ref 2 · internal anchor

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer