LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

Longteng Zhang , Lin Zhang , Shaohuai Shi , Xiaowen Chu , Bo Li

Authors on Pith no claims yet

classification 💻 cs.CL

keywords fine-tuningfull-ftloralora-falow-rankgradientpeftexperiments

read the original abstract

Fine-tuning large language models (LLMs) is crucial for improving their performance on downstream tasks, but full-parameter fine-tuning (Full-FT) is computationally expensive and memory-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address this by optimizing only a small subset of parameters. However, LoRA may underperform Full-FT in certain scenarios due to the intrinsic limitations of its low-rank gradients. In this work, we reveal an asymmetric, collapsible structure in LoRA's update: the low-rank modification to W can be reformulated as a single-layer linear regression, implying that one of the LoRA factors can be frozen without sacrificing expressivity. Leveraging this insight, we introduce LoRA-FA, which freezes the projection-down matrix A and trains only the projection-up matrix B. We further close the gap to Full-FT by deriving closed-form gradient corrections that minimize the discrepancy between the induced low-rank gradient and the full gradient. Through extensive experiments on diverse benchmarks, including GLUE, GSM8K, MT-Bench, and HumanEval, we demonstrate that LoRA-FA consistently achieves comparable performance to existing PEFT methods and Full-FT. Experiments on system efficiency show that LoRA-FA significantly reduces activation memory consumption and computational workload in fine-tuning. Our code is available at https://github.com/huggingface/peft.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Crowded in B-Space: Calibrating Shared Directions for LoRA Merging
cs.CL 2026-04 unverdicted novelty 7.0

Pico reduces LoRA merge interference by calibrating over-shared directions in the B matrix before merging, yielding 3.4-8.3 point accuracy gains and sometimes beating joint training.
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
cs.CV 2026-04 unverdicted novelty 7.0

FIT is a large-scale dataset of 1.13M try-on triplets with exact size data plus a synthetic generation pipeline that enables training of virtual try-on models capable of depicting realistic garment fit including ill-f...
S2FT: Parameter-Efficient Fine-Tuning in Sparse Spectrum Domain
cs.CV 2026-05 unverdicted novelty 6.0

S2FT replaces the sparse-spectrum assumption of prior Fourier PEFT with a learned rearrangement that maps a pre-estimated weight change into a domain where few spectral coefficients suffice.
Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
cs.LG 2026-05 unverdicted novelty 6.0

Dr. Post-Training reframes general data as a data-induced regularizer for LLM post-training updates, yielding a family of methods that outperform data-selection baselines on SFT, RLHF, and RLVR tasks.
Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity
q-bio.NC 2026-04 conditional novelty 6.0

RE-CONFIRM shows that standard fine-tuning of foundation models fails to recover known regional hubs in neurological disorders, while Hub-LoRA recovers them and outperforms custom models.
TLoRA: Task-aware Low Rank Adaptation of Large Language Models
cs.CL 2026-04 unverdicted novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer ...
DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs
cs.CR 2026-04 unverdicted novelty 4.0

DP-FLogTinyLLM combines federated learning, differential privacy, and LoRA-tuned tiny LLMs to match centralized log anomaly detection performance on Thunderbird and BGL datasets while preserving privacy.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
cs.LG 2024-03 accept novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.