Pico reduces LoRA merge interference by calibrating over-shared directions in the B matrix before merging, yielding 3.4-8.3 point accuracy gains and sometimes beating joint training.
hub
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning
15 Pith papers cite this work. Polarity classification is still indexing.
abstract
Fine-tuning large language models (LLMs) is crucial for improving their performance on downstream tasks, but full-parameter fine-tuning (Full-FT) is computationally expensive and memory-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address this by optimizing only a small subset of parameters. However, LoRA may underperform Full-FT in certain scenarios due to the intrinsic limitations of its low-rank gradients. In this work, we reveal an asymmetric, collapsible structure in LoRA's update: the low-rank modification to W can be reformulated as a single-layer linear regression, implying that one of the LoRA factors can be frozen without sacrificing expressivity. Leveraging this insight, we introduce LoRA-FA, which freezes the projection-down matrix A and trains only the projection-up matrix B. We further close the gap to Full-FT by deriving closed-form gradient corrections that minimize the discrepancy between the induced low-rank gradient and the full gradient. Through extensive experiments on diverse benchmarks, including GLUE, GSM8K, MT-Bench, and HumanEval, we demonstrate that LoRA-FA consistently achieves comparable performance to existing PEFT methods and Full-FT. Experiments on system efficiency show that LoRA-FA significantly reduces activation memory consumption and computational workload in fine-tuning. Our code is available at https://github.com/huggingface/peft.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
FIT is a large-scale dataset of 1.13M try-on triplets with exact size data plus a synthetic generation pipeline that enables training of virtual try-on models capable of depicting realistic garment fit including ill-fit cases.
LoRA-DA derives an optimal data-aware LoRA initialization by solving an optimization problem from asymptotic analysis of parameter discrepancy using Fisher-gradient bias and Fisher-information variance terms.
UniR is a composable reasoning module trained with verifiable rewards and added to frozen LLMs via logit summation, enabling modular composition and weak-to-strong generalization across tasks and model sizes.
GaLore performs full-parameter LLM training with up to 65.5% less optimizer memory by projecting gradients onto a low-rank subspace at each step, matching full-rank performance on LLaMA pre-training and RoBERTa fine-tuning.
HELLoRA selectively applies LoRA adapters to hot experts in MoE layers, using as little as 15.7% of standard LoRA parameters while improving accuracy by 9.2% on OlMoE across math, code, and alignment tasks.
S2FT replaces the sparse-spectrum assumption of prior Fourier PEFT with a learned rearrangement that maps a pre-estimated weight change into a domain where few spectral coefficients suffice.
Dr. Post-Training reframes general data as a data-induced regularizer for LLM post-training updates, yielding a family of methods that outperform data-selection baselines on SFT, RLHF, and RLVR tasks.
RE-CONFIRM shows that standard fine-tuning of foundation models fails to recover known regional hubs in neurological disorders, while Hub-LoRA recovers them and outperforms custom models.
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
NP-LoRA fuses subject and style LoRAs via null-space projection of the content update onto the orthogonal complement of the style subspace, with a soft variant controlled by one parameter.
MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.
GWT projects gradients into wavelet subspaces to compress optimizer states for memory-efficient LLM training while claiming performance parity with full-rank updates.
DP-FLogTinyLLM combines federated learning, differential privacy, and LoRA-tuned tiny LLMs to match centralized log anomaly detection performance on Thunderbird and BGL datasets while preserving privacy.
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
citing papers explorer
-
Crowded in B-Space: Calibrating Shared Directions for LoRA Merging
Pico reduces LoRA merge interference by calibrating over-shared directions in the B matrix before merging, yielding 3.4-8.3 point accuracy gains and sometimes beating joint training.
-
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
FIT is a large-scale dataset of 1.13M try-on triplets with exact size data plus a synthetic generation pipeline that enables training of virtual try-on models capable of depicting realistic garment fit including ill-fit cases.
-
LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis
LoRA-DA derives an optimal data-aware LoRA initialization by solving an optimization problem from asymptotic analysis of parameter discrepancy using Fisher-gradient bias and Fisher-information variance terms.
-
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs
UniR is a composable reasoning module trained with verifiable rewards and added to frozen LLMs via logit summation, enabling modular composition and weak-to-strong generalization across tasks and model sizes.
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore performs full-parameter LLM training with up to 65.5% less optimizer memory by projecting gradients onto a low-rank subspace at each step, matching full-rank performance on LLaMA pre-training and RoBERTa fine-tuning.
-
HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models
HELLoRA selectively applies LoRA adapters to hot experts in MoE layers, using as little as 15.7% of standard LoRA parameters while improving accuracy by 9.2% on OlMoE across math, code, and alignment tasks.
-
S2FT: Parameter-Efficient Fine-Tuning in Sparse Spectrum Domain
S2FT replaces the sparse-spectrum assumption of prior Fourier PEFT with a learned rearrangement that maps a pre-estimated weight change into a domain where few spectral coefficients suffice.
-
Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
Dr. Post-Training reframes general data as a data-induced regularizer for LLM post-training updates, yielding a family of methods that outperform data-selection baselines on SFT, RLHF, and RLVR tasks.
-
Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity
RE-CONFIRM shows that standard fine-tuning of foundation models fails to recover known regional hubs in neurological disorders, while Hub-LoRA recovers them and outperforms custom models.
-
TLoRA: Task-aware Low Rank Adaptation of Large Language Models
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
-
NP-LoRA: Null Space Projection for Subject-Style LoRA Fusion
NP-LoRA fuses subject and style LoRAs via null-space projection of the content update onto the orthogonal complement of the style subspace, with a soft variant controlled by one parameter.
-
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.
-
GWT: Scalable Optimizer State Compression for Large Language Model Training
GWT projects gradients into wavelet subspaces to compress optimizer states for memory-efficient LLM training while claiming performance parity with full-rank updates.
-
DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs
DP-FLogTinyLLM combines federated learning, differential privacy, and LoRA-tuned tiny LLMs to match centralized log anomaly detection performance on Thunderbird and BGL datasets while preserving privacy.
-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.