In a solvable attention model, pre-training followed by rank-one LoRA admits sharp asymptotic predictions for test errors and representation alignment via an effective noise term.
When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and downstream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model. On the practical side, we empirically show that our theoretical findings extend beyond our toy model and remain relevant in the context of a vision-transformer model trained on real data.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model
In a solvable attention model, pre-training followed by rank-one LoRA admits sharp asymptotic predictions for test errors and representation alignment via an effective noise term.