When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

· 2026 · cs.LG · arXiv 2602.02855

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and downstream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model. On the practical side, we empirically show that our theoretical findings extend beyond our toy model and remain relevant in the context of a vision-transformer model trained on real data.

representative citing papers

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

In a solvable attention model, pre-training followed by rank-one LoRA admits sharp asymptotic predictions for test errors and representation alignment via an effective noise term.

citing papers explorer

Showing 1 of 1 citing paper.

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model cs.LG · 2026-06-04 · unverdicted · none · ref 12 · internal anchor
In a solvable attention model, pre-training followed by rank-one LoRA admits sharp asymptotic predictions for test errors and representation alignment via an effective noise term.

When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

fields

years

verdicts

representative citing papers

citing papers explorer