Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics

· 2024 · cs.LG · arXiv 2402.15415

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Low-Rank Adaptation (LoRA) is the dominant parameter-efficient fine-tuning method due to its favorable compute-performance trade-off, yet it suffers from catastrophic forgetting. We study forgetting through a tractable _mean-field self-attention_ toy model, where tokens evolve as an interacting particle system and LoRA acts as a low-rank perturbation. Using tools from partial differential equations and dynamical systems, we characterize regimes suggesting a phase transition between forgetting and non-forgetting behavior. We show that one phase transition appears with respect to the norm of the perturbation, and the other with respect to the depth of the Transformers. We further bound the time-to-deviation in terms of the perturbation size and spectral quantities, and corroborate the predicted trends with experiments and exploratory analyses on real models under LoRA fine-tuning.

representative citing papers

Perceptrons and localization of attention's mean-field landscape

cs.LG · 2026-01-29 · unverdicted · novelty 7.0

In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.

Quantitative Clustering in Mean-Field Transformer Models

cs.LG · 2025-04-20 · unverdicted · novelty 5.0

Mean-field transformer models synchronize to a Dirac point mass exponentially fast with explicit quantitative rates under suitable parameter assumptions.

citing papers explorer

Showing 2 of 2 citing papers.

Perceptrons and localization of attention's mean-field landscape cs.LG · 2026-01-29 · unverdicted · none · ref 11 · internal anchor
In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.
Quantitative Clustering in Mean-Field Transformer Models cs.LG · 2025-04-20 · unverdicted · none · ref 11 · internal anchor
Mean-field transformer models synchronize to a Dirac point mass exponentially fast with explicit quantitative rates under suitable parameter assumptions.

Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics

fields

years

verdicts

representative citing papers

citing papers explorer