DISeL augments standard LoRA with per-input gates over rank-one updates to reduce catastrophic forgetting during fine-tuning while adding few parameters.
CommonsenseQA: A question answering challenge targeting commonsense knowledge
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
In linear regression, LoRA can achieve lower excess risk than full fine-tuning when the pretraining-downstream difference is low-rank, and small LoRA ranks can improve generalization by acting as regularization.
citing papers explorer
-
Learning When to Adapt
DISeL augments standard LoRA with per-input gates over rank-one updates to reduce catastrophic forgetting during fine-tuning while adding few parameters.
-
LoRA vs. Full Fine-Tuning: A Theoretical Perspective
In linear regression, LoRA can achieve lower excess risk than full fine-tuning when the pretraining-downstream difference is low-rank, and small LoRA ranks can improve generalization by acting as regularization.