Full finetuning with the pretraining optimizer reduces forgetting compared to other optimizers or LoRA while achieving comparable new-task performance.
Adagrad under anisotropic smoothness
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Proving stability of Leon's preconditioner enables the first tuning-free Nesterov-accelerated projection-free adaptive SGD variant with improved non-smooth non-convex rates.
citing papers explorer
-
Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less
Full finetuning with the pretraining optimizer reduces forgetting compared to other optimizers or LoRA while achieving comparable new-task performance.
-
Optimal Projection-Free Adaptive SGD for Matrix Optimization
Proving stability of Leon's preconditioner enables the first tuning-free Nesterov-accelerated projection-free adaptive SGD variant with improved non-smooth non-convex rates.