An empirical study of catastrophic forgetting in large language models during continual fine-tuning.IEEE Transactions on Audio, Speech and Language Processing

Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, Yue Zhang · 2025

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

Characterizing and Correcting Effective Target Shift in Online Learning

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.

Learning When to Adapt

cs.LG · 2026-05-18 · conditional · novelty 6.0

DISeL augments standard LoRA with per-input gates over rank-one updates to reduce catastrophic forgetting during fine-tuning while adding few parameters.

Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

SFT on LLMs removes noise-like token interactions in a brief early phase before introducing overfitted ones, explaining inconsistent effectiveness across model scales.

On-Policy Distillation with Best-of-N Teacher Rollout Selection

cs.CV · 2026-05-10 · unverdicted · novelty 5.0 · 2 refs

BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.

Preserving Foundational Capabilities in Flow-Matching VLAs through Conservative SFT

cs.RO · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

ConSFT is a gradient-scaling fine-tuning objective for flow-matching VLAs that bounds parameter disruption via model-confidence weighting, yielding over 20% better capability retention than vanilla SFT on LIBERO and RoboTwin.

citing papers explorer

Showing 6 of 6 citing papers.

Characterizing and Correcting Effective Target Shift in Online Learning stat.ML · 2026-05-08 · unverdicted · none · ref 1
Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.
Learning When to Adapt cs.LG · 2026-05-18 · conditional · none · ref 30
DISeL augments standard LoRA with per-input gates over rank-one updates to reduce catastrophic forgetting during fine-tuning while adding few parameters.
Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates cs.LG · 2026-05-19 · unverdicted · none · ref 42
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.
Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective cs.AI · 2026-05-18 · unverdicted · none · ref 21
SFT on LLMs removes noise-like token interactions in a brief early phase before introducing overfitted ones, explaining inconsistent effectiveness across model scales.
On-Policy Distillation with Best-of-N Teacher Rollout Selection cs.CV · 2026-05-10 · unverdicted · none · ref 30 · 2 links
BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.
Preserving Foundational Capabilities in Flow-Matching VLAs through Conservative SFT cs.RO · 2026-05-09 · unverdicted · none · ref 17 · 2 links
ConSFT is a gradient-scaling fine-tuning objective for flow-matching VLAs that bounds parameter disruption via model-confidence weighting, yielding over 20% better capability retention than vanilla SFT on LIBERO and RoboTwin.

An empirical study of catastrophic forgetting in large language models during continual fine-tuning.IEEE Transactions on Audio, Speech and Language Processing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer