DG-Hard uses Donoho-Gavish hard thresholding on the fine-tuning weight delta to separate task-aligned signal from noise-like residual, recovering damaged capabilities while preserving target-task gains.
Guoxiong Gao, Haocheng Ju, Jiedong Jiang, Zihan Qin, and Bin Dong
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.
CAPS provides an iterative differentially private synthesis method that outperforms one-shot baselines on authentic educational real-world data.
ARROW adds a distribution-matching long-term replay buffer to DreamerV3 and shows reduced forgetting versus same-size baselines on Atari and Procgen continual RL benchmarks.
citing papers explorer
-
Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining
DG-Hard uses Donoho-Gavish hard thresholding on the fine-tuning weight delta to separate task-aligned signal from noise-like residual, recovering damaged capabilities while preserving target-task gains.
-
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
-
Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.
-
Cyclic Adaptive Private Synthesis for Sharing Real-World Data in Education
CAPS provides an iterative differentially private synthesis method that outperforms one-shot baselines on authentic educational real-world data.
-
ARROW: Augmented Replay for RObust World models
ARROW adds a distribution-matching long-term replay buffer to DreamerV3 and shows reduced forgetting versus same-size baselines on Atari and Procgen continual RL benchmarks.