Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.
Gradient episodic memory for continual learning
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.
Muon-OGD introduces a spectral-norm constrained orthogonal projection method solved via dual iterations and Newton-Schulz approximations to improve stability-plasticity trade-off in sequential LLM adaptation.
A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.
A survey classifying machine unlearning into centralized (exact and approximate), distributed/irregular data, verification, and privacy/security categories with technique overviews.
citing papers explorer
-
Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era
Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.
-
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.
-
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
Muon-OGD introduces a spectral-norm constrained orthogonal projection method solved via dual iterations and Newton-Schulz approximations to improve stability-plasticity trade-off in sequential LLM adaptation.
-
Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation
A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.
-
Machine Unlearning: A Comprehensive Survey
A survey classifying machine unlearning into centralized (exact and approximate), distributed/irregular data, verification, and privacy/security categories with technique overviews.