In high-dimensional continual linear regression, optimal fixed L2 regularization strength scales as T/ln T with the number of tasks and mitigates label noise for arbitrary linear teachers.
hub
Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
11 Pith papers cite this work. Polarity classification is still indexing.
abstract
Continual learning has received a great deal of attention recently with several approaches being proposed. However, evaluations involve a diverse set of scenarios making meaningful comparison difficult. This work provides a systematic categorization of the scenarios and evaluates them within a consistent framework including strong baselines and state-of-the-art methods. The results provide an understanding of the relative difficulty of the scenarios and that simple baselines (Adagrad, L2 regularization, and naive rehearsal strategies) can surprisingly achieve similar performance to current mainstream methods. We conclude with several suggestions for creating harder evaluation scenarios and future research directions. The code is available at https://github.com/GT-RIPL/Continual-Learning-Benchmark
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Inf-SSM constrains the infinite-horizon evolution of SSMs via Grassmannian geometry and an efficient O(n^2) Sylvester solver to enable exemplar-free continual learning with reduced forgetting.
REMIX uses Laplace kernel parameterization to enable scalable full-covariance modeling in model inversion, improving synthetic sample quality and performance in data-free continual learning.
FRPO applies a max-min robust optimization over KL-bounded policy neighborhoods during RLHF to reduce catastrophic forgetting of safety and accuracy under subsequent SFT or RL fine-tuning.
The paper surveys CRL literature, proposes a taxonomy of methods into four categories based on knowledge storage and transfer, reviews metrics and benchmarks, and outlines challenges and future research directions.
Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.
The relative rankings of continual learning methods are not preserved across different fine-tuning regimes defined by trainable parameter depth.
SE2D stabilizes continual distillation across heterogeneous teachers by preserving logits on external unlabeled data to mitigate unseen knowledge forgetting.
Three complementary metrics are introduced to distinguish model adaptation from intrinsic data difficulty under temporal distribution shift.
Autoencoder extracts class prototypes whose means enable metric classification in incremental learning, matching SOTA accuracy with lower memory overhead on CIFAR-100 and CUB-200-2011 via regularization to avoid forgetting.
DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.
citing papers explorer
-
Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability
Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.
-
Continual Distillation of Teachers from Different Domains
SE2D stabilizes continual distillation across heterogeneous teachers by preserving logits on external unlabeled data to mitigate unseen knowledge forgetting.