hub

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, Zsolt Kira · 2018 · cs.LG · arXiv 1810.12488

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open full Pith review browse 11 citing papers arXiv PDF

abstract

Continual learning has received a great deal of attention recently with several approaches being proposed. However, evaluations involve a diverse set of scenarios making meaningful comparison difficult. This work provides a systematic categorization of the scenarios and evaluates them within a consistent framework including strong baselines and state-of-the-art methods. The results provide an understanding of the relative difficulty of the scenarios and that simple baselines (Adagrad, L2 regularization, and naive rehearsal strategies) can surprisingly achieve similar performance to current mainstream methods. We conclude with several suggestions for creating harder evaluation scenarios and future research directions. The code is available at https://github.com/GT-RIPL/Continual-Learning-Benchmark

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Optimal L2 Regularization in High-dimensional Continual Linear Regression

cs.LG · 2026-01-20 · unverdicted · novelty 8.0

In high-dimensional continual linear regression, optimal fixed L2 regularization strength scales as T/ln T with the number of tasks and mitigates label noise for arbitrary linear teachers.

Exemplar-Free Continual Learning for State Space Models

cs.LG · 2025-05-24 · unverdicted · novelty 7.0

Inf-SSM constrains the infinite-horizon evolution of SSMs via Grassmannian geometry and an efficient O(n^2) Sylvester solver to enable exemplar-free continual learning with reduced forgetting.

Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

REMIX uses Laplace kernel parameterization to enable scalable full-covariance modeling in model inversion, improving synthetic sample quality and performance in data-free continual learning.

Robust Policy Optimization to Prevent Catastrophic Forgetting

cs.LG · 2026-02-09 · unverdicted · novelty 6.0

FRPO applies a max-min robust optimization over KL-bounded policy neighborhoods during RLHF to reduce catastrophic forgetting of safety and accuracy under subsequent SFT or RL fine-tuning.

A Survey of Continual Reinforcement Learning

cs.LG · 2025-06-27 · accept · novelty 6.0

The paper surveys CRL literature, proposes a taxonomy of methods into four categories based on knowledge storage and transfer, reviews metrics and benchmarks, and outlines challenges and future research directions.

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

cs.LG · 2026-04-23 · conditional · novelty 6.0

Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.

Fine-Tuning Regimes Define Distinct Continual Learning Problems

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

The relative rankings of continual learning methods are not preserved across different fine-tuning regimes defined by trainable parameter depth.

Continual Distillation of Teachers from Different Domains

cs.LG · 2026-04-10 · conditional · novelty 6.0

SE2D stabilizes continual distillation across heterogeneous teachers by preserving logits on external unlabeled data to mitigate unseen knowledge forgetting.

Tracking Adaptation Time: Metrics for Temporal Distribution Shift

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

Three complementary metrics are introduced to distinguish model adaptation from intrinsic data difficulty under temporal distribution shift.

Autoencoder-Based Incremental Class Learning without Retraining on Old Data

cs.LG · 2019-07-18 · unverdicted · novelty 4.0

Autoencoder extracts class prototypes whose means enable metric classification in incremental learning, matching SOTA accuracy with lower memory overhead on CIFAR-100 and CUB-200-2011 via regularization to avoid forgetting.

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

cs.CV · 2026-04-21 · unverdicted · novelty 3.0

DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability cs.LG · 2026-04-23 · conditional · none · ref 31
Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.
Continual Distillation of Teachers from Different Domains cs.LG · 2026-04-10 · conditional · none · ref 16
SE2D stabilizes continual distillation across heterogeneous teachers by preserving logits on external unlabeled data to mitigate unseen knowledge forgetting.

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer