Learning without Forgetting

Derek Hoiem; Zhizhong Li

arxiv: 1606.09282 · v3 · pith:M5DSELQUnew · submitted 2016-06-29 · 💻 cs.CV · cs.LG· stat.ML

Learning without Forgetting

Zhizhong Li , Derek Hoiem This is my paper

classification 💻 cs.CV cs.LGstat.ML

keywords datacapabilitieslearningtaskforgettingwithoutfine-tuningmethod

0 comments

read the original abstract

When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era
cs.LG 2026-05 unverdicted novelty 7.0

Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable ...
Bridging Data Trials and Task Barriers: A Unified Framework for Sketch Biometric Identification
cs.CV 2026-05 unverdicted novelty 6.0

The paper introduces a continual learning framework combining synthetic sketch generation and trusted sample replay to enable a single model to perform multiple sketch biometric identification tasks.
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
cs.CL 2026-05 unverdicted novelty 6.0

MELT decouples reasoning depth from memory in looped LLMs by sharing a single gated KV cache per layer and using two-phase chunk-wise distillation from Ouro, delivering constant memory use while matching or beating st...
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
cs.CL 2026-05 unverdicted novelty 6.0

MELT decouples reasoning depth from memory in looped language models by sharing a single gated KV cache per layer and training it via chunk-wise distillation from Ouro starting models.
Emergent Slow Thinking in LLMs as Inverse Tree Freezing
cs.AI 2025-09 unverdicted novelty 6.0

RLVR drives a concept network in LLMs through nucleation and freezing into inverse trees that support slow thinking, and intervening with brief SFT at peak frustration outperforms standard RLVR while post-freeze SFT c...
Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents
cs.AI 2026-04 unverdicted novelty 5.0

Layered mutability framework claims governance difficulty in persistent self-modifying agents rises with rapid mutation, strong downstream coupling, weak reversibility, and low observability, producing compositional d...
Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents
cs.AI 2026-04 unverdicted novelty 5.0

Persistent self-modifying AI agents exhibit compositional drift from mismatches across five mutability layers, with governance difficulty rising under rapid mutation, strong coupling, weak reversibility, and low obser...