Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning

Christopher Kanan; Kamil Faber; Katarzyna Filus; Roberto Corizzo

arxiv: 2605.16374 · v1 · pith:R3PCM6GRnew · submitted 2026-05-10 · 💻 cs.LG · cs.AI

Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning

Katarzyna Filus , Kamil Faber , Roberto Corizzo , Christopher Kanan This is my paper

Pith reviewed 2026-05-20 22:03 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continual learningcatastrophic forgettingsparse autoencodersconcept-level forgettingrepresentational accessibilityvision modelssupervised learningforgetting decomposition

0 comments

The pith

Much of concept forgetting in continual learning reflects reduced accessibility rather than outright erasure of information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a diagnostic framework that uses sparse autoencoders to create a task-anchored latent feature space for examining forgetting at the level of individual visual concepts. It breaks forgetting down into apparent deletion, recoverability under linear assumptions, and decodability, demonstrating that a large share of information that seems lost can still be recovered. This reframes the problem because performance drops may not mean the knowledge is gone but only harder to reach. A sympathetic reader would care because it suggests new ways to diagnose and potentially restore prior knowledge without treating every drop as permanent loss.

Core claim

We propose a diagnostic framework leveraging Sparse Autoencoders to define a task-anchored latent feature space. This enables decomposing forgetting into apparent concept deletion, recoverability, and decodability. We show that a large portion of seemingly lost concept-level information can often be recovered under linearity assumption, with concept decodability degrading as more tasks are introduced. Overall, our findings suggest that a significant part of concept-level forgetting can be attributed to changes in the representational accessibility rather than complete information erasure.

What carries the argument

The task-anchored latent feature space from Sparse Autoencoders, treating individual latents as proxies for recurring and relatively disentangled visual patterns to decompose forgetting into deletion, recoverability, and decodability.

If this is right

Linear recovery methods could restore performance on old tasks without retraining the full model.
Decodability of concepts declines as the number of tasks grows, indicating a practical limit on how many tasks can be learned sequentially.
Forgetting can be split into recoverable and truly deleted categories, allowing targeted fixes instead of blanket prevention of drift.
Standard accuracy metrics underestimate retained knowledge because they do not test internal accessibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar diagnostic frameworks could be tested on language models to see whether forgetting there also stems more from access issues than erasure.
Continual learning algorithms might add explicit mechanisms to maintain linear decodability of prior concepts.
Benchmarks for continual learning could incorporate recoverability tests alongside final task accuracy to better measure true retention.

Load-bearing premise

The recoverability analysis rests on treating individual SAE latents as reliable proxies for distinct visual patterns combined with the assumption that linear combinations can recover seemingly lost information.

What would settle it

An experiment showing that linear probes on 'recovered' latents fail to improve accuracy on prior tasks, or that the latents do not consistently map to the same visual patterns across tasks, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.16374 by Christopher Kanan, Kamil Faber, Katarzyna Filus, Roberto Corizzo.

**Figure 2.** Figure 2: Concept activation analysis: Active concept count, (seeming) deletion ratio and regained concept statistics for 2seq-CIFAR10 and 2seq-tiny-ImageNet. mass is similar across settings, the regained count ratio is the highest for LwF, followed by EWC. It suggests that these strategies may preserve a more linearly recoverable fine-grained space. Overall, these results indicate that many apparently forgotten con… view at source ↗

**Figure 3.** Figure 3: Deletion ratio for all tasks of 10seq-tiny-ImageNet throughout the continual training. (a) SGD (b) LwF (c) EWC (d) DER++ [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Regained concept count ratio for all tasks of 10seq-tiny-ImageNet throughout the continual training after the linear translation. We use the same legend as in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Mean balanced accuracy and F1 score along with their distributions for the concept prediction probe on 2seq-CIFAR10 and 2seq-tiny-ImageNet. (a) SGD - Mean balanced accuracy (b) SGD - Mean F1 score (c) LwF - Mean balanced accuracy (d) LwF - Mean F1 score (e) EWC - Mean balanced accuracy (f) EWC - Mean F1 score (g) DER++ - Mean balanced accuracy (h) DER++ - Mean F1 score [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 6.** Figure 6: Mean balanced accuracy and mean F1 score for the concept prediction probe on 10seq-tiny-ImageNet for all tasks throughout continual training. We use the same legend as in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Accuracy of probes predicting 2seq-CIFAR10 and 2seq-tiny-ImageNet task 0 classes from the continual model features (representation-) and SAE latent activations (concept-level). 6 Conclusion In this paper, we proposed an SAE-based framework that defines a fixed, more disentangled concept space to analyze forgetting in continual learning through task-specific knowledge preservation and recoverability under a… view at source ↗

read the original abstract

Continual learning studies how models can adapt to new tasks while retaining previously acquired knowledge. Although a broad spectrum of methods has been proposed to mitigate catastrophic forgetting, the field remains predominantly performance-driven, with limited insight into what forgetting actually corresponds to within the vision model's representation space. Prior work has primarily analyzed forgetting through task-level performance or coarse measures of representational drift, without disentangling output-level accessibility from changes in finer-grained internal structure. To this end, we propose a diagnostic framework that leverages Sparse Autoencoders (SAEs) to define a task-anchored latent feature space, enabling analysis of how task-specific information evolves at a finer granularity, where individual SAE latents are treated as concept proxies for recurring and relatively disentangled visual patterns in the model's internal computations. Within this framework, we decompose forgetting into apparent concept deletion, recoverability, and decodability. We show that a large portion of seemingly lost concept-level information can often be recovered under linearity assumption, with concept decodability degrading as more tasks are introduced. Overall, our findings suggest that a significant part of concept-level forgetting can be attributed to changes in the representational accessibility rather than complete information erasure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses SAEs to decompose concept-level forgetting into deletion versus recoverability under a linearity assumption, claiming much apparent loss is an accessibility issue rather than erasure.

read the letter

The main point is that this work tries to move past coarse performance metrics in continual learning by using sparse autoencoders to build a task-anchored latent space. Individual SAE latents serve as proxies for recurring visual patterns, and the authors decompose forgetting into apparent concept deletion, recoverability via linear probes, and decodability. They conclude that a sizable share of what looks like lost information can be recovered, so the problem is often one of access rather than total erasure from the representation space. That decomposition is the clearest new element compared with earlier analyses that stayed at task accuracy or broad drift measures. The framework itself is a reasonable attempt to get finer-grained insight into internal changes across tasks. It does separate output-level effects from representational ones in a way that could eventually support more precise mitigation ideas. The approach is internally consistent on its own terms and shows honest engagement with the limits of existing forgetting diagnostics. The soft spots sit mainly in the untested premises. The recoverability numbers depend on the linearity assumption for recovery and on SAE latents actually indexing stable, disentangled concepts that persist across tasks. Neither is validated in the abstract, and if the latents instead latch onto task-specific artifacts the attribution to accessibility would not hold. There are also no quantitative results, datasets, or error bars shown yet, which makes it difficult to judge effect sizes or robustness. This is the kind of paper that would interest people already working on representation-level analysis in continual learning rather than benchmark chasers. A reader who wants diagnostic tools beyond accuracy curves could extract useful framing from the setup, even before the experiments are fully convincing. It deserves peer review because the core idea is distinct enough to merit referee input on the experimental validation and assumption checks, though it will need substantial strengthening on the empirical side to stand on its own.

Referee Report

3 major / 2 minor

Summary. The paper proposes a diagnostic framework for concept-level forgetting in supervised continual learning. It uses Sparse Autoencoders (SAEs) to construct a task-anchored latent feature space in which individual latents act as proxies for recurring, relatively disentangled visual patterns. Forgetting is decomposed into apparent concept deletion, recoverability (under a linearity assumption), and decodability. The central empirical claim is that a large fraction of seemingly lost concept information remains recoverable via linear probes, implying that much of the observed forgetting reflects reduced representational accessibility rather than outright erasure; decodability is reported to degrade as the number of tasks increases.

Significance. If the recoverability decomposition and accessibility interpretation hold after validation, the work supplies a finer-grained mechanistic account of forgetting that goes beyond task-level accuracy or coarse representational drift. This could guide the development of continual-learning methods that explicitly target accessibility rather than assuming erasure. The SAE-based proxy approach is a concrete methodological contribution that may be reusable in other representation analyses.

major comments (3)

[§3.2] §3.2 (definition of task-anchored latent feature space): the claim that SAE latents reliably index recurring and relatively disentangled visual patterns that persist across tasks is introduced without independent verification (no ablation on SAE training, no human or automated concept-consistency checks across task boundaries). Because the entire deletion/recoverability/decodability decomposition rests on these latents, the quantitative attribution to accessibility is not yet load-bearing.
[§5] §5 (recoverability analysis): recoverability percentages are obtained under an explicit linearity assumption for linear probes on the SAE latents. No experiment tests whether non-linear recovery would materially change the reported fractions; if non-linear structure is required, the accessibility-versus-erasure distinction does not follow from the current metrics.
[Results] Results section / Table reporting recoverability: the manuscript presents aggregate recoverability figures but does not report per-run variance, confidence intervals, or sensitivity to SAE hyperparameters and task ordering. This makes it difficult to judge whether the “large portion” claim is robust or sensitive to modeling choices.

minor comments (2)

[Abstract] Abstract: quantitative recoverability percentages, dataset names, and number of tasks are omitted; adding one or two concrete numbers would improve readability.
[§4] Notation: the precise formulas for the deletion, recoverability, and decodability metrics should be stated in a single early subsection rather than scattered across the decomposition paragraph.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the validation needs for our SAE-based diagnostic framework. We respond to each major comment below and have revised the manuscript accordingly to include additional verification, non-linear experiments, and statistical reporting.

read point-by-point responses

Referee: [§3.2] §3.2 (definition of task-anchored latent feature space): the claim that SAE latents reliably index recurring and relatively disentangled visual patterns that persist across tasks is introduced without independent verification (no ablation on SAE training, no human or automated concept-consistency checks across task boundaries). Because the entire deletion/recoverability/decodability decomposition rests on these latents, the quantitative attribution to accessibility is not yet load-bearing.

Authors: We agree that stronger verification of the SAE latents as proxies strengthens the foundation of the decomposition. In the revision we add an ablation varying SAE sparsity and dictionary size, plus automated consistency checks via activation cosine similarity and reconstruction overlap across task boundaries. These support that many latents track recurring patterns. We also expand the discussion to note the proxy nature of the approach and the practical limits of exhaustive human validation. revision: yes
Referee: [§5] §5 (recoverability analysis): recoverability percentages are obtained under an explicit linearity assumption for linear probes on the SAE latents. No experiment tests whether non-linear recovery would materially change the reported fractions; if non-linear structure is required, the accessibility-versus-erasure distinction does not follow from the current metrics.

Authors: The linearity assumption is intentional, as linear probes provide a direct, standard measure of representational accessibility. To address the concern we have added non-linear probe experiments (two-layer MLPs) in the revised §5. Non-linear recovery yields modestly higher fractions, yet the core observation of substantial recoverability persists and the accessibility interpretation remains supported. We have updated the text to present both linear and non-linear results side-by-side. revision: yes
Referee: Results section / Table reporting recoverability: the manuscript presents aggregate recoverability figures but does not report per-run variance, confidence intervals, or sensitivity to SAE hyperparameters and task ordering. This makes it difficult to judge whether the “large portion” claim is robust or sensitive to modeling choices.

Authors: We accept that aggregate figures alone limit assessment of robustness. The revised results section and tables now report per-run standard deviations, 95% bootstrap confidence intervals, and sensitivity sweeps over SAE hyperparameters (sparsity, width) and task permutations. These analyses confirm that the reported recoverability levels remain high and stable across the tested variations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; decomposition defined independently of fitted outputs

full rationale

The paper introduces a diagnostic framework that defines a task-anchored SAE latent space and then decomposes concept-level forgetting into deletion, recoverability, and decodability as distinct analytical categories. Recoverability is measured by applying a linearity assumption to probe whether information can be recovered from the latents; this is an empirical measurement step rather than a quantity that reduces by construction to parameters fitted on the target metrics. No equations or definitions are shown where a prediction is statistically forced by prior fitting on the same data, nor is the central attribution to accessibility justified solely by self-citation chains. The SAE proxies and linearity assumption are introduced as modeling choices at the framework definition stage, but the resulting percentages are not tautological with those choices. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework depends on the domain assumption that SAE latents can serve as proxies for visual concepts and on the linearity assumption for recoverability analysis; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Individual SAE latents serve as proxies for recurring and relatively disentangled visual patterns in the model's internal computations
Invoked when defining the task-anchored latent feature space for concept-level analysis.
domain assumption Linearity assumption allows recovery of seemingly lost concept-level information
Used to support the claim that a large portion of lost information can be recovered.

pith-pipeline@v0.9.0 · 5741 in / 1335 out tokens · 41597 ms · 2026-05-20T22:03:50.515127+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We decompose forgetting into apparent concept deletion, recoverability, and decodability... a large portion of seemingly lost concept-level information can often be recovered under linearity assumption
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

individual SAE latents are treated as concept proxies for recurring and relatively disentangled visual patterns

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 3 internal anchors

[1]

Aswal and C

D. Aswal and C. Hudelot. ConceptGuard: Neuro-Symbolic Safety Guardrails via Sparse Interpretable Jailbreak Concepts, 2025

work page 2025
[2]

Mechanistic Interpretability for AI Safety -- A Review

L. Bereska and E. Gavves. Mechanistic Interpretability for AI Safety – A Review.arXiv preprint arXiv:2404.14082, 2024

work page internal anchor Pith review arXiv 2024
[3]

A. Bian, W. Li, H. Yuan, C. Yu, M. Wang, Z. Zhao, A. Lu, P. Ji, and T. Feng. Make Continual Learning Stronger via C-Flat.Advances in Neural Information Processing Systems, 37:7608– 7630, 2024

work page 2024
[4]

Boschini, L

M. Boschini, L. Bonicelli, P. Buzzega, A. Porrello, and S. Calderara. Class-Incremental Continual Learning into the eXtended DER-verse.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

work page 2022
[5]

Buzzega, M

P. Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara. Dark Experience for General Continual Learning: a Strong, Simple Baseline.Advances in Neural Information Processing Systems, 33:15920–15930, 2020

work page 2020
[6]

Davari, N

M. Davari, N. Asadi, S. Mudur, R. Aljundi, and E. Belilovsky. Probing Representation Forgetting in Supervised and Unsupervised Continual Learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16712–16721, 2022

work page 2022
[7]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. InIEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

work page 2009
[8]

Dunefsky, P

J. Dunefsky, P. Chlenski, and N. Nanda. Transcoders Find Interpretable LLM Feature Circuits. Advances in Neural Information Processing Systems, 37:24375–24410, 2024

work page 2024
[9]

Toy Models of Superposition

N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen, et al. Toy models of superposition.arXiv preprint arXiv:2209.10652, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[10]

Faber, C

K. Faber, C. Kanan, V . Lomonaco, and R. Corizzo. Continual anomaly detection: A compre- hensive survey and research roadmap.Preprints, 2026

work page 2026
[11]

Gallifant, S

J. Gallifant, S. Chen, K. Sasse, H. Aerts, T. Hartvigsen, and D. Bitterman. Sparse Autoencoder Features for Classifications and Transferability. InConference on Empirical Methods in Natural Language Processing, pages 29927–29951, 2025

work page 2025
[12]

Q. Gu, D. Shim, and F. Shkurti. Preserving linear separability in continual learning by backward feature projection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24286–24295, 2023

work page 2023
[13]

K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016
[14]

J. Hu, Z. Lian, Z. Wen, C. Li, G. Chen, X. Wen, B. Xiao, and M. Tan. Continual Knowledge Adaptation for Reinforcement Learning.arXiv preprint arXiv:2510.19314, 2025

work page arXiv 2025
[15]

Huben, H

R. Huben, H. Cunningham, L. R. Smith, A. Ewart, and L. Sharkey. Sparse Autoencoders Find Highly Interpretable Features in Language Models. InInternational Conference on Learning Representations, 2024

work page 2024
[16]

Kemker, M

R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan. Measuring catastrophic forgetting in neural networks. InAAAI Conference on Artificial Intelligence, volume 32, 2018. 10

work page 2018
[17]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

work page 2017
[18]

Kornblith, M

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton. Similarity of Neural Network Representations Revisited. InInternational Conference on Machine Learning, 2019

work page 2019
[19]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning Multiple Layers of Features from Tiny Images. 2009

work page 2009
[20]

Li and D

Z. Li and D. Hoiem. Learning without Forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2017

work page 2017
[21]

Masana, X

M. Masana, X. Liu, B. Twardowski, M. Menta, A. D. Bagdanov, and J. Van De Weijer. Class- incremental learning: survey and performance evaluation on image classification.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(5):5513–5533, 2022

work page 2022
[22]

Putting a Face to Forgetting: Continual Learning meets Mechanistic Interpretability

S. Masip, G. M. van de Ven, J. Ferrando, and T. Tuytelaars. Putting a Face to Forgetting: Continual Learning meets Mechanistic Interpretability.arXiv preprint arXiv:2601.22012, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

M. Pach, S. Karthik, Q. Bouniot, S. Belongie, and Z. Akata. Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models.Advances in Neural Information Process- ing Systems, 2025

work page 2025
[24]

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter. Continual Lifelong Learning with Neural Networks: A Review.Neural Networks, 113:54–71, 2019

work page 2019
[25]

Rolnick, A

D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne. Experience Replay for Continual Learning.Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[26]

Soutif-Cormerais, A

A. Soutif-Cormerais, A. Carta, A. Cossu, J. Hurtado, V . Lomonaco, J. Van de Weijer, and H. Hemati. A Comprehensive Empirical Evaluation on Online Continual Learning. InIEEE/CVF International Conference on Computer Vision, pages 3518–3528, 2023

work page 2023
[27]

Verwimp, R

E. Verwimp, R. Aljundi, S. Ben-David, M. Bethge, A. Cossu, A. Gepperth, et al. Continual Learning: Applications and the Road Forward.Transactions on Machine Learning Research, 2024

work page 2024
[28]

Raw before filtered

L. Wang, X. Zhang, H. Su, and J. Zhu. A Comprehensive Survey of Continual Learning: Theory, Method and Application.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5362–5383, 2024. 11 A Concept definition In this work, we understood aconceptas a computational representation of a shared, recurring pattern that helps a model structure i...

work page 2024

[1] [1]

Aswal and C

D. Aswal and C. Hudelot. ConceptGuard: Neuro-Symbolic Safety Guardrails via Sparse Interpretable Jailbreak Concepts, 2025

work page 2025

[2] [2]

Mechanistic Interpretability for AI Safety -- A Review

L. Bereska and E. Gavves. Mechanistic Interpretability for AI Safety – A Review.arXiv preprint arXiv:2404.14082, 2024

work page internal anchor Pith review arXiv 2024

[3] [3]

A. Bian, W. Li, H. Yuan, C. Yu, M. Wang, Z. Zhao, A. Lu, P. Ji, and T. Feng. Make Continual Learning Stronger via C-Flat.Advances in Neural Information Processing Systems, 37:7608– 7630, 2024

work page 2024

[4] [4]

Boschini, L

M. Boschini, L. Bonicelli, P. Buzzega, A. Porrello, and S. Calderara. Class-Incremental Continual Learning into the eXtended DER-verse.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

work page 2022

[5] [5]

Buzzega, M

P. Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara. Dark Experience for General Continual Learning: a Strong, Simple Baseline.Advances in Neural Information Processing Systems, 33:15920–15930, 2020

work page 2020

[6] [6]

Davari, N

M. Davari, N. Asadi, S. Mudur, R. Aljundi, and E. Belilovsky. Probing Representation Forgetting in Supervised and Unsupervised Continual Learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16712–16721, 2022

work page 2022

[7] [7]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. InIEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

work page 2009

[8] [8]

Dunefsky, P

J. Dunefsky, P. Chlenski, and N. Nanda. Transcoders Find Interpretable LLM Feature Circuits. Advances in Neural Information Processing Systems, 37:24375–24410, 2024

work page 2024

[9] [9]

Toy Models of Superposition

N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen, et al. Toy models of superposition.arXiv preprint arXiv:2209.10652, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[10] [10]

Faber, C

K. Faber, C. Kanan, V . Lomonaco, and R. Corizzo. Continual anomaly detection: A compre- hensive survey and research roadmap.Preprints, 2026

work page 2026

[11] [11]

Gallifant, S

J. Gallifant, S. Chen, K. Sasse, H. Aerts, T. Hartvigsen, and D. Bitterman. Sparse Autoencoder Features for Classifications and Transferability. InConference on Empirical Methods in Natural Language Processing, pages 29927–29951, 2025

work page 2025

[12] [12]

Q. Gu, D. Shim, and F. Shkurti. Preserving linear separability in continual learning by backward feature projection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24286–24295, 2023

work page 2023

[13] [13]

K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016

[14] [14]

J. Hu, Z. Lian, Z. Wen, C. Li, G. Chen, X. Wen, B. Xiao, and M. Tan. Continual Knowledge Adaptation for Reinforcement Learning.arXiv preprint arXiv:2510.19314, 2025

work page arXiv 2025

[15] [15]

Huben, H

R. Huben, H. Cunningham, L. R. Smith, A. Ewart, and L. Sharkey. Sparse Autoencoders Find Highly Interpretable Features in Language Models. InInternational Conference on Learning Representations, 2024

work page 2024

[16] [16]

Kemker, M

R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan. Measuring catastrophic forgetting in neural networks. InAAAI Conference on Artificial Intelligence, volume 32, 2018. 10

work page 2018

[17] [17]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

work page 2017

[18] [18]

Kornblith, M

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton. Similarity of Neural Network Representations Revisited. InInternational Conference on Machine Learning, 2019

work page 2019

[19] [19]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning Multiple Layers of Features from Tiny Images. 2009

work page 2009

[20] [20]

Li and D

Z. Li and D. Hoiem. Learning without Forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2017

work page 2017

[21] [21]

Masana, X

M. Masana, X. Liu, B. Twardowski, M. Menta, A. D. Bagdanov, and J. Van De Weijer. Class- incremental learning: survey and performance evaluation on image classification.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(5):5513–5533, 2022

work page 2022

[22] [22]

Putting a Face to Forgetting: Continual Learning meets Mechanistic Interpretability

S. Masip, G. M. van de Ven, J. Ferrando, and T. Tuytelaars. Putting a Face to Forgetting: Continual Learning meets Mechanistic Interpretability.arXiv preprint arXiv:2601.22012, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[23] [23]

M. Pach, S. Karthik, Q. Bouniot, S. Belongie, and Z. Akata. Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models.Advances in Neural Information Process- ing Systems, 2025

work page 2025

[24] [24]

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter. Continual Lifelong Learning with Neural Networks: A Review.Neural Networks, 113:54–71, 2019

work page 2019

[25] [25]

Rolnick, A

D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne. Experience Replay for Continual Learning.Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[26] [26]

Soutif-Cormerais, A

A. Soutif-Cormerais, A. Carta, A. Cossu, J. Hurtado, V . Lomonaco, J. Van de Weijer, and H. Hemati. A Comprehensive Empirical Evaluation on Online Continual Learning. InIEEE/CVF International Conference on Computer Vision, pages 3518–3528, 2023

work page 2023

[27] [27]

Verwimp, R

E. Verwimp, R. Aljundi, S. Ben-David, M. Bethge, A. Cossu, A. Gepperth, et al. Continual Learning: Applications and the Road Forward.Transactions on Machine Learning Research, 2024

work page 2024

[28] [28]

Raw before filtered

L. Wang, X. Zhang, H. Su, and J. Zhu. A Comprehensive Survey of Continual Learning: Theory, Method and Application.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5362–5383, 2024. 11 A Concept definition In this work, we understood aconceptas a computational representation of a shared, recurring pattern that helps a model structure i...

work page 2024