CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning

Amirhosein Javadi; Tara Javidi; Tsui-Wei Weng; Tuomas Oikarinen

arxiv: 2604.14519 · v1 · submitted 2026-04-16 · 💻 cs.LG · cs.CV

CI-CBM: Class-Incremental Concept Bottleneck Model for Interpretable Continual Learning

Amirhosein Javadi , Tuomas Oikarinen , Tara Javidi , Tsui-Wei Weng This is my paper

Pith reviewed 2026-05-10 12:27 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords class-incremental learningconcept bottleneck modelcontinual learninginterpretable AIcatastrophic forgettingconcept regularizationpseudo-concept generation

0 comments

The pith

CI-CBM maintains human-interpretable concepts in class-incremental learning while matching black-box performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a new method called Class-Incremental Concept Bottleneck Model to handle the challenge of learning new classes over time without forgetting previous ones, all while keeping decisions explainable in terms of human concepts. It achieves this through techniques like concept regularization to keep concepts consistent and pseudo-concept generation to add new information smoothly. A reader would care because most continual learning methods either lose accuracy or become hard to interpret, but this claims to do both well. Experiments across seven datasets back the claim that it performs like black-box models and improves on previous interpretable ones by 36 percent accuracy on average.

Core claim

CI-CBM leverages effective techniques, including concept regularization and pseudo-concept generation to maintain interpretable decision processes throughout incremental learning phases. Through extensive evaluation on seven datasets, it achieves comparable performance to black-box models and outperforms previous interpretable approaches in CIL, with an average 36% accuracy gain. It provides interpretable decisions on individual inputs and understandable global decision rules, and works in both pretrained and non-pretrained scenarios.

What carries the argument

The Class-Incremental Concept Bottleneck Model (CI-CBM) with concept regularization for stability and pseudo-concept generation for handling new classes.

If this is right

Interpretable decisions remain available for individual inputs and as global rules after learning new classes.
Accuracy stays comparable to black-box models across multiple datasets.
Outperforms prior interpretable methods in class-incremental learning by 36% on average.
Applies successfully whether the model backbone starts pretrained or is trained from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adapting this to other continual learning scenarios like task-incremental learning could be a natural next step.
Stable concepts might allow users to correct model behavior by modifying specific concepts without retraining.
Verifying the approach on even larger and more diverse datasets would strengthen the generality claim.

Load-bearing premise

That concept regularization and pseudo-concept generation preserve stable, meaningful human-interpretable concepts across incremental phases without systematic errors or task-specific tuning.

What would settle it

Observing that concepts drift into non-interpretable states or that accuracy no longer matches black-box levels after several incremental phases on the evaluated datasets would challenge the claim.

Figures

Figures reproduced from arXiv: 2604.14519 by Amirhosein Javadi, Tara Javidi, Tsui-Wei Weng, Tuomas Oikarinen.

**Figure 2.** Figure 2: Overview of our pipeline for Class Incremental Concept Bottleneck Model (CI-CBM). Color-coded [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the pseudo-feature generation procedure with a toy example. In Phase 1, the model [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of decision boundaries in the feature space. Black lines indicate Bayes-optimal boundaries, and colored regions denote predicted class assignments. Dashed circles represent one standard deviation from each class mean. Left: original class feature distributions, where differences in variance lead to curved boundaries between old and new classes. Right: pseudo-feature distributions generated by… view at source ↗

**Figure 5.** Figure 5: Experiment III - Average accuracy curves for CIFAR-100, TinyImageNet, and ImageNet-Subset over 10 learning phases, comparing CI-CBM with other unrestricted ResNet-based methods. et al., 2023) to train ResNet-18 and DeiT, respectively, from scratch in the initial phase. Afterward, we freeze them as the backbone for CI-CBM and incrementally learn the classes. We conduct a comparative analysis of our method a… view at source ↗

**Figure 6.** Figure 6: Visualization of model reasoning and concept contributions for an image of the Sturgeon class, [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

Catastrophic forgetting remains a fundamental challenge in continual learning, in which models often forget previous knowledge when fine-tuned on a new task. This issue is especially pronounced in class incremental learning (CIL), which is the most challenging setting in continual learning. Existing methods to address catastrophic forgetting often sacrifice either model interpretability or accuracy. To address this challenge, we introduce ClassIncremental Concept Bottleneck Model (CI-CBM), which leverage effective techniques, including concept regularization and pseudo-concept generation to maintain interpretable decision processes throughout incremental learning phases. Through extensive evaluation on seven datasets, CI-CBM achieves comparable performance to black-box models and outperforms previous interpretable approaches in CIL, with an average 36% accuracy gain. CICBM provides interpretable decisions on individual inputs and understandable global decision rules, as shown in our experiments, thereby demonstrating that human understandable concepts can be maintained during incremental learning without compromising model performance. Our approach is effective in both pretrained and non-pretrained scenarios; in the latter, the backbone is trained from scratch during the first learning phase. Code is publicly available at github.com/importAmir/CI-CBM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CI-CBM pairs concept regularization and pseudo-concept generation with bottleneck models for class-incremental learning, delivering usable accuracy gains but thin evidence that the concepts stay stable and meaningful.

read the letter

The new piece is the specific pairing of concept regularization to stabilize existing concepts and pseudo-concept generation to handle new classes, all inside a concept bottleneck setup for CIL. The abstract positions this as distinct from prior work, and the results claim it matches black-box accuracy while beating earlier interpretable CIL methods by 36% on average across seven datasets. Code release and coverage of both pretrained and from-scratch backbones are practical pluses, and they report both local decisions and global rules in the experiments. That combination addresses a real tension between forgetting and interpretability without obvious circularity in the method itself.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the Class-Incremental Concept Bottleneck Model (CI-CBM) to address catastrophic forgetting in class-incremental learning (CIL) while preserving interpretability. It combines concept regularization and pseudo-concept generation with concept bottleneck models, claiming that on seven datasets the approach achieves accuracy comparable to black-box models and an average 36% gain over prior interpretable CIL methods. The paper states that CI-CBM supports both instance-level interpretable decisions and global decision rules, works in pretrained and from-scratch settings, and releases code publicly.

Significance. If the claims regarding stable, human-interpretable concepts hold, the work would meaningfully advance interpretable continual learning by showing that interpretability need not be traded off against accuracy in the challenging CIL setting. The public code release is a clear strength that supports reproducibility.

major comments (2)

[Abstract] Abstract: the central performance claim of an 'average 36% accuracy gain' over previous interpretable approaches is presented without naming the baselines, reporting per-dataset accuracies, error bars, or statistical tests. This makes it impossible to assess whether the gain is robust or driven by particular dataset characteristics.
[Experiments] Experiments (as referenced in the abstract): no quantitative metrics for concept quality, stability, or interpretability across incremental phases (e.g., concept alignment scores, phase-to-phase consistency of global rules, or human ratings) are described, nor are ablations on regularization strength or pseudo-concept generation. Because the distinguishing contribution is the preservation of meaningful concepts rather than accuracy alone, this omission is load-bearing for the interpretability claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our results and strengthen the evaluation of interpretability. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim of an 'average 36% accuracy gain' over previous interpretable approaches is presented without naming the baselines, reporting per-dataset accuracies, error bars, or statistical tests. This makes it impossible to assess whether the gain is robust or driven by particular dataset characteristics.

Authors: We agree that the abstract would benefit from additional context on the performance claim. The detailed results—including the specific prior interpretable CIL baselines, per-dataset accuracies, error bars from multiple runs, and statistical comparisons—are fully reported in the Experiments section with tables and figures. In the revised manuscript we will update the abstract to name the primary baselines and explicitly note that the 36% average gain is computed from per-dataset results with error bars. revision: yes
Referee: [Experiments] Experiments (as referenced in the abstract): no quantitative metrics for concept quality, stability, or interpretability across incremental phases (e.g., concept alignment scores, phase-to-phase consistency of global rules, or human ratings) are described, nor are ablations on regularization strength or pseudo-concept generation. Because the distinguishing contribution is the preservation of meaningful concepts rather than accuracy alone, this omission is load-bearing for the interpretability claim.

Authors: We acknowledge that quantitative metrics would provide stronger, more direct support for the claim of preserved concept quality. Our current experiments demonstrate interpretability via explicit instance-level concept attributions and global decision rules that remain consistent across phases, with accuracy comparable to black-box models serving as supporting evidence of concept stability. We did not originally include concept alignment scores, phase-to-phase consistency metrics, human ratings, or component ablations. We will add ablations on regularization strength and pseudo-concept generation, plus quantitative measures of concept consistency (e.g., overlap of active concepts across phases), to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with independent algorithmic components and external validation

full rationale

The paper presents CI-CBM as a new algorithmic approach combining concept regularization and pseudo-concept generation for class-incremental learning. Its central claims rest on empirical results across seven datasets, including accuracy comparisons to black-box and prior interpretable baselines. No equations are provided that define a quantity in terms of itself or rename a fitted parameter as a prediction. No self-citations are invoked to justify uniqueness theorems or load-bearing assumptions. The derivation chain consists of stated techniques plus experimental measurements that can be independently reproduced or falsified, satisfying the criteria for a self-contained, non-circular contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The method rests on standard machine-learning assumptions that meaningful human concepts exist and can be extracted and regularized; no new free parameters, axioms, or invented entities are declared in the abstract.

pith-pipeline@v0.9.0 · 5512 in / 1086 out tokens · 24270 ms · 2026-05-10T12:27:26.594550+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

Length filter: Remove concepts longer than 30 characters

work page
[2]

Class similarity filter: Remove concepts with cosine similarityą0.85to any class name, using Sentence-Transformer and CLIP text embeddings

work page
[3]

This filtered set is then used to update the concept set

Redundancy filter: Remove near-duplicate concepts with cosine similarityą0.9to any earlier concept in the set. This filtered set is then used to update the concept set. All steps are automated and applied incrementally for each new phase. The full pipeline is implemented in the released codebase. A12 Impact of Distillation Regularizer Weight on Accuracy. ...

work page 2026

[1] [1]

Length filter: Remove concepts longer than 30 characters

work page

[2] [2]

Class similarity filter: Remove concepts with cosine similarityą0.85to any class name, using Sentence-Transformer and CLIP text embeddings

work page

[3] [3]

This filtered set is then used to update the concept set

Redundancy filter: Remove near-duplicate concepts with cosine similarityą0.9to any earlier concept in the set. This filtered set is then used to update the concept set. All steps are automated and applied incrementally for each new phase. The full pipeline is implemented in the released codebase. A12 Impact of Distillation Regularizer Weight on Accuracy. ...

work page 2026