HydraCIL: Decoupled Class-Incremental Learning through Prototype-Guided Multi-Head Classifiers

Daniel Vila-Cruz; Laura Mor\'an-Fern\'andez; Ver\'onica Bol\'on-Canedo

arxiv: 2606.09960 · v1 · pith:FB5M7PCJnew · submitted 2026-06-08 · 💻 cs.LG · cs.AI

HydraCIL: Decoupled Class-Incremental Learning through Prototype-Guided Multi-Head Classifiers

Daniel Vila-Cruz , Laura Mor\'an-Fern\'andez , Ver\'onica Bol\'on-Canedo This is my paper

Pith reviewed 2026-06-27 17:01 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords class-incremental learningcontinual learningprototype-guided classifiersmulti-head classifiersfrozen backboneedge AIefficient continual learning

0 comments

The pith

HydraCIL freezes the backbone once and trains only lightweight task-specific heads selected by prototypes at inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HydraCIL as a class-incremental learning approach that decouples feature extraction from classifier training by keeping a pre-trained backbone frozen. For each new task, features are extracted once and a dedicated lightweight head is learned, with class prototypes used to route inputs to the correct head during testing. This design targets embedded and resource-limited settings where repeated full-model retraining is impractical. Experiments on CIFAR-100, ImageNet-100, CoRe50, and Flowers102 show the method reaches or exceeds the accuracy of prior continual learning techniques while using far less training time and energy.

Core claim

HydraCIL performs class-incremental learning by extracting features once from a frozen backbone and training a separate lightweight classifier head for each task, using prototypes to select the appropriate head at inference time and thereby avoiding catastrophic forgetting without retraining the feature extractor.

What carries the argument

Prototype-guided multi-head classifiers, in which each task receives its own head and learned prototypes determine head selection at test time.

If this is right

Accuracy matches or exceeds prior state-of-the-art CIL methods on CIFAR-100, ImageNet-100, CoRe50, and Flowers102.
Training time drops because the backbone is never retrained after the initial pass.
Carbon footprint decreases due to the reduction in repeated forward and backward passes through the full model.
The approach supports rapid adaptation on embedded devices with limited compute and energy budgets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The fixed backbone may improve stability across many tasks by preventing drift in earlier representations.
Memory growth remains linear with the number of heads but stays modest because each head is lightweight.
The prototype routing mechanism could be tested on streaming data where task boundaries are unknown.

Load-bearing premise

That features from a frozen backbone remain sufficiently discriminative for new classes without any backbone adaptation or task-specific feature tuning.

What would settle it

On a benchmark where new classes require feature representations that differ markedly from those produced by the frozen backbone, HydraCIL accuracy falls substantially below methods that adapt the full network.

Figures

Figures reproduced from arXiv: 2606.09960 by Daniel Vila-Cruz, Laura Mor\'an-Fern\'andez, Ver\'onica Bol\'on-Canedo.

**Figure 1.** Figure 1: HydraCIL (k = 2) performance measured across different tasks splits on CIFAR-100 dataset. Table IX illustrates the performance of HydraCIL when executed on a CPU compared to a GPU. While the GPU naturally offers accelerated processing, the CPU-based training remains relatively efficient even compared to RMM-FOSTER and DER executed on GPU. TABLE IX RESULTS OF HYDRACIL TRAINED USING A CPU ON CIFAR-100. Tasks… view at source ↗

read the original abstract

We present HydraCIL, a decoupled continual learning model based on prototype-guided multi-head classifiers, targeting sustainable deployment in embedded and resource-constrained environments. While most Class-Incremental Learning (CIL) methods rely on powerful hardware and long retraining cycles, real-world systems, such as robots or edge AI devices, must adapt quickly with limited resources. HydraCIL addresses this gap by freezing the backbone and decoupling feature extraction from learning. For each task, features are extracted once and a lightweight, task-specific classifier head is created, avoiding costly backbone retraining. At inference, HydraCIL selects the appropriate head via similarity with prototypes. Experiments on CIFAR-100, ImageNet-100, CoRe50, and Flowers102 datasets show that HydraCIL matches or outperforms state-of-the-art CIL methods while significantly reducing training time and carbon footprint, making it a practical solution for continual learning in real-world and embedded settings, where energy efficiency and rapid adaptation are critical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HydraCIL freezes the backbone and swaps in prototype-selected heads to cut compute in CIL, but the whole thing depends on pre-trained features staying good enough for new classes.

read the letter

The key point is that this paper freezes a pre-trained backbone, pulls features once per task, trains a lightweight head, and picks the right head at inference by prototype similarity. That decoupling is the main move, aimed at edge devices where full retraining costs too much time and energy.

The efficiency focus is the part that lands. The authors target real constraints in robotics and embedded AI, and they run the method on CIFAR-100, ImageNet-100, CoRe50, and Flowers102 while claiming parity with existing CIL work plus big drops in training time and carbon footprint. If the numbers check out in the full experiments, the architecture is simple enough that practitioners could actually try it.

The soft spot is exactly the one the stress-test note flags. The method assumes the frozen features remain linearly separable for every new class without any backbone updates. Standard benchmarks overlap heavily with common pre-training data, so they do not stress what happens under distribution shift. If a new class lands in a region where the fixed extractor is weak, the multi-head setup cannot recover accuracy. The abstract also gives no concrete metrics, baseline details, or protocol, which makes it hard to judge how close the claimed parity really is.

The paper engages the resource problem directly and avoids obvious circular claims. It is aimed at people who need continual learning on constrained hardware rather than at theorists chasing marginal accuracy gains. A reader working on practical deployment could extract the architecture and test it themselves.

I would send it to peer review. The efficiency claim is worth referee scrutiny even if the feature-quality assumption needs harder experiments.

Referee Report

1 major / 0 minor

Summary. The paper proposes HydraCIL, a decoupled class-incremental learning method that freezes a pre-trained backbone, extracts features once per task to train lightweight prototype-guided multi-head classifiers, and selects the appropriate head at inference via prototype similarity. It claims this achieves performance matching or exceeding state-of-the-art CIL methods on CIFAR-100, ImageNet-100, CoRe50, and Flowers102 while substantially reducing training time and carbon footprint, targeting resource-constrained embedded deployment.

Significance. If the accuracy claims are substantiated with rigorous metrics and the frozen-feature assumption holds, the work would offer a practical route to low-cost continual learning on edge devices by eliminating backbone retraining, directly addressing energy and latency constraints in real-world CIL applications.

major comments (1)

[Experiments] The central performance claim rests on the untested assumption that features from the frozen backbone remain sufficiently discriminative and linearly separable for new classes; the paper evaluates only on benchmarks (CIFAR-100, ImageNet-100) whose pre-training overlap is high and does not report results under distribution shift that would stress this assumption.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the importance of validating the frozen-backbone assumption under distribution shift. We address the concern directly below and commit to strengthening the experimental section.

read point-by-point responses

Referee: [Experiments] The central performance claim rests on the untested assumption that features from the frozen backbone remain sufficiently discriminative and linearly separable for new classes; the paper evaluates only on benchmarks (CIFAR-100, ImageNet-100) whose pre-training overlap is high and does not report results under distribution shift that would stress this assumption.

Authors: We acknowledge that CIFAR-100 and ImageNet-100 exhibit substantial pre-training overlap with typical ImageNet-initialized backbones. However, the manuscript also reports results on CoRe50 (robotic manipulation objects) and Flowers102 (fine-grained natural images), which introduce measurable domain shifts relative to standard pre-training corpora. To directly test the assumption under stronger distribution shift, we will add new experiments in the revision using datasets with greater domain gaps (e.g., medical or satellite imagery) and will include quantitative measures of feature separability (e.g., linear probe accuracy and prototype cosine similarity statistics) for both in-distribution and shifted tasks. These additions will be placed in a new subsection of the experimental analysis. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed derivation or predictions

full rationale

The paper describes an engineering method (freeze backbone, extract features once per task, train lightweight prototype-guided heads, select at inference by similarity) and validates it empirically on standard benchmarks. No equations, first-principles derivations, or statistical predictions are presented that reduce by construction to fitted inputs or self-citations. The central design choices are explicit architectural decisions, not outputs claimed to be derived from prior results within the paper. External benchmarks provide independent falsifiability, so the work is self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger entries; no free parameters, axioms, or invented entities are explicitly stated.

axioms (1)

domain assumption Frozen backbone features remain adequate for new classes without retraining
Central to the decoupling claim; stated implicitly by the decision to freeze the backbone.

pith-pipeline@v0.9.1-grok · 5719 in / 1024 out tokens · 16065 ms · 2026-06-27T17:01:21.785663+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 3 canonical work pages

[1]

Class-incremental learning: A survey,

D.-W. Zhou, Q.-W. Wang, Z.-H. Qi, H.-J. Ye, D.-C. Zhan, and Z. Liu, “Class-incremental learning: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024
[2]

Vision transformers on the edge: A comprehensive survey of model compression and acceleration strategies,

S. Saha and L. Xu, “Vision transformers on the edge: A comprehensive survey of model compression and acceleration strategies,”Neurocom- puting, p. 130417, 2025

2025
[3]

Der: Dynamically expandable representation for class incremental learning,

S. Yan, J. Xie, and X. He, “Der: Dynamically expandable representation for class incremental learning,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2021, pp. 3014–3023

2021
[4]

Foster: Feature boosting and compression for class-incremental learning,

F.-Y . Wang, D.-W. Zhou, H.-J. Ye, and D.-C. Zhan, “Foster: Feature boosting and compression for class-incremental learning,” inEuropean conference on computer vision. Springer, 2022, pp. 398–414

2022
[5]

A model or 603 exemplars: Towards memory-efficient class-incremental learning,

D.-W. Zhou, Q.-W. Wang, H.-J. Ye, and D.-C. Zhan, “A model or 603 exemplars: Towards memory-efficient class-incremental learning,”arXiv preprint arXiv:2205.13218, 2022

work page arXiv 2022
[6]

Rmm: Reinforced memory manage- ment for class-incremental learning,

Y . Liu, B. Schiele, and Q. Sun, “Rmm: Reinforced memory manage- ment for class-incremental learning,”Advances in neural information processing systems, vol. 34, pp. 3478–3490, 2021

2021
[7]

Self-sustaining representation expansion for non-exemplar class-incremental learning,

K. Zhu, W. Zhai, Y . Cao, J. Luo, and Z.-J. Zha, “Self-sustaining representation expansion for non-exemplar class-incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9296–9305

2022
[8]

Fetril: Feature translation for exemplar-free class-incremental learning,

G. Petit, A. Popescu, H. Schindler, D. Picard, and B. Delezoide, “Fetril: Feature translation for exemplar-free class-incremental learning,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2023, pp. 3911–3920

2023
[9]

Representation robustness and feature expansion for exemplar-free class-incremental learning,

Y . Luo, H. Ge, Y . Liu, and C. Wu, “Representation robustness and feature expansion for exemplar-free class-incremental learning,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 7, pp. 5306–5320, 2023

2023
[10]

How efficient are today’s continual learning algorithms?

M. Y . Harun, J. Gallardo, T. L. Hayes, and C. Kanan, “How efficient are today’s continual learning algorithms?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2431–2436

2023
[11]

Remind your neural network to prevent catastrophic forgetting,

T. L. Hayes, K. Kafle, R. Shrestha, M. Acharya, and C. Kanan, “Remind your neural network to prevent catastrophic forgetting,” inEuropean conference on computer vision. Springer, 2020, pp. 466–483

2020
[12]

Efficient single-step framework for incremental class learning in neural networks,

A. Dopico-Castro, O. Fontenla-Romero, B. Guijarro-Berdiñas, and A. Alonso-Betanzos, “Efficient single-step framework for incremental class learning in neural networks,”arXiv preprint arXiv:2509.11285, 2025

work page arXiv 2025
[13]

Courty, V

B. Courty, V . Schmidt, S. Luccioni, K. Goyal, M. Coutarel, B. Feld, J. Lecourt, L. Connell, A. Saboni, Inimaz, supatomic, M. Léval, L. Blanche, A. Cruveiller, ouminasara, F. Zhao, A. Joshi, A. Bogroff, H. de Lavoreille, N. Laskaris, E. Abati, D. Blank, Z. Wang, A. Catovic, M. Alençon, M. St˛ echły, C. Bauer, L. O. N. de Araújo, JPW, and MinervaBooks, “ml...

work page doi:10.5281/zenodo.11171501 2024
[14]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255

2009
[15]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

2009
[16]

Core50: a new dataset and benchmark for continuous object recognition,

V . Lomonaco and D. Maltoni, “Core50: a new dataset and benchmark for continuous object recognition,” inConference on robot learning. PMLR, 2017, pp. 17–26

2017
[17]

Automated flower classification over a large number of classes,

M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in2008 Sixth Indian conference on computer vision, graphics & image processing. IEEE, 2008, pp. 722–729. 6

2008
[18]

Fast and frugal transfer learning via precomputed features and adaptive normal- ization,

D. Vila-Cruz, V . Bolón-Canedo, and L. Morán-Fernández, “Fast and frugal transfer learning via precomputed features and adaptive normal- ization,” inInternational Conference on Intelligent Data Engineering and Automated Learning. Springer, 2025, pp. 143–149. 7

2025

[1] [1]

Class-incremental learning: A survey,

D.-W. Zhou, Q.-W. Wang, Z.-H. Qi, H.-J. Ye, D.-C. Zhan, and Z. Liu, “Class-incremental learning: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024

[2] [2]

Vision transformers on the edge: A comprehensive survey of model compression and acceleration strategies,

S. Saha and L. Xu, “Vision transformers on the edge: A comprehensive survey of model compression and acceleration strategies,”Neurocom- puting, p. 130417, 2025

2025

[3] [3]

Der: Dynamically expandable representation for class incremental learning,

S. Yan, J. Xie, and X. He, “Der: Dynamically expandable representation for class incremental learning,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2021, pp. 3014–3023

2021

[4] [4]

Foster: Feature boosting and compression for class-incremental learning,

F.-Y . Wang, D.-W. Zhou, H.-J. Ye, and D.-C. Zhan, “Foster: Feature boosting and compression for class-incremental learning,” inEuropean conference on computer vision. Springer, 2022, pp. 398–414

2022

[5] [5]

A model or 603 exemplars: Towards memory-efficient class-incremental learning,

D.-W. Zhou, Q.-W. Wang, H.-J. Ye, and D.-C. Zhan, “A model or 603 exemplars: Towards memory-efficient class-incremental learning,”arXiv preprint arXiv:2205.13218, 2022

work page arXiv 2022

[6] [6]

Rmm: Reinforced memory manage- ment for class-incremental learning,

Y . Liu, B. Schiele, and Q. Sun, “Rmm: Reinforced memory manage- ment for class-incremental learning,”Advances in neural information processing systems, vol. 34, pp. 3478–3490, 2021

2021

[7] [7]

Self-sustaining representation expansion for non-exemplar class-incremental learning,

K. Zhu, W. Zhai, Y . Cao, J. Luo, and Z.-J. Zha, “Self-sustaining representation expansion for non-exemplar class-incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9296–9305

2022

[8] [8]

Fetril: Feature translation for exemplar-free class-incremental learning,

G. Petit, A. Popescu, H. Schindler, D. Picard, and B. Delezoide, “Fetril: Feature translation for exemplar-free class-incremental learning,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2023, pp. 3911–3920

2023

[9] [9]

Representation robustness and feature expansion for exemplar-free class-incremental learning,

Y . Luo, H. Ge, Y . Liu, and C. Wu, “Representation robustness and feature expansion for exemplar-free class-incremental learning,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 7, pp. 5306–5320, 2023

2023

[10] [10]

How efficient are today’s continual learning algorithms?

M. Y . Harun, J. Gallardo, T. L. Hayes, and C. Kanan, “How efficient are today’s continual learning algorithms?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2431–2436

2023

[11] [11]

Remind your neural network to prevent catastrophic forgetting,

T. L. Hayes, K. Kafle, R. Shrestha, M. Acharya, and C. Kanan, “Remind your neural network to prevent catastrophic forgetting,” inEuropean conference on computer vision. Springer, 2020, pp. 466–483

2020

[12] [12]

Efficient single-step framework for incremental class learning in neural networks,

A. Dopico-Castro, O. Fontenla-Romero, B. Guijarro-Berdiñas, and A. Alonso-Betanzos, “Efficient single-step framework for incremental class learning in neural networks,”arXiv preprint arXiv:2509.11285, 2025

work page arXiv 2025

[13] [13]

Courty, V

B. Courty, V . Schmidt, S. Luccioni, K. Goyal, M. Coutarel, B. Feld, J. Lecourt, L. Connell, A. Saboni, Inimaz, supatomic, M. Léval, L. Blanche, A. Cruveiller, ouminasara, F. Zhao, A. Joshi, A. Bogroff, H. de Lavoreille, N. Laskaris, E. Abati, D. Blank, Z. Wang, A. Catovic, M. Alençon, M. St˛ echły, C. Bauer, L. O. N. de Araújo, JPW, and MinervaBooks, “ml...

work page doi:10.5281/zenodo.11171501 2024

[14] [14]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255

2009

[15] [15]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

2009

[16] [16]

Core50: a new dataset and benchmark for continuous object recognition,

V . Lomonaco and D. Maltoni, “Core50: a new dataset and benchmark for continuous object recognition,” inConference on robot learning. PMLR, 2017, pp. 17–26

2017

[17] [17]

Automated flower classification over a large number of classes,

M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in2008 Sixth Indian conference on computer vision, graphics & image processing. IEEE, 2008, pp. 722–729. 6

2008

[18] [18]

Fast and frugal transfer learning via precomputed features and adaptive normal- ization,

D. Vila-Cruz, V . Bolón-Canedo, and L. Morán-Fernández, “Fast and frugal transfer learning via precomputed features and adaptive normal- ization,” inInternational Conference on Intelligent Data Engineering and Automated Learning. Springer, 2025, pp. 143–149. 7

2025