Shapley Neuron Values for Continual Learning: Which Neurons Matter Most?
Pith reviewed 2026-05-20 20:24 UTC · model grok-4.3
The pith
Shapley values can identify which neurons to freeze so a network learns new tasks without forgetting old ones or storing data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present Shapley Neuron Valuation (SNV), a framework that treats neurons as players in a cooperative game and computes their marginal contribution to the network's output on prior tasks. SNV then freezes neurons with the highest values to protect earlier knowledge while leaving lower-value neurons plastic for new learning. On ImageNet-1k this yields +2.88 percent accuracy in class-incremental learning and +6.46 percent in task-incremental learning over the strongest buffer-free baseline.
What carries the argument
Shapley Neuron Valuation (SNV), which assigns each neuron a value equal to its average marginal contribution across all possible coalitions of neurons, to decide which neurons must remain unchanged during subsequent training.
If this is right
- SNV enables continual learning on large image datasets without storing past examples.
- Accuracy gains appear in both class-incremental and task-incremental protocols.
- The network size stays constant because only selected neurons are frozen rather than new layers added.
- Neuron importance is computed once per task and then used to set a binary freeze mask.
Where Pith is reading between the lines
- The same scoring could be applied to other layer types if an equivalent notion of 'neuron' is defined.
- Periodic re-calculation of Shapley values after several tasks might further reduce forgetting under distribution shift.
- Combining SNV with lightweight regularization on the plastic neurons could produce additive gains.
Load-bearing premise
That the importance ranking of neurons computed on past tasks will still mark the right neurons to protect when entirely new tasks arrive.
What would settle it
Run the same continual-learning schedule on ImageNet-1k but freeze neurons chosen at random instead of by SNV scores; if the accuracy gap over baselines vanishes, the method's advantage is not explained by the Shapley ranking.
Figures
read the original abstract
Continual learning enables neural networks to learn tasks sequentially without forgetting previously acquired knowledge. However, neural networks suffer from catastrophic forgetting, where learning new tasks degrades performance on earlier ones. We address this problem with Shapley Neuron Valuation (SNV), a principled framework that quantifies Neuron importance in continual learning, grounded in cooperative game theory. SNV selectively freezes important Neurons while keeping others plastic, enabling buffer-free continual learning without expanding architecture. Experiments on ImageNet-1k show that SNV consistently outperforms existing buffer-free methods. In particular, SNV improves accuracy by +2.88% in the class incremental learning and +6.46% in the task incremental learning scenarios compared to the second baseline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Shapley Neuron Valuation (SNV), a framework grounded in cooperative game theory to quantify the importance of individual neurons for continual learning. SNV computes Shapley values using a value function on the current task's data to rank neurons, then selectively freezes the most important ones while keeping others plastic. This approach aims to mitigate catastrophic forgetting in a buffer-free manner without expanding the network architecture. Experiments on ImageNet-1k report accuracy gains of +2.88% in class-incremental learning and +6.46% in task-incremental learning over the second baseline.
Significance. If the central claim holds, SNV would provide a principled, game-theoretic method for identifying neurons to protect across tasks, advancing buffer-free continual learning. The grounding in cooperative game theory and the scale of the ImageNet-1k experiments are strengths that could influence future work on neuron-level regularization. However, the significance hinges on whether task-local Shapley rankings generalize to prevent forgetting, which requires further substantiation.
major comments (2)
- [Abstract and §4 (Experiments)] Abstract and §4 (Experiments): The headline accuracy gains (+2.88% class-incremental, +6.46% task-incremental) are stated without derivation details for the Shapley approximation, baseline descriptions, statistical tests, error bars, or ablation results on the importance threshold or value function. This prevents assessment of whether the central claim is supported or influenced by post-hoc choices.
- [§3 (SNV framework)] §3 (SNV framework): The load-bearing assumption that Shapley values computed via value function v on the current task's data identify neurons whose freezing prevents forgetting on future tasks lacks supporting derivation or experiment. No analysis demonstrates that marginal contributions under v on task t correlate with cross-task preservation or stability under distribution shift; the selected neurons could simply be those most active on the current distribution.
minor comments (1)
- [§3] The notation for the characteristic function v(S) and how it is defined for neuron coalitions in the continual-learning setting could be clarified with an explicit equation.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive suggestions. We address the major comments point by point below, providing clarifications and outlining the revisions we will make to strengthen the manuscript. We believe these changes will better support the central claims of our work on Shapley Neuron Valuation for continual learning.
read point-by-point responses
-
Referee: [Abstract and §4 (Experiments)] The headline accuracy gains (+2.88% class-incremental, +6.46% task-incremental) are stated without derivation details for the Shapley approximation, baseline descriptions, statistical tests, error bars, or ablation results on the importance threshold or value function. This prevents assessment of whether the central claim is supported or influenced by post-hoc choices.
Authors: We appreciate this observation and agree that more detailed reporting is necessary for reproducibility and to substantiate the claims. In the revised version, we will include: (1) the specific method for approximating Shapley values, such as the number of samples or the algorithm used (e.g., Monte Carlo sampling); (2) comprehensive descriptions of the baselines, including their implementation details and hyperparameters; (3) results presented with error bars from at least 3 independent runs and statistical tests (e.g., Wilcoxon signed-rank test) to confirm significance of the gains; (4) ablation studies varying the importance threshold (e.g., freezing top 10%, 20%, 30% neurons) and different value functions v (e.g., accuracy vs. loss-based). These additions will be placed in an expanded Section 4 and supplementary material. revision: yes
-
Referee: [§3 (SNV framework)] The load-bearing assumption that Shapley values computed via value function v on the current task's data identify neurons whose freezing prevents forgetting on future tasks lacks supporting derivation or experiment. No analysis demonstrates that marginal contributions under v on task t correlate with cross-task preservation or stability under distribution shift; the selected neurons could simply be those most active on the current distribution.
Authors: This is a valid concern regarding the theoretical grounding. The SNV framework posits that neurons with high Shapley values on the current task's data are those that contribute most to the model's performance on that task, and by freezing them, we preserve the representations learned so far. While we do not provide a formal proof that these marginal contributions directly correlate with future task stability, the empirical results demonstrate that this selection leads to better retention of previous knowledge compared to baselines that do not use such principled selection. To address this, in the revision we will add a subsection in §3 providing a more detailed motivation based on the cooperative game theory interpretation, arguing that high-value neurons are critical for the function approximation on the seen data distribution. We will also include an experiment analyzing the overlap of important neurons across tasks or the forgetting rate when using SNV vs. random freezing. However, we maintain that the primary validation comes from the end-to-end performance improvements on ImageNet-1k, which show reduced catastrophic forgetting. revision: partial
Circularity Check
SNV applies standard Shapley valuation to neurons with no reduction of the continual-learning claim to fitted inputs or self-citations
full rationale
The paper grounds SNV directly in cooperative game theory by defining neuron importance via the Shapley value of a value function v computed on the current task's data distribution. This construction is independent of the target continual-learning outcome (future-task accuracy after freezing); the link between current-task marginal contributions and cross-task stability is presented as an empirical hypothesis rather than a definitional identity. No equations equate the protection mask to a fit on forgetting metrics, no self-citation supplies a uniqueness theorem that forces the method, and the reported accuracy gains are measured on held-out future tasks rather than being recovered by construction from the valuation step itself. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- importance threshold for freezing
axioms (1)
- domain assumption Shapley values computed over neuron coalitions accurately reflect contribution to task performance in a neural network
Reference graph
Works this paper leans on
-
[1]
Linear Mode Connectivity and the Lottery Ticket Hypothesis , author =. ICML , year =
-
[2]
Learning both weights and connections for efficient neural network , author=. NeurIPS , year=
-
[3]
Sokar, Ghada and Mocanu, Decebal Constantin and Pechenizkiy, Mykola , journal =
-
[4]
Gurbuz, Mustafa B and Dovrolis, Constantine , booktitle =
-
[5]
On the Discrimination and Consistency for Exemplar-Free Class Incremental Learning , author =. IJCAI , year =
- [6]
-
[7]
International Journal of Game Theory , year=
Monotonic solutions of cooperative games , author=. International Journal of Game Theory , year=
-
[8]
Contributions to the Theory of Games II , publisher =
A Value for n-Person Games , author =. Contributions to the Theory of Games II , publisher =
-
[9]
Data Shapley: Equitable Valuation of Data for Machine Learning , author =. ICML , year =
-
[10]
Neuron Shapley: Discovering the Responsible Neurons , year =
Ghorbani, Amirata and Zou, James Y , booktitle =. Neuron Shapley: Discovering the Responsible Neurons , year =
-
[11]
No Forgetting Learning: Buffer-free Continual Learning Classification , author=. 2026 , url=
work page 2026
-
[12]
Forget-free Continual Learning with Winning Subnetworks , author =. ICML , year =
-
[13]
Hyperparameters in Continual Learning: A Reality Check , author =. TMLR , year =
- [14]
-
[15]
Kingma, Diederik P. and Ba, Jimmy , booktitle =. Adam: A Method for Stochastic Optimization. , year =
- [16]
- [17]
- [18]
-
[19]
Learning Multiple Layers of Features from Tiny Images , author =. 2009 , institution =
work page 2009
-
[20]
International Conference on Artificial Intelligence and Statistics , year =
Non-stochastic Best Arm Identification and Hyperparameter Optimization , author =. International Conference on Artificial Intelligence and Statistics , year =
-
[21]
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , author=. JMLR , year=
-
[22]
Titans: Learning to Memorize at Test Time , author=. 2024 , booktitle =
work page 2024
-
[23]
A Practical Guide, 1st Ed., Cham: Springer International Publishing , year=
The EU General Data Protection Regulation (GDPR) , author=. A Practical Guide, 1st Ed., Cham: Springer International Publishing , year=
-
[24]
Prediction Error-based Classification for Class-Incremental Learning , author=. ICLR , year=
-
[25]
Zenke, Friedemann and Poole, Ben and Ganguli, Surya , booktitle=
-
[26]
Douillard, Arthur and Cord, Matthieu and Ollion, Charles and Robert, Thomas and Valle, Eduardo , title=. ECCV , year=
- [27]
-
[28]
Proceedings of the National Academy of Sciences , year=
Overcoming Catastrophic Forgetting in Neural Networks , author=. Proceedings of the National Academy of Sciences , year=
-
[29]
Learning without Forgetting , year=
Li, Zhizhong and Hoiem, Derek , journal=. Learning without Forgetting , year=
-
[30]
Wu, Yue and Chen, Yinpeng and Wang, Lijuan and Ye, Yuancheng and Liu, Zicheng and Guo, Yandong and Fu, Yun , title =. CVPR , year =
-
[31]
Continual lifelong learning with neural networks: A review , author=. Neural Networks , year=
- [32]
-
[33]
Nature Machine Intelligence , year =
Guanxiong Zeng and Yang Chen and Bo Cui and Shan Yu , title =. Nature Machine Intelligence , year =
-
[34]
Understanding the difficulty of training deep feedforward neural networks , author =. AISTATS , year =
-
[35]
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. ICCV , year =
-
[36]
ImageNet Classification with Deep Convolutional Neural Networks , year =
Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E , booktitle =. ImageNet Classification with Deep Convolutional Neural Networks , year =
-
[37]
B. Pfülb and A. Gepperth , booktitle=. A comprehensive, application-oriented study of catastrophic forgetting in
-
[38]
Continual Learning: A Review of Techniques, Challenges and Future Directions , author=. TAI , year=
-
[39]
Wang, Liyuan and Zhang, Xingxing and Su, Hang and Zhu, Jun , journal=
-
[40]
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning , author=. 2025 , journal=
work page 2025
-
[41]
Continual Learning: Applications and the Road Forward , author=. 2024 , journal=
work page 2024
-
[42]
iCaRL: Incremental classifier and representation learning , author=. CVPR , year=
- [43]
- [44]
- [45]
-
[46]
Memory-efficient incremental learning through feature adaptation , author=. ECCV , year=
- [47]
-
[48]
Podnet: Pooled outputs distillation for small-tasks incremental learning , author=. ECCV , year=
-
[49]
Learning a unified classifier incrementally via rebalancing , author=. CVPR , year=
-
[50]
Tiny ImageNet Visual Recognition Challenge , author=
-
[51]
Overcoming catastrophic forgetting with unlabeled data in the wild , author=. ICCV , year=
-
[52]
Lifelong gan: Continual learning for conditional image generation , author=. ICCV , year=
- [53]
-
[54]
Dark experience for general continual learning: a strong, simple baseline , author=. NeurIPS , year=
-
[55]
Functional Regularisation for Continual Learning with Gaussian Processes , author=. ICLR , year=
-
[56]
Continual deep learning by functional regularisation of memorable past , author=. NeurIPS , year=
-
[57]
Continual learning via sequential function-space variational inference , author=. ICML , year=
-
[58]
Memory replay gans: Learning to generate new categories without forgetting , author=. NeurIPS , year=
-
[59]
Learning multiple layers of features from tiny images , author=. 2009 , institution=
work page 2009
-
[60]
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. CVPR , year =
- [61]
-
[62]
Gradient Episodic Memory for Continual Learning , year =
Lopez-Paz, David and Ranzato, Marc Aurelio , booktitle =. Gradient Episodic Memory for Continual Learning , year =
-
[63]
Don't forget, there is more than forgetting: new metrics for Continual Learning , author=. 2018 , booktitle=
work page 2018
-
[64]
Aljundi, Rahaf and Chakravarty, Punarjay and Tuytelaars, Tinne , title =. CVPR , year =
-
[65]
Zhou, Da-Wei and Wang, Qi-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan and Liu, Ziwei , title =. TPAMI , year =
-
[66]
Deep Learning , author=
-
[67]
doi:10.5281/zenodo.18774099 , url =
Vahedifar, Mohammad Ali and Zhang, Qi and Iosifidis, Alexandros , title =. doi:10.5281/zenodo.18774099 , url =
-
[68]
Liu, Xialei and Masana, Marc and Herranz, Luis and Van de Weijer, Joost and López, Antonio M. and Bagdanov, Andrew D. , booktitle=
-
[69]
Aljundi, Rahaf and Babiloni, Francesca and Elhoseiny, Mohamed and Rohrbach, Marcus and Tuytelaars, Tinne , title =. ECCV , year =
-
[70]
Andrei A. Rusu and Neil C. Rabinowitz and Guillaume Desjardins and Hubert Soyer and James Kirkpatrick and Koray Kavukcuoglu and Razvan Pascanu and Raia Hadsell , year=
-
[71]
Jaehong Yoon and Eunho Yang and Jeongtae Lee and Sung Ju Hwang , year=
-
[72]
Douillard, Arthur and Ram\'e, Alexandre and Couairon, Guillaume and Cord, Matthieu , title =. CVPR , year =
-
[73]
Liu, Yaoyao and Schiele, Bernt and Sun, Qianru , title =. 2024 , booktitle =
work page 2024
-
[74]
Sun, Qing and Lyu, Fan and Shang, Fanhua and Feng, Wei and Wan, Liang , booktitle =
-
[75]
Da-Wei Zhou and Hai-Long Sun and Jingyi Ning and Han-Jia Ye and De-Chuan Zhan , year=
-
[76]
Haoxuan Qu and Hossein Rahmani and Li Xu and Bryan Williams and Jun Liu , year=
-
[77]
Khetarpal, Khimya and Riemer, Matthew and Rish, Irina and Precup, Doina , journal=
-
[78]
Zhou, Da-Wei and Ye, Han-Jia and Zhan, De-Chuan , title =. 2021 , booktitle =
work page 2021
-
[79]
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning , author=. ICLR , year=
-
[80]
and Ajanthan, Thalaiyasingam and Torr, Philip H
Chaudhry, Arslan and Dokania, Puneet K. and Ajanthan, Thalaiyasingam and Torr, Philip H. S. , title =. ECCV , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.