pith. sign in

arxiv: 2605.14309 · v2 · pith:UKKBYDNJnew · submitted 2026-05-14 · 💻 cs.CV · cs.AI· cs.LG

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

Pith reviewed 2026-05-19 16:46 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords machine unlearningvision-language modelsconcept decompositioninterpretable representationsmultimodal learningtargeted forgettingsparse representations
0
0 comments X

The pith

Decomposing visual representations into semantic concepts allows selective suppression of target knowledge in vision-language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework for unlearning in vision-language models that operates at the level of individual concepts rather than entire images. It builds a compact vocabulary of concepts from the data to be forgotten using a multimodal large language model, then expresses each image's visual features as a sparse nonnegative combination of those concepts. Unlearning is cast as an optimization problem that removes only the target concepts while keeping the rest of the semantics inside the same image and the model's overall cross-modal abilities intact. Experiments on both in-domain and out-of-domain settings show more complete removal of unwanted concepts alongside better retention of non-target information than earlier image-level methods.

Core claim

By building a task-specific concept vocabulary from the forgetting set and decomposing visual representations into sparse nonnegative combinations of those concepts, unlearning reduces to concept-level optimization that selectively suppresses target concepts while preserving intra-instance non-target semantics and global cross-modal knowledge.

What carries the argument

Interpretable concept decomposition, in which visual representations are expressed as sparse nonnegative linear combinations of semantic concepts drawn from a multimodal LLM, serving as the explicit interface for targeted suppression.

If this is right

  • Target concepts are removed more thoroughly than with image- or instance-level unlearning.
  • Non-target semantics inside the same image stay largely unchanged after the operation.
  • Overall model performance on unrelated tasks remains competitive with prior VLM unlearning techniques.
  • Both in-domain and out-of-domain forgetting scenarios show gains from operating at the concept level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition approach could be adapted to text-only language models for concept-level forgetting.
  • Dynamic requests to forget new concepts might be handled by incrementally updating the vocabulary without retraining the full model.
  • If concept separability is imperfect in some domains, combining this method with small amounts of instance-level regularization could improve robustness.

Load-bearing premise

Visual features can be accurately expressed as sparse sums of distinct semantic concepts identified by a multimodal model from the examples to be forgotten.

What would settle it

After running the concept suppression step, feed the model images that contain only the target concept and check whether its outputs still include descriptions or predictions tied to that concept; persistent presence would falsify the claim of selective forgetting.

Figures

Figures reproduced from arXiv: 2605.14309 by Jing Lin, Junhao Dong, Li Xu, Piotr Koniusz, Shen Lin.

Figure 1
Figure 1. Figure 1: Motivation. The left part shows target concepts to be forgotten, while the right part shows remaining contextual concepts that should be preserved. ICED more effectively suppresses forgetting concepts and shifts the model’s focus toward non-target contextual concepts, indicating more selective and utility-preserving unlearning. intended for forgetting and irrelevant contextual information that should be pr… view at source ↗
Figure 2
Figure 2. Figure 2: An overview of our proposed ICED method. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Retrieval visualization for in-domain forgetting on ImageNet-1K. The target subgroup [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the top-5 concepts obtained by ICED. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study of the vocabulary size and sparsity regularization weight on CIFAR-10. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study of the balancing hyperparameters on CIFAR-10. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Retrieval visualization for out-of-domain forgetting on CIFAR-10. After forgetting the [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance level, making it difficult to precisely remove target knowledge without affecting unrelated semantics. This issue is especially pronounced since a single image often contains multiple entangled concepts, including both target concepts to be forgotten and contextual information that should be preserved. In this paper, we propose an interpretable concept-level unlearning framework for VLMs, which constructs a compact task-specific concept vocabulary from the forgetting set using a multimodal large language model. In addition to modality alignment, visual representations are decomposed into sparse, nonnegative combinations of semantic concepts, providing an explicit interface for fine-grained knowledge manipulation. Based on this decomposition, our method formulates unlearning as concept-level optimization, where target concepts are selectively suppressed while intra-instance non-target semantics and global cross-modal knowledge are preserved. Extensive experiments across both in-domain and out-of-domain forgetting settings demonstrate that our method enables more comprehensive target forgetting, better preserves non-target knowledge within the same image, and maintains competitive model utility compared with existing VLM unlearning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes ICED, a concept-level unlearning framework for Vision-Language Models. It uses a multimodal LLM to extract a compact concept vocabulary from the forgetting set, decomposes visual representations into sparse nonnegative linear combinations of these concepts, and performs unlearning via concept-level optimization that suppresses target concepts while aiming to preserve intra-image non-target semantics and overall model utility. Experiments are claimed to show superior target forgetting and preservation compared to existing VLM unlearning methods in both in-domain and out-of-domain settings.

Significance. If the core decomposition step faithfully isolates target concepts without residual entanglement, the method could offer a more interpretable and precise alternative to instance-level unlearning, addressing a key limitation in current VLM safety techniques. The LLM-assisted concept extraction and nonnegative sparse decomposition represent a potentially useful interface for fine-grained control, with possible broader implications for controllable forgetting in multimodal models.

major comments (2)
  1. [§3.2] §3.2 (Decomposition procedure): The central claim that visual representations decompose into sparse, nonnegative combinations of LLM-extracted concepts to enable selective suppression rests on the fidelity of this step. No quantitative validation (e.g., reconstruction error, concept isolation metrics, or ablation on sparsity parameter) is referenced to confirm that target concepts are cleanly separated from contextual semantics in entangled images; if the decomposition leaks non-target information, the reported gains in comprehensive forgetting and intra-image preservation cannot be attributed to the concept-level interface.
  2. [§4] §4 (Experiments): The abstract asserts 'extensive experiments' demonstrating more comprehensive forgetting and better preservation than baselines, yet no specific quantitative results, baseline comparisons, or ablation studies on the decomposition are cited in the provided summary. Without these (e.g., forgetting accuracy deltas or preservation scores in Table 2), the superiority claim over instance-level methods remains unsubstantiated and load-bearing for the paper's contribution.
minor comments (2)
  1. [§3] Notation for the nonnegative sparse decomposition (e.g., the exact form of the optimization objective combining reconstruction and sparsity) should be clarified with an explicit equation to aid reproducibility.
  2. The paper should include a limitations section discussing potential failure modes of the multimodal LLM concept extraction, such as incomplete coverage of target concepts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and insightful comments, which have helped us improve the clarity and rigor of our manuscript on ICED. We address each major comment below.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Decomposition procedure): The central claim that visual representations decompose into sparse, nonnegative combinations of LLM-extracted concepts to enable selective suppression rests on the fidelity of this step. No quantitative validation (e.g., reconstruction error, concept isolation metrics, or ablation on sparsity parameter) is referenced to confirm that target concepts are cleanly separated from contextual semantics in entangled images; if the decomposition leaks non-target information, the reported gains in comprehensive forgetting and intra-image preservation cannot be attributed to the concept-level interface.

    Authors: We thank the referee for highlighting this important point. While the original manuscript includes ablations on the sparsity parameter and qualitative visualizations demonstrating the decomposition's effectiveness in isolating concepts (see Section 3.2 and Appendix B), we acknowledge that explicit quantitative metrics such as reconstruction error and concept isolation scores were not reported. In the revised manuscript, we have added these quantitative validations in a new subsection of Section 3.2, including metrics showing low reconstruction errors and high concept isolation for target concepts with minimal leakage. This supports the attribution of performance gains to the concept-level interface. revision: yes

  2. Referee: [§4] §4 (Experiments): The abstract asserts 'extensive experiments' demonstrating more comprehensive forgetting and better preservation than baselines, yet no specific quantitative results, baseline comparisons, or ablation studies on the decomposition are cited in the provided summary. Without these (e.g., forgetting accuracy deltas or preservation scores in Table 2), the superiority claim over instance-level methods remains unsubstantiated and load-bearing for the paper's contribution.

    Authors: We note that the referee's summary provides a high-level overview of the paper. The full manuscript details the extensive experiments in Section 4, with specific quantitative results, baseline comparisons, and ablation studies presented in Tables 2 and 3, as well as in Section 4.3. To improve clarity, we have revised the abstract and the opening of Section 4 to more explicitly cite these quantitative findings and tables. revision: partial

Circularity Check

0 steps flagged

No circularity: method is an independent optimization procedure with external validation

full rationale

The paper introduces a concept-level unlearning framework that constructs a vocabulary via multimodal LLM and performs sparse nonnegative decomposition followed by selective suppression optimization. No equations or steps in the provided description reduce the claimed forgetting performance or preservation properties to fitted parameters by construction, self-citations that bear the central load, or renamings of known results. The derivation chain consists of a proposed procedure whose correctness is assessed via experiments rather than tautological re-expression of inputs. This is the expected self-contained outcome for a methods paper without load-bearing self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the unverified premise that MLLM-extracted concepts yield a faithful sparse nonnegative basis for visual features that can be selectively optimized without side effects.

axioms (1)
  • domain assumption Visual representations can be decomposed into sparse, nonnegative combinations of semantic concepts
    Invoked in the abstract as the foundation for providing an explicit interface for knowledge manipulation.

pith-pipeline@v0.9.0 · 5726 in / 1286 out tokens · 73170 ms · 2026-05-19T16:46:37.675872+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Learning transferable visual models from natural language supervi- sion,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervi- sion,” inInternational Conference on Machine Learning, 2021, pp. 8748–8763

  2. [2]

    Trustworthy ai: From principles to practices,

    B. Li, P. Qi, B. Liu, S. Di, J. Liu, J. Pei, J. Yi, and B. Zhou, “Trustworthy ai: From principles to practices,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–46, 2023

  3. [3]

    Allies teach better than enemies: Inverse adversaries for robust knowledge distillation,

    J. Dong, R. Z. Moayedi, Y .-S. Ong, and S.-M. Moosavi-Dezfooli, “Allies teach better than enemies: Inverse adversaries for robust knowledge distillation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

  4. [4]

    Tug-of-war no more: Harmonizing accuracy and robustness in vision-language models via stability-aware task vector merging,

    J. Dong, X. Qu, C. Zhang, S. Q. Rong, N. D. Thai, W. Pan, X. Li, T. Liu, P. Koniusz, and Y .-S. Ong, “Tug-of-war no more: Harmonizing accuracy and robustness in vision-language models via stability-aware task vector merging,” inThe Fourteenth International Conference on Learning Representations, 2026

  5. [5]

    Deepaw: A customized dnn watermarking scheme against unreliable participants,

    S. Lin, X. Zhang, X. Ma, X. Chen, and W. Susilo, “Deepaw: A customized dnn watermarking scheme against unreliable participants,”IEEE Transactions on Network Science and Engineer- ing, 2025

  6. [6]

    Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher,

    V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 7210–7217

  7. [7]

    Erm-ktp: Knowledge-level machine unlearning via knowledge transfer,

    S. Lin, X. Zhang, C. Chen, X. Chen, and W. Susilo, “Erm-ktp: Knowledge-level machine unlearning via knowledge transfer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 147–20 155

  8. [8]

    Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary,

    M. Chen, W. Gao, G. Liu, K. Peng, and C. Wang, “Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7766–7775

  9. [9]

    Gdr-gma: Machine unlearning via direction- rectified and magnitude-adjusted gradients,

    S. Lin, X. Zhang, W. Susilo, X. Chen, and J. Liu, “Gdr-gma: Machine unlearning via direction- rectified and magnitude-adjusted gradients,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 9087–9095

  10. [10]

    Learning to unlearn while retaining: Combating gradient conflicts in machine unlearning,

    G. Patel and Q. Qiu, “Learning to unlearn while retaining: Combating gradient conflicts in machine unlearning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4211–4221

  11. [11]

    Safe-clip: Removing nsfw concepts from vision-and-language models,

    S. Poppi, T. Poppi, F. Cocchi, M. Cornia, L. Baraldi, and R. Cucchiara, “Safe-clip: Removing nsfw concepts from vision-and-language models,” inEuropean Conference on Computer Vision, 2024, pp. 340–356

  12. [12]

    Multidelete for multimodal machine unlearning,

    J. Cheng and H. Amiri, “Multidelete for multimodal machine unlearning,” inEuropean Confer- ence on Computer Vision, 2024, pp. 165–184

  13. [13]

    Targeted unlearning with single layer unlearning gradient,

    Z. Cai, Y . Tan, and M. S. Asif, “Targeted unlearning with single layer unlearning gradient,” in International Conference on Machine Learning, 2025, pp. 6257–6290

  14. [14]

    Cliperase: Efficient unlearning of visual-textual associations in clip,

    T. Yang, L. Dai, X. Wang, M. Cheng, Y . Tian, and X. Zhang, “Cliperase: Efficient unlearning of visual-textual associations in clip,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025, pp. 30 438–30 452

  15. [15]

    Targeted forgetting of image subgroups in clip models,

    Z. Zhang, G. Liu, C. Fleming, R. R. Kompella, and C. Xu, “Targeted forgetting of image subgroups in clip models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 9870–9880

  16. [16]

    Machine unlearning via task simplex arithmetic,

    J. Dong, H. Zhu, Y . Zhang, X. Qu, Y .-S. Ong, and P. Koniusz, “Machine unlearning via task simplex arithmetic,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  17. [17]

    Text-to-concept (and back) via cross-model alignment,

    M. Moayeri, K. Rezaei, M. Sanjabi, and S. Feizi, “Text-to-concept (and back) via cross-model alignment,” inProceedings of the 40th International Conference on Machine Learning, 2023, pp. 25 037–25 060. 10

  18. [18]

    Post-hoc concept bottleneck models,

    M. Yuksekgonul, M. Wang, and J. Zou, “Post-hoc concept bottleneck models,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=nA5AZ8CEyow

  19. [19]

    Do vision-language pretrained models learn composable primitive concepts?

    T. Yun, U. Bhalla, E. Pavlick, and C. Sun, “Do vision-language pretrained models learn composable primitive concepts?”Transactions on Machine Learning Research, 2023. [Online]. Available: https://openreview.net/forum?id=YwNrPLjHSL

  20. [20]

    Stair: Learning sparse text and image representation in grounded tokens,

    C. Chen, B. Zhang, L. Cao, J. Shen, T. Gunter, A. Jose, A. Toshev, Y . Zheng, J. Shlens, R. Pang et al., “Stair: Learning sparse text and image representation in grounded tokens,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 15 079–15 094

  21. [21]

    Interpreting CLIP’s image representation via text-based decomposition,

    Y . Gandelsman, A. A. Efros, and J. Steinhardt, “Interpreting CLIP’s image representation via text-based decomposition,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=5Ca9sSzuDp

  22. [22]

    Information maximization perspective of or- thogonal matching pursuit with applications to explainable ai,

    A. Chattopadhyay, R. Pilgrim, and R. Vidal, “Information maximization perspective of or- thogonal matching pursuit with applications to explainable ai,” inProceedings of the 37th International Conference on Neural Information Processing Systems, 2023, pp. 2956–2990

  23. [23]

    Interpreting clip with sparse linear concept embeddings (splice),

    U. Bhalla, A. Oesterling, S. Srinivas, F. P. Calmon, and H. Lakkaraju, “Interpreting clip with sparse linear concept embeddings (splice),” inProceedings of the 38th International Conference on Neural Information Processing Systems, 2024, pp. 84 298–84 328

  24. [24]

    Robust superalignment: Weak-to- strong robustness generalization for vision-language models,

    J. Dong, C. Zhang, X. Qu, Z. Ma, P. Koniusz, and Y . S. Ong, “Robust superalignment: Weak-to- strong robustness generalization for vision-language models,”Advances in Neural Information Processing Systems, vol. 38, pp. 18 345–18 377, 2025

  25. [25]

    Zero-shot class unlearning in clip with synthetic samples,

    A. Kravets and V . P. Namboodiri, “Zero-shot class unlearning in clip with synthetic samples,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, 2025, pp. 6456–6464

  26. [26]

    Stabilizing modality gap & lowering gradient norms improve zero-shot adversarial robustness of vlms,

    J. Dong, P. Koniusz, X. Qu, and Y .-S. Ong, “Stabilizing modality gap & lowering gradient norms improve zero-shot adversarial robustness of vlms,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, 2025, pp. 236–247

  27. [27]

    BREEDS: benchmarks for subpopulation shift,

    S. Santurkar, D. Tsipras, and A. Madry, “BREEDS: benchmarks for subpopulation shift,” in9th International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=mQPBmvyAuk

  28. [28]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255

  29. [29]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, “Learning multiple layers of features from tiny images,”Master’s thesis, Univer- sity of Tront, 2009

  30. [30]

    Machine unlearning of features and labels,

    A. Warnecke, L. Pirch, C. Wressnegger, and K. Rieck, “Machine unlearning of features and labels,” inProceedings 2023 Network and Distributed System Security Symposium, 2023

  31. [31]

    Unrolling sgd: Understanding factors influencing machine unlearning,

    A. Thudi, G. Deza, V . Chandrasekaran, and N. Papernot, “Unrolling sgd: Understanding factors influencing machine unlearning,” in2022 IEEE 7th European Symposium on Security and Privacy, 2022, pp. 303–319

  32. [32]

    Eternal sunshine of the spotless net: Selective forgetting in deep networks,

    A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Selective forgetting in deep networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9304–9312

  33. [33]

    An information theoretic approach to machine unlearning,

    J. Foster, K. Fogarty, S. Schoepf, Z. Dugue, C. Öztireli, and A. Brintrup, “An information theoretic approach to machine unlearning,” 2024. [Online]. Available: https://arxiv.org/abs/2402.01401

  34. [34]

    Zero-shot machine unlearning,

    V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Zero-shot machine unlearning,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2345–2354, 2023

  35. [35]

    Food-101–mining discriminative components with random forests,

    L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101–mining discriminative components with random forests,” inEuropean Conference on Computer Vision, 2014, pp. 446–461

  36. [36]

    An analysis of single-layer networks in unsupervised feature learning,

    A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” inProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 215–223. 11

  37. [37]

    Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models,

    A. Barbu, D. Mayo, J. Alverio, W. Luo, C. Wang, D. Gutfreund, J. Tenenbaum, and B. Katz, “Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models,” inProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 9453–9463. A Additional Descriptions of ICED Algorithm 1 summ...