ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

Jing Lin; Junhao Dong; Li Xu; Piotr Koniusz; Shen Lin

arxiv: 2605.14309 · v2 · pith:UKKBYDNJnew · submitted 2026-05-14 · 💻 cs.CV · cs.AI· cs.LG

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

Shen Lin , Jing Lin , Junhao Dong , Piotr Koniusz , Li Xu This is my paper

Pith reviewed 2026-05-19 16:46 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords machine unlearningvision-language modelsconcept decompositioninterpretable representationsmultimodal learningtargeted forgettingsparse representations

0 comments

The pith

Decomposing visual representations into semantic concepts allows selective suppression of target knowledge in vision-language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework for unlearning in vision-language models that operates at the level of individual concepts rather than entire images. It builds a compact vocabulary of concepts from the data to be forgotten using a multimodal large language model, then expresses each image's visual features as a sparse nonnegative combination of those concepts. Unlearning is cast as an optimization problem that removes only the target concepts while keeping the rest of the semantics inside the same image and the model's overall cross-modal abilities intact. Experiments on both in-domain and out-of-domain settings show more complete removal of unwanted concepts alongside better retention of non-target information than earlier image-level methods.

Core claim

By building a task-specific concept vocabulary from the forgetting set and decomposing visual representations into sparse nonnegative combinations of those concepts, unlearning reduces to concept-level optimization that selectively suppresses target concepts while preserving intra-instance non-target semantics and global cross-modal knowledge.

What carries the argument

Interpretable concept decomposition, in which visual representations are expressed as sparse nonnegative linear combinations of semantic concepts drawn from a multimodal LLM, serving as the explicit interface for targeted suppression.

If this is right

Target concepts are removed more thoroughly than with image- or instance-level unlearning.
Non-target semantics inside the same image stay largely unchanged after the operation.
Overall model performance on unrelated tasks remains competitive with prior VLM unlearning techniques.
Both in-domain and out-of-domain forgetting scenarios show gains from operating at the concept level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition approach could be adapted to text-only language models for concept-level forgetting.
Dynamic requests to forget new concepts might be handled by incrementally updating the vocabulary without retraining the full model.
If concept separability is imperfect in some domains, combining this method with small amounts of instance-level regularization could improve robustness.

Load-bearing premise

Visual features can be accurately expressed as sparse sums of distinct semantic concepts identified by a multimodal model from the examples to be forgotten.

What would settle it

After running the concept suppression step, feed the model images that contain only the target concept and check whether its outputs still include descriptions or predictions tied to that concept; persistent presence would falsify the claim of selective forgetting.

Figures

Figures reproduced from arXiv: 2605.14309 by Jing Lin, Junhao Dong, Li Xu, Piotr Koniusz, Shen Lin.

**Figure 1.** Figure 1: Motivation. The left part shows target concepts to be forgotten, while the right part shows remaining contextual concepts that should be preserved. ICED more effectively suppresses forgetting concepts and shifts the model’s focus toward non-target contextual concepts, indicating more selective and utility-preserving unlearning. intended for forgetting and irrelevant contextual information that should be pr… view at source ↗

**Figure 2.** Figure 2: An overview of our proposed ICED method. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Retrieval visualization for in-domain forgetting on ImageNet-1K. The target subgroup [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of the top-5 concepts obtained by ICED. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation study of the vocabulary size and sparsity regularization weight on CIFAR-10. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study of the balancing hyperparameters on CIFAR-10. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Retrieval visualization for out-of-domain forgetting on CIFAR-10. After forgetting the [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance level, making it difficult to precisely remove target knowledge without affecting unrelated semantics. This issue is especially pronounced since a single image often contains multiple entangled concepts, including both target concepts to be forgotten and contextual information that should be preserved. In this paper, we propose an interpretable concept-level unlearning framework for VLMs, which constructs a compact task-specific concept vocabulary from the forgetting set using a multimodal large language model. In addition to modality alignment, visual representations are decomposed into sparse, nonnegative combinations of semantic concepts, providing an explicit interface for fine-grained knowledge manipulation. Based on this decomposition, our method formulates unlearning as concept-level optimization, where target concepts are selectively suppressed while intra-instance non-target semantics and global cross-modal knowledge are preserved. Extensive experiments across both in-domain and out-of-domain forgetting settings demonstrate that our method enables more comprehensive target forgetting, better preserves non-target knowledge within the same image, and maintains competitive model utility compared with existing VLM unlearning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shifts VLM unlearning to concepts via MLLM vocabulary plus nonnegative sparse decomposition, which targets a real entanglement problem but still needs the numbers to show it works better than instance baselines.

read the letter

The main point is that this work moves unlearning from whole images to individual concepts by first pulling a task-specific vocabulary out of the forgetting set with a multimodal LLM, then decomposing visual representations into sparse nonnegative combinations of those concepts so you can suppress targets while trying to keep the rest of the image intact. That pipeline is the actual novelty relative to the instance-level methods they cite. It directly tackles the practical issue that one photo often mixes things you want to forget with things you need to keep, and the decomposition step is meant to give an explicit, interpretable knob for selective suppression plus preservation of cross-modal knowledge. The framing is straightforward and the goal is useful for privacy or safety maintenance where blanket forgetting is too coarse. What they do well is name the entanglement limitation clearly and sketch an optimization that operates at the concept level rather than the instance level. The stress-test worry about whether the decomposition cleanly separates target from non-target concepts in entangled scenes is worth checking, but the abstract alone does not let us see the actual loss terms, sparsity enforcement, or ablations that would confirm fidelity. The claim of more comprehensive forgetting and better intra-image preservation is stated, yet no quantitative results, baselines, or validation details appear in the summary, so the central advantage is still unproven on the page. This is the kind of paper that would interest people working on VLM safety, regulatory compliance, or interpretable editing. A reader who already follows machine unlearning literature would get value from seeing the concrete pipeline even if the gains turn out modest. It deserves a serious referee because the idea is distinct enough and the problem it names is real; referees can press on the empirical side and the decomposition assumptions without the work being incoherent on its own terms. I would send it out for review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes ICED, a concept-level unlearning framework for Vision-Language Models. It uses a multimodal LLM to extract a compact concept vocabulary from the forgetting set, decomposes visual representations into sparse nonnegative linear combinations of these concepts, and performs unlearning via concept-level optimization that suppresses target concepts while aiming to preserve intra-image non-target semantics and overall model utility. Experiments are claimed to show superior target forgetting and preservation compared to existing VLM unlearning methods in both in-domain and out-of-domain settings.

Significance. If the core decomposition step faithfully isolates target concepts without residual entanglement, the method could offer a more interpretable and precise alternative to instance-level unlearning, addressing a key limitation in current VLM safety techniques. The LLM-assisted concept extraction and nonnegative sparse decomposition represent a potentially useful interface for fine-grained control, with possible broader implications for controllable forgetting in multimodal models.

major comments (2)

[§3.2] §3.2 (Decomposition procedure): The central claim that visual representations decompose into sparse, nonnegative combinations of LLM-extracted concepts to enable selective suppression rests on the fidelity of this step. No quantitative validation (e.g., reconstruction error, concept isolation metrics, or ablation on sparsity parameter) is referenced to confirm that target concepts are cleanly separated from contextual semantics in entangled images; if the decomposition leaks non-target information, the reported gains in comprehensive forgetting and intra-image preservation cannot be attributed to the concept-level interface.
[§4] §4 (Experiments): The abstract asserts 'extensive experiments' demonstrating more comprehensive forgetting and better preservation than baselines, yet no specific quantitative results, baseline comparisons, or ablation studies on the decomposition are cited in the provided summary. Without these (e.g., forgetting accuracy deltas or preservation scores in Table 2), the superiority claim over instance-level methods remains unsubstantiated and load-bearing for the paper's contribution.

minor comments (2)

[§3] Notation for the nonnegative sparse decomposition (e.g., the exact form of the optimization objective combining reconstruction and sparsity) should be clarified with an explicit equation to aid reproducibility.
The paper should include a limitations section discussing potential failure modes of the multimodal LLM concept extraction, such as incomplete coverage of target concepts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and insightful comments, which have helped us improve the clarity and rigor of our manuscript on ICED. We address each major comment below.

read point-by-point responses

Referee: [§3.2] §3.2 (Decomposition procedure): The central claim that visual representations decompose into sparse, nonnegative combinations of LLM-extracted concepts to enable selective suppression rests on the fidelity of this step. No quantitative validation (e.g., reconstruction error, concept isolation metrics, or ablation on sparsity parameter) is referenced to confirm that target concepts are cleanly separated from contextual semantics in entangled images; if the decomposition leaks non-target information, the reported gains in comprehensive forgetting and intra-image preservation cannot be attributed to the concept-level interface.

Authors: We thank the referee for highlighting this important point. While the original manuscript includes ablations on the sparsity parameter and qualitative visualizations demonstrating the decomposition's effectiveness in isolating concepts (see Section 3.2 and Appendix B), we acknowledge that explicit quantitative metrics such as reconstruction error and concept isolation scores were not reported. In the revised manuscript, we have added these quantitative validations in a new subsection of Section 3.2, including metrics showing low reconstruction errors and high concept isolation for target concepts with minimal leakage. This supports the attribution of performance gains to the concept-level interface. revision: yes
Referee: [§4] §4 (Experiments): The abstract asserts 'extensive experiments' demonstrating more comprehensive forgetting and better preservation than baselines, yet no specific quantitative results, baseline comparisons, or ablation studies on the decomposition are cited in the provided summary. Without these (e.g., forgetting accuracy deltas or preservation scores in Table 2), the superiority claim over instance-level methods remains unsubstantiated and load-bearing for the paper's contribution.

Authors: We note that the referee's summary provides a high-level overview of the paper. The full manuscript details the extensive experiments in Section 4, with specific quantitative results, baseline comparisons, and ablation studies presented in Tables 2 and 3, as well as in Section 4.3. To improve clarity, we have revised the abstract and the opening of Section 4 to more explicitly cite these quantitative findings and tables. revision: partial

Circularity Check

0 steps flagged

No circularity: method is an independent optimization procedure with external validation

full rationale

The paper introduces a concept-level unlearning framework that constructs a vocabulary via multimodal LLM and performs sparse nonnegative decomposition followed by selective suppression optimization. No equations or steps in the provided description reduce the claimed forgetting performance or preservation properties to fitted parameters by construction, self-citations that bear the central load, or renamings of known results. The derivation chain consists of a proposed procedure whose correctness is assessed via experiments rather than tautological re-expression of inputs. This is the expected self-contained outcome for a methods paper without load-bearing self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the unverified premise that MLLM-extracted concepts yield a faithful sparse nonnegative basis for visual features that can be selectively optimized without side effects.

axioms (1)

domain assumption Visual representations can be decomposed into sparse, nonnegative combinations of semantic concepts
Invoked in the abstract as the foundation for providing an explicit interface for knowledge manipulation.

pith-pipeline@v0.9.0 · 5726 in / 1286 out tokens · 73170 ms · 2026-05-19T16:46:37.675872+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Learning transferable visual models from natural language supervi- sion,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervi- sion,” inInternational Conference on Machine Learning, 2021, pp. 8748–8763

work page 2021
[2]

Trustworthy ai: From principles to practices,

B. Li, P. Qi, B. Liu, S. Di, J. Liu, J. Pei, J. Yi, and B. Zhou, “Trustworthy ai: From principles to practices,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–46, 2023

work page 2023
[3]

Allies teach better than enemies: Inverse adversaries for robust knowledge distillation,

J. Dong, R. Z. Moayedi, Y .-S. Ong, and S.-M. Moosavi-Dezfooli, “Allies teach better than enemies: Inverse adversaries for robust knowledge distillation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

work page 2026
[4]

Tug-of-war no more: Harmonizing accuracy and robustness in vision-language models via stability-aware task vector merging,

J. Dong, X. Qu, C. Zhang, S. Q. Rong, N. D. Thai, W. Pan, X. Li, T. Liu, P. Koniusz, and Y .-S. Ong, “Tug-of-war no more: Harmonizing accuracy and robustness in vision-language models via stability-aware task vector merging,” inThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[5]

Deepaw: A customized dnn watermarking scheme against unreliable participants,

S. Lin, X. Zhang, X. Ma, X. Chen, and W. Susilo, “Deepaw: A customized dnn watermarking scheme against unreliable participants,”IEEE Transactions on Network Science and Engineer- ing, 2025

work page 2025
[6]

Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher,

V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 7210–7217

work page 2023
[7]

Erm-ktp: Knowledge-level machine unlearning via knowledge transfer,

S. Lin, X. Zhang, C. Chen, X. Chen, and W. Susilo, “Erm-ktp: Knowledge-level machine unlearning via knowledge transfer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 147–20 155

work page 2023
[8]

Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary,

M. Chen, W. Gao, G. Liu, K. Peng, and C. Wang, “Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7766–7775

work page 2023
[9]

Gdr-gma: Machine unlearning via direction- rectified and magnitude-adjusted gradients,

S. Lin, X. Zhang, W. Susilo, X. Chen, and J. Liu, “Gdr-gma: Machine unlearning via direction- rectified and magnitude-adjusted gradients,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 9087–9095

work page 2024
[10]

Learning to unlearn while retaining: Combating gradient conflicts in machine unlearning,

G. Patel and Q. Qiu, “Learning to unlearn while retaining: Combating gradient conflicts in machine unlearning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4211–4221

work page 2025
[11]

Safe-clip: Removing nsfw concepts from vision-and-language models,

S. Poppi, T. Poppi, F. Cocchi, M. Cornia, L. Baraldi, and R. Cucchiara, “Safe-clip: Removing nsfw concepts from vision-and-language models,” inEuropean Conference on Computer Vision, 2024, pp. 340–356

work page 2024
[12]

Multidelete for multimodal machine unlearning,

J. Cheng and H. Amiri, “Multidelete for multimodal machine unlearning,” inEuropean Confer- ence on Computer Vision, 2024, pp. 165–184

work page 2024
[13]

Targeted unlearning with single layer unlearning gradient,

Z. Cai, Y . Tan, and M. S. Asif, “Targeted unlearning with single layer unlearning gradient,” in International Conference on Machine Learning, 2025, pp. 6257–6290

work page 2025
[14]

Cliperase: Efficient unlearning of visual-textual associations in clip,

T. Yang, L. Dai, X. Wang, M. Cheng, Y . Tian, and X. Zhang, “Cliperase: Efficient unlearning of visual-textual associations in clip,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025, pp. 30 438–30 452

work page 2025
[15]

Targeted forgetting of image subgroups in clip models,

Z. Zhang, G. Liu, C. Fleming, R. R. Kompella, and C. Xu, “Targeted forgetting of image subgroups in clip models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 9870–9880

work page 2025
[16]

Machine unlearning via task simplex arithmetic,

J. Dong, H. Zhu, Y . Zhang, X. Qu, Y .-S. Ong, and P. Koniusz, “Machine unlearning via task simplex arithmetic,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[17]

Text-to-concept (and back) via cross-model alignment,

M. Moayeri, K. Rezaei, M. Sanjabi, and S. Feizi, “Text-to-concept (and back) via cross-model alignment,” inProceedings of the 40th International Conference on Machine Learning, 2023, pp. 25 037–25 060. 10

work page 2023
[18]

Post-hoc concept bottleneck models,

M. Yuksekgonul, M. Wang, and J. Zou, “Post-hoc concept bottleneck models,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=nA5AZ8CEyow

work page 2023
[19]

Do vision-language pretrained models learn composable primitive concepts?

T. Yun, U. Bhalla, E. Pavlick, and C. Sun, “Do vision-language pretrained models learn composable primitive concepts?”Transactions on Machine Learning Research, 2023. [Online]. Available: https://openreview.net/forum?id=YwNrPLjHSL

work page 2023
[20]

Stair: Learning sparse text and image representation in grounded tokens,

C. Chen, B. Zhang, L. Cao, J. Shen, T. Gunter, A. Jose, A. Toshev, Y . Zheng, J. Shlens, R. Pang et al., “Stair: Learning sparse text and image representation in grounded tokens,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 15 079–15 094

work page 2023
[21]

Interpreting CLIP’s image representation via text-based decomposition,

Y . Gandelsman, A. A. Efros, and J. Steinhardt, “Interpreting CLIP’s image representation via text-based decomposition,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=5Ca9sSzuDp

work page 2024
[22]

Information maximization perspective of or- thogonal matching pursuit with applications to explainable ai,

A. Chattopadhyay, R. Pilgrim, and R. Vidal, “Information maximization perspective of or- thogonal matching pursuit with applications to explainable ai,” inProceedings of the 37th International Conference on Neural Information Processing Systems, 2023, pp. 2956–2990

work page 2023
[23]

Interpreting clip with sparse linear concept embeddings (splice),

U. Bhalla, A. Oesterling, S. Srinivas, F. P. Calmon, and H. Lakkaraju, “Interpreting clip with sparse linear concept embeddings (splice),” inProceedings of the 38th International Conference on Neural Information Processing Systems, 2024, pp. 84 298–84 328

work page 2024
[24]

Robust superalignment: Weak-to- strong robustness generalization for vision-language models,

J. Dong, C. Zhang, X. Qu, Z. Ma, P. Koniusz, and Y . S. Ong, “Robust superalignment: Weak-to- strong robustness generalization for vision-language models,”Advances in Neural Information Processing Systems, vol. 38, pp. 18 345–18 377, 2025

work page 2025
[25]

Zero-shot class unlearning in clip with synthetic samples,

A. Kravets and V . P. Namboodiri, “Zero-shot class unlearning in clip with synthetic samples,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, 2025, pp. 6456–6464

work page 2025
[26]

Stabilizing modality gap & lowering gradient norms improve zero-shot adversarial robustness of vlms,

J. Dong, P. Koniusz, X. Qu, and Y .-S. Ong, “Stabilizing modality gap & lowering gradient norms improve zero-shot adversarial robustness of vlms,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, 2025, pp. 236–247

work page 2025
[27]

BREEDS: benchmarks for subpopulation shift,

S. Santurkar, D. Tsipras, and A. Madry, “BREEDS: benchmarks for subpopulation shift,” in9th International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=mQPBmvyAuk

work page 2021
[28]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255

work page 2009
[29]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,”Master’s thesis, Univer- sity of Tront, 2009

work page 2009
[30]

Machine unlearning of features and labels,

A. Warnecke, L. Pirch, C. Wressnegger, and K. Rieck, “Machine unlearning of features and labels,” inProceedings 2023 Network and Distributed System Security Symposium, 2023

work page 2023
[31]

Unrolling sgd: Understanding factors influencing machine unlearning,

A. Thudi, G. Deza, V . Chandrasekaran, and N. Papernot, “Unrolling sgd: Understanding factors influencing machine unlearning,” in2022 IEEE 7th European Symposium on Security and Privacy, 2022, pp. 303–319

work page 2022
[32]

Eternal sunshine of the spotless net: Selective forgetting in deep networks,

A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Selective forgetting in deep networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9304–9312

work page 2020
[33]

An information theoretic approach to machine unlearning,

J. Foster, K. Fogarty, S. Schoepf, Z. Dugue, C. Öztireli, and A. Brintrup, “An information theoretic approach to machine unlearning,” 2024. [Online]. Available: https://arxiv.org/abs/2402.01401

work page arXiv 2024
[34]

Zero-shot machine unlearning,

V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Zero-shot machine unlearning,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2345–2354, 2023

work page 2023
[35]

Food-101–mining discriminative components with random forests,

L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101–mining discriminative components with random forests,” inEuropean Conference on Computer Vision, 2014, pp. 446–461

work page 2014
[36]

An analysis of single-layer networks in unsupervised feature learning,

A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” inProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 215–223. 11

work page 2011
[37]

Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models,

A. Barbu, D. Mayo, J. Alverio, W. Luo, C. Wang, D. Gutfreund, J. Tenenbaum, and B. Katz, “Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models,” inProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 9453–9463. A Additional Descriptions of ICED Algorithm 1 summ...

work page arXiv 2019

[1] [1]

Learning transferable visual models from natural language supervi- sion,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervi- sion,” inInternational Conference on Machine Learning, 2021, pp. 8748–8763

work page 2021

[2] [2]

Trustworthy ai: From principles to practices,

B. Li, P. Qi, B. Liu, S. Di, J. Liu, J. Pei, J. Yi, and B. Zhou, “Trustworthy ai: From principles to practices,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–46, 2023

work page 2023

[3] [3]

Allies teach better than enemies: Inverse adversaries for robust knowledge distillation,

J. Dong, R. Z. Moayedi, Y .-S. Ong, and S.-M. Moosavi-Dezfooli, “Allies teach better than enemies: Inverse adversaries for robust knowledge distillation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

work page 2026

[4] [4]

Tug-of-war no more: Harmonizing accuracy and robustness in vision-language models via stability-aware task vector merging,

J. Dong, X. Qu, C. Zhang, S. Q. Rong, N. D. Thai, W. Pan, X. Li, T. Liu, P. Koniusz, and Y .-S. Ong, “Tug-of-war no more: Harmonizing accuracy and robustness in vision-language models via stability-aware task vector merging,” inThe Fourteenth International Conference on Learning Representations, 2026

work page 2026

[5] [5]

Deepaw: A customized dnn watermarking scheme against unreliable participants,

S. Lin, X. Zhang, X. Ma, X. Chen, and W. Susilo, “Deepaw: A customized dnn watermarking scheme against unreliable participants,”IEEE Transactions on Network Science and Engineer- ing, 2025

work page 2025

[6] [6]

Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher,

V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 7210–7217

work page 2023

[7] [7]

Erm-ktp: Knowledge-level machine unlearning via knowledge transfer,

S. Lin, X. Zhang, C. Chen, X. Chen, and W. Susilo, “Erm-ktp: Knowledge-level machine unlearning via knowledge transfer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 147–20 155

work page 2023

[8] [8]

Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary,

M. Chen, W. Gao, G. Liu, K. Peng, and C. Wang, “Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7766–7775

work page 2023

[9] [9]

Gdr-gma: Machine unlearning via direction- rectified and magnitude-adjusted gradients,

S. Lin, X. Zhang, W. Susilo, X. Chen, and J. Liu, “Gdr-gma: Machine unlearning via direction- rectified and magnitude-adjusted gradients,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 9087–9095

work page 2024

[10] [10]

Learning to unlearn while retaining: Combating gradient conflicts in machine unlearning,

G. Patel and Q. Qiu, “Learning to unlearn while retaining: Combating gradient conflicts in machine unlearning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4211–4221

work page 2025

[11] [11]

Safe-clip: Removing nsfw concepts from vision-and-language models,

S. Poppi, T. Poppi, F. Cocchi, M. Cornia, L. Baraldi, and R. Cucchiara, “Safe-clip: Removing nsfw concepts from vision-and-language models,” inEuropean Conference on Computer Vision, 2024, pp. 340–356

work page 2024

[12] [12]

Multidelete for multimodal machine unlearning,

J. Cheng and H. Amiri, “Multidelete for multimodal machine unlearning,” inEuropean Confer- ence on Computer Vision, 2024, pp. 165–184

work page 2024

[13] [13]

Targeted unlearning with single layer unlearning gradient,

Z. Cai, Y . Tan, and M. S. Asif, “Targeted unlearning with single layer unlearning gradient,” in International Conference on Machine Learning, 2025, pp. 6257–6290

work page 2025

[14] [14]

Cliperase: Efficient unlearning of visual-textual associations in clip,

T. Yang, L. Dai, X. Wang, M. Cheng, Y . Tian, and X. Zhang, “Cliperase: Efficient unlearning of visual-textual associations in clip,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025, pp. 30 438–30 452

work page 2025

[15] [15]

Targeted forgetting of image subgroups in clip models,

Z. Zhang, G. Liu, C. Fleming, R. R. Kompella, and C. Xu, “Targeted forgetting of image subgroups in clip models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 9870–9880

work page 2025

[16] [16]

Machine unlearning via task simplex arithmetic,

J. Dong, H. Zhu, Y . Zhang, X. Qu, Y .-S. Ong, and P. Koniusz, “Machine unlearning via task simplex arithmetic,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[17] [17]

Text-to-concept (and back) via cross-model alignment,

M. Moayeri, K. Rezaei, M. Sanjabi, and S. Feizi, “Text-to-concept (and back) via cross-model alignment,” inProceedings of the 40th International Conference on Machine Learning, 2023, pp. 25 037–25 060. 10

work page 2023

[18] [18]

Post-hoc concept bottleneck models,

M. Yuksekgonul, M. Wang, and J. Zou, “Post-hoc concept bottleneck models,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=nA5AZ8CEyow

work page 2023

[19] [19]

Do vision-language pretrained models learn composable primitive concepts?

T. Yun, U. Bhalla, E. Pavlick, and C. Sun, “Do vision-language pretrained models learn composable primitive concepts?”Transactions on Machine Learning Research, 2023. [Online]. Available: https://openreview.net/forum?id=YwNrPLjHSL

work page 2023

[20] [20]

Stair: Learning sparse text and image representation in grounded tokens,

C. Chen, B. Zhang, L. Cao, J. Shen, T. Gunter, A. Jose, A. Toshev, Y . Zheng, J. Shlens, R. Pang et al., “Stair: Learning sparse text and image representation in grounded tokens,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 15 079–15 094

work page 2023

[21] [21]

Interpreting CLIP’s image representation via text-based decomposition,

Y . Gandelsman, A. A. Efros, and J. Steinhardt, “Interpreting CLIP’s image representation via text-based decomposition,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=5Ca9sSzuDp

work page 2024

[22] [22]

Information maximization perspective of or- thogonal matching pursuit with applications to explainable ai,

A. Chattopadhyay, R. Pilgrim, and R. Vidal, “Information maximization perspective of or- thogonal matching pursuit with applications to explainable ai,” inProceedings of the 37th International Conference on Neural Information Processing Systems, 2023, pp. 2956–2990

work page 2023

[23] [23]

Interpreting clip with sparse linear concept embeddings (splice),

U. Bhalla, A. Oesterling, S. Srinivas, F. P. Calmon, and H. Lakkaraju, “Interpreting clip with sparse linear concept embeddings (splice),” inProceedings of the 38th International Conference on Neural Information Processing Systems, 2024, pp. 84 298–84 328

work page 2024

[24] [24]

Robust superalignment: Weak-to- strong robustness generalization for vision-language models,

J. Dong, C. Zhang, X. Qu, Z. Ma, P. Koniusz, and Y . S. Ong, “Robust superalignment: Weak-to- strong robustness generalization for vision-language models,”Advances in Neural Information Processing Systems, vol. 38, pp. 18 345–18 377, 2025

work page 2025

[25] [25]

Zero-shot class unlearning in clip with synthetic samples,

A. Kravets and V . P. Namboodiri, “Zero-shot class unlearning in clip with synthetic samples,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, 2025, pp. 6456–6464

work page 2025

[26] [26]

Stabilizing modality gap & lowering gradient norms improve zero-shot adversarial robustness of vlms,

J. Dong, P. Koniusz, X. Qu, and Y .-S. Ong, “Stabilizing modality gap & lowering gradient norms improve zero-shot adversarial robustness of vlms,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, 2025, pp. 236–247

work page 2025

[27] [27]

BREEDS: benchmarks for subpopulation shift,

S. Santurkar, D. Tsipras, and A. Madry, “BREEDS: benchmarks for subpopulation shift,” in9th International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=mQPBmvyAuk

work page 2021

[28] [28]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255

work page 2009

[29] [29]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,”Master’s thesis, Univer- sity of Tront, 2009

work page 2009

[30] [30]

Machine unlearning of features and labels,

A. Warnecke, L. Pirch, C. Wressnegger, and K. Rieck, “Machine unlearning of features and labels,” inProceedings 2023 Network and Distributed System Security Symposium, 2023

work page 2023

[31] [31]

Unrolling sgd: Understanding factors influencing machine unlearning,

A. Thudi, G. Deza, V . Chandrasekaran, and N. Papernot, “Unrolling sgd: Understanding factors influencing machine unlearning,” in2022 IEEE 7th European Symposium on Security and Privacy, 2022, pp. 303–319

work page 2022

[32] [32]

Eternal sunshine of the spotless net: Selective forgetting in deep networks,

A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Selective forgetting in deep networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9304–9312

work page 2020

[33] [33]

An information theoretic approach to machine unlearning,

J. Foster, K. Fogarty, S. Schoepf, Z. Dugue, C. Öztireli, and A. Brintrup, “An information theoretic approach to machine unlearning,” 2024. [Online]. Available: https://arxiv.org/abs/2402.01401

work page arXiv 2024

[34] [34]

Zero-shot machine unlearning,

V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Zero-shot machine unlearning,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2345–2354, 2023

work page 2023

[35] [35]

Food-101–mining discriminative components with random forests,

L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101–mining discriminative components with random forests,” inEuropean Conference on Computer Vision, 2014, pp. 446–461

work page 2014

[36] [36]

An analysis of single-layer networks in unsupervised feature learning,

A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” inProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 215–223. 11

work page 2011

[37] [37]

Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models,

A. Barbu, D. Mayo, J. Alverio, W. Luo, C. Wang, D. Gutfreund, J. Tenenbaum, and B. Katz, “Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models,” inProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 9453–9463. A Additional Descriptions of ICED Algorithm 1 summ...

work page arXiv 2019