POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse

arxiv: 2511.19339 · v2 · submitted 2025-11-24 · 💻 cs.CV

POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse

Anjie Le , Can Peng , Yuyuan Liu , J. Alison Noble This is my paper

Pith reviewed 2026-05-17 05:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords machine unlearningneural collapseequiangular tight framerepresentation learningcomputer visionforgetting operator

0 comments p. Extension

The pith

Orthogonal projection of neural collapse ETF structures yields a provably optimal operator for unlearning representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for machine unlearning that targets internal feature representations in vision models instead of only the classifier head. It sets up a three-way balance among effective removal of unwanted data, faithful retention of other knowledge, and preservation of class separation. Using neural collapse, the work shows that an orthogonal projection applied to the simplex equiangular tight frame of collapsed representations stays an ETF in lower dimensions and therefore acts as an optimal forgetting operator. The resulting POUR method comes in closed-form projection and distillation versions, includes a new Representation Unlearning Score, and is tested on CIFAR-10/100 and PathMNIST.

Core claim

Building on neural collapse theory, the orthogonal projection of a simplex Equiangular Tight Frame remains an ETF in a lower dimensional space, yielding a provably optimal forgetting operator that realizes the required trade-off between forgetting efficacy, retention fidelity, and class separation.

What carries the argument

The orthogonal projection of a simplex Equiangular Tight Frame (ETF) that remains an ETF after reduction to lower dimensions, used as the forgetting operator.

If this is right

Representation-level unlearning achieves closed-form solutions without full retraining.
The three-way trade-off among forgetting, retention, and separation is satisfied by construction.
The Representation Unlearning Score provides a direct metric for feature-level performance.
POUR variants outperform prior unlearning methods on both classification accuracy and representation metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar geometric projections could be tested in domains where neural collapse has been observed beyond vision.
The approach points toward using representation geometry for targeted data removal in privacy-sensitive applications.
Adaptive choice of projection dimension might handle different numbers of classes to forget in a single model.

Load-bearing premise

Trained model representations have reached neural collapse so the ETF projection serves as an optimal forgetting operator without harming retention or separation.

What would settle it

After the projection step, representations of the classes to forget remain linearly separable from retained classes or accuracy on retained classes drops substantially.

Figures

Figures reproduced from arXiv: 2511.19339 by Anjie Le, Can Peng, J. Alison Noble, Yuyuan Liu.

**Figure 1.** Figure 1: Grad-CAM visualization on PathMNIST before and after unlearning. Each row shows a tissue class. After applying POUR on the adipose class, its Grad-CAM signal vanishes, while the retained classes (debris, lymphocytes, mucus) preserve clear and distinct attention patterns. A growing literature on machine unlearning has explored how to make models forget a specific class, subset, or concept without retrainin… view at source ↗

**Figure 2.** Figure 2: C=4 simplex ETF. One vertex v1 along +z; the other three lie at z = −1/3 with equal 120◦ separation in xy. Orthogonal projection onto v ⊥ 1 (z = 0) yields an equilateral triangle formed by u2, u3, u4. 3.2. ETF Stability under Projection. The second property concerns the robustness of ETF geometry under dimensionality reduction. Geometrically, removing one vertex of a regular simplex and projecting the rem… view at source ↗

**Figure 3.** Figure 3: Overview of the POUR framework. During training, the unlearning module applies an orthogonal projection operator PA on the feature space of the original model to remove the contribution of the forgotten class A. The unlearned feature extractor θ ′ is optimized via an L2 loss to align its projected features with those of the original extractor θ using the unlearning data. This alignment preserves the Neural… view at source ↗

**Figure 4.** Figure 4: b. Therefore, supervision on the forget set is lower and therefore forgetting is harder, as discussed in Section 2.3. Methods such as gradient ascent and random labels largely disrupt the structure of the retained classes. Boundary Shrink and Boundary Expand, though among the stronger baselines, fail to reproduce the structure of the retrained model representations as effectively as POUR. 5.3. Unlearning o… view at source ↗

**Figure 5.** Figure 5: Classifier weight angle distributions. The green dashed line denotes the mean pairwise angle, while the red dashed line marks the ideal NC angle. The closeness between the two reflects how well the classifier aligns with NC geometry at convergence. position and suppresses discriminatory directions associated with the forget class. Boundary Shrink and Boundary Expand [6] perform local decision-boundary adj… view at source ↗

**Figure 1.** Figure 1: Grad-CAM visualization on PathMNIST before and after unlearning. Each row shows a tissue class. Only after POUR unlearning, the Grad-CAM signal vanishes. Geometrically grounded forgetting. Several methods exploit the geometry of learned representations. Kodge et al. [22] proposed a gradient-free method that explicitly computes class-specific subspaces via singular value decomposition and suppresses disc… view at source ↗

read the original abstract

In computer vision, machine unlearning aims to remove the influence of specific visual concepts or training images without retraining from scratch. Studies show that existing approaches often modify the classifier while leaving internal representations intact, resulting in incomplete forgetting. In this work, we extend the notion of unlearning to the representation level, deriving a three-term interplay between forgetting efficacy, retention fidelity, and class separation. Building on Neural Collapse theory, we show that the orthogonal projection of a simplex Equiangular Tight Frame (ETF) remains an ETF in a lower dimensional space, yielding a provably optimal forgetting operator. We further introduce the Representation Unlearning Score (RUS) to quantify representation-level forgetting and retention fidelity. Building on this, we introduce POUR (Provably Optimal Unlearning of Representations), a geometric projection method with closed-form (POUR-P) and a feature-level unlearning variant under a distillation scheme (POUR-D). Experiments on CIFAR-10/100 and PathMNIST demonstrate that POUR achieves effective unlearning while preserving retained knowledge, outperforming state-of-the-art unlearning methods on both classification-level and representation-level metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

POUR uses neural collapse to define a projection-based forgetting operator and adds the RUS metric, but the optimality guarantee rests on exact ETF collapse that real models only approximate.

read the letter

The one or two things to know about this paper are that it proposes a method called POUR for unlearning at the representation level in computer vision models by leveraging neural collapse theory, specifically using orthogonal projections that preserve the equiangular tight frame properties of simplex ETFs, and it introduces a new metric, the Representation Unlearning Score or RUS, to evaluate how well forgetting and retention are balanced at the feature level. What is actually new here is the application of the projection property from neural collapse to create a forgetting operator. They show that projecting the representations in a way that reduces the dimension while keeping the ETF structure allows for a provably optimal balance between forgetting specific concepts, keeping the performance on retained classes, and maintaining class separation. This leads to two implementations: POUR-P which is a closed-form geometric projection, and POUR-D which uses a distillation scheme at the feature level. The RUS metric seems useful because it directly quantifies representation-level changes rather than relying solely on classifier accuracy drops. The paper does well in providing experimental results on standard benchmarks like CIFAR-10 and CIFAR-100, as well as the medical imaging dataset PathMNIST. It claims to outperform state-of-the-art unlearning methods on both classification metrics and their representation-level ones, which suggests the approach has practical merit in achieving more complete forgetting without as much damage to the model's overall utility. Where the soft spots are is mainly in the reliance on the neural collapse assumption. The optimality is proven under the condition that the model has exactly collapsed to the ideal simplex ETF geometry, with equal norms and angles. However, in real training on finite datasets, this is only approximate, so the guarantees might weaken. The stress test note points this out correctly, and the paper would benefit from either showing how close their trained models are to ideal collapse or providing some robustness analysis for when the class means deviate. The three-term interplay is mentioned but without seeing the full derivations, it's hard to assess if there are any hidden assumptions or if the method truly avoids fitting to the unlearning data. The soundness score from the abstract review is low for a reason, but assuming the full paper fills in the details, this is still a minor to moderate concern rather than a fatal one. This work is primarily for researchers in machine unlearning, especially those interested in representation-level interventions for privacy or bias removal in vision tasks. A reader who follows neural collapse literature or works on geometric deep learning would find the extension interesting. It has sufficient novelty in the operator construction and enough empirical backing to deserve a serious referee, even if some qualifications on the theoretical guarantees are needed. I would recommend sending this to peer review.

Referee Report

3 major / 2 minor

Summary. The paper proposes POUR for representation-level unlearning in computer vision. It extends unlearning beyond classifiers to internal representations, derives a three-term interplay among forgetting efficacy, retention fidelity, and class separation, and shows that the orthogonal projection of a simplex Equiangular Tight Frame (ETF) remains an ETF in lower dimensions. This property is used to construct a provably optimal forgetting operator. The work introduces the Representation Unlearning Score (RUS) and presents two instantiations: a closed-form geometric projection (POUR-P) and a distillation-based feature-level variant (POUR-D). Experiments on CIFAR-10/100 and PathMNIST report that POUR outperforms prior unlearning methods on both classification accuracy and representation-level metrics.

Significance. If the ETF projection property and its optimality transfer to the approximate neural collapse observed in practice, the work supplies a geometrically interpretable, closed-form route to representation unlearning that explicitly balances the three-term trade-off. The introduction of RUS and the distinction between POUR-P and POUR-D are concrete contributions that could be adopted by subsequent studies. The empirical gains on standard benchmarks are encouraging, yet the overall significance hinges on whether the theoretical guarantee remains meaningful once the exact-simplex assumption is relaxed.

major comments (3)

[Abstract and §3] Abstract and §3 (Theoretical Analysis): The central claim that orthogonal projection of a simplex ETF yields a provably optimal forgetting operator balancing the three-term interplay is asserted without derivation details, error bounds, or verification that the projected frame satisfies the equal-angle and equal-norm conditions required for optimality. This step is load-bearing for the 'provably optimal' label.
[§2 and §4] §2 (Neural Collapse Background) and §4 (Method): The optimality derivation assumes exact neural collapse so that class means form an ideal simplex ETF (origin-centered, equal norms, equal inner products). On finite CIFAR-10/100 training the Gram matrix of empirical class means deviates from this geometry; the manuscript provides no quantitative bound on how such deviations degrade the separation or retention guarantees of the projected operator.
[§4.1] §4.1 (POUR-P Construction): The three-term interplay is introduced as the objective that the projection optimizes, yet it is not shown to reduce to a parameter-free projection; the mapping from the interplay coefficients to the choice of projection subspace appears to require additional fitting or hyper-parameters not stated in the closed-form claim.

minor comments (2)

[§5] The definition and normalization of the Representation Unlearning Score (RUS) should be given explicitly with equations relating it to the three terms; its numerical range and invariance properties are currently unclear from the abstract.
[§6] Figure captions and experimental tables should report the precise unlearning ratio (fraction of classes or samples removed) and the number of retained classes for each dataset to allow direct comparison with prior work.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our work. We address each of the major comments point by point below, providing clarifications and indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Theoretical Analysis): The central claim that orthogonal projection of a simplex ETF yields a provably optimal forgetting operator balancing the three-term interplay is asserted without derivation details, error bounds, or verification that the projected frame satisfies the equal-angle and equal-norm conditions required for optimality. This step is load-bearing for the 'provably optimal' label.

Authors: We appreciate this observation. Section 3 of the manuscript derives the key property: the orthogonal projection of a simplex ETF onto the subspace spanned by a subset of its vectors yields another simplex ETF in the reduced dimension. The proof explicitly verifies that the projected frame maintains equal vector norms and equal pairwise angles (inner products of -1/(k-1) for k remaining classes). This property ensures the projection optimally balances the three-term interplay—complete forgetting of the unlearned class by nulling its direction, while preserving retention fidelity and class separation through the lower-dimensional ETF structure. Regarding error bounds, we will add a new paragraph in the revised §3 discussing the sensitivity to small perturbations from the ideal ETF geometry. revision: yes
Referee: [§2 and §4] §2 (Neural Collapse Background) and §4 (Method): The optimality derivation assumes exact neural collapse so that class means form an ideal simplex ETF (origin-centered, equal norms, equal inner products). On finite CIFAR-10/100 training the Gram matrix of empirical class means deviates from this geometry; the manuscript provides no quantitative bound on how such deviations degrade the separation or retention guarantees of the projected operator.

Authors: We acknowledge that the theoretical analysis assumes exact neural collapse for the provable optimality. In practice, as documented in the neural collapse literature, training on CIFAR-10 and CIFAR-100 leads to approximate collapse with minor deviations in the class-mean Gram matrix. Our empirical results on these datasets show that POUR-P and POUR-D still outperform prior methods on both accuracy and the proposed RUS metric. To address the lack of quantitative bounds, we will include in the revision an analysis measuring the deviation (e.g., via the distance to the ideal ETF Gram matrix) and its correlation with unlearning performance across the experiments. revision: yes
Referee: [§4.1] §4.1 (POUR-P Construction): The three-term interplay is introduced as the objective that the projection optimizes, yet it is not shown to reduce to a parameter-free projection; the mapping from the interplay coefficients to the choice of projection subspace appears to require additional fitting or hyper-parameters not stated in the closed-form claim.

Authors: The construction in §4.1 defines the projection subspace as the span of the class means corresponding to the retained classes, which is uniquely determined by the data and the unlearning request. This choice directly optimizes the three-term objective without tunable coefficients or hyperparameters because the ETF preservation guarantees the balance: the projection removes the unlearned class contribution (forgetting) while the resulting frame ensures equal separation and fidelity for retained classes. The closed-form nature comes from computing the orthogonal projector matrix explicitly from the retained means. We will revise §4.1 to include the explicit mapping and equations demonstrating the reduction to this parameter-free form. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation builds on external Neural Collapse theory and independent ETF projection math

full rationale

The paper's core step shows that orthogonal projection preserves the simplex ETF property under Neural Collapse assumptions, then uses this to define a forgetting operator balancing forgetting, retention, and separation. This is a direct mathematical claim resting on established NC literature (not self-citation) and ETF geometry, without reducing to fitted parameters, self-definitional loops, or load-bearing prior work by the same authors. The RUS metric and POUR variants are downstream applications rather than circular inputs. The derivation remains self-contained against external ETF benchmarks and does not rename known results or smuggle ansatzes via citation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the domain assumption that neural collapse occurs in the trained models and on the mathematical property that orthogonal projections preserve ETF structure. No free parameters are mentioned. The new score RUS and the POUR variants are introduced constructs without independent evidence outside the paper.

axioms (2)

domain assumption Neural collapse theory holds for the trained vision models under consideration
The derivation of the optimal forgetting operator is built directly on the simplex ETF geometry that neural collapse predicts.
standard math Orthogonal projection of a simplex ETF remains an ETF in lower dimensions
This preservation property is invoked to establish the provably optimal character of the forgetting operator.

invented entities (1)

Representation Unlearning Score (RUS) no independent evidence
purpose: To quantify the three-term interplay of forgetting efficacy, retention fidelity, and class separation at the representation level
New metric defined in the paper to evaluate representation-level unlearning.

pith-pipeline@v0.9.0 · 5503 in / 1562 out tokens · 60975 ms · 2026-05-17T05:48:54.049416+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the orthogonal projection of a simplex Equiangular Tight Frame (ETF) remains an ETF in a lower dimensional space, yielding a provably optimal forgetting operator
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Building on Neural Collapse theory, we show that the orthogonal projection of a simplex Equiangular Tight Frame (ETF) remains an ETF

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

Choquette-Choo, Haoche Jia, Adeline Travers, Benjamin Zhang, David Lie, and Nicolas Papernot

Laurent Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Haoche Jia, Adeline Travers, Benjamin Zhang, David Lie, and Nicolas Papernot. Machine unlearn- ing. InIEEE Symposium on Security and Privacy (SP), pages 141–159, 2021. 1, 8

work page 2021
[2]

Deep unlearn: Benchmarking machine unlearning.arXiv preprint arXiv:2410.01276, 2024

Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Haddadi. Deep unlearn: Benchmarking machine unlearning.arXiv preprint arXiv:2410.01276, 2024. 6

work page arXiv 2024
[3]

Towards making systems forget with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. InIEEE Symposium on Security and Privacy (SP), pages 463–480, 2015. 8, 1

work page 2015
[4]

California consumer privacy act of 2018, 2018

CCPA. California consumer privacy act of 2018, 2018. Cali- fornia Civil Code §1798.100 et seq. 1

work page 2018
[5]

Learning to unlearn: Instance- wise unlearning for pre-trained classifiers

Sungmin Cha, Sungjun Cho, Dasol Hwang, Honglak Lee, Tae- sup Moon, and Moontae Lee. Learning to unlearn: Instance- wise unlearning for pre-trained classifiers. InProceedings of the AAAI conference on artificial intelligence, pages 11186– 11194, 2024. 2

work page 2024
[6]

Boundary unlearning: Rapid forgetting of deep net- works via shifting the decision boundary

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep net- works via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6, 8, 1

work page 2023
[7]

Chundawat, Ayush K

Vishwaraj S. Chundawat, Ayush K. Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. InAAAI Conference on Artificial Intelligence (AAAI), pages 7210–7217, 2023. 8, 1

work page 2023
[8]

Chundawat, Ayush K

Vishwaraj S. Chundawat, Ayush K. Tarun, Murari Mandal, and Mohan S. Kankanhalli. Zero-shot machine unlearning. IEEE Transactions on Information Forensics and Security, 18:2345–2354, 2023. 1, 8

work page 2023
[9]

Duck: Distance- based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023

Marco Cotogni, Jacopo Bonato, Luigi Sabetta, Francesco Pelosin, and Alessandro Nicolosi. Duck: Distance- based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023. 6

work page arXiv 2023
[10]

Neural collapse for cross-entropy class-imbalanced regime

Huy Dang, Zhenyu Yang, Qinghua He, Jiadong Wang, Taiji Wei, Zhizheng Li, Chenliang Gong, Shuaiwen Hu, and Ying- bin Liang. Neural collapse for cross-entropy class-imbalanced regime. InProceedings of the 41st International Conference on Machine Learning (ICML), 2024. 4

work page 2024
[11]

Cong Fang, Hangfeng He, Qi Long, and Weijie J. Su. Explor- ing deep neural networks via layer-peeled model: Minority collapse in imbalanced training.Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021. 4

work page 2021
[12]

Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 (general data protection regulation), 2016

GDPR. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 (general data protection regulation), 2016. Official Journal of the European Union, L 119, 1-88. 1

work page 2016
[13]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9301–9309, 2020. 1, 8

work page 2020
[14]

For- getting outside the box: Scrubbing deep networks of informa- tion accessible from input-output observations

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. For- getting outside the box: Scrubbing deep networks of informa- tion accessible from input-output observations. InEuropean Conference on Computer Vision (ECCV), pages 383–398,

work page
[15]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11516–11524, 2021. 6, 8, 1

work page 2021
[16]

Certified data removal from machine learning models

Chuan Guo, Tom Goldstein, Awni Y . Hannun, and Laurens van der Maaten. Certified data removal from machine learning models.arXiv preprint arXiv:1911.03030, 2019. 8, 1

work page arXiv 1911
[17]

X. Y . Han, Vardan Papyan, and David L. Donoho. Neural collapse under mse loss: Proximity to and dynamics on the central path. InInternational Conference on Learning Repre- sentations (ICLR), 2022. 4

work page 2022
[18]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 5

work page internal anchor Pith review Pith/arXiv arXiv 2015
[19]

Neural collapse for un- constrained feature model under class-imbalanced regime

Weihao Hong and Shuyang Ling. Neural collapse for un- constrained feature model under class-imbalanced regime. Journal of Machine Learning Research, 25, 2024. 4

work page 2024
[20]

Wide neural networks trained with weight decay prov- ably exhibit neural collapse, 2024

Arthur Jacot, Peter S´uken´ık, Zihan Wang, and Marco Mon- delli. Wide neural networks trained with weight decay prov- ably exhibit neural collapse, 2024. 4

work page 2024
[21]

Are we truly forgetting? a critical re-examination of machine unlearn- ing evaluation protocols.arXiv preprint arXiv:2503.06991,

Yongwoo Kim, Sungmin Cha, and Donghyun Kim. Are we truly forgetting? a critical re-examination of machine unlearn- ing evaluation protocols.arXiv preprint arXiv:2503.06991,

work page arXiv
[22]

Deep unlearning: Fast and efficient gradient-free class forgetting

Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting. Transactions on Machine Learning Research (TMLR), 2024. to appear. 2, 8, 1

work page 2024
[23]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pages 3519–3529. PMlR, 2019. 3

work page 2019
[24]

Namboodiri

Alexey Kravets and Vinay P. Namboodiri. Zero-shot class unlearning in CLIP with synthetic samples. InIEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV),

work page
[25]

Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Trans- actions on Neural Networks and Learning Systems, 2025

Na Li, Chunyi Zhou, Yansong Gao, Hui Chen, Zhi Zhang, Boyu Kuang, and Anmin Fu. Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Trans- actions on Neural Networks and Learning Systems, 2025. 6

work page 2025
[26]

Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022

Jianfeng Lu and Stefan Steinerberger. Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022. 4

work page 2022
[27]

Han, and David L

Vardan Papyan, X.Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences (PNAS), 117(40):24652–24663, 2020. 1, 3, 8, 4

work page 2020
[28]

Personal information protection law of the people’s republic of china, 2021

PIPL. Personal information protection law of the people’s republic of china, 2021. Adopted at the 30th Meeting of the Standing Committee of the Thirteenth National People’s Congress. 1 9

work page 2021
[29]

Train once, forget precisely: Anchored optimization for effi- cient post-hoc unlearning.arXiv preprint arXiv:2506.14515,

Prabhav Sanga, Jaskaran Singh, and Arun Kumar Dubey. Train once, forget precisely: Anchored optimization for effi- cient post-hoc unlearning.arXiv preprint arXiv:2506.14515,

work page arXiv
[30]

Remember what you want to forget: Algorithms for machine unlearning

Aditi Sekhari, Jayadev Acharya, Gautam Kamath, and Abhradeep Thakurta Suresh. Remember what you want to forget: Algorithms for machine unlearning. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18075–18086, 2021. 8, 1

work page 2021
[31]

Fast yet effective machine unlearning

Ayush K Tarun, Vikram S Chundawat, Murari Mandal, and Mohan Kankanhalli. Fast yet effective machine unlearning. IEEE Transactions on Neural Networks and Learning Sys- tems, 35(9):13046–13055, 2023. 8, 1

work page 2023
[32]

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S. Yu. Machine unlearning: A survey, 2023. 1

work page 2023
[33]

Neural collapse to multiple centers for imbalanced data

Hongren Yan, Yuhua Qian, Furong Peng, Jiachen Luo, Zhe- qing Zhu, and Feijiang Li. Neural collapse to multiple centers for imbalanced data. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 4

work page 2024
[34]

Medmnist v2- a large-scale lightweight benchmark for 2d and 3d biomedical image classification.Scientific Data, 10(1):41, 2023

Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, and Bingbing Ni. Medmnist v2- a large-scale lightweight benchmark for 2d and 3d biomedical image classification.Scientific Data, 10(1):41, 2023. 6

work page 2023
[35]

CLIPErase: Efficient unlearning of visual-textual associations in CLIP

Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, and Xiangliang Zhang. CLIPErase: Efficient unlearning of visual-textual associations in CLIP. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2025. 1

work page 2025
[36]

Forget-me-not: Learning to forget in text-to- image diffusion models

Yufan Zhang, Chenyang Si, Jianfeng Zhang, Yingcong Chen, and Wayne Wu. Forget-me-not: Learning to forget in text-to- image diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1

work page 2024
[37]

Decoupled distillation to erase: A general unlearning method for any class-centric tasks

Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, and Wei-Shi Zheng. Decoupled distillation to erase: A general unlearning method for any class-centric tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 2, 6, 8

work page 2025
[38]

A geometric analysis of neural collapse with unconstrained features

Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, and Qing Qu. A geometric analysis of neural collapse with unconstrained features. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 4 10 POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse Supplementary Material Appendix Table of Contents

work page 2021
[39]

Proof on Decomposition ofK-Bound 1.2

Additional Justifications 1.1. Proof on Decomposition ofK-Bound 1.2. Justification on CKA USage

work page
[40]

Training Assumptions 2.2

Neural Collapse 2.1. Training Assumptions 2.2. Neural Collapse Statements

work page
[41]

Geometric Optimality of the Simplex ETF 3.2

ETF Implies Bayes Optimality 3.1. Geometric Optimality of the Simplex ETF 3.2. Bayes-Optimal Nearest Class Mean Rule

work page
[42]

Closure of Projection 4.2

Proof of Main Theorem 4.1. Closure of Projection 4.2. Optimality of Projection

work page
[43]

right to be forgotten,

More Related Work Machine Unlearning.The problem of removing specific training data from a model, often motivated by privacy reg- ulations such as the “right to be forgotten,” was first for- malized in the systems security community [3]. The semi- nal work of Bourtoule et al. [1] introduced theSISAframe- work, partitioning training data across multiple sh...

work page
[44]

Sekhari et al

proposed certified removal via influence-based updates. Sekhari et al. [30] provided theoretical guarantees for ap- proximate unlearning in general models. For deep networks, approaches include amnesiac unlearning [15], which inverts stored gradients, and Fisher information–based scrubbing [13, 14], which perturbs weights along sensitive directions. Other...

work page
[45]

Yet, none of the previous approaches con- nects to the phenomenon of Neural Collapse [27], wherein class features converge to a simplex equiangular tight frame

proposed a gradient-free method that explicitly com- putes class-specific subspaces via singular value decomposi- tion and suppresses discriminatory directions associated with the forget class. Yet, none of the previous approaches con- nects to the phenomenon of Neural Collapse [27], wherein class features converge to a simplex equiangular tight frame. Co...

work page
[46]

introduced a zero-shot unlearning method for CLIP that generates synthetic forget samples via gradient ascent

work page
[47]

Proof on Decomposition of K-Bound Let Z denote the feature space and P(Z) the set of prob- ability measures on it

Additional Justifications 1.1. Proof on Decomposition of K-Bound Let Z denote the feature space and P(Z) the set of prob- ability measures on it. Fix a symmetric function class F ⊆ {φ:Z →R} (i.e., φ∈ F ⇒ −φ∈ F ). For an Integral Probability Metric (IPM) defined as K(P, Q) = sup φ∈F Ez∼P [φ(z)]−Ez∼Q[φ(z)] , P, Q∈ P(Z), the following property holds. 1 Propo...

work page
[48]

Training and modeling assumptions

Neural Collapse 2.1. Training and modeling assumptions. Below are the standard Neural Collapse (NC) assumptions: • (A1) Interpolation / TPT:The network is trained to near- zero training error and then further optimized in the termi- nal phase of training (TPT) under standard protocols such as SGD or Adam with decays [27]. • (A2) Overparameterization:The m...

work page
[49]

ETF Implies Bayes Optimality We present a formal statement and proof of Proposition 3.1. First, we show that the simplex Equiangular Tight Frame (ETF) configuration is geometrically optimal: it maximizes the minimum pairwise angle among class means and there- fore maximizes the multiclass angular margin of the Nearest Class Mean (NCM) classifier. Second, ...

work page
[50]

Closure of Projection Note that asimplex ETF{v i}C i=1 ⊂R C−1 satisfies ∥vi∥= 1, v ⊤ i vj =− 1 C−1 (i̸=j), CX i=1 vi = 0

Proof of Main Theorem 4.1. Closure of Projection Note that asimplex ETF{v i}C i=1 ⊂R C−1 satisfies ∥vi∥= 1, v ⊤ i vj =− 1 C−1 (i̸=j), CX i=1 vi = 0. Equivalently, its Gram matrix has1on the diagonal and constant off-diagonal entries−1/(C−1). Theorem 4.1(Projection of a Simplex ETF).Let {vi}C i=1 ⊂R C−1 be a simplex ETF . Fixv1 and let P=I−v 1v⊤ 1 be the o...

work page
[51]

2.(Isotropic Gaussian conditionals)conditional on classi, θ(x)|(y=i)∼ N(µ i, σ2Id), with ∥µi∥= 1 and {µi}C i=1 coinciding with the ETF directions{v i}from NC (i.e.µ i =v i)

(Balanced classes)class priors are uniform: Pr(y=i) = 1/Cfori∈ Y. 2.(Isotropic Gaussian conditionals)conditional on classi, θ(x)|(y=i)∼ N(µ i, σ2Id), with ∥µi∥= 1 and {µi}C i=1 coinciding with the ETF directions{v i}from NC (i.e.µ i =v i). Fix a classu∈ Yand define P=I−v uv⊤ u ,˜v i = P vi ∥P vi∥ (i̸=u), so that by Proposition 3.2 the vectors {˜vi}i̸=u fo...

work page

[1] [1]

Choquette-Choo, Haoche Jia, Adeline Travers, Benjamin Zhang, David Lie, and Nicolas Papernot

Laurent Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Haoche Jia, Adeline Travers, Benjamin Zhang, David Lie, and Nicolas Papernot. Machine unlearn- ing. InIEEE Symposium on Security and Privacy (SP), pages 141–159, 2021. 1, 8

work page 2021

[2] [2]

Deep unlearn: Benchmarking machine unlearning.arXiv preprint arXiv:2410.01276, 2024

Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Haddadi. Deep unlearn: Benchmarking machine unlearning.arXiv preprint arXiv:2410.01276, 2024. 6

work page arXiv 2024

[3] [3]

Towards making systems forget with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. InIEEE Symposium on Security and Privacy (SP), pages 463–480, 2015. 8, 1

work page 2015

[4] [4]

California consumer privacy act of 2018, 2018

CCPA. California consumer privacy act of 2018, 2018. Cali- fornia Civil Code §1798.100 et seq. 1

work page 2018

[5] [5]

Learning to unlearn: Instance- wise unlearning for pre-trained classifiers

Sungmin Cha, Sungjun Cho, Dasol Hwang, Honglak Lee, Tae- sup Moon, and Moontae Lee. Learning to unlearn: Instance- wise unlearning for pre-trained classifiers. InProceedings of the AAAI conference on artificial intelligence, pages 11186– 11194, 2024. 2

work page 2024

[6] [6]

Boundary unlearning: Rapid forgetting of deep net- works via shifting the decision boundary

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep net- works via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6, 8, 1

work page 2023

[7] [7]

Chundawat, Ayush K

Vishwaraj S. Chundawat, Ayush K. Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. InAAAI Conference on Artificial Intelligence (AAAI), pages 7210–7217, 2023. 8, 1

work page 2023

[8] [8]

Chundawat, Ayush K

Vishwaraj S. Chundawat, Ayush K. Tarun, Murari Mandal, and Mohan S. Kankanhalli. Zero-shot machine unlearning. IEEE Transactions on Information Forensics and Security, 18:2345–2354, 2023. 1, 8

work page 2023

[9] [9]

Duck: Distance- based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023

Marco Cotogni, Jacopo Bonato, Luigi Sabetta, Francesco Pelosin, and Alessandro Nicolosi. Duck: Distance- based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023. 6

work page arXiv 2023

[10] [10]

Neural collapse for cross-entropy class-imbalanced regime

Huy Dang, Zhenyu Yang, Qinghua He, Jiadong Wang, Taiji Wei, Zhizheng Li, Chenliang Gong, Shuaiwen Hu, and Ying- bin Liang. Neural collapse for cross-entropy class-imbalanced regime. InProceedings of the 41st International Conference on Machine Learning (ICML), 2024. 4

work page 2024

[11] [11]

Cong Fang, Hangfeng He, Qi Long, and Weijie J. Su. Explor- ing deep neural networks via layer-peeled model: Minority collapse in imbalanced training.Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021. 4

work page 2021

[12] [12]

Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 (general data protection regulation), 2016

GDPR. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 (general data protection regulation), 2016. Official Journal of the European Union, L 119, 1-88. 1

work page 2016

[13] [13]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9301–9309, 2020. 1, 8

work page 2020

[14] [14]

For- getting outside the box: Scrubbing deep networks of informa- tion accessible from input-output observations

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. For- getting outside the box: Scrubbing deep networks of informa- tion accessible from input-output observations. InEuropean Conference on Computer Vision (ECCV), pages 383–398,

work page

[15] [15]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11516–11524, 2021. 6, 8, 1

work page 2021

[16] [16]

Certified data removal from machine learning models

Chuan Guo, Tom Goldstein, Awni Y . Hannun, and Laurens van der Maaten. Certified data removal from machine learning models.arXiv preprint arXiv:1911.03030, 2019. 8, 1

work page arXiv 1911

[17] [17]

X. Y . Han, Vardan Papyan, and David L. Donoho. Neural collapse under mse loss: Proximity to and dynamics on the central path. InInternational Conference on Learning Repre- sentations (ICLR), 2022. 4

work page 2022

[18] [18]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 5

work page internal anchor Pith review Pith/arXiv arXiv 2015

[19] [19]

Neural collapse for un- constrained feature model under class-imbalanced regime

Weihao Hong and Shuyang Ling. Neural collapse for un- constrained feature model under class-imbalanced regime. Journal of Machine Learning Research, 25, 2024. 4

work page 2024

[20] [20]

Wide neural networks trained with weight decay prov- ably exhibit neural collapse, 2024

Arthur Jacot, Peter S´uken´ık, Zihan Wang, and Marco Mon- delli. Wide neural networks trained with weight decay prov- ably exhibit neural collapse, 2024. 4

work page 2024

[21] [21]

Are we truly forgetting? a critical re-examination of machine unlearn- ing evaluation protocols.arXiv preprint arXiv:2503.06991,

Yongwoo Kim, Sungmin Cha, and Donghyun Kim. Are we truly forgetting? a critical re-examination of machine unlearn- ing evaluation protocols.arXiv preprint arXiv:2503.06991,

work page arXiv

[22] [22]

Deep unlearning: Fast and efficient gradient-free class forgetting

Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting. Transactions on Machine Learning Research (TMLR), 2024. to appear. 2, 8, 1

work page 2024

[23] [23]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pages 3519–3529. PMlR, 2019. 3

work page 2019

[24] [24]

Namboodiri

Alexey Kravets and Vinay P. Namboodiri. Zero-shot class unlearning in CLIP with synthetic samples. InIEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV),

work page

[25] [25]

Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Trans- actions on Neural Networks and Learning Systems, 2025

Na Li, Chunyi Zhou, Yansong Gao, Hui Chen, Zhi Zhang, Boyu Kuang, and Anmin Fu. Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Trans- actions on Neural Networks and Learning Systems, 2025. 6

work page 2025

[26] [26]

Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022

Jianfeng Lu and Stefan Steinerberger. Neural collapse under cross-entropy loss.Applied and Computational Harmonic Analysis, 59:224–241, 2022. 4

work page 2022

[27] [27]

Han, and David L

Vardan Papyan, X.Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences (PNAS), 117(40):24652–24663, 2020. 1, 3, 8, 4

work page 2020

[28] [28]

Personal information protection law of the people’s republic of china, 2021

PIPL. Personal information protection law of the people’s republic of china, 2021. Adopted at the 30th Meeting of the Standing Committee of the Thirteenth National People’s Congress. 1 9

work page 2021

[29] [29]

Train once, forget precisely: Anchored optimization for effi- cient post-hoc unlearning.arXiv preprint arXiv:2506.14515,

Prabhav Sanga, Jaskaran Singh, and Arun Kumar Dubey. Train once, forget precisely: Anchored optimization for effi- cient post-hoc unlearning.arXiv preprint arXiv:2506.14515,

work page arXiv

[30] [30]

Remember what you want to forget: Algorithms for machine unlearning

Aditi Sekhari, Jayadev Acharya, Gautam Kamath, and Abhradeep Thakurta Suresh. Remember what you want to forget: Algorithms for machine unlearning. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18075–18086, 2021. 8, 1

work page 2021

[31] [31]

Fast yet effective machine unlearning

Ayush K Tarun, Vikram S Chundawat, Murari Mandal, and Mohan Kankanhalli. Fast yet effective machine unlearning. IEEE Transactions on Neural Networks and Learning Sys- tems, 35(9):13046–13055, 2023. 8, 1

work page 2023

[32] [32]

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S. Yu. Machine unlearning: A survey, 2023. 1

work page 2023

[33] [33]

Neural collapse to multiple centers for imbalanced data

Hongren Yan, Yuhua Qian, Furong Peng, Jiachen Luo, Zhe- qing Zhu, and Feijiang Li. Neural collapse to multiple centers for imbalanced data. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 4

work page 2024

[34] [34]

Medmnist v2- a large-scale lightweight benchmark for 2d and 3d biomedical image classification.Scientific Data, 10(1):41, 2023

Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, and Bingbing Ni. Medmnist v2- a large-scale lightweight benchmark for 2d and 3d biomedical image classification.Scientific Data, 10(1):41, 2023. 6

work page 2023

[35] [35]

CLIPErase: Efficient unlearning of visual-textual associations in CLIP

Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, and Xiangliang Zhang. CLIPErase: Efficient unlearning of visual-textual associations in CLIP. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2025. 1

work page 2025

[36] [36]

Forget-me-not: Learning to forget in text-to- image diffusion models

Yufan Zhang, Chenyang Si, Jianfeng Zhang, Yingcong Chen, and Wayne Wu. Forget-me-not: Learning to forget in text-to- image diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1

work page 2024

[37] [37]

Decoupled distillation to erase: A general unlearning method for any class-centric tasks

Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, and Wei-Shi Zheng. Decoupled distillation to erase: A general unlearning method for any class-centric tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 2, 6, 8

work page 2025

[38] [38]

A geometric analysis of neural collapse with unconstrained features

Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, and Qing Qu. A geometric analysis of neural collapse with unconstrained features. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 4 10 POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse Supplementary Material Appendix Table of Contents

work page 2021

[39] [39]

Proof on Decomposition ofK-Bound 1.2

Additional Justifications 1.1. Proof on Decomposition ofK-Bound 1.2. Justification on CKA USage

work page

[40] [40]

Training Assumptions 2.2

Neural Collapse 2.1. Training Assumptions 2.2. Neural Collapse Statements

work page

[41] [41]

Geometric Optimality of the Simplex ETF 3.2

ETF Implies Bayes Optimality 3.1. Geometric Optimality of the Simplex ETF 3.2. Bayes-Optimal Nearest Class Mean Rule

work page

[42] [42]

Closure of Projection 4.2

Proof of Main Theorem 4.1. Closure of Projection 4.2. Optimality of Projection

work page

[43] [43]

right to be forgotten,

More Related Work Machine Unlearning.The problem of removing specific training data from a model, often motivated by privacy reg- ulations such as the “right to be forgotten,” was first for- malized in the systems security community [3]. The semi- nal work of Bourtoule et al. [1] introduced theSISAframe- work, partitioning training data across multiple sh...

work page

[44] [44]

Sekhari et al

proposed certified removal via influence-based updates. Sekhari et al. [30] provided theoretical guarantees for ap- proximate unlearning in general models. For deep networks, approaches include amnesiac unlearning [15], which inverts stored gradients, and Fisher information–based scrubbing [13, 14], which perturbs weights along sensitive directions. Other...

work page

[45] [45]

Yet, none of the previous approaches con- nects to the phenomenon of Neural Collapse [27], wherein class features converge to a simplex equiangular tight frame

proposed a gradient-free method that explicitly com- putes class-specific subspaces via singular value decomposi- tion and suppresses discriminatory directions associated with the forget class. Yet, none of the previous approaches con- nects to the phenomenon of Neural Collapse [27], wherein class features converge to a simplex equiangular tight frame. Co...

work page

[46] [46]

introduced a zero-shot unlearning method for CLIP that generates synthetic forget samples via gradient ascent

work page

[47] [47]

Proof on Decomposition of K-Bound Let Z denote the feature space and P(Z) the set of prob- ability measures on it

Additional Justifications 1.1. Proof on Decomposition of K-Bound Let Z denote the feature space and P(Z) the set of prob- ability measures on it. Fix a symmetric function class F ⊆ {φ:Z →R} (i.e., φ∈ F ⇒ −φ∈ F ). For an Integral Probability Metric (IPM) defined as K(P, Q) = sup φ∈F Ez∼P [φ(z)]−Ez∼Q[φ(z)] , P, Q∈ P(Z), the following property holds. 1 Propo...

work page

[48] [48]

Training and modeling assumptions

Neural Collapse 2.1. Training and modeling assumptions. Below are the standard Neural Collapse (NC) assumptions: • (A1) Interpolation / TPT:The network is trained to near- zero training error and then further optimized in the termi- nal phase of training (TPT) under standard protocols such as SGD or Adam with decays [27]. • (A2) Overparameterization:The m...

work page

[49] [49]

ETF Implies Bayes Optimality We present a formal statement and proof of Proposition 3.1. First, we show that the simplex Equiangular Tight Frame (ETF) configuration is geometrically optimal: it maximizes the minimum pairwise angle among class means and there- fore maximizes the multiclass angular margin of the Nearest Class Mean (NCM) classifier. Second, ...

work page

[50] [50]

Closure of Projection Note that asimplex ETF{v i}C i=1 ⊂R C−1 satisfies ∥vi∥= 1, v ⊤ i vj =− 1 C−1 (i̸=j), CX i=1 vi = 0

Proof of Main Theorem 4.1. Closure of Projection Note that asimplex ETF{v i}C i=1 ⊂R C−1 satisfies ∥vi∥= 1, v ⊤ i vj =− 1 C−1 (i̸=j), CX i=1 vi = 0. Equivalently, its Gram matrix has1on the diagonal and constant off-diagonal entries−1/(C−1). Theorem 4.1(Projection of a Simplex ETF).Let {vi}C i=1 ⊂R C−1 be a simplex ETF . Fixv1 and let P=I−v 1v⊤ 1 be the o...

work page

[51] [51]

2.(Isotropic Gaussian conditionals)conditional on classi, θ(x)|(y=i)∼ N(µ i, σ2Id), with ∥µi∥= 1 and {µi}C i=1 coinciding with the ETF directions{v i}from NC (i.e.µ i =v i)

(Balanced classes)class priors are uniform: Pr(y=i) = 1/Cfori∈ Y. 2.(Isotropic Gaussian conditionals)conditional on classi, θ(x)|(y=i)∼ N(µ i, σ2Id), with ∥µi∥= 1 and {µi}C i=1 coinciding with the ETF directions{v i}from NC (i.e.µ i =v i). Fix a classu∈ Yand define P=I−v uv⊤ u ,˜v i = P vi ∥P vi∥ (i̸=u), so that by Proposition 3.2 the vectors {˜vi}i̸=u fo...

work page