Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Mihalis A. Nicolaou; Panagiotis Koromilas; Theodoros Giannakopoulos; Yannis Panagakis

arxiv: 2605.20302 · v2 · pith:WJSXJN5Dnew · submitted 2026-05-19 · 💻 cs.LG · cs.CV

Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Panagiotis Koromilas , Theodoros Giannakopoulos , Mihalis A. Nicolaou , Yannis Panagakis This is my paper

Pith reviewed 2026-05-22 09:44 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords neural collapsecross-entropysupervised contrastive learningclass prototypeshyperspherenormalized losstransfer learningrobustness

0 comments

The pith

Both cross-entropy and supervised contrastive learning are variants of prototype contrast on the unit hypersphere, and fixing each at its failure point reaches neural collapse by design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Supervised classification has a theoretical optimum called neural collapse yet neither standard cross-entropy nor supervised contrastive learning fully reaches it in practice. Cross-entropy leaves radial degrees of freedom unconstrained and converges to a degenerate geometry, while contrastive learning reaches the geometry during pretraining but discards it in the subsequent linear probing step. The paper shows that the two paradigms are different appearances of the same underlying method of contrasting class prototypes on the unit hypersphere. From the cross-entropy side it introduces normalized losses that import a large effective negative set plus decoupled alignment and uniformity terms. From the contrastive side it proves that the objective already optimizes for a classifier whose weights are exactly the class mean embeddings, making linear probing redundant and harmful. The resulting geometry improves accuracy, converges faster, and yields gains in transfer learning, class imbalance, and corruption robustness.

Core claim

Both cross-entropy and supervised contrastive learning are different appearances of the same method that contrasts prototypes on the unit hypersphere, and closing the gap requires fixing each at its point of failure. From the cross-entropy side we propose NTCE and NONL, two normalized losses that import contrastive optimization's missing ingredients: a large effective negative set and decoupled alignment and uniformity terms. From the supervised contrastive side we prove that its objective already optimizes throughout training for a principled classifier whose weights are the class mean embeddings, making linear probing both redundant and harmful.

What carries the argument

Contrasting class prototypes on the unit hypersphere, where features and prototypes are forced to unit length so that alignment to the correct prototype and uniformity across prototypes can be optimized separately.

If this is right

NTCE and NONL surpass cross-entropy accuracy on four benchmarks including ImageNet-1K while approximating neural collapse at 95 percent or higher.
They reach converged neural collapse metrics on four of five measures in under 7.5 percent of the iterations needed by cross-entropy.
Supervised contrastive learning with fixed prototypes matches the accuracy of linear probing without the hours-long post-training classifier phase.
The learned geometry produces a 5.5 percent mean relative gain in transfer learning and up to 8.7 percent under severe class imbalance.
Robustness to corruptions improves on ImageNet-C.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the unification is correct, the same hyperspherical prototype contrast principle could be applied directly to other supervised losses without first running contrastive pretraining.
The proof that class-mean embeddings are the optimal classifier weights suggests initializing or periodically resetting the final linear layer to running class means during training.
The approach may extend naturally to multi-label or long-tailed settings where maintaining uniform prototype spacing on the sphere is especially valuable.
Viewing all supervised training as prototype placement on the sphere implies that the final geometry matters more than the particular loss used to reach it.

Load-bearing premise

The main defect in cross-entropy is its unconstrained radial degrees of freedom and that importing a large effective negative set plus decoupled alignment and uniformity terms will produce neural collapse without new side effects or hidden fitting.

What would settle it

Train a standard ResNet on ImageNet-1K with the proposed NTCE loss and check whether neural collapse metrics stay below 95 percent or final accuracy fails to exceed ordinary cross-entropy; if either occurs the central claim is refuted.

Figures

Figures reproduced from arXiv: 2605.20302 by Mihalis A. Nicolaou, Panagiotis Koromilas, Theodoros Giannakopoulos, Yannis Panagakis.

**Figure 1.** Figure 1: Supervised learning as learning class prototypes on the hypersphere. (a) Cross-entropy with unconstrained features z, weights W, and biases b leaves radial degrees of freedom free, preventing convergence to NC. (b) SCL pretraining maps features onto S d−1 via a projection head, producing representations that approach within-class collapse (NC1) and maximal between-class separation (NC2). (c) Standard pract… view at source ↗

**Figure 2.** Figure 2: NC convergence on CIFAR-100. Six metrics vs. training iterations; red dashed lines mark the 95% NC threshold and circles denote each method’s convergence [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗

**Figure 3.** Figure 3: Validation Accuracy (%) Phase Diagrams. Classification accuracy on validation set. Higher values indicate better generalization performance. Each subplot shows the performance landscape across temperature and batch size hyperparameters for different loss functions: NormFace, NTCE, and NONL. Brighter regions indicate superior performance. White contour lines indicate iso-performance curves for detailed anal… view at source ↗

read the original abstract

Supervised classification has a theoretical optimum, Neural Collapse (NC), yet neither of its two dominant paradigms reaches it in practice. Cross entropy (CE) leaves radial degrees of freedom unconstrained and converges to a degenerate geometry, while supervised contrastive learning (SCL) drives features toward NC during pretraining but discards this structure in a post hoc linear probing phase. We show that both paradigms are different appearances of the same method that contrasts prototypes on the unit hypersphere, and that closing the gap requires fixing each at its point of failure. From the CE side, we propose NTCE and NONL, two normalized losses that import contrastive optimization's missing ingredients into classifier learning: a large effective negative set and decoupled alignment and uniformity terms. From the SCL side, we prove that SCL's objective already optimizes throughout training for a principled classifier whose weights are the class mean embeddings, making linear probing both redundant and harmful. Empirically, on four benchmarks including ImageNet-1K, NTCE and NONL surpass CE accuracy, closely approximate NC ($\geq 95\%$), and match CE's converged NC on 4/5 metrics in under $7.5\%$ of its iterations, while SCL with fixed prototypes matches linear probing without the hours-long classifier training phase. The learned geometry yields $+5.5\%$ mean relative improvement in transfer learning, up to $+8.7\%$ under severe class imbalance, and improved robustness to corruptions on ImageNet-C. Our work recasts supervised learning as prototype learning on the hypersphere, with NC reached by design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies CE and SCL as hypersphere prototype contrast, proves SCL already optimizes class-mean classifiers, and gives normalized losses that reach NC by design with reported gains on ImageNet.

read the letter

The main things to know are the unification of cross-entropy and supervised contrastive learning as prototype contrasting on the unit hypersphere, plus the proof that SCL already optimizes for class-mean embeddings so linear probing becomes redundant. They also introduce NTCE and NONL to fix the radial freedom in CE by adding a large effective negative set and decoupled alignment/uniformity terms. These changes are meant to produce neural collapse directly during training rather than as an afterthought. The reported results show higher accuracy than standard CE, NC metrics at or above 95 percent on four benchmarks including ImageNet-1K, faster convergence in under 7.5 percent of the iterations, and gains in transfer learning and robustness to corruptions. The proof on SCL classifiers is a useful clarification of why existing practice works the way it does. The soft spots are in how tightly the unification holds. SCL contrasts against instance-level negatives in the batch while the new CE variants contrast against class prototypes. Importing a large negative set into CE needs to produce gradients that match the SCL case exactly once features are normalized. If the temperature scaling or weighting differs, the geometries end up similar but not identical, so the claim that both are appearances of the same method is more conceptual than exact. The abstract does not lay out the precise reformulation or ablations that would confirm the match, which leaves some room for doubt on whether the NC guarantee transfers without side effects. This is a moderate issue worth checking rather than a load-bearing flaw. The work sits on prior neural collapse results, but the new losses and the SCL proof add distinct elements. Readers working on representation geometry, supervised contrastive methods, or ways to induce collapse without extra stages would find it useful. It deserves peer review because the claims are concrete enough on standard benchmarks and the proof is verifiable in principle, even if revisions are needed on the negative sampling details.

Referee Report

2 major / 2 minor

Summary. The paper claims that neural collapse (NC) is the theoretical optimum for supervised classification but is not reached by standard cross-entropy (CE) or supervised contrastive learning (SCL). It argues both paradigms are equivalent appearances of prototype contrasting on the unit hypersphere. From the CE side it introduces NTCE and NONL, normalized losses that add a large effective negative set plus decoupled alignment/uniformity terms. From the SCL side it proves that the SCL objective already optimizes for a classifier whose weights are the class-mean embeddings, rendering linear probing redundant. Experiments on ImageNet-1K and three other benchmarks report accuracy gains, NC metrics ≥95%, faster convergence, +5.5% mean relative transfer improvement, and better robustness under imbalance and corruptions.

Significance. If the unification and the SCL proof hold, the work would recast supervised learning as hypersphere prototype learning and supply a design principle for reaching NC without post-hoc stages. The explicit proof that SCL optimizes class-mean embeddings is a clear strength, as is the empirical demonstration of high NC metrics and transfer gains on ImageNet-1K. The practical speed-up (NC reached in <7.5% of CE iterations) and robustness improvements would be noteworthy if the theoretical equivalence is made rigorous.

major comments (2)

[Introduction / unification section] Introduction and the unification section: the central claim that CE and SCL are 'different appearances of the same method' that contrasts prototypes on the unit hypersphere requires an explicit gradient-level equivalence between NTCE's class-prototype negatives and SCL's instance-level batch negatives. The abstract states that NTCE imports 'a large effective negative set,' yet without showing that the resulting gradient w.r.t. normalized features exactly parallels the SCL gradient (including temperature scaling and negative weighting), the 'same method' assertion remains approximate and the transfer of the NC-by-design guarantee is not guaranteed.
[SCL proof section] SCL proof (the section containing the claim that SCL optimizes for class-mean embeddings): the proof must specify whether the class means are treated as fixed or dynamically updated during training, and how the resulting classifier is shown to be 'principled' throughout optimization rather than only at convergence. If the derivation assumes fixed prototypes, it does not directly establish that linear probing is redundant or harmful at every stage.

minor comments (2)

[Abstract / Experiments] The abstract reports NC metrics '≥95%' and 'match CE's converged NC on 4/5 metrics' but does not state the precise NC metrics used or whether they are averaged over multiple seeds; this should be clarified in the experimental section.
[Method section] Notation for the new losses (NTCE, NONL) should be introduced with explicit equations showing the alignment and uniformity terms before the empirical comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments help clarify the scope of our unification claim and the SCL proof. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Introduction / unification section] Introduction and the unification section: the central claim that CE and SCL are 'different appearances of the same method' that contrasts prototypes on the unit hypersphere requires an explicit gradient-level equivalence between NTCE's class-prototype negatives and SCL's instance-level batch negatives. The abstract states that NTCE imports 'a large effective negative set,' yet without showing that the resulting gradient w.r.t. normalized features exactly parallels the SCL gradient (including temperature scaling and negative weighting), the 'same method' assertion remains approximate and the transfer of the NC-by-design guarantee is not guaranteed.

Authors: We agree that the unification would benefit from greater precision on the gradient relationship. The manuscript presents the equivalence at the level of the optimized geometry and the shared objective of prototype alignment plus uniformity on the hypersphere, rather than claiming identical per-step gradients. NTCE replaces instance negatives with class prototypes while preserving the normalized feature space and the decoupled alignment/uniformity structure; this yields the same NC fixed point but with different negative sampling. In the revision we will add a short appendix that derives the NTCE gradient for normalized features and explicitly compares its alignment term and effective negative weighting to the SCL gradient (including temperature). We will also qualify the abstract and introduction to state that the methods are equivalent in their design principle and NC optimum, while noting that the negative sets differ in construction. revision: partial
Referee: [SCL proof section] SCL proof (the section containing the claim that SCL optimizes for class-mean embeddings): the proof must specify whether the class means are treated as fixed or dynamically updated during training, and how the resulting classifier is shown to be 'principled' throughout optimization rather than only at convergence. If the derivation assumes fixed prototypes, it does not directly establish that linear probing is redundant or harmful at every stage.

Authors: We thank the referee for this clarification request. The proof treats the class means as the empirical means computed from the current embeddings at each training step; these means are therefore updated dynamically as the feature extractor evolves. The derivation shows that, for any fixed feature distribution at a given iteration, the SCL objective is minimized with respect to the linear classifier precisely when the classifier weights equal those instantaneous class means. Because this optimality condition holds with respect to the features present at every step, the resulting classifier remains principled throughout training rather than only at convergence. In the revision we will rewrite the relevant section to state explicitly that the means are recomputed from the current batch statistics at each iteration and that the optimality argument applies to the instantaneous feature distribution, thereby establishing redundancy of linear probing at all stages. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are independent reformulations with external benchmarks.

full rationale

The paper's core claims rest on explicit reformulations of CE into normalized variants (NTCE/NONL) that add large negative sets and decoupled alignment/uniformity terms, plus a proof that SCL's loss already targets class-mean embeddings on the hypersphere. These steps are presented as direct mathematical mappings and empirical checks on ImageNet-1K and other datasets rather than reductions to fitted parameters or self-citation chains. No load-bearing premise collapses to a prior result by the same authors or to an ansatz smuggled via citation; the unification is derived from the loss gradients and geometry, not assumed by construction. The work remains self-contained against external validation metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated beyond standard deep-learning assumptions such as the existence of a unit hypersphere geometry.

pith-pipeline@v0.9.0 · 5836 in / 1147 out tokens · 43150 ms · 2026-05-22T09:44:43.043096+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show that both paradigms are different appearances of the same method that contrasts prototypes on the unit hypersphere... NTCE and NONL... decoupled alignment and uniformity terms.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.1 (Neural Collapse optimality of normalized losses)... simplex ETF class means... classifier–feature alignment

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages · 3 internal anchors

[1]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Are All Losses Created Equal: A Neural Collapse Perspective , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[2]

International Conference on Learning Representations , year=

On the Role of Neural Collapse in Transfer Learning , author=. International Conference on Learning Representations , year=

work page
[3]

Advances in Neural Information Processing Systems , volume =

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? , author =. Advances in Neural Information Processing Systems , volume =

work page
[4]

Advances in Neural Information Processing Systems , volume =

Hyperspherical Prototype Networks , author =. Advances in Neural Information Processing Systems , volume =

work page
[5]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Equiangular Basis Vectors , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

work page
[6]

International Conference on Learning Representations (ICLR) , year =

Visual Recognition with Deep Nearest Centroids , author =. International Conference on Learning Representations (ICLR) , year =

work page
[7]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Neural Collapse Inspired Knowledge Distillation , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page
[8]

Proceedings of the International Conference on Learning Representations , year=

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author=. Proceedings of the International Conference on Learning Representations , year=

work page
[9]

International conference on machine learning , pages=

Understanding contrastive representation learning through alignment and uniformity on the hypersphere , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[10]

Advances in neural information processing systems , volume=

Formal guarantees on the robustness of a classifier against adversarial manipulation , author=. Advances in neural information processing systems , volume=

work page
[11]

International conference on machine learning , pages=

A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[12]

L2-constrained Softmax Loss for Discriminative Face Verification

L2-Constrained Softmax Loss for Discriminative Face Verification , author =. arXiv preprint arXiv:1703.09507 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =

No Fuss Distance Metric Learning Using Proxies , author =. Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =. 2017 , doi =

work page 2017
[14]

International Conference on Learning Representations (ICLR) , year =

Fixed Non-Negative Orthogonal Classifier: Inducing Zero-Mean Neural Collapse with Feature Dimension Separation , author =. International Conference on Learning Representations (ICLR) , year =

work page
[15]

Proceedings of the National Academy of Sciences , volume=

Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , doi=

work page 2020
[16]

Journal of Machine Learning Research , volume=

The Implicit Bias of Gradient Descent on Separable Data , author=. Journal of Machine Learning Research , volume=. 2018 , url=

work page 2018
[17]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Spectrally-normalized margin bounds for neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page
[18]

Neyshabur, Behnam and Bhojanapalli, Srinadh and Srebro, Nathan , booktitle=. A. 2018 , url=

work page 2018
[19]

, booktitle=

Kornblith, Simon and Shlens, Jonathon and Le, Quoc V. , booktitle=. Do Better. 2019 , url=

work page 2019
[20]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Supervised Contrastive Learning , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[21]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Robustness of classifiers: from adversarial to random noise , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[22]

2020 , url=

Gavin Weiguang Ding and Yash Sharma and Kry Yik Chau Lui and Ruitong Huang , booktitle=. 2020 , url=

work page 2020
[23]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[24]

Applied and Computational Harmonic Analysis , volume=

Neural Collapse under Cross-Entropy Loss , author=. Applied and Computational Harmonic Analysis , volume=. 2022 , doi=

work page 2022
[25]

International Conference on Machine Learning , pages=

Unveiling the Dynamics of Information Interplay in Supervised Learning , author=. International Conference on Machine Learning , pages=. 2024 , organization=

work page 2024
[26]

2007 15th European signal processing conference , pages=

The effective rank: A measure of effective dimensionality , author=. 2007 15th European signal processing conference , pages=. 2007 , organization=

work page 2007
[27]

Proceedings of the 41st International Conference on Machine Learning , pages=

Matrix information theory for self-supervised learning , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page
[28]

Advances in Neural Information Processing Systems , volume=

Imbalance trouble: Revisiting neural-collapse geometry , author=. Advances in Neural Information Processing Systems , volume=

work page
[29]

International Conference on Machine Learning , pages=

On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[30]

Advances in Neural Information Processing Systems , volume=

A geometric analysis of neural collapse with unconstrained features , author=. Advances in Neural Information Processing Systems , volume=

work page
[31]

Biometrika , volume=

On the Existence of Maximum Likelihood Estimates in Logistic Regression Models , author=. Biometrika , volume=. 1984 , publisher=

work page 1984
[32]

Journal of Machine Learning Research , volume=

Neural Collapse for Unconstrained Feature Model under Cross-Entropy Loss with Imbalanced Data , author=. Journal of Machine Learning Research , volume=

work page
[33]

, booktitle=

Wang, Feng and Xiang, Xiang and Cheng, Jian and Yuille, Alan L. , booktitle=. 2017 , doi=

work page 2017
[34]

Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Li, Ming and Raj, Bhiksha and Song, Le , booktitle=

work page
[35]

Wang, Hao and Wang, Yitong and Zhou, Zheng and Ji, Xing and Gong, Dihong and Zhou, Jingchao and Li, Zhifeng and Liu, Wei , booktitle=

work page
[36]

Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos , booktitle=

work page
[37]

Han and Vardan Papyan and David L

X.Y. Han and Vardan Papyan and David L. Donoho , booktitle=. Neural Collapse Under. 2022 , url=

work page 2022
[38]

Proceedings of the National Academy of Sciences , volume=

Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , doi=

work page 2021
[39]

arXiv preprint arXiv:2110.02796 , year=

An Unconstrained Layer-Peeled Perspective on Neural Collapse , author=. arXiv preprint arXiv:2110.02796 , year=

work page arXiv
[40]

A Theoretical Framework for Preventing Class Collapse in Supervised Contrastive Learning , author=

work page
[41]

Proceedings of the 39th International Conference on Machine Learning , series =

Extended Unconstrained Features Model for Exploring Deep Neural Collapse , author =. Proceedings of the 39th International Conference on Machine Learning , series =. 2022 , publisher =

work page 2022
[42]

International Conference on Machine Learning , pages=

Dissecting supervised contrastive learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[43]

CoRR , volume=

Leyan Pan and Xinyuan Cao , title=. CoRR , volume=. 2023 , cdate=

work page 2023
[44]

arXiv preprint arXiv:2011.11619 , year =

Neural Collapse with Unconstrained Features , author =. arXiv preprint arXiv:2011.11619 , year =

work page arXiv 2011
[45]

International Conference on Learning Representations , year =

An Unconstrained Layer-Peeled Perspective on Neural Collapse , author =. International Conference on Learning Representations , year =

work page
[46]

International Conference on Learning Representations , year =

Long-Tail Learning via Logit Adjustment , author =. International Conference on Learning Representations , year =

work page
[47]

Advances in Neural Information Processing Systems , volume =

Prototypical Networks for Few-shot Learning , author =. Advances in Neural Information Processing Systems , volume =

work page
[48]

Advances in Neural Information Processing Systems , volume=

Guiding neural collapse: Optimising towards the nearest simplex equiangular tight frame , author=. Advances in Neural Information Processing Systems , volume=

work page
[49]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[50]

International Conference on Machine Learning , pages=

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses , author=. International Conference on Machine Learning , pages=. 2024 , organization=

work page 2024
[51]

European Conference on Computer Vision , pages=

Decoupled contrastive learning , author=. European Conference on Computer Vision , pages=. 2022 , organization=

work page 2022
[52]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Unsupervised feature learning by cross-level instance-group discrimination , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[53]

Journal of Machine Learning Research , volume =

Neural Collapse for Unconstrained Feature Model under Class-Imbalance , author =. Journal of Machine Learning Research , volume =. 2024 , url =

work page 2024
[54]

arXiv preprint arXiv:2202.08384 , year =

Limitations of Neural Collapse for Understanding Generalization in Deep Learning , author =. arXiv preprint arXiv:2202.08384 , year =

work page arXiv
[55]

Proceedings of the National Academy of Sciences , volume =

Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training , author =. Proceedings of the National Academy of Sciences , volume =. 2021 , doi =

work page 2021
[56]

Advances in Computational Mathematics , volume =

Finite Normalized Tight Frames , author =. Advances in Computational Mathematics , volume =. 2003 , doi =

work page 2003
[57]

Experimental Mathematics , volume =

Packing Lines, Planes, etc.: Packings in Grassmannian Spaces , author =. Experimental Mathematics , volume =. 1996 , url =

work page 1996
[58]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page
[59]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page
[60]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[61]

Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of

Kini, Ganesh Ramachandra and Vakilian, Vala and Behnia, Tina and Gilani Tehrani-Saadi, Jaiden and Thrampoulidis, Christos , booktitle =. Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of

work page
[62]

ICASSP 2024 -- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss , author =. ICASSP 2024 -- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =. 2024 , organization =

work page 2024
[63]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Neural Collapse versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[64]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[65]

International Conference on Machine Learning (ICML) , year =

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning , author =. International Conference on Machine Learning (ICML) , year =

work page
[66]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Why Do Better Loss Functions Lead to Less Transferable Features? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[67]

International Conference on Machine Learning (ICML) , year =

Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning , author =. International Conference on Machine Learning (ICML) , year =

work page
[68]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Associative Embedding: End-to-End Learning for Joint Detection and Grouping , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[69]

Semantic Instance Segmentation with a Discriminative Loss Function

Semantic Instance Segmentation with a Discriminative Loss Function , author =. arXiv preprint arXiv:1708.02551 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[70]

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , author =. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[71]

Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent

Huang, Lei and Liu, Xianglong and Lang, Bo and Yu, Adams Wei and Wang, Yongliang and Li, Bo , booktitle =. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent

work page
[72]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[73]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[74]

International Conference on Learning Representations (ICLR) , year =

Spectral Normalization for Generative Adversarial Networks , author =. International Conference on Learning Representations (ICLR) , year =

work page
[75]

ICML Workshop on Identifying and Understanding Deep Learning Phenomena , year =

Layer Rotation: a Surprisingly Simple Indicator of Generalization in Deep Networks? , author =. ICML Workshop on Identifying and Understanding Deep Learning Phenomena , year =

work page
[76]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Goyal, Priya and Doll. Accurate, Large Minibatch. arXiv preprint arXiv:1706.02677 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[77]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Deep Residual Learning for Image Recognition , author =. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[78]

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle =

work page
[79]

Learning Multiple Layers of Features from Tiny Images , author =

work page
[80]

Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc , booktitle =

work page

Showing first 80 references.

[1] [1]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Are All Losses Created Equal: A Neural Collapse Perspective , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[2] [2]

International Conference on Learning Representations , year=

On the Role of Neural Collapse in Transfer Learning , author=. International Conference on Learning Representations , year=

work page

[3] [3]

Advances in Neural Information Processing Systems , volume =

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? , author =. Advances in Neural Information Processing Systems , volume =

work page

[4] [4]

Advances in Neural Information Processing Systems , volume =

Hyperspherical Prototype Networks , author =. Advances in Neural Information Processing Systems , volume =

work page

[5] [5]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Equiangular Basis Vectors , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

work page

[6] [6]

International Conference on Learning Representations (ICLR) , year =

Visual Recognition with Deep Nearest Centroids , author =. International Conference on Learning Representations (ICLR) , year =

work page

[7] [7]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Neural Collapse Inspired Knowledge Distillation , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page

[8] [8]

Proceedings of the International Conference on Learning Representations , year=

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author=. Proceedings of the International Conference on Learning Representations , year=

work page

[9] [9]

International conference on machine learning , pages=

Understanding contrastive representation learning through alignment and uniformity on the hypersphere , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020

[10] [10]

Advances in neural information processing systems , volume=

Formal guarantees on the robustness of a classifier against adversarial manipulation , author=. Advances in neural information processing systems , volume=

work page

[11] [11]

International conference on machine learning , pages=

A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020

[12] [12]

L2-constrained Softmax Loss for Discriminative Face Verification

L2-Constrained Softmax Loss for Discriminative Face Verification , author =. arXiv preprint arXiv:1703.09507 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =

No Fuss Distance Metric Learning Using Proxies , author =. Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =. 2017 , doi =

work page 2017

[14] [14]

International Conference on Learning Representations (ICLR) , year =

Fixed Non-Negative Orthogonal Classifier: Inducing Zero-Mean Neural Collapse with Feature Dimension Separation , author =. International Conference on Learning Representations (ICLR) , year =

work page

[15] [15]

Proceedings of the National Academy of Sciences , volume=

Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , doi=

work page 2020

[16] [16]

Journal of Machine Learning Research , volume=

The Implicit Bias of Gradient Descent on Separable Data , author=. Journal of Machine Learning Research , volume=. 2018 , url=

work page 2018

[17] [17]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Spectrally-normalized margin bounds for neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page

[18] [18]

Neyshabur, Behnam and Bhojanapalli, Srinadh and Srebro, Nathan , booktitle=. A. 2018 , url=

work page 2018

[19] [19]

, booktitle=

Kornblith, Simon and Shlens, Jonathon and Le, Quoc V. , booktitle=. Do Better. 2019 , url=

work page 2019

[20] [20]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Supervised Contrastive Learning , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page

[21] [21]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Robustness of classifiers: from adversarial to random noise , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page

[22] [22]

2020 , url=

Gavin Weiguang Ding and Yash Sharma and Kry Yik Chau Lui and Ruitong Huang , booktitle=. 2020 , url=

work page 2020

[23] [23]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[24] [24]

Applied and Computational Harmonic Analysis , volume=

Neural Collapse under Cross-Entropy Loss , author=. Applied and Computational Harmonic Analysis , volume=. 2022 , doi=

work page 2022

[25] [25]

International Conference on Machine Learning , pages=

Unveiling the Dynamics of Information Interplay in Supervised Learning , author=. International Conference on Machine Learning , pages=. 2024 , organization=

work page 2024

[26] [26]

2007 15th European signal processing conference , pages=

The effective rank: A measure of effective dimensionality , author=. 2007 15th European signal processing conference , pages=. 2007 , organization=

work page 2007

[27] [27]

Proceedings of the 41st International Conference on Machine Learning , pages=

Matrix information theory for self-supervised learning , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page

[28] [28]

Advances in Neural Information Processing Systems , volume=

Imbalance trouble: Revisiting neural-collapse geometry , author=. Advances in Neural Information Processing Systems , volume=

work page

[29] [29]

International Conference on Machine Learning , pages=

On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[30] [30]

Advances in Neural Information Processing Systems , volume=

A geometric analysis of neural collapse with unconstrained features , author=. Advances in Neural Information Processing Systems , volume=

work page

[31] [31]

Biometrika , volume=

On the Existence of Maximum Likelihood Estimates in Logistic Regression Models , author=. Biometrika , volume=. 1984 , publisher=

work page 1984

[32] [32]

Journal of Machine Learning Research , volume=

Neural Collapse for Unconstrained Feature Model under Cross-Entropy Loss with Imbalanced Data , author=. Journal of Machine Learning Research , volume=

work page

[33] [33]

, booktitle=

Wang, Feng and Xiang, Xiang and Cheng, Jian and Yuille, Alan L. , booktitle=. 2017 , doi=

work page 2017

[34] [34]

Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Li, Ming and Raj, Bhiksha and Song, Le , booktitle=

work page

[35] [35]

Wang, Hao and Wang, Yitong and Zhou, Zheng and Ji, Xing and Gong, Dihong and Zhou, Jingchao and Li, Zhifeng and Liu, Wei , booktitle=

work page

[36] [36]

Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos , booktitle=

work page

[37] [37]

Han and Vardan Papyan and David L

X.Y. Han and Vardan Papyan and David L. Donoho , booktitle=. Neural Collapse Under. 2022 , url=

work page 2022

[38] [38]

Proceedings of the National Academy of Sciences , volume=

Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , doi=

work page 2021

[39] [39]

arXiv preprint arXiv:2110.02796 , year=

An Unconstrained Layer-Peeled Perspective on Neural Collapse , author=. arXiv preprint arXiv:2110.02796 , year=

work page arXiv

[40] [40]

A Theoretical Framework for Preventing Class Collapse in Supervised Contrastive Learning , author=

work page

[41] [41]

Proceedings of the 39th International Conference on Machine Learning , series =

Extended Unconstrained Features Model for Exploring Deep Neural Collapse , author =. Proceedings of the 39th International Conference on Machine Learning , series =. 2022 , publisher =

work page 2022

[42] [42]

International Conference on Machine Learning , pages=

Dissecting supervised contrastive learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[43] [43]

CoRR , volume=

Leyan Pan and Xinyuan Cao , title=. CoRR , volume=. 2023 , cdate=

work page 2023

[44] [44]

arXiv preprint arXiv:2011.11619 , year =

Neural Collapse with Unconstrained Features , author =. arXiv preprint arXiv:2011.11619 , year =

work page arXiv 2011

[45] [45]

International Conference on Learning Representations , year =

An Unconstrained Layer-Peeled Perspective on Neural Collapse , author =. International Conference on Learning Representations , year =

work page

[46] [46]

International Conference on Learning Representations , year =

Long-Tail Learning via Logit Adjustment , author =. International Conference on Learning Representations , year =

work page

[47] [47]

Advances in Neural Information Processing Systems , volume =

Prototypical Networks for Few-shot Learning , author =. Advances in Neural Information Processing Systems , volume =

work page

[48] [48]

Advances in Neural Information Processing Systems , volume=

Guiding neural collapse: Optimising towards the nearest simplex equiangular tight frame , author=. Advances in Neural Information Processing Systems , volume=

work page

[49] [49]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[50] [50]

International Conference on Machine Learning , pages=

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses , author=. International Conference on Machine Learning , pages=. 2024 , organization=

work page 2024

[51] [51]

European Conference on Computer Vision , pages=

Decoupled contrastive learning , author=. European Conference on Computer Vision , pages=. 2022 , organization=

work page 2022

[52] [52]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Unsupervised feature learning by cross-level instance-group discrimination , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[53] [53]

Journal of Machine Learning Research , volume =

Neural Collapse for Unconstrained Feature Model under Class-Imbalance , author =. Journal of Machine Learning Research , volume =. 2024 , url =

work page 2024

[54] [54]

arXiv preprint arXiv:2202.08384 , year =

Limitations of Neural Collapse for Understanding Generalization in Deep Learning , author =. arXiv preprint arXiv:2202.08384 , year =

work page arXiv

[55] [55]

Proceedings of the National Academy of Sciences , volume =

Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training , author =. Proceedings of the National Academy of Sciences , volume =. 2021 , doi =

work page 2021

[56] [56]

Advances in Computational Mathematics , volume =

Finite Normalized Tight Frames , author =. Advances in Computational Mathematics , volume =. 2003 , doi =

work page 2003

[57] [57]

Experimental Mathematics , volume =

Packing Lines, Planes, etc.: Packings in Grassmannian Spaces , author =. Experimental Mathematics , volume =. 1996 , url =

work page 1996

[58] [58]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page

[59] [59]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page

[60] [60]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016

[61] [61]

Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of

Kini, Ganesh Ramachandra and Vakilian, Vala and Behnia, Tina and Gilani Tehrani-Saadi, Jaiden and Thrampoulidis, Christos , booktitle =. Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of

work page

[62] [62]

ICASSP 2024 -- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss , author =. ICASSP 2024 -- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =. 2024 , organization =

work page 2024

[63] [63]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Neural Collapse versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[64] [64]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[65] [65]

International Conference on Machine Learning (ICML) , year =

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning , author =. International Conference on Machine Learning (ICML) , year =

work page

[66] [66]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Why Do Better Loss Functions Lead to Less Transferable Features? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[67] [67]

International Conference on Machine Learning (ICML) , year =

Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning , author =. International Conference on Machine Learning (ICML) , year =

work page

[68] [68]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Associative Embedding: End-to-End Learning for Joint Detection and Grouping , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[69] [69]

Semantic Instance Segmentation with a Discriminative Loss Function

Semantic Instance Segmentation with a Discriminative Loss Function , author =. arXiv preprint arXiv:1708.02551 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[70] [70]

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , author =. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page

[71] [71]

Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent

Huang, Lei and Liu, Xianglong and Lang, Bo and Yu, Adams Wei and Wang, Yongliang and Li, Bo , booktitle =. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent

work page

[72] [72]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[73] [73]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[74] [74]

International Conference on Learning Representations (ICLR) , year =

Spectral Normalization for Generative Adversarial Networks , author =. International Conference on Learning Representations (ICLR) , year =

work page

[75] [75]

ICML Workshop on Identifying and Understanding Deep Learning Phenomena , year =

Layer Rotation: a Surprisingly Simple Indicator of Generalization in Deep Networks? , author =. ICML Workshop on Identifying and Understanding Deep Learning Phenomena , year =

work page

[76] [76]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Goyal, Priya and Doll. Accurate, Large Minibatch. arXiv preprint arXiv:1706.02677 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[77] [77]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Deep Residual Learning for Image Recognition , author =. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page

[78] [78]

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle =

work page

[79] [79]

Learning Multiple Layers of Features from Tiny Images , author =

work page

[80] [80]

Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc , booktitle =

work page