pith. sign in

arxiv: 2605.20302 · v2 · pith:WJSXJN5Dnew · submitted 2026-05-19 · 💻 cs.LG · cs.CV

Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Pith reviewed 2026-05-22 09:44 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords neural collapsecross-entropysupervised contrastive learningclass prototypeshyperspherenormalized losstransfer learningrobustness
0
0 comments X

The pith

Both cross-entropy and supervised contrastive learning are variants of prototype contrast on the unit hypersphere, and fixing each at its failure point reaches neural collapse by design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Supervised classification has a theoretical optimum called neural collapse yet neither standard cross-entropy nor supervised contrastive learning fully reaches it in practice. Cross-entropy leaves radial degrees of freedom unconstrained and converges to a degenerate geometry, while contrastive learning reaches the geometry during pretraining but discards it in the subsequent linear probing step. The paper shows that the two paradigms are different appearances of the same underlying method of contrasting class prototypes on the unit hypersphere. From the cross-entropy side it introduces normalized losses that import a large effective negative set plus decoupled alignment and uniformity terms. From the contrastive side it proves that the objective already optimizes for a classifier whose weights are exactly the class mean embeddings, making linear probing redundant and harmful. The resulting geometry improves accuracy, converges faster, and yields gains in transfer learning, class imbalance, and corruption robustness.

Core claim

Both cross-entropy and supervised contrastive learning are different appearances of the same method that contrasts prototypes on the unit hypersphere, and closing the gap requires fixing each at its point of failure. From the cross-entropy side we propose NTCE and NONL, two normalized losses that import contrastive optimization's missing ingredients: a large effective negative set and decoupled alignment and uniformity terms. From the supervised contrastive side we prove that its objective already optimizes throughout training for a principled classifier whose weights are the class mean embeddings, making linear probing both redundant and harmful.

What carries the argument

Contrasting class prototypes on the unit hypersphere, where features and prototypes are forced to unit length so that alignment to the correct prototype and uniformity across prototypes can be optimized separately.

If this is right

  • NTCE and NONL surpass cross-entropy accuracy on four benchmarks including ImageNet-1K while approximating neural collapse at 95 percent or higher.
  • They reach converged neural collapse metrics on four of five measures in under 7.5 percent of the iterations needed by cross-entropy.
  • Supervised contrastive learning with fixed prototypes matches the accuracy of linear probing without the hours-long post-training classifier phase.
  • The learned geometry produces a 5.5 percent mean relative gain in transfer learning and up to 8.7 percent under severe class imbalance.
  • Robustness to corruptions improves on ImageNet-C.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the unification is correct, the same hyperspherical prototype contrast principle could be applied directly to other supervised losses without first running contrastive pretraining.
  • The proof that class-mean embeddings are the optimal classifier weights suggests initializing or periodically resetting the final linear layer to running class means during training.
  • The approach may extend naturally to multi-label or long-tailed settings where maintaining uniform prototype spacing on the sphere is especially valuable.
  • Viewing all supervised training as prototype placement on the sphere implies that the final geometry matters more than the particular loss used to reach it.

Load-bearing premise

The main defect in cross-entropy is its unconstrained radial degrees of freedom and that importing a large effective negative set plus decoupled alignment and uniformity terms will produce neural collapse without new side effects or hidden fitting.

What would settle it

Train a standard ResNet on ImageNet-1K with the proposed NTCE loss and check whether neural collapse metrics stay below 95 percent or final accuracy fails to exceed ordinary cross-entropy; if either occurs the central claim is refuted.

Figures

Figures reproduced from arXiv: 2605.20302 by Mihalis A. Nicolaou, Panagiotis Koromilas, Theodoros Giannakopoulos, Yannis Panagakis.

Figure 1
Figure 1. Figure 1: Supervised learning as learning class prototypes on the hypersphere. (a) Cross-entropy with unconstrained features z, weights W, and biases b leaves radial degrees of freedom free, preventing convergence to NC. (b) SCL pretraining maps features onto S d−1 via a projection head, producing representations that approach within-class collapse (NC1) and maximal between-class separation (NC2). (c) Standard pract… view at source ↗
Figure 2
Figure 2. Figure 2: NC convergence on CIFAR-100. Six metrics vs. training iterations; red dashed lines mark the 95% NC threshold and circles denote each method’s convergence [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Validation Accuracy (%) Phase Diagrams. Classification accuracy on validation set. Higher values indicate better generalization performance. Each subplot shows the performance landscape across temperature and batch size hyperparameters for different loss functions: NormFace, NTCE, and NONL. Brighter regions indicate superior performance. White contour lines indicate iso-performance curves for detailed anal… view at source ↗
read the original abstract

Supervised classification has a theoretical optimum, Neural Collapse (NC), yet neither of its two dominant paradigms reaches it in practice. Cross entropy (CE) leaves radial degrees of freedom unconstrained and converges to a degenerate geometry, while supervised contrastive learning (SCL) drives features toward NC during pretraining but discards this structure in a post hoc linear probing phase. We show that both paradigms are different appearances of the same method that contrasts prototypes on the unit hypersphere, and that closing the gap requires fixing each at its point of failure. From the CE side, we propose NTCE and NONL, two normalized losses that import contrastive optimization's missing ingredients into classifier learning: a large effective negative set and decoupled alignment and uniformity terms. From the SCL side, we prove that SCL's objective already optimizes throughout training for a principled classifier whose weights are the class mean embeddings, making linear probing both redundant and harmful. Empirically, on four benchmarks including ImageNet-1K, NTCE and NONL surpass CE accuracy, closely approximate NC ($\geq 95\%$), and match CE's converged NC on 4/5 metrics in under $7.5\%$ of its iterations, while SCL with fixed prototypes matches linear probing without the hours-long classifier training phase. The learned geometry yields $+5.5\%$ mean relative improvement in transfer learning, up to $+8.7\%$ under severe class imbalance, and improved robustness to corruptions on ImageNet-C. Our work recasts supervised learning as prototype learning on the hypersphere, with NC reached by design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that neural collapse (NC) is the theoretical optimum for supervised classification but is not reached by standard cross-entropy (CE) or supervised contrastive learning (SCL). It argues both paradigms are equivalent appearances of prototype contrasting on the unit hypersphere. From the CE side it introduces NTCE and NONL, normalized losses that add a large effective negative set plus decoupled alignment/uniformity terms. From the SCL side it proves that the SCL objective already optimizes for a classifier whose weights are the class-mean embeddings, rendering linear probing redundant. Experiments on ImageNet-1K and three other benchmarks report accuracy gains, NC metrics ≥95%, faster convergence, +5.5% mean relative transfer improvement, and better robustness under imbalance and corruptions.

Significance. If the unification and the SCL proof hold, the work would recast supervised learning as hypersphere prototype learning and supply a design principle for reaching NC without post-hoc stages. The explicit proof that SCL optimizes class-mean embeddings is a clear strength, as is the empirical demonstration of high NC metrics and transfer gains on ImageNet-1K. The practical speed-up (NC reached in <7.5% of CE iterations) and robustness improvements would be noteworthy if the theoretical equivalence is made rigorous.

major comments (2)
  1. [Introduction / unification section] Introduction and the unification section: the central claim that CE and SCL are 'different appearances of the same method' that contrasts prototypes on the unit hypersphere requires an explicit gradient-level equivalence between NTCE's class-prototype negatives and SCL's instance-level batch negatives. The abstract states that NTCE imports 'a large effective negative set,' yet without showing that the resulting gradient w.r.t. normalized features exactly parallels the SCL gradient (including temperature scaling and negative weighting), the 'same method' assertion remains approximate and the transfer of the NC-by-design guarantee is not guaranteed.
  2. [SCL proof section] SCL proof (the section containing the claim that SCL optimizes for class-mean embeddings): the proof must specify whether the class means are treated as fixed or dynamically updated during training, and how the resulting classifier is shown to be 'principled' throughout optimization rather than only at convergence. If the derivation assumes fixed prototypes, it does not directly establish that linear probing is redundant or harmful at every stage.
minor comments (2)
  1. [Abstract / Experiments] The abstract reports NC metrics '≥95%' and 'match CE's converged NC on 4/5 metrics' but does not state the precise NC metrics used or whether they are averaged over multiple seeds; this should be clarified in the experimental section.
  2. [Method section] Notation for the new losses (NTCE, NONL) should be introduced with explicit equations showing the alignment and uniformity terms before the empirical comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments help clarify the scope of our unification claim and the SCL proof. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Introduction / unification section] Introduction and the unification section: the central claim that CE and SCL are 'different appearances of the same method' that contrasts prototypes on the unit hypersphere requires an explicit gradient-level equivalence between NTCE's class-prototype negatives and SCL's instance-level batch negatives. The abstract states that NTCE imports 'a large effective negative set,' yet without showing that the resulting gradient w.r.t. normalized features exactly parallels the SCL gradient (including temperature scaling and negative weighting), the 'same method' assertion remains approximate and the transfer of the NC-by-design guarantee is not guaranteed.

    Authors: We agree that the unification would benefit from greater precision on the gradient relationship. The manuscript presents the equivalence at the level of the optimized geometry and the shared objective of prototype alignment plus uniformity on the hypersphere, rather than claiming identical per-step gradients. NTCE replaces instance negatives with class prototypes while preserving the normalized feature space and the decoupled alignment/uniformity structure; this yields the same NC fixed point but with different negative sampling. In the revision we will add a short appendix that derives the NTCE gradient for normalized features and explicitly compares its alignment term and effective negative weighting to the SCL gradient (including temperature). We will also qualify the abstract and introduction to state that the methods are equivalent in their design principle and NC optimum, while noting that the negative sets differ in construction. revision: partial

  2. Referee: [SCL proof section] SCL proof (the section containing the claim that SCL optimizes for class-mean embeddings): the proof must specify whether the class means are treated as fixed or dynamically updated during training, and how the resulting classifier is shown to be 'principled' throughout optimization rather than only at convergence. If the derivation assumes fixed prototypes, it does not directly establish that linear probing is redundant or harmful at every stage.

    Authors: We thank the referee for this clarification request. The proof treats the class means as the empirical means computed from the current embeddings at each training step; these means are therefore updated dynamically as the feature extractor evolves. The derivation shows that, for any fixed feature distribution at a given iteration, the SCL objective is minimized with respect to the linear classifier precisely when the classifier weights equal those instantaneous class means. Because this optimality condition holds with respect to the features present at every step, the resulting classifier remains principled throughout training rather than only at convergence. In the revision we will rewrite the relevant section to state explicitly that the means are recomputed from the current batch statistics at each iteration and that the optimality argument applies to the instantaneous feature distribution, thereby establishing redundancy of linear probing at all stages. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are independent reformulations with external benchmarks.

full rationale

The paper's core claims rest on explicit reformulations of CE into normalized variants (NTCE/NONL) that add large negative sets and decoupled alignment/uniformity terms, plus a proof that SCL's loss already targets class-mean embeddings on the hypersphere. These steps are presented as direct mathematical mappings and empirical checks on ImageNet-1K and other datasets rather than reductions to fitted parameters or self-citation chains. No load-bearing premise collapses to a prior result by the same authors or to an ansatz smuggled via citation; the unification is derived from the loss gradients and geometry, not assumed by construction. The work remains self-contained against external validation metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated beyond standard deep-learning assumptions such as the existence of a unit hypersphere geometry.

pith-pipeline@v0.9.0 · 5836 in / 1147 out tokens · 43150 ms · 2026-05-22T09:44:43.043096+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages · 3 internal anchors

  1. [1]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Are All Losses Created Equal: A Neural Collapse Perspective , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  2. [2]

    International Conference on Learning Representations , year=

    On the Role of Neural Collapse in Transfer Learning , author=. International Conference on Learning Representations , year=

  3. [3]

    Advances in Neural Information Processing Systems , volume =

    Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? , author =. Advances in Neural Information Processing Systems , volume =

  4. [4]

    Advances in Neural Information Processing Systems , volume =

    Hyperspherical Prototype Networks , author =. Advances in Neural Information Processing Systems , volume =

  5. [5]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

    Equiangular Basis Vectors , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

  6. [6]

    International Conference on Learning Representations (ICLR) , year =

    Visual Recognition with Deep Nearest Centroids , author =. International Conference on Learning Representations (ICLR) , year =

  7. [7]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Neural Collapse Inspired Knowledge Distillation , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

  8. [8]

    Proceedings of the International Conference on Learning Representations , year=

    Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author=. Proceedings of the International Conference on Learning Representations , year=

  9. [9]

    International conference on machine learning , pages=

    Understanding contrastive representation learning through alignment and uniformity on the hypersphere , author=. International conference on machine learning , pages=. 2020 , organization=

  10. [10]

    Advances in neural information processing systems , volume=

    Formal guarantees on the robustness of a classifier against adversarial manipulation , author=. Advances in neural information processing systems , volume=

  11. [11]

    International conference on machine learning , pages=

    A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=. 2020 , organization=

  12. [12]

    L2-constrained Softmax Loss for Discriminative Face Verification

    L2-Constrained Softmax Loss for Discriminative Face Verification , author =. arXiv preprint arXiv:1703.09507 , year =

  13. [13]

    Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =

    No Fuss Distance Metric Learning Using Proxies , author =. Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =. 2017 , doi =

  14. [14]

    International Conference on Learning Representations (ICLR) , year =

    Fixed Non-Negative Orthogonal Classifier: Inducing Zero-Mean Neural Collapse with Feature Dimension Separation , author =. International Conference on Learning Representations (ICLR) , year =

  15. [15]

    Proceedings of the National Academy of Sciences , volume=

    Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , doi=

  16. [16]

    Journal of Machine Learning Research , volume=

    The Implicit Bias of Gradient Descent on Separable Data , author=. Journal of Machine Learning Research , volume=. 2018 , url=

  17. [17]

    Advances in Neural Information Processing Systems (NeurIPS) , pages=

    Spectrally-normalized margin bounds for neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

  18. [18]

    Neyshabur, Behnam and Bhojanapalli, Srinadh and Srebro, Nathan , booktitle=. A. 2018 , url=

  19. [19]

    , booktitle=

    Kornblith, Simon and Shlens, Jonathon and Le, Quoc V. , booktitle=. Do Better. 2019 , url=

  20. [20]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    Supervised Contrastive Learning , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  21. [21]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    Robustness of classifiers: from adversarial to random noise , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  22. [22]

    2020 , url=

    Gavin Weiguang Ding and Yash Sharma and Kry Yik Chau Lui and Ruitong Huang , booktitle=. 2020 , url=

  23. [23]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  24. [24]

    Applied and Computational Harmonic Analysis , volume=

    Neural Collapse under Cross-Entropy Loss , author=. Applied and Computational Harmonic Analysis , volume=. 2022 , doi=

  25. [25]

    International Conference on Machine Learning , pages=

    Unveiling the Dynamics of Information Interplay in Supervised Learning , author=. International Conference on Machine Learning , pages=. 2024 , organization=

  26. [26]

    2007 15th European signal processing conference , pages=

    The effective rank: A measure of effective dimensionality , author=. 2007 15th European signal processing conference , pages=. 2007 , organization=

  27. [27]

    Proceedings of the 41st International Conference on Machine Learning , pages=

    Matrix information theory for self-supervised learning , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

  28. [28]

    Advances in Neural Information Processing Systems , volume=

    Imbalance trouble: Revisiting neural-collapse geometry , author=. Advances in Neural Information Processing Systems , volume=

  29. [29]

    International Conference on Machine Learning , pages=

    On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  30. [30]

    Advances in Neural Information Processing Systems , volume=

    A geometric analysis of neural collapse with unconstrained features , author=. Advances in Neural Information Processing Systems , volume=

  31. [31]

    Biometrika , volume=

    On the Existence of Maximum Likelihood Estimates in Logistic Regression Models , author=. Biometrika , volume=. 1984 , publisher=

  32. [32]

    Journal of Machine Learning Research , volume=

    Neural Collapse for Unconstrained Feature Model under Cross-Entropy Loss with Imbalanced Data , author=. Journal of Machine Learning Research , volume=

  33. [33]

    , booktitle=

    Wang, Feng and Xiang, Xiang and Cheng, Jian and Yuille, Alan L. , booktitle=. 2017 , doi=

  34. [34]

    Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Li, Ming and Raj, Bhiksha and Song, Le , booktitle=

  35. [35]

    Wang, Hao and Wang, Yitong and Zhou, Zheng and Ji, Xing and Gong, Dihong and Zhou, Jingchao and Li, Zhifeng and Liu, Wei , booktitle=

  36. [36]

    Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos , booktitle=

  37. [37]

    Han and Vardan Papyan and David L

    X.Y. Han and Vardan Papyan and David L. Donoho , booktitle=. Neural Collapse Under. 2022 , url=

  38. [38]

    Proceedings of the National Academy of Sciences , volume=

    Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , doi=

  39. [39]

    arXiv preprint arXiv:2110.02796 , year=

    An Unconstrained Layer-Peeled Perspective on Neural Collapse , author=. arXiv preprint arXiv:2110.02796 , year=

  40. [40]

    A Theoretical Framework for Preventing Class Collapse in Supervised Contrastive Learning , author=

  41. [41]

    Proceedings of the 39th International Conference on Machine Learning , series =

    Extended Unconstrained Features Model for Exploring Deep Neural Collapse , author =. Proceedings of the 39th International Conference on Machine Learning , series =. 2022 , publisher =

  42. [42]

    International Conference on Machine Learning , pages=

    Dissecting supervised contrastive learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  43. [43]

    CoRR , volume=

    Leyan Pan and Xinyuan Cao , title=. CoRR , volume=. 2023 , cdate=

  44. [44]

    arXiv preprint arXiv:2011.11619 , year =

    Neural Collapse with Unconstrained Features , author =. arXiv preprint arXiv:2011.11619 , year =

  45. [45]

    International Conference on Learning Representations , year =

    An Unconstrained Layer-Peeled Perspective on Neural Collapse , author =. International Conference on Learning Representations , year =

  46. [46]

    International Conference on Learning Representations , year =

    Long-Tail Learning via Logit Adjustment , author =. International Conference on Learning Representations , year =

  47. [47]

    Advances in Neural Information Processing Systems , volume =

    Prototypical Networks for Few-shot Learning , author =. Advances in Neural Information Processing Systems , volume =

  48. [48]

    Advances in Neural Information Processing Systems , volume=

    Guiding neural collapse: Optimising towards the nearest simplex equiangular tight frame , author=. Advances in Neural Information Processing Systems , volume=

  49. [49]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  50. [50]

    International Conference on Machine Learning , pages=

    Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses , author=. International Conference on Machine Learning , pages=. 2024 , organization=

  51. [51]

    European Conference on Computer Vision , pages=

    Decoupled contrastive learning , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  52. [52]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Unsupervised feature learning by cross-level instance-group discrimination , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  53. [53]

    Journal of Machine Learning Research , volume =

    Neural Collapse for Unconstrained Feature Model under Class-Imbalance , author =. Journal of Machine Learning Research , volume =. 2024 , url =

  54. [54]

    arXiv preprint arXiv:2202.08384 , year =

    Limitations of Neural Collapse for Understanding Generalization in Deep Learning , author =. arXiv preprint arXiv:2202.08384 , year =

  55. [55]

    Proceedings of the National Academy of Sciences , volume =

    Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training , author =. Proceedings of the National Academy of Sciences , volume =. 2021 , doi =

  56. [56]

    Advances in Computational Mathematics , volume =

    Finite Normalized Tight Frames , author =. Advances in Computational Mathematics , volume =. 2003 , doi =

  57. [57]

    Experimental Mathematics , volume =

    Packing Lines, Planes, etc.: Packings in Grassmannian Spaces , author =. Experimental Mathematics , volume =. 1996 , url =

  58. [58]

    Scaling Learning Algorithms Towards

    Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

  59. [59]

    and Osindero, Simon and Teh, Yee Whye , journal =

    Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

  60. [60]

    2016 , publisher=

    Deep learning , author=. 2016 , publisher=

  61. [61]

    Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of

    Kini, Ganesh Ramachandra and Vakilian, Vala and Behnia, Tina and Gilani Tehrani-Saadi, Jaiden and Thrampoulidis, Christos , booktitle =. Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of

  62. [62]

    ICASSP 2024 -- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

    Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss , author =. ICASSP 2024 -- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =. 2024 , organization =

  63. [63]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Neural Collapse versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  64. [64]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  65. [65]

    International Conference on Machine Learning (ICML) , year =

    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning , author =. International Conference on Machine Learning (ICML) , year =

  66. [66]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Why Do Better Loss Functions Lead to Less Transferable Features? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  67. [67]

    International Conference on Machine Learning (ICML) , year =

    Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning , author =. International Conference on Machine Learning (ICML) , year =

  68. [68]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Associative Embedding: End-to-End Learning for Joint Detection and Grouping , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  69. [69]

    Semantic Instance Segmentation with a Discriminative Loss Function

    Semantic Instance Segmentation with a Discriminative Loss Function , author =. arXiv preprint arXiv:1708.02551 , year =

  70. [70]

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , author =. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  71. [71]

    Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent

    Huang, Lei and Liu, Xianglong and Lang, Bo and Yu, Adams Wei and Wang, Yongliang and Li, Bo , booktitle =. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent

  72. [72]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  73. [73]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  74. [74]

    International Conference on Learning Representations (ICLR) , year =

    Spectral Normalization for Generative Adversarial Networks , author =. International Conference on Learning Representations (ICLR) , year =

  75. [75]

    ICML Workshop on Identifying and Understanding Deep Learning Phenomena , year =

    Layer Rotation: a Surprisingly Simple Indicator of Generalization in Deep Networks? , author =. ICML Workshop on Identifying and Understanding Deep Learning Phenomena , year =

  76. [76]

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

    Goyal, Priya and Doll. Accurate, Large Minibatch. arXiv preprint arXiv:1706.02677 , year =

  77. [77]

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Deep Residual Learning for Image Recognition , author =. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  78. [78]

    Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle =

  79. [79]

    Learning Multiple Layers of Features from Tiny Images , author =

  80. [80]

    Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc , booktitle =

Showing first 80 references.