Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels
Pith reviewed 2026-05-21 08:14 UTC · model grok-4.3
The pith
Symmetrizing cross-entropy yields a unique convex multi-class unhinged loss that approximates other symmetric losses near equal scores.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The multi-class unhinged loss obtained by symmetrizing cross-entropy is the unique convex multi-class symmetric loss under suitable assumptions. It further serves as the linear approximation of any symmetric loss around score vectors with equal components.
What carries the argument
The unique decomposition of any multi-class loss function into a symmetric component plus a class-insensitive term, which converts an arbitrary loss into one that meets the symmetry condition required for label-noise robustness.
If this is right
- Training proceeds robustly without explicit estimation of noise transition matrices when symmetric losses are used.
- The multi-class unhinged loss supplies a canonical convex baseline against which other symmetric losses can be compared.
- SGCE and alpha-MAE let practitioners trade off beta-smoothness for empirical performance while keeping the symmetry guarantee.
- The approximation result implies that all symmetric losses share the same local behavior near uniform score vectors.
Where Pith is reading between the lines
- The decomposition technique could be applied to other base losses such as focal loss to generate additional robust variants.
- The local approximation property suggests that optimization dynamics of different symmetric losses coincide to first order near uniform predictions.
- Empirical verification on datasets with structured rather than uniform noise could test how far the uniqueness and robustness extend.
Load-bearing premise
The symmetry condition on a loss function is assumed to deliver theoretical robustness guarantees against label noise.
What would settle it
An experiment that measures test accuracy on controlled synthetic label noise and finds that the symmetrized multi-class unhinged loss performs no better than its non-symmetric counterpart would falsify the robustness claim.
Figures
read the original abstract
Labeling a training set is often expensive and susceptible to errors, making the design of robust loss functions for label noise an important problem. The symmetry condition provides theoretical guarantees for robustness to such noise. In this work, we study a symmetrization method arising from the unique decomposition of any multi-class loss function into a symmetric component and a class-insensitive term. In particular, symmetrizing the cross-entropy loss leads to a linear multi-class extension of the unhinged loss. Unlike in the binary case, the multi-class version must have specific coefficients in order to satisfy the symmetry condition. Under suitable assumptions, we show that this multi-class unhinged loss is the unique convex multi-class symmetric loss. We also show that it has a fundamental local role: the linear approximation of any symmetric loss around score vectors with equal components is equivalent to the multi-class unhinged loss. We then introduce SGCE and alpha-MAE, two loss functions that interpolate between the multi-class unhinged loss and the Mean Absolute Error while allowing control of the beta-smoothness of the loss. Experiments on standard noisy-label benchmarks show competitive performance compared with existing robust loss functions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that any multi-class loss admits a unique decomposition into a symmetric component plus a class-insensitive term. Symmetrizing cross-entropy produces a linear multi-class unhinged loss whose coefficients are fixed by the symmetry condition. Under suitable assumptions this loss is the unique convex multi-class symmetric loss and equals the first-order approximation to any symmetric loss at score vectors with equal components. Two interpolating families (SGCE and alpha-MAE) are introduced that trade off between the unhinged loss and MAE while controlling beta-smoothness; experiments report competitive performance on standard noisy-label benchmarks.
Significance. If the uniqueness and approximation results hold under clearly stated conditions, the work supplies a principled foundation for symmetric robust losses and identifies a canonical convex member of the class. The interpolating losses add practical utility by letting practitioners tune smoothness. The contribution would be strengthened by explicit assumptions and more detailed empirical reporting.
major comments (3)
- [Abstract / uniqueness theorem] Abstract and theoretical section on uniqueness: the claim that the multi-class unhinged loss is the unique convex multi-class symmetric loss rests on unspecified 'suitable assumptions.' Because this statement is load-bearing for both the theoretical positioning and the motivation for SGCE/alpha-MAE, the assumptions must be enumerated explicitly (e.g., convexity class, gradient bounds, or domain restrictions) and the proof must show they are necessary rather than implicit in the construction.
- [Decomposition theorem] Decomposition section: the uniqueness of the decomposition of an arbitrary multi-class loss into symmetric part plus class-insensitive term is asserted but the derivation is not supplied in the abstract and must be given in full, including verification that the decomposition is independent of any further restrictions on the loss.
- [Experimental results] Experiments section: performance is summarized only as 'competitive' with no error bars, no tabulated numerical comparisons to baselines, and no dataset or hyper-parameter details visible. This weakens support for the practical utility of the interpolating losses.
minor comments (2)
- [Abstract] Abstract: a one-sentence indication of the scope of the 'suitable assumptions' would help readers assess the result without reading the full proof.
- [Notation / local approximation] Notation: ensure the score vector notation (equal-component vectors) is defined once and used consistently when stating the local approximation property.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and have prepared revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / uniqueness theorem] Abstract and theoretical section on uniqueness: the claim that the multi-class unhinged loss is the unique convex multi-class symmetric loss rests on unspecified 'suitable assumptions.' Because this statement is load-bearing for both the theoretical positioning and the motivation for SGCE/alpha-MAE, the assumptions must be enumerated explicitly (e.g., convexity class, gradient bounds, or domain restrictions) and the proof must show they are necessary rather than implicit in the construction.
Authors: We agree that the assumptions should be stated explicitly. In the revised manuscript we will enumerate them at the theorem statement (convexity, continuous differentiability, and domain restricted to the interior of the probability simplex) and revise the proof to indicate precisely where each assumption is invoked. We will also add a short discussion with counter-examples showing that uniqueness can fail when convexity is dropped, thereby clarifying necessity within the stated class. revision: yes
-
Referee: [Decomposition theorem] Decomposition section: the uniqueness of the decomposition of an arbitrary multi-class loss into symmetric part plus class-insensitive term is asserted but the derivation is not supplied in the abstract and must be given in full, including verification that the decomposition is independent of any further restrictions on the loss.
Authors: The derivation appears in the theoretical section but is indeed concise. We will expand it in the revision to a self-contained step-by-step argument: define the symmetric component via averaging the loss over label permutations that preserve the score vector symmetry, subtract to obtain the class-insensitive remainder, and prove uniqueness by showing that any other split would violate either symmetry or class-insensitivity. Independence from additional restrictions follows directly from the construction, which uses only the symmetry axiom. revision: yes
-
Referee: [Experimental results] Experiments section: performance is summarized only as 'competitive' with no error bars, no tabulated numerical comparisons to baselines, and no dataset or hyper-parameter details visible. This weakens support for the practical utility of the interpolating losses.
Authors: We acknowledge the need for more detailed reporting. The revised version will replace the summary statement with tables containing mean accuracy and standard deviation over five independent runs, explicit numerical comparisons against all baselines, and a new subsection (plus appendix) listing datasets, noise models and rates, optimizer settings, learning-rate schedules, and all hyper-parameters used for SGCE and alpha-MAE. revision: yes
Circularity Check
Uniqueness of multi-class unhinged loss under suitable assumptions follows from symmetry decomposition without reducing to self-definition or fitted inputs
full rationale
The paper starts from the symmetry condition on losses, which is an external robustness property, and uses the asserted unique decomposition of any multi-class loss into symmetric component plus class-insensitive term to construct the symmetrized cross-entropy. This yields the multi-class unhinged loss with specific coefficients. The claim that it is the unique convex multi-class symmetric loss is explicitly conditioned on suitable assumptions, and the local linear approximation property is shown by direct expansion around equal-component score vectors. No derivation step renames a fitted parameter as a prediction, imports uniqueness via self-citation, or defines the target result into the inputs by construction. The central theoretical positioning therefore remains independent of the paper's own fitted values or prior author results.
Axiom & Free-Parameter Ledger
free parameters (2)
- alpha
- beta
axioms (2)
- domain assumption Any multi-class loss admits a unique decomposition into a symmetric component and a class-insensitive term.
- domain assumption The symmetry condition on the loss guarantees robustness to label noise under suitable assumptions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
There is a unique (up to constants) decomposition of a loss function into a sum of a symmetric loss function and a class-insensitive term. The symmetric component is given by Lsym(z,y) := L(z,y) − (1/C) Σk L(z,k).
-
IndisputableMonolith/Foundation/LogicAsFunctionalEquation.leanSatisfiesLawsOfLogic + derivedCost uniqueness echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
The multi-class unhinged loss is the unique convex, non-trivial, non-increasing, multi-class symmetric loss function satisfying the property of invariance to permutations (up to an additive and a multiplicative constant).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adaptive Supervision Online Learning for Vision Based Autonomous Systems , author=
-
[2]
Concentration Inequalities: A Nonasymptotic Theory of Independence , author =. 2013 , publisher =
work page 2013
-
[3]
Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers , author =. Neural Networks , year =
-
[4]
International Conference on Machine Learning (ICML) , pages=
Train faster, generalize better: Stability of stochastic gradient descent , author=. International Conference on Machine Learning (ICML) , pages=. 2016 , publisher=
work page 2016
-
[5]
The Generic Chaining: Upper and Lower Bounds of Stochastic Processes , author=. 2005 , publisher=
work page 2005
-
[6]
Arindam Banerjee and Tiancong Chen and Yingxue Zhou , title =. CoRR , volume =. 2020 , url =. 2002.09956 , archivePrefix =
-
[7]
International Conference on Learning Representations (ICLR) , year =
Behnam Neyshabur and Srinadh Bhojanapalli and David McAllester and Nathan Srebro , title =. International Conference on Learning Representations (ICLR) , year =
-
[8]
Fergus Immanuel Biggs and Benjamin Guedj , title =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS) , year =
-
[9]
Paul Viallard and Pascal Germain and Amaury Habrard and Emilie Morvant , title =. CoRR , volume =. 2021 , url =. 2102.08649 , archivePrefix =
-
[10]
David A. McAllester , title =. Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT) , year =
-
[11]
Advances in Neural Information Processing Systems (NeurIPS) , year =
John Langford and John Shawe-Taylor , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[12]
Alexandre Lacasse and Fran. PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , booktitle =. 2007 , volume =
work page 2007
-
[13]
Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm , journal =
Pascal Germain and Alexandre Lacasse and Fran. Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm , journal =. 2015 , volume =
work page 2015
-
[14]
Second-Order PAC-Bayesian Bounds for Weighted Majority Votes , booktitle =
Andr. Second-Order PAC-Bayesian Bounds for Weighted Majority Votes , booktitle =. 2020 , volume =
work page 2020
- [15]
-
[16]
A vector-contraction inequality for Rademacher complexities
Andreas Maurer , title =. CoRR , volume =. 2016 , url =. 1605.00251 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
High-Dimensional Probability: An Introduction with Applications in Data Science , author =. 2018 , address =
work page 2018
- [18]
-
[19]
arXiv preprint arXiv:2006.07279 , year =
PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses , author =. arXiv preprint arXiv:2006.07279 , year =
-
[20]
Advances in Neural Information Processing Systems 29 (NeurIPS 2016) , pages =
PAC-Bayesian Theory Meets Bayesian Inference , author =. Advances in Neural Information Processing Systems 29 (NeurIPS 2016) , pages =. 2016 , url =
work page 2016
-
[21]
Journal of Machine Learning Research , volume =
Pierre Alquier and James Ridgway and Nicolas Chopin , title =. Journal of Machine Learning Research , volume =. 2016 , publisher =
work page 2016
-
[22]
Active Negative Loss Functions for Learning with Noisy Labels , url =
Ye, Xichen and Li, Xiaoqiang and dai, songmin and Liu, Tong and Sun, Yan and Tong, Weiqin , booktitle =. Active Negative Loss Functions for Learning with Noisy Labels , url =
-
[23]
Journal of Machine Learning Research , volume=
On the Dynamics Under the Unhinged Loss and Beyond , author=. Journal of Machine Learning Research , volume=. 2023 , url=
work page 2023
-
[24]
International Conference on Learning Representations , year=
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations , author=. International Conference on Learning Representations , year=
-
[25]
2019 IEEE/CVF International Conference on Computer Vision (ICCV) , year=
Symmetric Cross Entropy for Robust Learning With Noisy Labels , author=. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) , year=
work page 2019
- [26]
-
[27]
Proceedings of the 40th International Conference on Machine Learning , year =
Dixian Zhu and Yiming Ying and Tianbao Yang , title =. Proceedings of the 40th International Conference on Machine Learning , year =
-
[28]
Brendan van Rooyen and Aditya Krishna Menon and Robert C. Williamson , editor =. Learning with Symmetric Label Noise: The Importance of Being Unhinged , booktitle =. 2015 , url =
work page 2015
-
[29]
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =
Kim, Youngdong and Yim, Junho and Yun, Juseung and Kim, Junmo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =
-
[30]
WebVision Database: Visual Learning and Understanding from Web Data , author=. 2017 , eprint=
work page 2017
-
[31]
Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei , Title =. 2015 , journal =. doi:10.1007/s11263-015-0816-y , volume=
-
[32]
The Twelfth International Conference on Learning Representations , year=
Robust Classification via Regression for Learning with Noisy Labels , author=. The Twelfth International Conference on Learning Representations , year=
-
[33]
S. Yang and S. Wu and E. Yang and B. Han and Y. Liu and M. Xu and G. Niu and T. Liu , journal =. A Parametrical Model for Instance-Dependent Label Noise , year =. doi:10.1109/TPAMI.2023.3301876 , publisher =
-
[34]
ImageNet: A large-scale hierarchical image database , year=
Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei , booktitle=. ImageNet: A large-scale hierarchical image database , year=
-
[35]
International Conference on Machine Learning , year=
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , author=. International Conference on Machine Learning , year=
-
[36]
Proceedings of the 38th International Conference on Machine Learning , pages =
Asymmetric Loss Functions for Learning with Noisy Labels , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =
work page 2021
-
[37]
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , author=. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=
work page 2017
-
[38]
Ghosh, Aritra and Manwani, Naresh and Sastry, P.S. , title =. Neurocomput. , month =. 2015 , issue_date =. doi:10.1016/j.neucom.2014.09.081 , abstract =
-
[39]
Proceedings of the AAAI Conference on Artificial Intelligence , author=
Risk Minimization in the Presence of Label Noise , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2016 , month=. doi:10.1609/aaai.v30i1.10293 , abstractNote=
-
[40]
Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning , year=
Gong, Chen and Shi, Hong and Liu, Tongliang and Zhang, Chuang and Yang, Jian and Tao, Dacheng , journal=. Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning , year=
-
[41]
Gong, Chen and Yang, Jian and You, Jane and Sugiyama, Masashi , journal=. Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning , year=
-
[42]
Multi-class Label Noise Learning via Loss Decomposition and Centroid Estimation , booktitle =
Yongliang Ding and Tao Zhou and Chuang Zhang and Yijing Luo and Juan Tang and Chen Gong , editor =. Multi-class Label Noise Learning via Loss Decomposition and Centroid Estimation , booktitle =. 2022 , url =. doi:10.1137/1.9781611977172.29 , timestamp =
-
[43]
Proceedings of The 33rd International Conference on Machine Learning , pages =
Loss factorization, weakly supervised learning and label noise robustness , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , editor =
work page 2016
-
[44]
Proceedings of the 37th International Conference on Machine Learning , pages =
Normalized Loss Functions for Deep Learning with Noisy Labels , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =
work page 2020
-
[45]
Image classification with deep learning in the presence of noisy labels:
G. Image classification with deep learning in the presence of noisy labels:. Knowl. Based Syst. , volume =. 2021 , url =. doi:10.1016/j.knosys.2021.106771 , timestamp =
-
[46]
Yao, Yu and Liu, Tongliang and Han, Bo and Gong, Mingming and Deng, Jiankang and Niu, Gang and Sugiyama, Masashi , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =
work page 2020
-
[47]
Proceedings of the 38th International Conference on Machine Learning , pages =
Provably End-to-end Label-noise Learning without Anchor Points , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =
work page 2021
-
[48]
Proceedings of the 40th International Conference on Machine Learning , pages =
Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =
work page 2023
- [49]
-
[50]
Feng, Lei and Shu, Senlin and Lin, Zhuoyi and Lv, Fengmao and Li, Li and An, Bo , title =. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence , articleno =. 2021 , isbn =
work page 2021
-
[51]
How does Early Stopping Help Generalization against Label Noise? , author=. 2020 , eprint=
work page 2020
-
[52]
A Closer Look at Memorization in Deep Networks , year =
Arpit, Devansh and Jastrzundefinedbski, Stanis. A Closer Look at Memorization in Deep Networks , year =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =
-
[53]
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=
Contrastive Learning Improves Model Robustness Under Label Noise , author=. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=
work page 2021
-
[54]
Proceedings of the 39th International Conference on Machine Learning , pages =
Investigating Why Contrastive Learning Benefits Robustness against Label Noise , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , editor =
work page 2022
-
[55]
Chen, Yipeng and Ban, Xiaojuan and Xu, Ke , title =. Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November 4–7, 2022, Proceedings, Part II , pages =. 2022 , isbn =. doi:10.1007/978-3-031-18910-4_49 , abstract =
-
[56]
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =
work page 2020
-
[57]
Liu, Sheng and Niles-Weed, Jonathan and Razavian, Narges and Fernandez-Granda, Carlos , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =
work page 2020
-
[58]
Ghosh, Aritra and Kumar, Himanshu and Sastry, P. S. , title =. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence , pages =. 2017 , publisher =
work page 2017
-
[59]
Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)
-
[60]
Classification Problem Solving
Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence
- [61]
-
[62]
New Ways to Make Microcircuits Smaller---Duplicate Entry
Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science
-
[63]
Clancey and Glenn Rennels , abstract =
Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =
-
[64]
Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies
-
[65]
Poligon: A System for Parallel Problem Solving
Rice, James. Poligon: A System for Parallel Problem Solving
-
[66]
Transfer of Rule-Based Expertise through a Tutorial Dialogue
Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue
-
[67]
The Engineering of Qualitative Models
Clancey, William J. The Engineering of Qualitative Models
- [68]
- [69]
-
[70]
Self-Distillation: Towards Efficient and Compact Neural Networks , year=
Zhang, Linfeng and Bao, Chenglong and Ma, Kaisheng , journal=. Self-Distillation: Towards Efficient and Compact Neural Networks , year=
-
[71]
Proceedings of Thirty Third Conference on Learning Theory , pages =
Sharper Bounds for Uniformly Stable Algorithms , author =. Proceedings of Thirty Third Conference on Learning Theory , pages =. 2020 , editor =
work page 2020
-
[72]
Yi Zhou and Yingbin Liang and Huishuai Zhang , title =. Mach. Learn. , volume =. 2022 , url =. doi:10.1007/s10994-021-06056-w , timestamp =
-
[73]
Wenlong Mou and Liwei Wang and Xiyu Zhai and Kai Zheng , editor =. Generalization Bounds of. Conference On Learning Theory,. 2018 , url =
work page 2018
-
[74]
Yunwen Lei and Yiming Ying , title =. CoRR , volume =. 2020 , url =. 2006.08157 , timestamp =
-
[75]
Shalev-Shwartz, Shai and Ben-David, Shai , pages =
-
[76]
Learning with Gradient Descent and Weakly Convex Losses , publisher =
Richards, Dominic and Rabbat, Mike , keywords =. Learning with Gradient Descent and Weakly Convex Losses , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2101.04968 , url =
-
[77]
Alex Krizhevsky , title =
-
[78]
LeCun, Yann and Cortes, Corinna , biburl =
-
[79]
International Conference on Learning Representations , year=
The Break-Even Point on Optimization Trajectories of Deep Neural Networks , author=. International Conference on Learning Representations , year=
-
[80]
Advances in Neural Information Processing Systems 32 , pages =
Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence , author =. Advances in Neural Information Processing Systems 32 , pages =. 2019 , publisher =
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.