Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels

Alexandre Lemire Paquin; Brahim Chaib-draa; Philippe Gigu\`ere

arxiv: 2605.20347 · v1 · pith:YRTVWUDNnew · submitted 2026-05-19 · 💻 cs.LG · stat.ML

Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels

Alexandre Lemire Paquin , Brahim Chaib-Draa , Philippe Gigu\`ere This is my paper

Pith reviewed 2026-05-21 08:14 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords noisy labelsrobust loss functionssymmetric lossesmulti-class unhinged losslabel noise robustnessneural network training

0 comments

The pith

Symmetrizing cross-entropy yields a unique convex multi-class unhinged loss that approximates other symmetric losses near equal scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Any multi-class loss decomposes uniquely into a symmetric component and a class-insensitive term. Applying this decomposition to cross-entropy produces a linear multi-class unhinged loss with specific coefficients that satisfy symmetry. Under suitable assumptions this loss is the only convex multi-class symmetric loss and equals the first-order Taylor approximation of any symmetric loss around score vectors whose components are all equal. The authors also define interpolated losses SGCE and alpha-MAE that vary the smoothness parameter between the unhinged form and mean absolute error while preserving the symmetry property.

Core claim

The multi-class unhinged loss obtained by symmetrizing cross-entropy is the unique convex multi-class symmetric loss under suitable assumptions. It further serves as the linear approximation of any symmetric loss around score vectors with equal components.

What carries the argument

The unique decomposition of any multi-class loss function into a symmetric component plus a class-insensitive term, which converts an arbitrary loss into one that meets the symmetry condition required for label-noise robustness.

If this is right

Training proceeds robustly without explicit estimation of noise transition matrices when symmetric losses are used.
The multi-class unhinged loss supplies a canonical convex baseline against which other symmetric losses can be compared.
SGCE and alpha-MAE let practitioners trade off beta-smoothness for empirical performance while keeping the symmetry guarantee.
The approximation result implies that all symmetric losses share the same local behavior near uniform score vectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decomposition technique could be applied to other base losses such as focal loss to generate additional robust variants.
The local approximation property suggests that optimization dynamics of different symmetric losses coincide to first order near uniform predictions.
Empirical verification on datasets with structured rather than uniform noise could test how far the uniqueness and robustness extend.

Load-bearing premise

The symmetry condition on a loss function is assumed to deliver theoretical robustness guarantees against label noise.

What would settle it

An experiment that measures test accuracy on controlled synthetic label noise and finds that the symmetrized multi-class unhinged loss performs no better than its non-symmetric counterpart would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 2605.20347 by Alexandre Lemire Paquin, Brahim Chaib-draa, Philippe Gigu\`ere.

read the original abstract

Labeling a training set is often expensive and susceptible to errors, making the design of robust loss functions for label noise an important problem. The symmetry condition provides theoretical guarantees for robustness to such noise. In this work, we study a symmetrization method arising from the unique decomposition of any multi-class loss function into a symmetric component and a class-insensitive term. In particular, symmetrizing the cross-entropy loss leads to a linear multi-class extension of the unhinged loss. Unlike in the binary case, the multi-class version must have specific coefficients in order to satisfy the symmetry condition. Under suitable assumptions, we show that this multi-class unhinged loss is the unique convex multi-class symmetric loss. We also show that it has a fundamental local role: the linear approximation of any symmetric loss around score vectors with equal components is equivalent to the multi-class unhinged loss. We then introduce SGCE and alpha-MAE, two loss functions that interpolate between the multi-class unhinged loss and the Mean Absolute Error while allowing control of the beta-smoothness of the loss. Experiments on standard noisy-label benchmarks show competitive performance compared with existing robust loss functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends the unhinged loss to multi-class with a uniqueness result among convex symmetric losses and a local approximation property, but the assumptions stay vague and experiments are only summarized.

read the letter

The main thing here is a multi-class unhinged loss that satisfies the symmetry condition for label noise robustness. They decompose any multi-class loss into a symmetric part plus a class-insensitive term, then symmetrize cross-entropy to get this extension. Unlike the binary version, it needs specific coefficients. Under suitable assumptions they prove it is the unique convex symmetric loss, and they show it acts as the linear approximation to any symmetric loss near score vectors with equal components. From there they build SGCE and alpha-MAE as interpolations between this loss and mean absolute error, with a knob for beta-smoothness. Experiments on standard noisy benchmarks are called competitive with existing robust losses.

Referee Report

3 major / 2 minor

Summary. The paper claims that any multi-class loss admits a unique decomposition into a symmetric component plus a class-insensitive term. Symmetrizing cross-entropy produces a linear multi-class unhinged loss whose coefficients are fixed by the symmetry condition. Under suitable assumptions this loss is the unique convex multi-class symmetric loss and equals the first-order approximation to any symmetric loss at score vectors with equal components. Two interpolating families (SGCE and alpha-MAE) are introduced that trade off between the unhinged loss and MAE while controlling beta-smoothness; experiments report competitive performance on standard noisy-label benchmarks.

Significance. If the uniqueness and approximation results hold under clearly stated conditions, the work supplies a principled foundation for symmetric robust losses and identifies a canonical convex member of the class. The interpolating losses add practical utility by letting practitioners tune smoothness. The contribution would be strengthened by explicit assumptions and more detailed empirical reporting.

major comments (3)

[Abstract / uniqueness theorem] Abstract and theoretical section on uniqueness: the claim that the multi-class unhinged loss is the unique convex multi-class symmetric loss rests on unspecified 'suitable assumptions.' Because this statement is load-bearing for both the theoretical positioning and the motivation for SGCE/alpha-MAE, the assumptions must be enumerated explicitly (e.g., convexity class, gradient bounds, or domain restrictions) and the proof must show they are necessary rather than implicit in the construction.
[Decomposition theorem] Decomposition section: the uniqueness of the decomposition of an arbitrary multi-class loss into symmetric part plus class-insensitive term is asserted but the derivation is not supplied in the abstract and must be given in full, including verification that the decomposition is independent of any further restrictions on the loss.
[Experimental results] Experiments section: performance is summarized only as 'competitive' with no error bars, no tabulated numerical comparisons to baselines, and no dataset or hyper-parameter details visible. This weakens support for the practical utility of the interpolating losses.

minor comments (2)

[Abstract] Abstract: a one-sentence indication of the scope of the 'suitable assumptions' would help readers assess the result without reading the full proof.
[Notation / local approximation] Notation: ensure the score vector notation (equal-component vectors) is defined once and used consistently when stating the local approximation property.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have prepared revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / uniqueness theorem] Abstract and theoretical section on uniqueness: the claim that the multi-class unhinged loss is the unique convex multi-class symmetric loss rests on unspecified 'suitable assumptions.' Because this statement is load-bearing for both the theoretical positioning and the motivation for SGCE/alpha-MAE, the assumptions must be enumerated explicitly (e.g., convexity class, gradient bounds, or domain restrictions) and the proof must show they are necessary rather than implicit in the construction.

Authors: We agree that the assumptions should be stated explicitly. In the revised manuscript we will enumerate them at the theorem statement (convexity, continuous differentiability, and domain restricted to the interior of the probability simplex) and revise the proof to indicate precisely where each assumption is invoked. We will also add a short discussion with counter-examples showing that uniqueness can fail when convexity is dropped, thereby clarifying necessity within the stated class. revision: yes
Referee: [Decomposition theorem] Decomposition section: the uniqueness of the decomposition of an arbitrary multi-class loss into symmetric part plus class-insensitive term is asserted but the derivation is not supplied in the abstract and must be given in full, including verification that the decomposition is independent of any further restrictions on the loss.

Authors: The derivation appears in the theoretical section but is indeed concise. We will expand it in the revision to a self-contained step-by-step argument: define the symmetric component via averaging the loss over label permutations that preserve the score vector symmetry, subtract to obtain the class-insensitive remainder, and prove uniqueness by showing that any other split would violate either symmetry or class-insensitivity. Independence from additional restrictions follows directly from the construction, which uses only the symmetry axiom. revision: yes
Referee: [Experimental results] Experiments section: performance is summarized only as 'competitive' with no error bars, no tabulated numerical comparisons to baselines, and no dataset or hyper-parameter details visible. This weakens support for the practical utility of the interpolating losses.

Authors: We acknowledge the need for more detailed reporting. The revised version will replace the summary statement with tables containing mean accuracy and standard deviation over five independent runs, explicit numerical comparisons against all baselines, and a new subsection (plus appendix) listing datasets, noise models and rates, optimizer settings, learning-rate schedules, and all hyper-parameters used for SGCE and alpha-MAE. revision: yes

Circularity Check

0 steps flagged

Uniqueness of multi-class unhinged loss under suitable assumptions follows from symmetry decomposition without reducing to self-definition or fitted inputs

full rationale

The paper starts from the symmetry condition on losses, which is an external robustness property, and uses the asserted unique decomposition of any multi-class loss into symmetric component plus class-insensitive term to construct the symmetrized cross-entropy. This yields the multi-class unhinged loss with specific coefficients. The claim that it is the unique convex multi-class symmetric loss is explicitly conditioned on suitable assumptions, and the local linear approximation property is shown by direct expansion around equal-component score vectors. No derivation step renames a fitted parameter as a prediction, imports uniqueness via self-citation, or defines the target result into the inputs by construction. The central theoretical positioning therefore remains independent of the paper's own fitted values or prior author results.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The work rests on the symmetry condition for robustness and the uniqueness of the loss decomposition. No new physical entities are introduced. The alpha and beta parameters in the interpolating losses are tunable but not fitted to the final performance metric.

free parameters (2)

alpha
Interpolation weight between multi-class unhinged loss and MAE in alpha-MAE.
beta
Controls smoothness of the loss surface in SGCE and alpha-MAE.

axioms (2)

domain assumption Any multi-class loss admits a unique decomposition into a symmetric component and a class-insensitive term.
Invoked to derive the symmetrized cross-entropy loss.
domain assumption The symmetry condition on the loss guarantees robustness to label noise under suitable assumptions.
Central premise for all robustness claims.

pith-pipeline@v0.9.0 · 5751 in / 1488 out tokens · 30013 ms · 2026-05-21T08:14:53.734248+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

There is a unique (up to constants) decomposition of a loss function into a sum of a symmetric loss function and a class-insensitive term. The symmetric component is given by Lsym(z,y) := L(z,y) − (1/C) Σk L(z,k).
IndisputableMonolith/Foundation/LogicAsFunctionalEquation.lean SatisfiesLawsOfLogic + derivedCost uniqueness echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

The multi-class unhinged loss is the unique convex, non-trivial, non-increasing, multi-class symmetric loss function satisfying the property of invariance to permutations (up to an additive and a multiplicative constant).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

141 extracted references · 141 canonical work pages · 1 internal anchor

[1]

Adaptive Supervision Online Learning for Vision Based Autonomous Systems , author=

work page
[2]

2013 , publisher =

Concentration Inequalities: A Nonasymptotic Theory of Independence , author =. 2013 , publisher =

work page 2013
[3]

Neural Networks , year =

Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers , author =. Neural Networks , year =

work page
[4]

International Conference on Machine Learning (ICML) , pages=

Train faster, generalize better: Stability of stochastic gradient descent , author=. International Conference on Machine Learning (ICML) , pages=. 2016 , publisher=

work page 2016
[5]

2005 , publisher=

The Generic Chaining: Upper and Lower Bounds of Stochastic Processes , author=. 2005 , publisher=

work page 2005
[6]

CoRR , volume =

Arindam Banerjee and Tiancong Chen and Yingxue Zhou , title =. CoRR , volume =. 2020 , url =. 2002.09956 , archivePrefix =

work page arXiv 2020
[7]

International Conference on Learning Representations (ICLR) , year =

Behnam Neyshabur and Srinadh Bhojanapalli and David McAllester and Nathan Srebro , title =. International Conference on Learning Representations (ICLR) , year =

work page
[8]

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS) , year =

Fergus Immanuel Biggs and Benjamin Guedj , title =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS) , year =

work page
[9]

CoRR , volume =

Paul Viallard and Pascal Germain and Amaury Habrard and Emilie Morvant , title =. CoRR , volume =. 2021 , url =. 2102.08649 , archivePrefix =

work page arXiv 2021
[10]

McAllester , title =

David A. McAllester , title =. Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT) , year =

work page
[11]

Advances in Neural Information Processing Systems (NeurIPS) , year =

John Langford and John Shawe-Taylor , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[12]

PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , booktitle =

Alexandre Lacasse and Fran. PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , booktitle =. 2007 , volume =

work page 2007
[13]

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm , journal =

Pascal Germain and Alexandre Lacasse and Fran. Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm , journal =. 2015 , volume =

work page 2015
[14]

Second-Order PAC-Bayesian Bounds for Weighted Majority Votes , booktitle =

Andr. Second-Order PAC-Bayesian Bounds for Weighted Majority Votes , booktitle =. 2020 , volume =

work page 2020
[15]

2014 , isbn =

Shai Shalev-Shwartz and Shai Ben-David , title =. 2014 , isbn =

work page 2014
[16]

A vector-contraction inequality for Rademacher complexities

Andreas Maurer , title =. CoRR , volume =. 2016 , url =. 1605.00251 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

2018 , address =

High-Dimensional Probability: An Introduction with Applications in Data Science , author =. 2018 , address =

work page 2018
[18]

, title =

Erdogdu, Murat A. , title =. 2022 , note =

work page 2022
[19]

arXiv preprint arXiv:2006.07279 , year =

PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses , author =. arXiv preprint arXiv:2006.07279 , year =

work page arXiv 2006
[20]

Advances in Neural Information Processing Systems 29 (NeurIPS 2016) , pages =

PAC-Bayesian Theory Meets Bayesian Inference , author =. Advances in Neural Information Processing Systems 29 (NeurIPS 2016) , pages =. 2016 , url =

work page 2016
[21]

Journal of Machine Learning Research , volume =

Pierre Alquier and James Ridgway and Nicolas Chopin , title =. Journal of Machine Learning Research , volume =. 2016 , publisher =

work page 2016
[22]

Active Negative Loss Functions for Learning with Noisy Labels , url =

Ye, Xichen and Li, Xiaoqiang and dai, songmin and Liu, Tong and Sun, Yan and Tong, Weiqin , booktitle =. Active Negative Loss Functions for Learning with Noisy Labels , url =

work page
[23]

Journal of Machine Learning Research , volume=

On the Dynamics Under the Unhinged Loss and Beyond , author=. Journal of Machine Learning Research , volume=. 2023 , url=

work page 2023
[24]

International Conference on Learning Representations , year=

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations , author=. International Conference on Learning Representations , year=

work page
[25]

2019 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

Symmetric Cross Entropy for Robust Learning With Noisy Labels , author=. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

work page 2019
[26]

NIPS , year=

Learning with Noisy Labels , author=. NIPS , year=

work page
[27]

Proceedings of the 40th International Conference on Machine Learning , year =

Dixian Zhu and Yiming Ying and Tianbao Yang , title =. Proceedings of the 40th International Conference on Machine Learning , year =

work page
[28]

Williamson , editor =

Brendan van Rooyen and Aditya Krishna Menon and Robert C. Williamson , editor =. Learning with Symmetric Label Noise: The Importance of Being Unhinged , booktitle =. 2015 , url =

work page 2015
[29]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Kim, Youngdong and Yim, Junho and Yun, Juseung and Kim, Junmo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

work page
[30]

2017 , eprint=

WebVision Database: Visual Learning and Understanding from Web Data , author=. 2017 , eprint=

work page 2017
[31]

Berg and Li Fei-Fei , Title =

Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei , Title =. 2015 , journal =. doi:10.1007/s11263-015-0816-y , volume=

work page doi:10.1007/s11263-015-0816-y 2015
[32]

The Twelfth International Conference on Learning Representations , year=

Robust Classification via Regression for Learning with Noisy Labels , author=. The Twelfth International Conference on Learning Representations , year=

work page
[33]

Yang and S

S. Yang and S. Wu and E. Yang and B. Han and Y. Liu and M. Xu and G. Niu and T. Liu , journal =. A Parametrical Model for Instance-Dependent Label Noise , year =. doi:10.1109/TPAMI.2023.3301876 , publisher =

work page doi:10.1109/tpami.2023.3301876 2023
[34]

ImageNet: A large-scale hierarchical image database , year=

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei , booktitle=. ImageNet: A large-scale hierarchical image database , year=

work page
[35]

International Conference on Machine Learning , year=

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , author=. International Conference on Machine Learning , year=

work page
[36]

Proceedings of the 38th International Conference on Machine Learning , pages =

Asymmetric Loss Functions for Learning with Noisy Labels , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021
[37]

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , author=. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page 2017
[38]

, title =

Ghosh, Aritra and Manwani, Naresh and Sastry, P.S. , title =. Neurocomput. , month =. 2015 , issue_date =. doi:10.1016/j.neucom.2014.09.081 , abstract =

work page doi:10.1016/j.neucom.2014.09.081 2015
[39]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Risk Minimization in the Presence of Label Noise , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2016 , month=. doi:10.1609/aaai.v30i1.10293 , abstractNote=

work page doi:10.1609/aaai.v30i1.10293 2016
[40]

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning , year=

Gong, Chen and Shi, Hong and Liu, Tongliang and Zhang, Chuang and Yang, Jian and Tao, Dacheng , journal=. Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning , year=

work page
[41]

Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning , year=

Gong, Chen and Yang, Jian and You, Jane and Sugiyama, Masashi , journal=. Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning , year=

work page
[42]

Multi-class Label Noise Learning via Loss Decomposition and Centroid Estimation , booktitle =

Yongliang Ding and Tao Zhou and Chuang Zhang and Yijing Luo and Juan Tang and Chen Gong , editor =. Multi-class Label Noise Learning via Loss Decomposition and Centroid Estimation , booktitle =. 2022 , url =. doi:10.1137/1.9781611977172.29 , timestamp =

work page doi:10.1137/1.9781611977172.29 2022
[43]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Loss factorization, weakly supervised learning and label noise robustness , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , editor =

work page 2016
[44]

Proceedings of the 37th International Conference on Machine Learning , pages =

Normalized Loss Functions for Deep Learning with Noisy Labels , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =

work page 2020
[45]

Image classification with deep learning in the presence of noisy labels:

G. Image classification with deep learning in the presence of noisy labels:. Knowl. Based Syst. , volume =. 2021 , url =. doi:10.1016/j.knosys.2021.106771 , timestamp =

work page doi:10.1016/j.knosys.2021.106771 2021
[46]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Yao, Yu and Liu, Tongliang and Han, Bo and Gong, Mingming and Deng, Jiankang and Niu, Gang and Sugiyama, Masashi , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020
[47]

Proceedings of the 38th International Conference on Machine Learning , pages =

Provably End-to-end Label-noise Learning without Anchor Points , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021
[48]

Proceedings of the 40th International Conference on Machine Learning , pages =

Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

work page 2023
[49]

, title =

Zhang, Zhilu and Sabuncu, Mert R. , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

work page 2018
[50]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence , articleno =

Feng, Lei and Shu, Senlin and Lin, Zhuoyi and Lv, Fengmao and Li, Li and An, Bo , title =. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence , articleno =. 2021 , isbn =

work page 2021
[51]

2020 , eprint=

How does Early Stopping Help Generalization against Label Noise? , author=. 2020 , eprint=

work page 2020
[52]

A Closer Look at Memorization in Deep Networks , year =

Arpit, Devansh and Jastrzundefinedbski, Stanis. A Closer Look at Memorization in Deep Networks , year =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =

work page
[53]

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=

Contrastive Learning Improves Model Robustness Under Label Noise , author=. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=

work page 2021
[54]

Proceedings of the 39th International Conference on Machine Learning , pages =

Investigating Why Contrastive Learning Benefits Robustness against Label Noise , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , editor =

work page 2022
[55]

Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November 4–7, 2022, Proceedings, Part II , pages =

Chen, Yipeng and Ban, Xiaojuan and Xu, Ke , title =. Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November 4–7, 2022, Proceedings, Part II , pages =. 2022 , isbn =. doi:10.1007/978-3-031-18910-4_49 , abstract =

work page doi:10.1007/978-3-031-18910-4_49 2022
[56]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =

work page 2020
[57]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Liu, Sheng and Niles-Weed, Jonathan and Razavian, Narges and Fernandez-Granda, Carlos , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020
[58]

Ghosh, Aritra and Kumar, Himanshu and Sastry, P. S. , title =. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence , pages =. 2017 , publisher =

work page 2017
[59]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page
[60]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page
[61]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980
[62]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page
[63]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984
[64]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page
[65]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page
[66]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page
[67]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page
[68]

2008 , eprint=

Crime and punishment in scientific research , author=. 2008 , eprint=

work page 2008
[69]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page
[70]

Self-Distillation: Towards Efficient and Compact Neural Networks , year=

Zhang, Linfeng and Bao, Chenglong and Ma, Kaisheng , journal=. Self-Distillation: Towards Efficient and Compact Neural Networks , year=

work page
[71]

Proceedings of Thirty Third Conference on Learning Theory , pages =

Sharper Bounds for Uniformly Stable Algorithms , author =. Proceedings of Thirty Third Conference on Learning Theory , pages =. 2020 , editor =

work page 2020
[72]

Yi Zhou and Yingbin Liang and Huishuai Zhang , title =. Mach. Learn. , volume =. 2022 , url =. doi:10.1007/s10994-021-06056-w , timestamp =

work page doi:10.1007/s10994-021-06056-w 2022
[73]

Generalization Bounds of

Wenlong Mou and Liwei Wang and Xiyu Zhai and Kai Zheng , editor =. Generalization Bounds of. Conference On Learning Theory,. 2018 , url =

work page 2018
[74]

CoRR , volume =

Yunwen Lei and Yiming Ying , title =. CoRR , volume =. 2020 , url =. 2006.08157 , timestamp =

work page arXiv 2020
[75]

Shalev-Shwartz, Shai and Ben-David, Shai , pages =

work page
[76]

Learning with Gradient Descent and Weakly Convex Losses , publisher =

Richards, Dominic and Rabbat, Mike , keywords =. Learning with Gradient Descent and Weakly Convex Losses , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2101.04968 , url =

work page doi:10.48550/arxiv.2101.04968 2021
[77]

Alex Krizhevsky , title =

work page
[78]

LeCun, Yann and Cortes, Corinna , biburl =

work page
[79]

International Conference on Learning Representations , year=

The Break-Even Point on Optimization Trajectories of Deep Neural Networks , author=. International Conference on Learning Representations , year=

work page
[80]

Advances in Neural Information Processing Systems 32 , pages =

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence , author =. Advances in Neural Information Processing Systems 32 , pages =. 2019 , publisher =

work page 2019

Showing first 80 references.

[1] [1]

Adaptive Supervision Online Learning for Vision Based Autonomous Systems , author=

work page

[2] [2]

2013 , publisher =

Concentration Inequalities: A Nonasymptotic Theory of Independence , author =. 2013 , publisher =

work page 2013

[3] [3]

Neural Networks , year =

Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers , author =. Neural Networks , year =

work page

[4] [4]

International Conference on Machine Learning (ICML) , pages=

Train faster, generalize better: Stability of stochastic gradient descent , author=. International Conference on Machine Learning (ICML) , pages=. 2016 , publisher=

work page 2016

[5] [5]

2005 , publisher=

The Generic Chaining: Upper and Lower Bounds of Stochastic Processes , author=. 2005 , publisher=

work page 2005

[6] [6]

CoRR , volume =

Arindam Banerjee and Tiancong Chen and Yingxue Zhou , title =. CoRR , volume =. 2020 , url =. 2002.09956 , archivePrefix =

work page arXiv 2020

[7] [7]

International Conference on Learning Representations (ICLR) , year =

Behnam Neyshabur and Srinadh Bhojanapalli and David McAllester and Nathan Srebro , title =. International Conference on Learning Representations (ICLR) , year =

work page

[8] [8]

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS) , year =

Fergus Immanuel Biggs and Benjamin Guedj , title =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS) , year =

work page

[9] [9]

CoRR , volume =

Paul Viallard and Pascal Germain and Amaury Habrard and Emilie Morvant , title =. CoRR , volume =. 2021 , url =. 2102.08649 , archivePrefix =

work page arXiv 2021

[10] [10]

McAllester , title =

David A. McAllester , title =. Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT) , year =

work page

[11] [11]

Advances in Neural Information Processing Systems (NeurIPS) , year =

John Langford and John Shawe-Taylor , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[12] [12]

PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , booktitle =

Alexandre Lacasse and Fran. PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , booktitle =. 2007 , volume =

work page 2007

[13] [13]

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm , journal =

Pascal Germain and Alexandre Lacasse and Fran. Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm , journal =. 2015 , volume =

work page 2015

[14] [14]

Second-Order PAC-Bayesian Bounds for Weighted Majority Votes , booktitle =

Andr. Second-Order PAC-Bayesian Bounds for Weighted Majority Votes , booktitle =. 2020 , volume =

work page 2020

[15] [15]

2014 , isbn =

Shai Shalev-Shwartz and Shai Ben-David , title =. 2014 , isbn =

work page 2014

[16] [16]

A vector-contraction inequality for Rademacher complexities

Andreas Maurer , title =. CoRR , volume =. 2016 , url =. 1605.00251 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

2018 , address =

High-Dimensional Probability: An Introduction with Applications in Data Science , author =. 2018 , address =

work page 2018

[18] [18]

, title =

Erdogdu, Murat A. , title =. 2022 , note =

work page 2022

[19] [19]

arXiv preprint arXiv:2006.07279 , year =

PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses , author =. arXiv preprint arXiv:2006.07279 , year =

work page arXiv 2006

[20] [20]

Advances in Neural Information Processing Systems 29 (NeurIPS 2016) , pages =

PAC-Bayesian Theory Meets Bayesian Inference , author =. Advances in Neural Information Processing Systems 29 (NeurIPS 2016) , pages =. 2016 , url =

work page 2016

[21] [21]

Journal of Machine Learning Research , volume =

Pierre Alquier and James Ridgway and Nicolas Chopin , title =. Journal of Machine Learning Research , volume =. 2016 , publisher =

work page 2016

[22] [22]

Active Negative Loss Functions for Learning with Noisy Labels , url =

Ye, Xichen and Li, Xiaoqiang and dai, songmin and Liu, Tong and Sun, Yan and Tong, Weiqin , booktitle =. Active Negative Loss Functions for Learning with Noisy Labels , url =

work page

[23] [23]

Journal of Machine Learning Research , volume=

On the Dynamics Under the Unhinged Loss and Beyond , author=. Journal of Machine Learning Research , volume=. 2023 , url=

work page 2023

[24] [24]

International Conference on Learning Representations , year=

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations , author=. International Conference on Learning Representations , year=

work page

[25] [25]

2019 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

Symmetric Cross Entropy for Robust Learning With Noisy Labels , author=. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

work page 2019

[26] [26]

NIPS , year=

Learning with Noisy Labels , author=. NIPS , year=

work page

[27] [27]

Proceedings of the 40th International Conference on Machine Learning , year =

Dixian Zhu and Yiming Ying and Tianbao Yang , title =. Proceedings of the 40th International Conference on Machine Learning , year =

work page

[28] [28]

Williamson , editor =

Brendan van Rooyen and Aditya Krishna Menon and Robert C. Williamson , editor =. Learning with Symmetric Label Noise: The Importance of Being Unhinged , booktitle =. 2015 , url =

work page 2015

[29] [29]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Kim, Youngdong and Yim, Junho and Yun, Juseung and Kim, Junmo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

work page

[30] [30]

2017 , eprint=

WebVision Database: Visual Learning and Understanding from Web Data , author=. 2017 , eprint=

work page 2017

[31] [31]

Berg and Li Fei-Fei , Title =

Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei , Title =. 2015 , journal =. doi:10.1007/s11263-015-0816-y , volume=

work page doi:10.1007/s11263-015-0816-y 2015

[32] [32]

The Twelfth International Conference on Learning Representations , year=

Robust Classification via Regression for Learning with Noisy Labels , author=. The Twelfth International Conference on Learning Representations , year=

work page

[33] [33]

Yang and S

S. Yang and S. Wu and E. Yang and B. Han and Y. Liu and M. Xu and G. Niu and T. Liu , journal =. A Parametrical Model for Instance-Dependent Label Noise , year =. doi:10.1109/TPAMI.2023.3301876 , publisher =

work page doi:10.1109/tpami.2023.3301876 2023

[34] [34]

ImageNet: A large-scale hierarchical image database , year=

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei , booktitle=. ImageNet: A large-scale hierarchical image database , year=

work page

[35] [35]

International Conference on Machine Learning , year=

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , author=. International Conference on Machine Learning , year=

work page

[36] [36]

Proceedings of the 38th International Conference on Machine Learning , pages =

Asymmetric Loss Functions for Learning with Noisy Labels , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021

[37] [37]

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , author=. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page 2017

[38] [38]

, title =

Ghosh, Aritra and Manwani, Naresh and Sastry, P.S. , title =. Neurocomput. , month =. 2015 , issue_date =. doi:10.1016/j.neucom.2014.09.081 , abstract =

work page doi:10.1016/j.neucom.2014.09.081 2015

[39] [39]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Risk Minimization in the Presence of Label Noise , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2016 , month=. doi:10.1609/aaai.v30i1.10293 , abstractNote=

work page doi:10.1609/aaai.v30i1.10293 2016

[40] [40]

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning , year=

Gong, Chen and Shi, Hong and Liu, Tongliang and Zhang, Chuang and Yang, Jian and Tao, Dacheng , journal=. Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning , year=

work page

[41] [41]

Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning , year=

Gong, Chen and Yang, Jian and You, Jane and Sugiyama, Masashi , journal=. Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning , year=

work page

[42] [42]

Multi-class Label Noise Learning via Loss Decomposition and Centroid Estimation , booktitle =

Yongliang Ding and Tao Zhou and Chuang Zhang and Yijing Luo and Juan Tang and Chen Gong , editor =. Multi-class Label Noise Learning via Loss Decomposition and Centroid Estimation , booktitle =. 2022 , url =. doi:10.1137/1.9781611977172.29 , timestamp =

work page doi:10.1137/1.9781611977172.29 2022

[43] [43]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Loss factorization, weakly supervised learning and label noise robustness , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , editor =

work page 2016

[44] [44]

Proceedings of the 37th International Conference on Machine Learning , pages =

Normalized Loss Functions for Deep Learning with Noisy Labels , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =

work page 2020

[45] [45]

Image classification with deep learning in the presence of noisy labels:

G. Image classification with deep learning in the presence of noisy labels:. Knowl. Based Syst. , volume =. 2021 , url =. doi:10.1016/j.knosys.2021.106771 , timestamp =

work page doi:10.1016/j.knosys.2021.106771 2021

[46] [46]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Yao, Yu and Liu, Tongliang and Han, Bo and Gong, Mingming and Deng, Jiankang and Niu, Gang and Sugiyama, Masashi , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020

[47] [47]

Proceedings of the 38th International Conference on Machine Learning , pages =

Provably End-to-end Label-noise Learning without Anchor Points , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021

[48] [48]

Proceedings of the 40th International Conference on Machine Learning , pages =

Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

work page 2023

[49] [49]

, title =

Zhang, Zhilu and Sabuncu, Mert R. , title =. Proceedings of the 32nd International Conference on Neural Information Processing Systems , pages =. 2018 , publisher =

work page 2018

[50] [50]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence , articleno =

Feng, Lei and Shu, Senlin and Lin, Zhuoyi and Lv, Fengmao and Li, Li and An, Bo , title =. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence , articleno =. 2021 , isbn =

work page 2021

[51] [51]

2020 , eprint=

How does Early Stopping Help Generalization against Label Noise? , author=. 2020 , eprint=

work page 2020

[52] [52]

A Closer Look at Memorization in Deep Networks , year =

Arpit, Devansh and Jastrzundefinedbski, Stanis. A Closer Look at Memorization in Deep Networks , year =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =

work page

[53] [53]

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=

Contrastive Learning Improves Model Robustness Under Label Noise , author=. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=

work page 2021

[54] [54]

Proceedings of the 39th International Conference on Machine Learning , pages =

Investigating Why Contrastive Learning Benefits Robustness against Label Noise , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , editor =

work page 2022

[55] [55]

Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November 4–7, 2022, Proceedings, Part II , pages =

Chen, Yipeng and Ban, Xiaojuan and Xu, Ke , title =. Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November 4–7, 2022, Proceedings, Part II , pages =. 2022 , isbn =. doi:10.1007/978-3-031-18910-4_49 , abstract =

work page doi:10.1007/978-3-031-18910-4_49 2022

[56] [56]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , pages =. 2020 , editor =

work page 2020

[57] [57]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Liu, Sheng and Niles-Weed, Jonathan and Razavian, Narges and Fernandez-Granda, Carlos , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020

[58] [58]

Ghosh, Aritra and Kumar, Himanshu and Sastry, P. S. , title =. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence , pages =. 2017 , publisher =

work page 2017

[59] [59]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page

[60] [60]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page

[61] [61]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980

[62] [62]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page

[63] [63]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984

[64] [64]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page

[65] [65]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page

[66] [66]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page

[67] [67]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page

[68] [68]

2008 , eprint=

Crime and punishment in scientific research , author=. 2008 , eprint=

work page 2008

[69] [69]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page

[70] [70]

Self-Distillation: Towards Efficient and Compact Neural Networks , year=

Zhang, Linfeng and Bao, Chenglong and Ma, Kaisheng , journal=. Self-Distillation: Towards Efficient and Compact Neural Networks , year=

work page

[71] [71]

Proceedings of Thirty Third Conference on Learning Theory , pages =

Sharper Bounds for Uniformly Stable Algorithms , author =. Proceedings of Thirty Third Conference on Learning Theory , pages =. 2020 , editor =

work page 2020

[72] [72]

Yi Zhou and Yingbin Liang and Huishuai Zhang , title =. Mach. Learn. , volume =. 2022 , url =. doi:10.1007/s10994-021-06056-w , timestamp =

work page doi:10.1007/s10994-021-06056-w 2022

[73] [73]

Generalization Bounds of

Wenlong Mou and Liwei Wang and Xiyu Zhai and Kai Zheng , editor =. Generalization Bounds of. Conference On Learning Theory,. 2018 , url =

work page 2018

[74] [74]

CoRR , volume =

Yunwen Lei and Yiming Ying , title =. CoRR , volume =. 2020 , url =. 2006.08157 , timestamp =

work page arXiv 2020

[75] [75]

Shalev-Shwartz, Shai and Ben-David, Shai , pages =

work page

[76] [76]

Learning with Gradient Descent and Weakly Convex Losses , publisher =

Richards, Dominic and Rabbat, Mike , keywords =. Learning with Gradient Descent and Weakly Convex Losses , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2101.04968 , url =

work page doi:10.48550/arxiv.2101.04968 2021

[77] [77]

Alex Krizhevsky , title =

work page

[78] [78]

LeCun, Yann and Cortes, Corinna , biburl =

work page

[79] [79]

International Conference on Learning Representations , year=

The Break-Even Point on Optimization Trajectories of Deep Neural Networks , author=. International Conference on Learning Representations , year=

work page

[80] [80]

Advances in Neural Information Processing Systems 32 , pages =

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence , author =. Advances in Neural Information Processing Systems 32 , pages =. 2019 , publisher =

work page 2019