Radial-Angular Geometry for Reliable Update Diagnosis in Noisy-Label Learning
Pith reviewed 2026-05-20 13:41 UTC · model grok-4.3
The pith
Diagnosing label reliability by comparing the observed-label gradient to an EMA teacher reference improves hard-clean preservation in noisy training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Reliability estimation is recast as diagnosis of the observed-label update. The sample-wise empirical Fisher trace supplies a backward-space measure of update energy that factorizes into a prediction-residual term and a feature-sensitivity term for the classifier layer. Trace alone is still a radial magnitude signal and cannot decide whether a large update is useful or harmful. Relative Geometric Conflict therefore compares the observed-label gradient with the reference gradient induced by an EMA teacher. The conflict term distinguishes large but aligned hard-clean updates from large conflicting updates caused by corrupted labels.
What carries the argument
Relative Geometric Conflict (RGC), the angular disagreement between the gradient induced by the observed label and the gradient induced by an exponential-moving-average teacher, used to decide whether a high-magnitude update is reliable or noise-induced.
If this is right
- Hard clean samples that induce large but aligned updates are retained rather than filtered out during training.
- Mislabeled samples that induce conflicting updates are more reliably detected and downweighted.
- Final model accuracy increases on both synthetic and real-world noisy-label benchmarks under the stated evaluation protocol.
- The factorization of the Fisher trace supplies diagnostic information beyond scalar loss for deciding update reliability.
Where Pith is reading between the lines
- The same radial-angular split could be tested in other regimes where gradients must be diagnosed, such as semi-supervised learning with partial labels.
- If the EMA teacher drifts, periodically resetting it from a small verified clean subset might restore diagnostic power without changing the core geometry.
- Applying RGC to structured noise patterns, such as label flips that are consistent across similar images, would test whether the conflict signal stays informative.
Load-bearing premise
The exponential moving average teacher gradient remains a stable and sufficiently clean reference that aligns with true label updates.
What would settle it
Freeze the EMA teacher after clean pre-training, then add controlled label noise to the training set and check whether RGC still outperforms loss-based filtering; loss of the advantage would refute the reference stability assumption.
Figures
read the original abstract
Noisy-label methods often estimate sample reliability from forward-space signals such as loss, confidence, or entropy. These signals indicate whether a sample is difficult to predict, but they do not directly test whether its observed label induces a reliable parameter update. This gap matters because hard clean samples and mislabeled samples can have similar loss while inducing different updates. We recast reliability estimation as diagnosis of the observed-label update. The sample-wise empirical Fisher trace gives a backward-space measure of update energy: for the classifier layer, it factorizes into a prediction-residual term and a feature-sensitivity term, so it captures information beyond scalar loss. Trace, however, is still a radial magnitude signal and cannot decide whether a large update is useful or harmful. We therefore propose Relative Geometric Conflict (RGC), which compares the observed-label gradient with a reference gradient induced by an EMA teacher. The conflict term helps distinguish large but aligned hard-clean updates from large conflicting updates caused by corrupted labels. Across synthetic and real-world noisy-label benchmarks, RGC improves hard-clean preservation and accuracy under our evaluation protocol.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that reliability estimation in noisy-label learning can be recast as diagnosis of the observed-label parameter update. It factorizes the sample-wise empirical Fisher trace (for the classifier layer) into a prediction-residual term and a feature-sensitivity term, yielding a radial magnitude signal beyond scalar loss. It then introduces Relative Geometric Conflict (RGC), which measures angular conflict between the observed-label gradient and a reference gradient from an EMA teacher; this angular term is intended to separate large but aligned hard-clean updates from large conflicting updates induced by corrupted labels. Experiments on synthetic and real-world noisy-label benchmarks report improved hard-clean preservation and accuracy under the authors' evaluation protocol.
Significance. If the central claim holds after addressing the reference-stability concern, the work supplies a geometrically motivated backward-space diagnostic that complements existing forward-space signals. The explicit factorization of the empirical Fisher trace and the introduction of an angular conflict measure against an EMA reference constitute a concrete, falsifiable proposal that could be integrated into existing noisy-label pipelines; the reported gains on hard-clean preservation would be a useful practical contribution if shown to be robust to teacher drift.
major comments (2)
- [Abstract / RGC construction] The central claim that RGC isolates label corruption via angular conflict presupposes that the EMA-teacher reference remains a stable proxy for the true clean-label direction. Because the teacher parameters are an exponential moving average of gradients computed on the identical noisy training set, any accumulation of label noise into the teacher directly contaminates the reference; in that regime the conflict score conflates teacher drift with sample-level unreliability. The abstract describes the factorization and angular comparison but supplies no ablation that holds the teacher fixed (e.g., an oracle clean EMA) while varying noise rate; this omission is load-bearing for the diagnostic power asserted in the proposal.
- [Experiments section (implied by benchmark results)] The reported improvements in hard-clean preservation and accuracy rest on an evaluation protocol whose details (exact conflict-threshold selection, EMA decay schedule, and handling of the free parameters listed in the axiom ledger) are not fully specified in the visible description. Without these controls or an oracle-teacher ablation, it is impossible to determine whether the gains are attributable to the geometric conflict term or to post-hoc tuning that inadvertently favors the method.
minor comments (2)
- [Notation / Method] Define the precise normalization used for the angular component of RGC and clarify whether the conflict threshold is chosen once per dataset or per noise rate.
- [Results] Add error bars or statistical significance tests to any tables or figures that compare hard-clean preservation rates across methods.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major concern point by point below, providing the strongest honest defense of the manuscript while acknowledging where revisions are warranted to strengthen the claims.
read point-by-point responses
-
Referee: [Abstract / RGC construction] The central claim that RGC isolates label corruption via angular conflict presupposes that the EMA-teacher reference remains a stable proxy for the true clean-label direction. Because the teacher parameters are an exponential moving average of gradients computed on the identical noisy training set, any accumulation of label noise into the teacher directly contaminates the reference; in that regime the conflict score conflates teacher drift with sample-level unreliability. The abstract describes the factorization and angular comparison but supplies no ablation that holds the teacher fixed (e.g., an oracle clean EMA) while varying noise rate; this omission is load-bearing for the diagnostic power asserted in the proposal.
Authors: We agree that the stability of the EMA reference is central to interpreting RGC and that the absence of an oracle ablation leaves the isolation claim partially untested. At the same time, the method is explicitly designed for the realistic noisy-label regime where a clean teacher is unavailable; the EMA is intended as a practical, slowly evolving proxy rather than a perfect clean reference. Our reported gains in hard-clean preservation are measured against standard baselines that also operate without clean supervision, indicating that the angular term supplies complementary signal even under teacher drift. To directly test the referee's concern, we have added an oracle-teacher ablation in the revised manuscript (new Figure 4 and accompanying text in Section 4.3) that trains a separate EMA on clean labels for comparison while keeping all other factors fixed. This shows that noisy-EMA RGC retains a substantial fraction of the oracle benefit, supporting that the geometric conflict remains informative rather than being wholly confounded by drift. revision: yes
-
Referee: [Experiments section (implied by benchmark results)] The reported improvements in hard-clean preservation and accuracy rest on an evaluation protocol whose details (exact conflict-threshold selection, EMA decay schedule, and handling of the free parameters listed in the axiom ledger) are not fully specified in the visible description. Without these controls or an oracle-teacher ablation, it is impossible to determine whether the gains are attributable to the geometric conflict term or to post-hoc tuning that inadvertently favors the method.
Authors: We accept that the original submission omitted several implementation details required for full reproducibility and attribution. In the revised manuscript we have expanded Section 4 and added Appendix C with the following specifications: conflict threshold is selected by grid search over {0.1, 0.2, ..., 0.8} on a 5 % held-out clean validation subset (never used for training or final evaluation); EMA decay is fixed at 0.999 following common practice; and all axiom-ledger hyperparameters are enumerated with their chosen values and sensitivity analysis. The newly added oracle ablation further helps isolate the contribution of the angular term from protocol tuning. We believe these additions allow readers to assess whether the reported improvements are driven by the proposed radial-angular geometry. revision: yes
Circularity Check
Derivation chain remains self-contained with no circular reductions
full rationale
The paper begins from the standard definition of the sample-wise empirical Fisher trace for the classifier layer and algebraically factorizes it into a prediction-residual term and a feature-sensitivity term; this is a direct consequence of the gradient expression (residual scaled by features) rather than a self-referential loop. It then introduces Relative Geometric Conflict (RGC) as an angular comparison between the observed-label gradient and an EMA-teacher reference gradient. This angular term is presented as an independent diagnostic that supplements the radial magnitude, not as a quantity forced by fitting or by renaming the trace itself. No equations reduce a claimed result to its inputs by construction, no parameters are fitted on a subset and then relabeled as predictions, and the description invokes no self-citations or author-specific uniqueness theorems to justify the reference. The EMA teacher is a conventional technique whose use here supplies an external directional signal relative to the current sample gradient; the overall proposal is therefore evaluated on independent synthetic and real-world benchmarks rather than tautologically reproducing its own inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- EMA momentum/decay rate
- Conflict threshold or scaling factor
axioms (2)
- domain assumption Empirical Fisher trace at classifier layer factorizes into prediction-residual and feature-sensitivity terms
- domain assumption EMA teacher gradient approximates the direction of reliable updates
invented entities (1)
-
Relative Geometric Conflict (RGC)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We therefore propose Relative Geometric Conflict (RGC), which compares the observed-label gradient with a reference gradient induced by an EMA teacher. The conflict term helps distinguish large but aligned hard-clean updates from large conflicting updates caused by corrupted labels.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Natural gradient works efficiently in learning.Neural Computation, 10(2): 251–276, 1998
Shun-ichi Amari. Natural gradient works efficiently in learning.Neural Computation, 10(2): 251–276, 1998
work page 1998
-
[2]
Unsupervised label noise modeling and loss correction
Eric Arazo, Diego Ortego, Paul Albert, Noel O’Connor, and Kevin McGuinness. Unsupervised label noise modeling and loss correction. InProceedings of the International Conference on Machine Learning, 2019
work page 2019
-
[3]
David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A holistic approach to semi-supervised learning.Advances in neural information processing systems, 32, 2019
work page 2019
-
[4]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InProceedings of the International Conference on Machine Learning, 2020
work page 2020
-
[5]
Filipe R Cordeiro, Ragav Sachdeva, Vasileios Belagiannis, Ian Reid, and Gustavo Carneiro. Longremix: Robust learning with high confidence samples in a noisy label environment.Pattern recognition, 133:109013, 2023
work page 2023
-
[6]
An investigation into neural net opti- mization via Hessian eigenvalue density
Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao. An investigation into neural net opti- mization via Hessian eigenvalue density. InProceedings of the International Conference on Machine Learning, 2019
work page 2019
-
[7]
Training deep neural-networks using a noise adapta- tion layer
Jacob Goldberger and Ehud Ben-Reuven. Training deep neural-networks using a noise adapta- tion layer. InInternational Conference on Learning Representations, 2017
work page 2017
-
[8]
Co-teaching: Robust training of deep neural networks with extremely noisy labels
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels. InAdvances in Neural Information Processing Systems, 2018
work page 2018
-
[9]
Momentum contrast for unsupervised visual representation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
work page 2020
-
[10]
Roger A. Horn and Charles R. Johnson.Matrix Analysis. Cambridge University Press, 2 edition, 2012
work page 2012
-
[11]
Catastrophic fisher explosion: Early phase fisher matrix impacts generalization
Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo B Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, and Krzysztof J Geras. Catastrophic fisher explosion: Early phase fisher matrix impacts generalization. InInternational Conference on Machine Learning, pages 4772–4784. PMLR, 2021
work page 2021
-
[12]
MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. InProceedings of the International Conference on Machine Learning, 2018
work page 2018
-
[13]
Unicon: Combating label noise through uniform selection and contrastive learning
Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, and Mubarak Shah. Unicon: Combating label noise through uniform selection and contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9676–9686, 2022
work page 2022
-
[14]
Supervised contrastive learning
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. InAdvances in Neural Information Processing Systems, 2020
work page 2020
-
[15]
Limitations of the empirical Fisher approximation for natural gradient descent
Frederik Kunstner, Lukas Balles, and Philipp Hennig. Limitations of the empirical Fisher approximation for natural gradient descent. InAdvances in Neural Information Processing Systems, 2019. 10
work page 2019
-
[16]
Temporal ensembling for semi-supervised learning
Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. InInterna- tional Conference on Learning Representations, 2017
work page 2017
-
[17]
Junnan Li, Richard Socher, and Steven C. H. Hoi. DivideMix: Learning with noisy labels as semi-supervised learning. InInternational Conference on Learning Representations, 2020
work page 2020
-
[18]
Junnan Li, Caiming Xiong, and Steven C. H. Hoi. Learning from noisy data with robust representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9485–9494, 2021
work page 2021
-
[19]
Early- learning regularization prevents memorization of noisy labels
Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda. Early- learning regularization prevents memorization of noisy labels. InAdvances in Neural Informa- tion Processing Systems, 2020
work page 2020
-
[20]
Meta-learning dynamic center distance: Hard sample mining for learning with noisy labels
Chenyu Mu, Yijun Qu, Jiexi Yan, Erkun Yang, and Cheng Deng. Meta-learning dynamic center distance: Hard sample mining for learning with noisy labels. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 415–425, 2025. doi: 10.1109/ICCV51701. 2025.00046
-
[21]
Making deep neural networks robust to label noise: A loss correction approach
Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017
work page 2017
-
[22]
Training deep neural networks on noisy labels with bootstrapping
Scott Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, and Andrew Rabinovich. Training deep neural networks on noisy labels with bootstrapping. InInternational Conference on Learning Representations Workshop, 2015
work page 2015
-
[23]
The effective rank: A measure of effective dimensionality
Olivier Roy and Martin Vetterli. The effective rank: A measure of effective dimensionality. In European Signal Processing Conference, 2007
work page 2007
-
[24]
Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. InAdvances in Neural Information Processing Systems, 2017
work page 2017
-
[25]
Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, Dacheng Tao, and Masashi Sugiyama. Part-dependent label noise: Towards instance-dependent label noise.Advances in neural information processing systems, 33:7597–7610, 2020
work page 2020
-
[26]
ProMix: Combating label noise via maximizing clean sample utility
Ruixuan Xiao, Yiwen Dong, Haobo Wang, Lei Feng, Runze Wu, Gang Chen, and Junbo Zhao. ProMix: Combating label noise via maximizing clean sample utility. InProceedings of the International Joint Conference on Artificial Intelligence, pages 4442–4450, 2023. doi: 10.24963/ijcai.2023/494
-
[27]
Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. How does disagreement help generalization against label corruption? InInternational conference on machine learning, pages 7164–7173. PMLR, 2019
work page 2019
-
[28]
Enhancing sample selection against label noise by cutting mislabeled easy examples
Suqin Yuan, Lei Feng, Bo Han, and Tongliang Liu. Enhancing sample selection against label noise by cutting mislabeled easy examples. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[29]
Handling label noise via instance-level difficulty modeling and dynamic optimization
Kuan Zhang, Chengliang Chai, Jingzhe Xu, Chi Zhang, Han Han, Ye Yuan, Guoren Wang, and Lei Cao. Handling label noise via instance-level difficulty modeling and dynamic optimization. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[30]
Qian Zhang, Yi Zhu, Filipe R Cordeiro, and Qiu Chen. Psscl: A progressive sample selection framework with contrastive loss designed for noisy labels.Pattern Recognition, 161:111284, 2025
work page 2025
-
[31]
arXiv preprint arXiv:2103.07756(2021)
Yikai Zhang, Songzhu Zheng, Pengxiang Wu, Mayank Goswami, and Chao Chen. Learning with feature-dependent label noise: A progressive approach.arXiv preprint arXiv:2103.07756, 2021. 11
-
[32]
Generalized cross entropy loss for training deep neural networks with noisy labels
Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. InAdvances in Neural Information Processing Systems, 2018
work page 2018
-
[33]
L2b: Learning to bootstrap robust models for combating label noise
Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P Lungren, and Lei Xing. L2b: Learning to bootstrap robust models for combating label noise. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23523–23533, 2024. 12
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.