Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label

Jingyang Mao; Ningkang Peng; Yanhui Gu

arxiv: 2605.20725 · v1 · pith:NKFRL3VGnew · submitted 2026-05-20 · 💻 cs.CV

Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label

Jingyang Mao , Ningkang Peng , Yanhui Gu This is my paper

Pith reviewed 2026-05-21 05:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords noisy labelsreliability estimationbilevel meta-learningMixupcontrastive learningimage classificationrobust trainingpseudo-labeling

0 comments

The pith

Separating reliability estimates for given labels and model predictions improves accuracy under noisy labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current noisy-label methods often merge annotation and prediction reliabilities into one weight even though the two can fail independently. It proposes estimating two separate batch-normalized scalars per sample through bilevel meta-learning, then routing each to its own training objective. One scalar gates a reliability-aware Mixup on the input side while the other selects pseudo-label positives in a contrastive branch. The approach is tested on both synthetic and real-world benchmarks where it raises average accuracy and holds up at high noise levels.

Core claim

Holistic Reliability Propagation decouples the two sources by producing independent alpha and beta reliability scalars via bilevel meta-learning, then applies alpha through global gating to Mixup on the input branch and uses beta to gate pseudo-label positives on the contrastive branch, yielding higher classification accuracy than combined-reliability baselines on noisy data.

What carries the argument

Bilevel meta-learning that outputs two unconstrained, batch-normalized reliability scalars per sample, routed separately by Holistic Reliability Propagation to Mixup and contrastive objectives.

If this is right

Average accuracy rises over strong baselines on both synthetic and real-world noisy-label benchmarks.
Performance stays competitive even when noise rates are highest.
Annotation errors and prediction errors can be handled by different objectives instead of a single combined weight.
Global gating on the Mixup branch and beta-gated positives on the contrastive branch each receive tailored reliability signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decoupling idea could apply to other multi-objective training pipelines where input and label signals have mismatched reliability profiles.
Meta-learned scalars might replace hand-crafted noise models in settings with mixed label and feature noise.
Testing the method on video or audio classification would check whether the two-branch routing generalizes beyond images.

Load-bearing premise

The bilevel meta-learning step can learn two meaningfully independent reliability scalars that remain useful when sent to different branches without extra constraints or validation.

What would settle it

An experiment that forces the two scalars to be identical or highly correlated and still matches or exceeds HRP accuracy would falsify the value of the decoupling.

Figures

Figures reproduced from arXiv: 2605.20725 by Jingyang Mao, Ningkang Peng, Yanhui Gu.

**Figure 2.** Figure 2: Analysis of disentangled reliabilities, CDCL pair [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Comprehensive analysis of the HRP framework. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Learning with noisy labels in multimedia classification often combines external annotations and model predictions into a single reliability weight, even though the two sources can fail for different reasons. We instead estimate disentangled reliabilities: bilevel meta-learning produces two batch-normalized scalars per sample, alpha for the given label and beta for the pseudo-label, without constraining them to sum to one. Holistic Reliability Propagation (HRP) then routes them to different objectives, using reliability-aware Mixup with global gating on the input branch and beta-gated pseudo-label positives on the contrastive branch. On synthetic and real-world benchmarks, HRP improves average accuracy over strong baselines and remains competitive at the highest noise rates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper splits reliability into two separate scalars for annotation and pseudo-labels then routes them to different losses, but offers little evidence the split is doing real work.

read the letter

The core idea here is to stop lumping annotation reliability and model prediction reliability into one weight. Instead they run bilevel meta-learning to get two batch-normalized scalars per sample, alpha for the given label and beta for the pseudo-label, with no sum-to-one constraint. They then send alpha into a reliability-aware Mixup on the input side and beta into gating the positive pairs for the contrastive loss. That routing choice is the clearest departure from earlier single-weight noisy-label methods.

Referee Report

3 major / 2 minor

Summary. The manuscript presents Holistic Reliability Propagation (HRP) for robust learning with noisy labels in classification tasks. It decouples annotation reliability (alpha) and prediction reliability (beta) using bilevel meta-learning to generate two independent batch-normalized scalars per sample without a sum-to-one constraint. These scalars are then propagated holistically: alpha gates a reliability-aware Mixup on the input branch, while beta gates pseudo-label positives in the contrastive branch. Experiments on synthetic and real-world benchmarks demonstrate improved average accuracy compared to strong baselines, with competitiveness maintained at high noise levels.

Significance. If the core assumption of meaningful disentanglement holds and the routing yields genuine additive gains, HRP could advance noisy-label methods by allowing separate treatment of annotation versus prediction errors across loss terms. The bilevel meta-learning formulation for producing the two scalars is a technically distinctive choice that, if supported by targeted validation, might generalize to other multi-source reliability settings in multimedia and vision tasks.

major comments (3)

[Methods (bilevel meta-learning and reliability scalars)] The central claim that alpha and beta are disentangled and capture distinct failure modes is load-bearing for attributing gains to the holistic routing. No correlation analysis between the two scalars, per-sample disagreement cases, or ablation (e.g., single-scalar variants or forced-sum-to-one baseline) is provided to rule out redundancy or collapse; without this, the reported improvements over baselines cannot be confidently linked to the decoupling mechanism.
[Experiments and Results] Experimental results state average accuracy gains but supply no error bars, statistical significance tests, or full ablation tables isolating the contribution of separate alpha/beta routing versus standard Mixup+contrastive baselines. This weakens the ability to verify robustness of the improvements, especially the claim of remaining competitive at the highest noise rates.
[Holistic Reliability Propagation description] The bilevel optimization is presented as producing two independent scalars routed to global gating (Mixup) and beta-gated positives (contrastive), yet no analysis confirms that the absence of a sum-to-one constraint actually yields non-redundant values useful for the two branches; a simple correlation plot or failure-mode case study would directly test this.

minor comments (2)

[Implementation details] Ensure all hyperparameters of the bilevel procedure (inner/outer learning rates, number of meta-steps) are fully specified for reproducibility.
[Discussion] Add a brief discussion of computational overhead introduced by the bilevel meta-learning relative to single-reliability baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on providing stronger empirical support for the disentanglement of alpha and beta and for the robustness of the reported gains. Below we respond to each major comment and describe the revisions we will incorporate.

read point-by-point responses

Referee: [Methods (bilevel meta-learning and reliability scalars)] The central claim that alpha and beta are disentangled and capture distinct failure modes is load-bearing for attributing gains to the holistic routing. No correlation analysis between the two scalars, per-sample disagreement cases, or ablation (e.g., single-scalar variants or forced-sum-to-one baseline) is provided to rule out redundancy or collapse; without this, the reported improvements over baselines cannot be confidently linked to the decoupling mechanism.

Authors: We agree that explicit validation of disentanglement is important for linking gains to the proposed routing. The bilevel meta-learning objective is constructed so that alpha is optimized with respect to annotation consistency while beta is optimized with respect to prediction consistency on the contrastive branch; this separation of objectives, together with the lack of a sum-to-one constraint, is intended to permit independent values. To strengthen the manuscript we will add (i) a correlation analysis between alpha and beta across training samples, (ii) selected per-sample disagreement cases, and (iii) ablations that compare the full HRP model against single-scalar variants and a forced-sum-to-one baseline. These additions will be placed in a new subsection of the experiments. revision: yes
Referee: [Experiments and Results] Experimental results state average accuracy gains but supply no error bars, statistical significance tests, or full ablation tables isolating the contribution of separate alpha/beta routing versus standard Mixup+contrastive baselines. This weakens the ability to verify robustness of the improvements, especially the claim of remaining competitive at the highest noise rates.

Authors: We acknowledge that the current presentation would benefit from greater statistical rigor. The reported numbers are averages over the standard benchmark splits, yet variability across random seeds and formal significance testing are not shown. In the revised version we will (i) report mean accuracy together with standard deviation over at least five independent runs, (ii) include paired statistical significance tests against the strongest baselines, and (iii) expand the ablation tables to isolate the incremental benefit of separate alpha/beta routing versus a combined reliability scalar and versus standard Mixup-plus-contrastive baselines. These updates will directly address the concern about robustness at high noise rates. revision: yes
Referee: [Holistic Reliability Propagation description] The bilevel optimization is presented as producing two independent scalars routed to global gating (Mixup) and beta-gated positives (contrastive), yet no analysis confirms that the absence of a sum-to-one constraint actually yields non-redundant values useful for the two branches; a simple correlation plot or failure-mode case study would directly test this.

Authors: The decision to forgo a sum-to-one constraint follows from the observation that annotation errors and prediction errors need not be complementary. Nevertheless, we recognize that empirical confirmation of non-redundancy is valuable. We will therefore include in the revision a correlation plot of the learned alpha and beta values as well as a small set of failure-mode case studies that illustrate how the two scalars are utilized differently by the Mixup and contrastive branches. These visualizations will be added to the method or experimental analysis section. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation introduces independent bilevel scalars without reducing to fitted inputs or self-citations

full rationale

The paper defines a bilevel meta-learning procedure to produce two batch-normalized scalars alpha (for annotation) and beta (for pseudo-label) without a sum-to-one constraint, then routes them separately into reliability-aware Mixup and beta-gated contrastive terms. No equations or sections in the manuscript reduce these scalars to quantities defined by the final evaluation data, nor does any load-bearing premise rest on a self-citation chain or uniqueness theorem from the same authors. The central claim remains an empirical proposal whose validity is tested on external benchmarks rather than being tautological by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields limited visibility into free parameters or axioms. The two per-sample scalars alpha and beta are produced by meta-learning and appear central; their independence and lack of sum-to-one constraint are presented as design choices rather than derived results.

free parameters (1)

alpha and beta scalars
Batch-normalized reliability scalars per sample produced by bilevel meta-learning; their values are learned rather than fixed a priori.

pith-pipeline@v0.9.0 · 5641 in / 1218 out tokens · 30040 ms · 2026-05-21T05:17:53.948900+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

forcing the two cues into a zero-sum pair, such as weights that must sum to one, is restrictive when neither source should strongly drive updates or when both are trustworthy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 3 internal anchors

[1]

Eric Arazo, Diego Ortego, Paul Albert, Noel O’Connor, and Kevin McGuinness

work page
[2]

InInternational conference on machine learning

Unsupervised label noise modeling and loss correction. InInternational conference on machine learning. PMLR, 312–321

work page
[3]

Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. 2017. A closer look at memorization in deep networks. In International conference on machine learning. PMLR, 233–242

work page 2017
[4]

Yingbin Bai, Erkun Yang, Bo Han, Yanhua Yang, Jiatong Li, Yinian Mao, Gang Niu, and Tongliang Liu. 2021. Understanding and improving early stopping for learning with noisy labels.Advances in Neural Information Processing Systems34 (2021), 24392–24403

work page 2021
[5]

David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning.Advances in neural information processing systems32 (2019)

work page 2019
[6]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. InInter- national conference on machine learning. PmLR, 1597–1607

work page 2020
[7]

Filipe R Cordeiro, Ragav Sachdeva, Vasileios Belagiannis, Ian Reid, and Gustavo Carneiro. 2023. Longremix: Robust learning with high confidence samples in a noisy label environment.Pattern recognition133 (2023), 109013

work page 2023
[8]

Fahimeh Fooladgar, Minh Nguyen Nhat To, Parvin Mousavi, and Purang Abol- maesumi. 2024. Manifold DivideMix: A semi-supervised contrastive learning framework for severe label noise. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4012–4021

work page 2024
[9]

Jacob Goldberger and Ehud Ben-Reuven. 2017. Training deep neural-networks using a noise adaptation layer. InInternational conference on learning representa- tions

work page 2017
[10]

Xian-Jin Gui, Wei Wang, and Zhang-Hao Tian. 2021. Towards understand- ing deep learning from noisy labels with small-loss criterion.arXiv preprint Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label Learning arXiv:2106.09291(2021)

work page arXiv 2021
[11]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels.Advances in neural information processing systems31 (2018)

work page 2018
[12]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Mo- mentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738

work page 2020
[13]

Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclas- sified and out-of-distribution examples in neural networks.arXiv preprint arXiv:1610.02136(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[14]

Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. 2018. Deep anomaly detection with outlier exposure.arXiv preprint arXiv:1812.04606(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, and Mubarak Shah. 2022. Unicon: Combating label noise through uniform selection and contrastive learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9676–9686

work page 2022
[16]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning.Advances in neural information processing systems33 (2020), 18661–18673

work page 2020
[17]

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems31 (2018)

work page 2018
[18]

Junnan Li, Richard Socher, and Steven CH Hoi. 2020. Dividemix: Learning with noisy labels as semi-supervised learning.arXiv preprint arXiv:2002.07394(2020)

work page arXiv 2020
[19]

Junnan Li, Caiming Xiong, and Steven C.H. Hoi. 2021. Learning From Noisy Data With Robust Representation Learning. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV). 9485–9494

work page 2021
[20]

Shikun Li, Xiaobo Xia, Shiming Ge, and Tongliang Liu. 2022. Selective-supervised contrastive learning with noisy labels. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 316–325

work page 2022
[21]

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda

work page
[22]

Early-learning regularization prevents memorization of noisy labels.Ad- vances in neural information processing systems33 (2020), 20331–20342

work page 2020
[23]

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based Out- of-distribution Detection. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 33. 21464–21475

work page 2020
[24]

Curtis Northcutt, Lu Jiang, and Isaac Chuang. 2021. Confident learning: Esti- mating uncertainty in dataset labels.Journal of Artificial Intelligence Research70 (2021), 1373–1411

work page 2021
[25]

Diego Ortego, Eric Arazo, Paul Albert, Noel E O’Connor, and Kevin McGuinness

work page
[26]

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Multi-objective interpolation training for robustness to label noise. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6606–6615

work page
[27]

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. InProceedings of the IEEE conference on computer vision and pattern recognition. 1944–1952

work page 2017
[28]

Ragav Sachdeva, Filipe Rolim Cordeiro, Vasileios Belagiannis, Ian Reid, and Gustavo Carneiro. 2023. ScanMix: Learning from severe label noise via semantic clustering and semi-supervised learning.Pattern recognition134 (2023), 109121

work page 2023
[29]

Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. 2019. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weight- ing. InAdvances in Neural Information Processing Systems, Vol. 32. Curran Asso- ciates, Inc

work page 2019
[30]

Baoye Song, Shihao Zhao, Luyao Dang, Haoguang Wang, and Lin Xu. 2025. A survey on learning from data with label noise via deep neural networks.Systems Science & Control Engineering13, 1 (2025), 2488120

work page 2025
[31]

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. 2019. Selfie: Refurbishing unclean samples for robust deep learning. InInternational conference on machine learning. PMLR, 5907–5915

work page 2019
[32]

Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, Dacheng Tao, and Masashi Sugiyama. 2020. Part-dependent label noise: Towards instance-dependent label noise.Advances in neural information processing systems33 (2020), 7597–7610

work page 2020
[33]

Ruixuan Xiao, Yiwen Dong, Haobo Wang, Lei Feng, Runze Wu, Gang Chen, and Junbo Zhao. 2022. Promix: Combating label noise via maximizing clean sample utility.arXiv preprint arXiv:2207.10276(2022)

work page arXiv 2022
[34]

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenx- uan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. 2022. Openood: Benchmarking generalized out-of-distribution detection.Advances in Neural Information Processing Systems35 (2022), 32598–32611

work page 2022
[35]

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama

work page
[36]

In International conference on machine learning

How does disagreement help generalization against label corruption?. In International conference on machine learning. PMLR, 7164–7173

work page
[37]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[38]

Qian Zhang, Yi Zhu, Filipe R Cordeiro, and Qiu Chen. 2025. PSSCL: A progressive sample selection framework with contrastive loss designed for noisy labels. Pattern Recognition161 (2025), 111284

work page 2025
[39]

Yikai Zhang, Songzhu Zheng, Pengxiang Wu, Mayank Goswami, and Chao Chen

work page
[40]

arXiv preprint arXiv:2103.07756(2021)

Learning with feature-dependent label noise: A progressive approach. arXiv preprint arXiv:2103.07756(2021)

work page arXiv 2021
[41]

Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels.Advances in neural information processing systems31 (2018)

work page 2018
[42]

Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and Ge Li. 2019. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1237–1246

work page 2019
[43]

Lungren, and Lei Xing

Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, and Lei Xing. 2024. L2B: Learning to Boot- strap Robust Models for Combating Label Noise. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23523–23533

work page 2024

[1] [1]

Eric Arazo, Diego Ortego, Paul Albert, Noel O’Connor, and Kevin McGuinness

work page

[2] [2]

InInternational conference on machine learning

Unsupervised label noise modeling and loss correction. InInternational conference on machine learning. PMLR, 312–321

work page

[3] [3]

Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. 2017. A closer look at memorization in deep networks. In International conference on machine learning. PMLR, 233–242

work page 2017

[4] [4]

Yingbin Bai, Erkun Yang, Bo Han, Yanhua Yang, Jiatong Li, Yinian Mao, Gang Niu, and Tongliang Liu. 2021. Understanding and improving early stopping for learning with noisy labels.Advances in Neural Information Processing Systems34 (2021), 24392–24403

work page 2021

[5] [5]

David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning.Advances in neural information processing systems32 (2019)

work page 2019

[6] [6]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. InInter- national conference on machine learning. PmLR, 1597–1607

work page 2020

[7] [7]

Filipe R Cordeiro, Ragav Sachdeva, Vasileios Belagiannis, Ian Reid, and Gustavo Carneiro. 2023. Longremix: Robust learning with high confidence samples in a noisy label environment.Pattern recognition133 (2023), 109013

work page 2023

[8] [8]

Fahimeh Fooladgar, Minh Nguyen Nhat To, Parvin Mousavi, and Purang Abol- maesumi. 2024. Manifold DivideMix: A semi-supervised contrastive learning framework for severe label noise. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4012–4021

work page 2024

[9] [9]

Jacob Goldberger and Ehud Ben-Reuven. 2017. Training deep neural-networks using a noise adaptation layer. InInternational conference on learning representa- tions

work page 2017

[10] [10]

Xian-Jin Gui, Wei Wang, and Zhang-Hao Tian. 2021. Towards understand- ing deep learning from noisy labels with small-loss criterion.arXiv preprint Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label Learning arXiv:2106.09291(2021)

work page arXiv 2021

[11] [11]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels.Advances in neural information processing systems31 (2018)

work page 2018

[12] [12]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Mo- mentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738

work page 2020

[13] [13]

Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclas- sified and out-of-distribution examples in neural networks.arXiv preprint arXiv:1610.02136(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[14] [14]

Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. 2018. Deep anomaly detection with outlier exposure.arXiv preprint arXiv:1812.04606(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, and Mubarak Shah. 2022. Unicon: Combating label noise through uniform selection and contrastive learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9676–9686

work page 2022

[16] [16]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning.Advances in neural information processing systems33 (2020), 18661–18673

work page 2020

[17] [17]

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems31 (2018)

work page 2018

[18] [18]

Junnan Li, Richard Socher, and Steven CH Hoi. 2020. Dividemix: Learning with noisy labels as semi-supervised learning.arXiv preprint arXiv:2002.07394(2020)

work page arXiv 2020

[19] [19]

Junnan Li, Caiming Xiong, and Steven C.H. Hoi. 2021. Learning From Noisy Data With Robust Representation Learning. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV). 9485–9494

work page 2021

[20] [20]

Shikun Li, Xiaobo Xia, Shiming Ge, and Tongliang Liu. 2022. Selective-supervised contrastive learning with noisy labels. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 316–325

work page 2022

[21] [21]

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda

work page

[22] [22]

Early-learning regularization prevents memorization of noisy labels.Ad- vances in neural information processing systems33 (2020), 20331–20342

work page 2020

[23] [23]

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based Out- of-distribution Detection. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 33. 21464–21475

work page 2020

[24] [24]

Curtis Northcutt, Lu Jiang, and Isaac Chuang. 2021. Confident learning: Esti- mating uncertainty in dataset labels.Journal of Artificial Intelligence Research70 (2021), 1373–1411

work page 2021

[25] [25]

Diego Ortego, Eric Arazo, Paul Albert, Noel E O’Connor, and Kevin McGuinness

work page

[26] [26]

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Multi-objective interpolation training for robustness to label noise. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6606–6615

work page

[27] [27]

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. InProceedings of the IEEE conference on computer vision and pattern recognition. 1944–1952

work page 2017

[28] [28]

Ragav Sachdeva, Filipe Rolim Cordeiro, Vasileios Belagiannis, Ian Reid, and Gustavo Carneiro. 2023. ScanMix: Learning from severe label noise via semantic clustering and semi-supervised learning.Pattern recognition134 (2023), 109121

work page 2023

[29] [29]

Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. 2019. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weight- ing. InAdvances in Neural Information Processing Systems, Vol. 32. Curran Asso- ciates, Inc

work page 2019

[30] [30]

Baoye Song, Shihao Zhao, Luyao Dang, Haoguang Wang, and Lin Xu. 2025. A survey on learning from data with label noise via deep neural networks.Systems Science & Control Engineering13, 1 (2025), 2488120

work page 2025

[31] [31]

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. 2019. Selfie: Refurbishing unclean samples for robust deep learning. InInternational conference on machine learning. PMLR, 5907–5915

work page 2019

[32] [32]

Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, Dacheng Tao, and Masashi Sugiyama. 2020. Part-dependent label noise: Towards instance-dependent label noise.Advances in neural information processing systems33 (2020), 7597–7610

work page 2020

[33] [33]

Ruixuan Xiao, Yiwen Dong, Haobo Wang, Lei Feng, Runze Wu, Gang Chen, and Junbo Zhao. 2022. Promix: Combating label noise via maximizing clean sample utility.arXiv preprint arXiv:2207.10276(2022)

work page arXiv 2022

[34] [34]

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenx- uan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. 2022. Openood: Benchmarking generalized out-of-distribution detection.Advances in Neural Information Processing Systems35 (2022), 32598–32611

work page 2022

[35] [35]

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama

work page

[36] [36]

In International conference on machine learning

How does disagreement help generalization against label corruption?. In International conference on machine learning. PMLR, 7164–7173

work page

[37] [37]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[38] [38]

Qian Zhang, Yi Zhu, Filipe R Cordeiro, and Qiu Chen. 2025. PSSCL: A progressive sample selection framework with contrastive loss designed for noisy labels. Pattern Recognition161 (2025), 111284

work page 2025

[39] [39]

Yikai Zhang, Songzhu Zheng, Pengxiang Wu, Mayank Goswami, and Chao Chen

work page

[40] [40]

arXiv preprint arXiv:2103.07756(2021)

Learning with feature-dependent label noise: A progressive approach. arXiv preprint arXiv:2103.07756(2021)

work page arXiv 2021

[41] [41]

Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels.Advances in neural information processing systems31 (2018)

work page 2018

[42] [42]

Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and Ge Li. 2019. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1237–1246

work page 2019

[43] [43]

Lungren, and Lei Xing

Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, and Lei Xing. 2024. L2B: Learning to Boot- strap Robust Models for Combating Label Noise. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23523–23533

work page 2024