arxiv: 2604.06468 · v2 · submitted 2026-04-07 · 💻 cs.LG · stat.ML

Recognition: no theorem link

Conformal Margin Risk Minimization: An Envelope Framework for Robust Learning under Label Noise

Yuanjie Shi , Peihong Li , Zijian Zhang , Janardhan Rao Doppa , Yan Yan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:28 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords label noiserobust classificationconformal predictionmargin regularizationplug-and-play learningnoisy supervision

0 comments

The pith

Conformal Margin Risk Minimization wraps any classification loss with one quantile-regularized term to focus training on high-margin samples under arbitrary label noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CMRM as a plug-and-play addition to existing losses that requires no noise transition matrix, clean subset, or pretrained extractor. It computes the margin between the observed label and competing labels, then penalizes samples whose margin falls below a conformal quantile computed on each batch. This directs gradient updates toward likely-correct examples while suppressing probable errors. A generalization bound is derived that holds for any label noise provided the margin distribution satisfies mild regularity. Experiments across five base learners and six datasets with synthetic and real noise show accuracy gains up to 3.39 percent, smaller conformal prediction sets, and no degradation on clean data.

Core claim

CMRM improves robustness to label noise for any base method by adding a single envelope term that thresholds the observed-label margin against a per-batch conformal quantile, thereby reweighting the loss toward high-margin samples without any privileged knowledge of the noise process.

What carries the argument

The conformal quantile threshold on the margin between the given label and alternatives; it supplies a method-agnostic uncertainty signal that regularizes the training loss to down-weight likely mislabeled points.

Load-bearing premise

The distribution of margins is regular enough that an empirical quantile computed on each batch reliably separates clean from noisy examples.

What would settle it

A dataset in which CMRM produces no accuracy gain or larger prediction sets despite the presence of label noise and the same base learner.

Figures

Figures reproduced from arXiv: 2604.06468 by Janardhan Rao Doppa, Peihong Li, Yan Yan, Yuanjie Shi, Zijian Zhang.

**Figure 1.** Figure 1: Confidence margin distributions for clean (blue) and noisy (orange) samples on CIFAR-100. (a) Class-conditional noise at 20% and (b) human annotation noise at 40%. In both cases, clean samples concentrate on positive margins while noisy samples shift to negative, showing that confidence margins can distinguish clean from noisy labels without assumptions on the noise process. this threshold, a property tha… view at source ↗

**Figure 2.** Figure 2: Justification experiments for multi-class classification on CIFAR-100 with 20% synthetic label noise. Subfigure (a) shows the training dynamics of total loss (Total), classification loss (cl), and CMRM loss (cr) over epochs. CMRM exhibits stable and monotonic convergence alongside standard loss components. Subfigure (b) reports the ratio of noisy samples among those filtered (with soft weight < 0.5) out by… view at source ↗

**Figure 3.** Figure 3: Training dynamics of LR+CMRM for binary classification on the Email dataset. Subfigure (a) Training dynamics of total loss (all loss), classification loss (cl loss), and CMRM loss (cr loss) over epochs. CMRM exhibits stable and monotonic convergence alongside standard loss components. Subfigure (b) τ − (negative class threshold) and τ + (positive class threshold) of LR+CMRM during training. The separatio… view at source ↗

**Figure 4.** Figure 4: Histograms of positive confidence distributions for clean (blue) and noisy (orange) samples on the Credit dataset with 20% label noise. The top and bottom rows correspond to samples with observed labels Ye = 0 (negative) and Ye = 1 (positive), respectively. Distributions are obtained using LR (left) and LR+CMRM (right). CMRM induces a clearer separation between clean and noisy confidence distributions fo… view at source ↗

**Figure 5.** Figure 5: training dynamics of total loss (Total), classification loss (cl), and CMRM loss (cr) for all base losses over epochs on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CERM; Subfigure (b) Focal+CERM; Subfigure (c) LDAM+CERM; Subfigure (d) GCE+CERM; CMRM exhibits stable and monotonic convergence alongside standard loss components. with its +CMRM variant. Across all objectives, CMRM consistentl… view at source ↗

**Figure 6.** Figure 6: The ratio of noisy samples among those filtered out by CMRM at each epoch on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CMRM (α = 0.15); Subfigure (b) Focal+CMRM (α = 0.1); Subfigure (c) LDAM+CMRM (α = 0.15); Subfigure (d) GCE+CMRM (α = 0.05); CMRM consistently suppresses noisy examples by excluding low-margin samples during training. (a) CE+CMRM (b) Focal+CMRM (c) LDAM+CMRM (d) GCE+CMRM … view at source ↗

**Figure 7.** Figure 7: The sensitivity of α of different base losses on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CERM; Subfigure (b) Focal+CERM; Subfigure (c) LDAM+CERM; Subfigure (d) GCE+CERM; CMRM maintains higher accuracy than CE across a range of α values, indicating robustness to hyperparameter α. (a) CE+CMRM (b) Focal+CMRM (c) LDAM+CMRM (d) GCE+CMRM [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: The sensitivity of λ of different base losses on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CERM; Subfigure (b) Focal+CERM; Subfigure (c) LDAM+CERM; Subfigure (d) GCE+CERM; CMRM maintains higher accuracy than CE across a range of λ values, indicating robustness to hyperparameter λ. (a) CE+CMRM (b) Focal+CMRM (c) LDAM+CMRM (d) GCE+CMRM [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: the kernel density estimate (KDE) of the margin distribution of different base losses on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CMRM (α = 0.15); Subfigure (b) Focal+CMRM (α = 0.1); Subfigure (c) LDAM+CMRM (α = 0.15); Subfigure (d) GCE+CMRM (α = 0.05); The vertical dashed line indicating the estimated τα(f). The density curve is smooth and strictly positive around τα(f), supporting the… view at source ↗

**Figure 10.** Figure 10: Dynamics of losses (Top Row) and two thresholds (τ + and τ −, Bottom Row) for binary classification of all base loss (LR, Focal, SVM, GCE) with CMRM on Email datasets. Top row shows the training dynamics of total loss (all loss), classification loss (cl loss), and CMRM regularization loss (CMRM loss) over epochs. CMRM exhibits stable and monotonic convergence alongside standard loss components. Bottom row… view at source ↗

read the original abstract

Most methods for learning with noisy labels require privileged knowledge such as noise transition matrices, clean subsets or pretrained feature extractors, resources typically unavailable when robustness is most needed. We propose Conformal Margin Risk Minimization (CMRM), a plug-and-play envelope framework that improves any classification loss under label noise by adding a single quantile-calibrated regularization term, with no privileged knowledge or training pipeline modification. CMRM measures the confidence margin between the observed label and competing labels, and thresholds it with a conformal quantile estimated per batch to focus training on high-margin samples while suppressing likely mislabeled ones. We derive a learning bound for CMRM under arbitrary label noise requiring only mild regularity of the margin distribution. Across five base methods and six benchmarks with synthetic and real-world noise, CMRM consistently improves accuracy (up to +3.39%), reduces conformal prediction set size (up to -20.44%) and does not hurt under 0% noise, showing that CMRM captures a method-agnostic uncertainty signal that existing mechanisms did not exploit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CMRM adds a simple per-batch conformal quantile regularizer to any loss for noisy labels and shows consistent small gains, but the supporting bound rests on an under-specified mild regularity assumption that noise can easily violate.

read the letter

CMRM wraps any classification loss with a single regularization term that thresholds the margin between the observed label and competitors using a conformal quantile estimated fresh on each batch. The goal is to emphasize high-margin samples and downplay likely mislabeled ones, all without noise transition matrices, clean subsets, or other privileged information. That plug-and-play envelope is the central new piece relative to earlier noisy-label methods that usually demand more structure or pretraining steps. The experiments give it real weight: across five base methods and six benchmarks that mix synthetic and real-world noise, accuracy rises by as much as 3.39 percent, conformal prediction sets shrink by up to 20 percent, and clean-data performance does not drop. The fact that the same add-on works on multiple architectures and does not trade off clean accuracy is the strongest empirical signal here. It suggests the margin quantile is picking up a usable uncertainty cue that standard losses overlook. The theory side is thinner. The paper states a learning bound under arbitrary label noise that needs only mild regularity on the margin distribution. Arbitrary flips can distort those margins badly, for example by turning high-margin clean points into low-margin noisy ones or by producing multimodal score distributions. The abstract gives no precise statement of the regularity condition and no derivation outline, so it is hard to judge whether the bound survives beyond the synthetic regimes where margins stay well-behaved. If the full paper supplies a clear assumption and proof sketch, the theoretical contribution would look more solid. This paper is aimed at practitioners who already train classifiers on imperfect data and want a lightweight robustness tweak rather than a full new architecture. Readers working on conformal methods or label-noise robustness will see immediate value in the framework description and the reported numbers. It deserves serious referee time because the empirical pattern is testable and the method is simple enough that reviewers can reproduce the gains quickly. I would send it out for peer review rather than desk-reject it.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Conformal Margin Risk Minimization (CMRM), a plug-and-play envelope framework that augments any base classification loss with a single quantile-calibrated regularization term derived from the margin between the observed label and competing labels. The quantile is estimated conformally per batch without requiring noise transition matrices, clean subsets, or pretrained extractors. A learning bound is derived for CMRM under arbitrary label noise that requires only mild regularity of the margin distribution. Experiments across five base methods and six benchmarks (synthetic and real-world noise) report consistent accuracy gains (up to +3.39%), smaller conformal prediction sets (up to -20.44%), and no degradation at 0% noise.

Significance. If the bound holds, the work provides a method-agnostic uncertainty signal that existing robust-learning mechanisms have not exploited, with the practical advantage of requiring no pipeline changes or privileged information. The empirical evaluation on multiple bases and both synthetic/real noise is a clear strength, as is the demonstration that performance is not harmed on clean data. The approach correctly leverages established conformal prediction ideas in a novel regularization envelope.

major comments (1)

[Abstract and theoretical section] Abstract and theoretical derivation (bound for CMRM under arbitrary label noise): the central claim that the bound requires 'only mild regularity of the margin distribution' is load-bearing, yet the manuscript does not specify the precise regularity conditions (e.g., continuity of the margin CDF, density bounded away from zero at the quantile level, or Lipschitz continuity). Arbitrary label noise can map clean margins to multimodal or discontinuous observed margins, which would invalidate the conformal quantile calibration guarantee used to control risk; a concrete counter-example or explicit assumption statement is needed to substantiate the bound.

minor comments (2)

[Method description] The per-batch quantile estimation procedure should include a short pseudocode or explicit formula for the conformal score and quantile computation to aid reproducibility.
[Experiments] Table captions and axis labels in the experimental results could more explicitly state the noise rates and base methods for each row/column to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment below and will revise the manuscript accordingly to strengthen the theoretical presentation.

read point-by-point responses

Referee: [Abstract and theoretical section] Abstract and theoretical derivation (bound for CMRM under arbitrary label noise): the central claim that the bound requires 'only mild regularity of the margin distribution' is load-bearing, yet the manuscript does not specify the precise regularity conditions (e.g., continuity of the margin CDF, density bounded away from zero at the quantile level, or Lipschitz continuity). Arbitrary label noise can map clean margins to multimodal or discontinuous observed margins, which would invalidate the conformal quantile calibration guarantee used to control risk; a concrete counter-example or explicit assumption statement is needed to substantiate the bound.

Authors: We agree that the precise regularity conditions were not stated explicitly enough in the abstract and theoretical derivation, and that this warrants clarification. In the revised manuscript we will add an explicit assumption statement in the theoretical section: the (observed) margin random variable is assumed to possess a continuous CDF whose density is positive and bounded away from zero in an open neighborhood of the target quantile level. This is the minimal condition needed for the conformal quantile estimator to achieve exact finite-sample marginal coverage and for the subsequent risk bound to hold. We will also include a short remark explaining why this condition is mild and why it is compatible with arbitrary label noise. While label noise can certainly produce multimodal or discontinuous margin distributions, the conformal calibration step itself is distribution-free and relies only on exchangeability within each batch (which is preserved by construction). The regularity assumption rules out only degenerate cases in which the quantile is not uniquely defined; it does not require unimodality or global Lipschitz continuity. We will add a brief discussion of this point and, space permitting, a simple illustrative example showing that the bound continues to hold under moderate multimodality. No counter-example that would invalidate the conformal guarantee under the stated regularity has been identified in our analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity; learning bound derived from standard conformal prediction theory with independent content

full rationale

The paper's central derivation is a learning bound for CMRM under arbitrary label noise that requires only mild regularity of the margin distribution. This bound is obtained by applying established conformal prediction quantile calibration to the observed margin between the given label and competitors, without reducing the bound to a fitted parameter or self-citation by construction. The per-batch quantile estimation is explicitly data-driven and does not tautologically define the risk control. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the framework is presented as a plug-and-play envelope around any base loss, with empirical gains shown separately on benchmarks. The result therefore remains self-contained against external conformal prediction benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on conformal prediction principles plus one domain assumption; no new entities or fitted constants beyond standard conformal calibration are introduced in the abstract.

axioms (1)

domain assumption mild regularity of the margin distribution
Invoked to derive the learning bound under arbitrary label noise.

pith-pipeline@v0.9.0 · 5494 in / 1248 out tokens · 72966 ms · 2026-05-10T19:28:53.395359+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 10 canonical work pages · 2 internal anchors

[1]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

Anastasios N Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

Image-to-image regression with distribution-free uncertainty quantification and applications in imaging

Anastasios N Angelopoulos, Amit Pal Kohli, Stephen Bates, Michael Jordan, Jitendra Malik, Thayer Alshaabi, Srigokul Upadhyayula, and Yaniv Romano. Image-to-image regression with distribution-free uncertainty quantification and applications in imaging. In International Conference on Machine Learning, pages 717--730. PMLR, 2022

2022
[3]

Unsupervised label noise modeling and loss correction

Eric Arazo, Diego Ortego, Paul Albert, Noel O’Connor, and Kevin McGuinness. Unsupervised label noise modeling and loss correction. In International conference on machine learning, pages 312--321. PMLR, 2019

2019
[4]

Wasserstein generative adversarial networks

Martin Arjovsky, Soumith Chintala, and L \'e on Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214--223. PMLR, 2017

2017
[5]

A closer look at memorization in deep networks

Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. A closer look at memorization in deep networks. In International Conference on Machine Learning, pages 233--242. PMLR, 2017

2017
[6]

Rademacher and gaussian complexities: Risk bounds and structural results

Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: 0 463--482, 2002

2002
[7]

Barry Becker and Ronny Kohavi. Adult . UCI Machine Learning Repository, 1996. DOI : https://doi.org/10.24432/C5XW20

work page doi:10.24432/c5xw20 1996
[8]

Food-101--mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101--mining discriminative components with random forests. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446--461. Springer, 2014

2014
[9]

Learning imbalanced datasets with label-distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019

2019
[10]

A survey on deep learning applied to medical images: from simple artificial neural networks to generative models

Pedro Celard, Eva Lorenzo Iglesias, Jos \'e Manuel Sorribes-Fdez, Rub \'e n Romero, A Seara Vieira, and Lourdes Borrajo. A survey on deep learning applied to medical images: from simple artificial neural networks to generative models. Neural Computing and Applications, 35 0 (3): 0 2291--2323, 2023

2023
[11]

Learning with instance-dependent label noise: A sample sieve approach

Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, and Yang Liu. Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347, 2020

work page arXiv 2010
[12]

Class-balanced loss based on effective number of samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268--9277, 2019

2019
[13]

Real analysis and probability

Richard M Dudley. Real analysis and probability. Chapman and Hall/CRC, 2018

2018
[14]

Training uncertainty-aware classifiers with conformalized deep learning

Bat-Sheva Einbinder, Yaniv Romano, Matteo Sesia, and Yanfei Zhou. Training uncertainty-aware classifiers with conformalized deep learning. Advances in neural information processing systems, 35: 0 22380--22395, 2022

2022
[15]

Large margin deep networks for classification

Gamaleldin Fathy Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, and Samy Bengio. Large margin deep networks for classification. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

2018
[16]

Robust classification via regression for learning with noisy labels

Erik Englesson and Hossein Azizpour. Robust classification via regression for learning with noisy labels. In ICLR 2024-The Twelfth International Conference on Learning Representations, Messe Wien Exhibition and Congress Center, Vienna, Austria, May 7-11t, 2024, 2024

2024
[17]

Conformal prediction: a unified review of theory and new challenges

Matteo Fontana, Gianluca Zeni, and Simone Vantini. Conformal prediction: a unified review of theory and new challenges. Bernoulli, 29 0 (1): 0 1--23, 2023

2023
[18]

Classification in the presence of label noise: a survey

Beno \^ t Fr \'e nay and Michel Verleysen. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems, 25 0 (5): 0 845--869, 2013

2013
[19]

Improving Uncertainty Quantification of Deep Classifiers via Neighborhood Conformal Prediction: Novel Algorithm and Theoretical Analysis

Subhankar Ghosh, Taha Belkhouja, Yan Yan, and Janardhan Rao Doppa. Improving Uncertainty Quantification of Deep Classifiers via Neighborhood Conformal Prediction: Novel Algorithm and Theoretical Analysis . In Proc. of AAAI Conf. , pages 7722--7730, 2023 a

2023
[20]

Probabilistically Robust Conformal Prediction

Subhankar Ghosh, Yuanjie Shi, Taha Belkhouja, Yan Yan, Jana Doppa, and Brian Jones. Probabilistically Robust Conformal Prediction . In UAI Conf. , volume 216 of Proc. of Machine Learning Research, pages 681--690. PMLR , 2023 b

2023
[21]

Dealing with noise problem in machine learning data-sets: A systematic review

Shivani Gupta and Atul Gupta. Dealing with noise problem in machine learning data-sets: A systematic review. Procedia Computer Science, 161: 0 466--474, 2019

2019
[22]

Co-teaching: Robust training of deep neural networks with extremely noisy labels

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems, 31, 2018

2018
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

2016
[24]

Using trusted data to train deep networks on labels corrupted by severe noise

Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using trusted data to train deep networks on labels corrupted by severe noise. Advances in neural information processing systems, 31, 2018

2018
[25]

Spambase

Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt. Spambase. UCI Machine Learning Repository, 1999. DOI: https://doi.org/10.24432/C53G6X

work page doi:10.24432/c53g6x 1999
[26]

Guan, and Maya R

Heinrich Jiang, Been Kim, Melody Y. Guan, and Maya R. Gupta. To trust or not to trust a classifier. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

2018
[27]

Beyond synthetic noise: Deep learning on controlled noisy labels

Lu Jiang, Di Huang, Mason Liu, and Weilong Yang. Beyond synthetic noise: Deep learning on controlled noisy labels. In International conference on machine learning, pages 4804--4815. PMLR, 2020

2020
[28]

A survey on classifying big data with label noise

Justin M Johnson and Taghi M Khoshgoftaar. A survey on classifying big data with label noise. ACM Journal of Data and Information Quality, 14 0 (4): 0 1--43, 2022

2022
[29]

Fine samples for learning with noisy labels

Taehyeon Kim, Jongwoo Ko, JinHwan Choi, Se-Young Yun, et al. Fine samples for learning with noisy labels. Advances in Neural Information Processing Systems, 34: 0 24137--24149, 2021

2021
[30]

Length optimization in conformal prediction

Shayan Kiyani, George J Pappas, and Hamed Hassani. Length optimization in conformal prediction. Advances in Neural Information Processing Systems, 37: 0 99519--99563, 2024

2024
[31]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Toronto, ON, Canada, 2009

2009
[32]

Simple and scalable predictive uncertainty estimation using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017

2017
[33]

Distribution-free predictive inference for regression

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113 0 (523): 0 1094--1111, 2018

2018
[34]

Junnan Li, Richard Socher, and Steven CH Hoi

Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394, 2020

work page arXiv 2002
[35]

Evaluation of dataset distribution and label quality for autonomous driving system

Sijia Li, Yong Fan, Yue Ma, and Ya Pan. Evaluation of dataset distribution and label quality for autonomous driving system. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C), pages 196--200. IEEE, 2021 a

2021
[36]

Provably end-to-end label-noise learning without anchor points

Xuefeng Li, Tongliang Liu, Bo Han, Gang Niu, and Masashi Sugiyama. Provably end-to-end label-noise learning without anchor points. In International conference on machine learning, pages 6403--6413. PMLR, 2021 b

2021
[37]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll \'a r. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980--2988, 2017

2017
[38]

Learning the latent causal structure for modeling label noise

Yexiong Lin, Yu Yao, and Tongliang Liu. Learning the latent causal structure for modeling label noise. Advances in Neural Information Processing Systems, 37: 0 120549--120577, 2024

2024
[39]

Early-learning regularization prevents memorization of noisy labels

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33: 0 20331--20342, 2020

2020
[40]

Foundations of machine learning

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018

2018
[41]

Learning with noisy labels

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. Advances in neural information processing systems, 26, 2013

2013
[42]

A comprehensive overview of large language models

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology, 16 0 (5): 0 1--72, 2025

2025
[43]

Exploring generalization in deep learning

Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. Exploring generalization in deep learning. Advances in neural information processing systems, 30, 2017

2017
[44]

Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom

Tri Nguyen, Shahana Ibrahim, and Xiao Fu. Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom. Advances in Neural Information Processing Systems, 37: 0 97261--97298, 2024

2024
[45]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth \'e e Darcet, Th \'e o Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[46]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1944--1952, 2017

1944
[47]

Identifying mislabeled data using the area under the margin ranking

Geoff Pleiss, Tianyi Zhang, Ethan Elenberg, and Kilian Q Weinberger. Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems, 33: 0 17044--17056, 2020

2020
[48]

J. R. Quinlan. Credit Approval . UCI Machine Learning Repository, 1987. DOI : https://doi.org/10.24432/C5FS30

work page doi:10.24432/c5fs30 1987
[49]

Classification with valid and adaptive coverage

Yaniv Romano, Matteo Sesia, and Emmanuel Candes. Classification with valid and adaptive coverage. Advances in Neural Information Processing Systems, 33: 0 3581--3591, 2020

2020
[50]

Approximation theorems of mathematical statistics

Robert J Serfling. Approximation theorems of mathematical statistics. John Wiley & Sons, 2009

2009
[51]

Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression , 2025

Hooman Shahrokhi, Devjeet Raj Roy, Yan Yan, Venera Arnaoudova, and Janaradhan Rao Doppa. Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression , 2025. URL https://arxiv.org/abs/2503.10512

work page arXiv 2025
[52]

A survey of label-noise deep learning for medical image analysis

Jialin Shi, Kailai Zhang, Chenyi Guo, Youquan Yang, Yali Xu, and Ji Wu. A survey of label-noise deep learning for medical image analysis. Medical image analysis, 95: 0 103166, 2024 a

2024
[53]

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Jana Doppa, and Yan Yan. Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration . In Advances in Neural Information Processing Sys. ( NeurIPS ) , 2024 b

2024
[54]

Direct prediction set minimization via bilevel conformal classifier training

Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Janardhan Rao Doppa, and Yan Yan. Direct prediction set minimization via bilevel conformal classifier training. arXiv preprint arXiv:2506.06599, 2025

work page arXiv 2025
[55]

Learning from noisy labels with deep neural networks: A survey

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. Learning from noisy labels with deep neural networks: A survey. IEEE transactions on neural networks and learning systems, 34 0 (11): 0 8135--8153, 2022

2022
[56]

Learning optimal conformal classifiers, 2022

David Stutz, Ali Taylan Cemgil, Arnaud Doucet, et al. Learning optimal conformal classifiers. arXiv preprint arXiv:2110.09192, 2021

work page arXiv 2021
[57]

Welcome to the era of chatgpt et al

Timm Teubner, Christoph M Flath, Christof Weinhardt, Wil Van Der Aalst, and Oliver Hinz. Welcome to the era of chatgpt et al. the prospects of large language models. Business & Information Systems Engineering, 65 0 (2): 0 95--101, 2023

2023
[58]

Large language models in medicine

Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature medicine, 29 0 (8): 0 1930--1940, 2023

1930
[59]

Leveraging an alignment set in tackling instance-dependent label noise

Donna Tjandra and Jenna Wiens. Leveraging an alignment set in tackling instance-dependent label noise. In Conference on Health, Inference, and Learning, pages 477--497. PMLR, 2023

2023
[60]

Asymptotic statistics, volume 3

Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

2000
[61]

Matching networks for one shot learning

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016

2016
[62]

Algorithmic learning in a random world

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer Science & Business Media, 2005

2005
[63]

Noisegpt: Label noise detection and rectification through probability curvature

Haoyu Wang, Zhuo Huang, Zhiwei Lin, and Tongliang Liu. Noisegpt: Label noise detection and rectification through probability curvature. Advances in Neural Information Processing Systems, 37: 0 120159--120183, 2024

2024
[64]

Learning with noisy labels revisited: A study using real-world human annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TBWA6PLJZQm

2022
[65]

Are anchor points really indispensable in label-noise learning? Advances in neural information processing systems, 32, 2019

Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. Are anchor points really indispensable in label-noise learning? Advances in neural information processing systems, 32, 2019

2019
[66]

Latent class-conditional noise model

Jiangchao Yao, Bo Han, Zhihan Zhou, Ya Zhang, and Ivor W Tsang. Latent class-conditional noise model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (8): 0 9964--9980, 2023

2023
[67]

Dual t: Reducing estimation error for transition matrix in label-noise learning

Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Jiankang Deng, Gang Niu, and Masashi Sugiyama. Dual t: Reducing estimation error for transition matrix in label-noise learning. Advances in neural information processing systems, 33: 0 7260--7271, 2020

2020
[68]

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. Proceedings of the National Academy of Sciences, 118 0 (3), 2021

2021
[69]

Vision-language models for vision tasks: A survey

Jingyi Zhang, Jiaxing Huang, Sheng Jin, and Shijian Lu. Vision-language models for vision tasks: A survey. IEEE transactions on pattern analysis and machine intelligence, 46 0 (8): 0 5625--5644, 2024

2024
[70]

Multiclass learning from noisy labels for non-decomposable performance measures

Mingyuan Zhang and Shivani Agarwal. Multiclass learning from noisy labels for non-decomposable performance measures. In International Conference on Artificial Intelligence and Statistics, pages 2170--2178. PMLR, 2024

2024
[71]

Generalized cross entropy loss for training deep neural networks with noisy labels

Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018

2018
[72]

A review of convolutional neural networks in computer vision

Xia Zhao, Limin Wang, Yufei Zhang, Xuming Han, Muhammet Deveci, and Milan Parmar. A review of convolutional neural networks in computer vision. Artificial Intelligence Review, 57 0 (4): 0 99, 2024

2024
[73]

Training cost-sensitive neural networks with methods addressing the class imbalance problem

Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on knowledge and data engineering, 18 0 (1): 0 63--77, 2006

2006
[74]

Label noise: Ignorance is bliss

Yilun Zhu, Jianxin Zhang, Aditya Gangrade, and Clay Scott. Label noise: Ignorance is bliss. Advances in Neural Information Processing Systems, 37: 0 116575--116616, 2024

2024