pith. machine review for the scientific record. sign in

arxiv: 2604.06468 · v2 · submitted 2026-04-07 · 💻 cs.LG · stat.ML

Recognition: no theorem link

Conformal Margin Risk Minimization: An Envelope Framework for Robust Learning under Label Noise

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:28 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords label noiserobust classificationconformal predictionmargin regularizationplug-and-play learningnoisy supervision
0
0 comments X

The pith

Conformal Margin Risk Minimization wraps any classification loss with one quantile-regularized term to focus training on high-margin samples under arbitrary label noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CMRM as a plug-and-play addition to existing losses that requires no noise transition matrix, clean subset, or pretrained extractor. It computes the margin between the observed label and competing labels, then penalizes samples whose margin falls below a conformal quantile computed on each batch. This directs gradient updates toward likely-correct examples while suppressing probable errors. A generalization bound is derived that holds for any label noise provided the margin distribution satisfies mild regularity. Experiments across five base learners and six datasets with synthetic and real noise show accuracy gains up to 3.39 percent, smaller conformal prediction sets, and no degradation on clean data.

Core claim

CMRM improves robustness to label noise for any base method by adding a single envelope term that thresholds the observed-label margin against a per-batch conformal quantile, thereby reweighting the loss toward high-margin samples without any privileged knowledge of the noise process.

What carries the argument

The conformal quantile threshold on the margin between the given label and alternatives; it supplies a method-agnostic uncertainty signal that regularizes the training loss to down-weight likely mislabeled points.

Load-bearing premise

The distribution of margins is regular enough that an empirical quantile computed on each batch reliably separates clean from noisy examples.

What would settle it

A dataset in which CMRM produces no accuracy gain or larger prediction sets despite the presence of label noise and the same base learner.

Figures

Figures reproduced from arXiv: 2604.06468 by Janardhan Rao Doppa, Peihong Li, Yan Yan, Yuanjie Shi, Zijian Zhang.

Figure 1
Figure 1. Figure 1: Confidence margin distributions for clean (blue) and noisy (orange) samples on CIFAR-100. (a) Class-conditional noise at 20% and (b) human annota￾tion noise at 40%. In both cases, clean samples concentrate on positive margins while noisy samples shift to negative, showing that confidence margins can distinguish clean from noisy labels without assumptions on the noise process. this threshold, a property tha… view at source ↗
Figure 2
Figure 2. Figure 2: Justification experiments for multi-class classification on CIFAR-100 with 20% synthetic label noise. Subfigure (a) shows the training dynamics of total loss (Total), classification loss (cl), and CMRM loss (cr) over epochs. CMRM exhibits stable and monotonic convergence alongside standard loss components. Subfigure (b) reports the ratio of noisy samples among those filtered (with soft weight < 0.5) out by… view at source ↗
Figure 3
Figure 3. Figure 3: Training dynamics of LR+CMRM for bi￾nary classification on the Email dataset. Subfigure (a) Training dynamics of total loss (all loss), classification loss (cl loss), and CMRM loss (cr loss) over epochs. CMRM exhibits stable and monotonic convergence alongside stan￾dard loss components. Subfigure (b) τ − (negative class threshold) and τ + (positive class threshold) of LR+CMRM during training. The separatio… view at source ↗
Figure 4
Figure 4. Figure 4: Histograms of positive confidence distri￾butions for clean (blue) and noisy (orange) samples on the Credit dataset with 20% label noise. The top and bottom rows correspond to samples with observed labels Ye = 0 (negative) and Ye = 1 (positive), respectively. Dis￾tributions are obtained using LR (left) and LR+CMRM (right). CMRM induces a clearer separation between clean and noisy confidence distributions fo… view at source ↗
Figure 5
Figure 5. Figure 5: training dynamics of total loss (Total), classification loss (cl), and CMRM loss (cr) for all base losses over epochs on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CERM; Subfigure (b) Focal+CERM; Subfigure (c) LDAM+CERM; Subfigure (d) GCE+CERM; CMRM exhibits stable and monotonic convergence alongside standard loss components. with its +CMRM variant. Across all objectives, CMRM consistentl… view at source ↗
Figure 6
Figure 6. Figure 6: The ratio of noisy samples among those filtered out by CMRM at each epoch on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CMRM (α = 0.15); Subfigure (b) Focal+CMRM (α = 0.1); Subfigure (c) LDAM+CMRM (α = 0.15); Subfigure (d) GCE+CMRM (α = 0.05); CMRM consistently suppresses noisy examples by excluding low-margin samples during training. (a) CE+CMRM (b) Focal+CMRM (c) LDAM+CMRM (d) GCE+CMRM … view at source ↗
Figure 7
Figure 7. Figure 7: The sensitivity of α of different base losses on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CERM; Subfigure (b) Focal+CERM; Subfigure (c) LDAM+CERM; Subfigure (d) GCE+CERM; CMRM maintains higher accuracy than CE across a range of α values, indicating robustness to hyperparameter α. (a) CE+CMRM (b) Focal+CMRM (c) LDAM+CMRM (d) GCE+CMRM [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The sensitivity of λ of different base losses on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CERM; Subfigure (b) Focal+CERM; Subfigure (c) LDAM+CERM; Subfigure (d) GCE+CERM; CMRM maintains higher accuracy than CE across a range of λ values, indicating robustness to hyperparameter λ. (a) CE+CMRM (b) Focal+CMRM (c) LDAM+CMRM (d) GCE+CMRM [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: the kernel density estimate (KDE) of the margin distribution of different base losses on CIFAR-100 with 20% synthetical label noise. Subfigure (a) CE+CMRM (α = 0.15); Subfigure (b) Focal+CMRM (α = 0.1); Subfigure (c) LDAM+CMRM (α = 0.15); Subfigure (d) GCE+CMRM (α = 0.05); The vertical dashed line indicating the estimated τα(f). The density curve is smooth and strictly positive around τα(f), supporting the… view at source ↗
Figure 10
Figure 10. Figure 10: Dynamics of losses (Top Row) and two thresholds (τ + and τ −, Bottom Row) for binary classification of all base loss (LR, Focal, SVM, GCE) with CMRM on Email datasets. Top row shows the training dynamics of total loss (all loss), classification loss (cl loss), and CMRM regularization loss (CMRM loss) over epochs. CMRM exhibits stable and monotonic convergence alongside standard loss components. Bottom row… view at source ↗
read the original abstract

Most methods for learning with noisy labels require privileged knowledge such as noise transition matrices, clean subsets or pretrained feature extractors, resources typically unavailable when robustness is most needed. We propose Conformal Margin Risk Minimization (CMRM), a plug-and-play envelope framework that improves any classification loss under label noise by adding a single quantile-calibrated regularization term, with no privileged knowledge or training pipeline modification. CMRM measures the confidence margin between the observed label and competing labels, and thresholds it with a conformal quantile estimated per batch to focus training on high-margin samples while suppressing likely mislabeled ones. We derive a learning bound for CMRM under arbitrary label noise requiring only mild regularity of the margin distribution. Across five base methods and six benchmarks with synthetic and real-world noise, CMRM consistently improves accuracy (up to +3.39%), reduces conformal prediction set size (up to -20.44%) and does not hurt under 0% noise, showing that CMRM captures a method-agnostic uncertainty signal that existing mechanisms did not exploit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Conformal Margin Risk Minimization (CMRM), a plug-and-play envelope framework that augments any base classification loss with a single quantile-calibrated regularization term derived from the margin between the observed label and competing labels. The quantile is estimated conformally per batch without requiring noise transition matrices, clean subsets, or pretrained extractors. A learning bound is derived for CMRM under arbitrary label noise that requires only mild regularity of the margin distribution. Experiments across five base methods and six benchmarks (synthetic and real-world noise) report consistent accuracy gains (up to +3.39%), smaller conformal prediction sets (up to -20.44%), and no degradation at 0% noise.

Significance. If the bound holds, the work provides a method-agnostic uncertainty signal that existing robust-learning mechanisms have not exploited, with the practical advantage of requiring no pipeline changes or privileged information. The empirical evaluation on multiple bases and both synthetic/real noise is a clear strength, as is the demonstration that performance is not harmed on clean data. The approach correctly leverages established conformal prediction ideas in a novel regularization envelope.

major comments (1)
  1. [Abstract and theoretical section] Abstract and theoretical derivation (bound for CMRM under arbitrary label noise): the central claim that the bound requires 'only mild regularity of the margin distribution' is load-bearing, yet the manuscript does not specify the precise regularity conditions (e.g., continuity of the margin CDF, density bounded away from zero at the quantile level, or Lipschitz continuity). Arbitrary label noise can map clean margins to multimodal or discontinuous observed margins, which would invalidate the conformal quantile calibration guarantee used to control risk; a concrete counter-example or explicit assumption statement is needed to substantiate the bound.
minor comments (2)
  1. [Method description] The per-batch quantile estimation procedure should include a short pseudocode or explicit formula for the conformal score and quantile computation to aid reproducibility.
  2. [Experiments] Table captions and axis labels in the experimental results could more explicitly state the noise rates and base methods for each row/column to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment below and will revise the manuscript accordingly to strengthen the theoretical presentation.

read point-by-point responses
  1. Referee: [Abstract and theoretical section] Abstract and theoretical derivation (bound for CMRM under arbitrary label noise): the central claim that the bound requires 'only mild regularity of the margin distribution' is load-bearing, yet the manuscript does not specify the precise regularity conditions (e.g., continuity of the margin CDF, density bounded away from zero at the quantile level, or Lipschitz continuity). Arbitrary label noise can map clean margins to multimodal or discontinuous observed margins, which would invalidate the conformal quantile calibration guarantee used to control risk; a concrete counter-example or explicit assumption statement is needed to substantiate the bound.

    Authors: We agree that the precise regularity conditions were not stated explicitly enough in the abstract and theoretical derivation, and that this warrants clarification. In the revised manuscript we will add an explicit assumption statement in the theoretical section: the (observed) margin random variable is assumed to possess a continuous CDF whose density is positive and bounded away from zero in an open neighborhood of the target quantile level. This is the minimal condition needed for the conformal quantile estimator to achieve exact finite-sample marginal coverage and for the subsequent risk bound to hold. We will also include a short remark explaining why this condition is mild and why it is compatible with arbitrary label noise. While label noise can certainly produce multimodal or discontinuous margin distributions, the conformal calibration step itself is distribution-free and relies only on exchangeability within each batch (which is preserved by construction). The regularity assumption rules out only degenerate cases in which the quantile is not uniquely defined; it does not require unimodality or global Lipschitz continuity. We will add a brief discussion of this point and, space permitting, a simple illustrative example showing that the bound continues to hold under moderate multimodality. No counter-example that would invalidate the conformal guarantee under the stated regularity has been identified in our analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity; learning bound derived from standard conformal prediction theory with independent content

full rationale

The paper's central derivation is a learning bound for CMRM under arbitrary label noise that requires only mild regularity of the margin distribution. This bound is obtained by applying established conformal prediction quantile calibration to the observed margin between the given label and competitors, without reducing the bound to a fitted parameter or self-citation by construction. The per-batch quantile estimation is explicitly data-driven and does not tautologically define the risk control. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the framework is presented as a plug-and-play envelope around any base loss, with empirical gains shown separately on benchmarks. The result therefore remains self-contained against external conformal prediction benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on conformal prediction principles plus one domain assumption; no new entities or fitted constants beyond standard conformal calibration are introduced in the abstract.

axioms (1)
  • domain assumption mild regularity of the margin distribution
    Invoked to derive the learning bound under arbitrary label noise.

pith-pipeline@v0.9.0 · 5494 in / 1248 out tokens · 72966 ms · 2026-05-10T19:28:53.395359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 10 canonical work pages · 2 internal anchors

  1. [1]

    A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

    Anastasios N Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021

  2. [2]

    Image-to-image regression with distribution-free uncertainty quantification and applications in imaging

    Anastasios N Angelopoulos, Amit Pal Kohli, Stephen Bates, Michael Jordan, Jitendra Malik, Thayer Alshaabi, Srigokul Upadhyayula, and Yaniv Romano. Image-to-image regression with distribution-free uncertainty quantification and applications in imaging. In International Conference on Machine Learning, pages 717--730. PMLR, 2022

  3. [3]

    Unsupervised label noise modeling and loss correction

    Eric Arazo, Diego Ortego, Paul Albert, Noel O’Connor, and Kevin McGuinness. Unsupervised label noise modeling and loss correction. In International conference on machine learning, pages 312--321. PMLR, 2019

  4. [4]

    Wasserstein generative adversarial networks

    Martin Arjovsky, Soumith Chintala, and L \'e on Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214--223. PMLR, 2017

  5. [5]

    A closer look at memorization in deep networks

    Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. A closer look at memorization in deep networks. In International Conference on Machine Learning, pages 233--242. PMLR, 2017

  6. [6]

    Rademacher and gaussian complexities: Risk bounds and structural results

    Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: 0 463--482, 2002

  7. [7]

    Barry Becker and Ronny Kohavi. Adult . UCI Machine Learning Repository, 1996. DOI : https://doi.org/10.24432/C5XW20

  8. [8]

    Food-101--mining discriminative components with random forests

    Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101--mining discriminative components with random forests. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446--461. Springer, 2014

  9. [9]

    Learning imbalanced datasets with label-distribution-aware margin loss

    Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019

  10. [10]

    A survey on deep learning applied to medical images: from simple artificial neural networks to generative models

    Pedro Celard, Eva Lorenzo Iglesias, Jos \'e Manuel Sorribes-Fdez, Rub \'e n Romero, A Seara Vieira, and Lourdes Borrajo. A survey on deep learning applied to medical images: from simple artificial neural networks to generative models. Neural Computing and Applications, 35 0 (3): 0 2291--2323, 2023

  11. [11]

    Learning with instance-dependent label noise: A sample sieve approach

    Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, and Yang Liu. Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347, 2020

  12. [12]

    Class-balanced loss based on effective number of samples

    Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268--9277, 2019

  13. [13]

    Real analysis and probability

    Richard M Dudley. Real analysis and probability. Chapman and Hall/CRC, 2018

  14. [14]

    Training uncertainty-aware classifiers with conformalized deep learning

    Bat-Sheva Einbinder, Yaniv Romano, Matteo Sesia, and Yanfei Zhou. Training uncertainty-aware classifiers with conformalized deep learning. Advances in neural information processing systems, 35: 0 22380--22395, 2022

  15. [15]

    Large margin deep networks for classification

    Gamaleldin Fathy Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, and Samy Bengio. Large margin deep networks for classification. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

  16. [16]

    Robust classification via regression for learning with noisy labels

    Erik Englesson and Hossein Azizpour. Robust classification via regression for learning with noisy labels. In ICLR 2024-The Twelfth International Conference on Learning Representations, Messe Wien Exhibition and Congress Center, Vienna, Austria, May 7-11t, 2024, 2024

  17. [17]

    Conformal prediction: a unified review of theory and new challenges

    Matteo Fontana, Gianluca Zeni, and Simone Vantini. Conformal prediction: a unified review of theory and new challenges. Bernoulli, 29 0 (1): 0 1--23, 2023

  18. [18]

    Classification in the presence of label noise: a survey

    Beno \^ t Fr \'e nay and Michel Verleysen. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems, 25 0 (5): 0 845--869, 2013

  19. [19]

    Improving Uncertainty Quantification of Deep Classifiers via Neighborhood Conformal Prediction: Novel Algorithm and Theoretical Analysis

    Subhankar Ghosh, Taha Belkhouja, Yan Yan, and Janardhan Rao Doppa. Improving Uncertainty Quantification of Deep Classifiers via Neighborhood Conformal Prediction: Novel Algorithm and Theoretical Analysis . In Proc. of AAAI Conf. , pages 7722--7730, 2023 a

  20. [20]

    Probabilistically Robust Conformal Prediction

    Subhankar Ghosh, Yuanjie Shi, Taha Belkhouja, Yan Yan, Jana Doppa, and Brian Jones. Probabilistically Robust Conformal Prediction . In UAI Conf. , volume 216 of Proc. of Machine Learning Research, pages 681--690. PMLR , 2023 b

  21. [21]

    Dealing with noise problem in machine learning data-sets: A systematic review

    Shivani Gupta and Atul Gupta. Dealing with noise problem in machine learning data-sets: A systematic review. Procedia Computer Science, 161: 0 466--474, 2019

  22. [22]

    Co-teaching: Robust training of deep neural networks with extremely noisy labels

    Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems, 31, 2018

  23. [23]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

  24. [24]

    Using trusted data to train deep networks on labels corrupted by severe noise

    Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using trusted data to train deep networks on labels corrupted by severe noise. Advances in neural information processing systems, 31, 2018

  25. [25]

    Spambase

    Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt. Spambase. UCI Machine Learning Repository, 1999. DOI: https://doi.org/10.24432/C53G6X

  26. [26]

    Guan, and Maya R

    Heinrich Jiang, Been Kim, Melody Y. Guan, and Maya R. Gupta. To trust or not to trust a classifier. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

  27. [27]

    Beyond synthetic noise: Deep learning on controlled noisy labels

    Lu Jiang, Di Huang, Mason Liu, and Weilong Yang. Beyond synthetic noise: Deep learning on controlled noisy labels. In International conference on machine learning, pages 4804--4815. PMLR, 2020

  28. [28]

    A survey on classifying big data with label noise

    Justin M Johnson and Taghi M Khoshgoftaar. A survey on classifying big data with label noise. ACM Journal of Data and Information Quality, 14 0 (4): 0 1--43, 2022

  29. [29]

    Fine samples for learning with noisy labels

    Taehyeon Kim, Jongwoo Ko, JinHwan Choi, Se-Young Yun, et al. Fine samples for learning with noisy labels. Advances in Neural Information Processing Systems, 34: 0 24137--24149, 2021

  30. [30]

    Length optimization in conformal prediction

    Shayan Kiyani, George J Pappas, and Hamed Hassani. Length optimization in conformal prediction. Advances in Neural Information Processing Systems, 37: 0 99519--99563, 2024

  31. [31]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Toronto, ON, Canada, 2009

  32. [32]

    Simple and scalable predictive uncertainty estimation using deep ensembles

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017

  33. [33]

    Distribution-free predictive inference for regression

    Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113 0 (523): 0 1094--1111, 2018

  34. [34]

    Junnan Li, Richard Socher, and Steven CH Hoi

    Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394, 2020

  35. [35]

    Evaluation of dataset distribution and label quality for autonomous driving system

    Sijia Li, Yong Fan, Yue Ma, and Ya Pan. Evaluation of dataset distribution and label quality for autonomous driving system. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C), pages 196--200. IEEE, 2021 a

  36. [36]

    Provably end-to-end label-noise learning without anchor points

    Xuefeng Li, Tongliang Liu, Bo Han, Gang Niu, and Masashi Sugiyama. Provably end-to-end label-noise learning without anchor points. In International conference on machine learning, pages 6403--6413. PMLR, 2021 b

  37. [37]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll \'a r. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980--2988, 2017

  38. [38]

    Learning the latent causal structure for modeling label noise

    Yexiong Lin, Yu Yao, and Tongliang Liu. Learning the latent causal structure for modeling label noise. Advances in Neural Information Processing Systems, 37: 0 120549--120577, 2024

  39. [39]

    Early-learning regularization prevents memorization of noisy labels

    Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33: 0 20331--20342, 2020

  40. [40]

    Foundations of machine learning

    Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018

  41. [41]

    Learning with noisy labels

    Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. Advances in neural information processing systems, 26, 2013

  42. [42]

    A comprehensive overview of large language models

    Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology, 16 0 (5): 0 1--72, 2025

  43. [43]

    Exploring generalization in deep learning

    Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. Exploring generalization in deep learning. Advances in neural information processing systems, 30, 2017

  44. [44]

    Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom

    Tri Nguyen, Shahana Ibrahim, and Xiao Fu. Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom. Advances in Neural Information Processing Systems, 37: 0 97261--97298, 2024

  45. [45]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timoth \'e e Darcet, Th \'e o Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023

  46. [46]

    Making deep neural networks robust to label noise: A loss correction approach

    Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1944--1952, 2017

  47. [47]

    Identifying mislabeled data using the area under the margin ranking

    Geoff Pleiss, Tianyi Zhang, Ethan Elenberg, and Kilian Q Weinberger. Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems, 33: 0 17044--17056, 2020

  48. [48]

    J. R. Quinlan. Credit Approval . UCI Machine Learning Repository, 1987. DOI : https://doi.org/10.24432/C5FS30

  49. [49]

    Classification with valid and adaptive coverage

    Yaniv Romano, Matteo Sesia, and Emmanuel Candes. Classification with valid and adaptive coverage. Advances in Neural Information Processing Systems, 33: 0 3581--3591, 2020

  50. [50]

    Approximation theorems of mathematical statistics

    Robert J Serfling. Approximation theorems of mathematical statistics. John Wiley & Sons, 2009

  51. [51]

    Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression , 2025

    Hooman Shahrokhi, Devjeet Raj Roy, Yan Yan, Venera Arnaoudova, and Janaradhan Rao Doppa. Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression , 2025. URL https://arxiv.org/abs/2503.10512

  52. [52]

    A survey of label-noise deep learning for medical image analysis

    Jialin Shi, Kailai Zhang, Chenyi Guo, Youquan Yang, Yali Xu, and Ji Wu. A survey of label-noise deep learning for medical image analysis. Medical image analysis, 95: 0 103166, 2024 a

  53. [53]

    Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

    Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Jana Doppa, and Yan Yan. Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration . In Advances in Neural Information Processing Sys. ( NeurIPS ) , 2024 b

  54. [54]

    Direct prediction set minimization via bilevel conformal classifier training

    Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Janardhan Rao Doppa, and Yan Yan. Direct prediction set minimization via bilevel conformal classifier training. arXiv preprint arXiv:2506.06599, 2025

  55. [55]

    Learning from noisy labels with deep neural networks: A survey

    Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. Learning from noisy labels with deep neural networks: A survey. IEEE transactions on neural networks and learning systems, 34 0 (11): 0 8135--8153, 2022

  56. [56]

    Learning optimal conformal classifiers, 2022

    David Stutz, Ali Taylan Cemgil, Arnaud Doucet, et al. Learning optimal conformal classifiers. arXiv preprint arXiv:2110.09192, 2021

  57. [57]

    Welcome to the era of chatgpt et al

    Timm Teubner, Christoph M Flath, Christof Weinhardt, Wil Van Der Aalst, and Oliver Hinz. Welcome to the era of chatgpt et al. the prospects of large language models. Business & Information Systems Engineering, 65 0 (2): 0 95--101, 2023

  58. [58]

    Large language models in medicine

    Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature medicine, 29 0 (8): 0 1930--1940, 2023

  59. [59]

    Leveraging an alignment set in tackling instance-dependent label noise

    Donna Tjandra and Jenna Wiens. Leveraging an alignment set in tackling instance-dependent label noise. In Conference on Health, Inference, and Learning, pages 477--497. PMLR, 2023

  60. [60]

    Asymptotic statistics, volume 3

    Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

  61. [61]

    Matching networks for one shot learning

    Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016

  62. [62]

    Algorithmic learning in a random world

    Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer Science & Business Media, 2005

  63. [63]

    Noisegpt: Label noise detection and rectification through probability curvature

    Haoyu Wang, Zhuo Huang, Zhiwei Lin, and Tongliang Liu. Noisegpt: Label noise detection and rectification through probability curvature. Advances in Neural Information Processing Systems, 37: 0 120159--120183, 2024

  64. [64]

    Learning with noisy labels revisited: A study using real-world human annotations

    Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TBWA6PLJZQm

  65. [65]

    Are anchor points really indispensable in label-noise learning? Advances in neural information processing systems, 32, 2019

    Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. Are anchor points really indispensable in label-noise learning? Advances in neural information processing systems, 32, 2019

  66. [66]

    Latent class-conditional noise model

    Jiangchao Yao, Bo Han, Zhihan Zhou, Ya Zhang, and Ivor W Tsang. Latent class-conditional noise model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (8): 0 9964--9980, 2023

  67. [67]

    Dual t: Reducing estimation error for transition matrix in label-noise learning

    Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Jiankang Deng, Gang Niu, and Masashi Sugiyama. Dual t: Reducing estimation error for transition matrix in label-noise learning. Advances in neural information processing systems, 33: 0 7260--7271, 2020

  68. [68]

    Understanding deep learning requires rethinking generalization

    Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. Proceedings of the National Academy of Sciences, 118 0 (3), 2021

  69. [69]

    Vision-language models for vision tasks: A survey

    Jingyi Zhang, Jiaxing Huang, Sheng Jin, and Shijian Lu. Vision-language models for vision tasks: A survey. IEEE transactions on pattern analysis and machine intelligence, 46 0 (8): 0 5625--5644, 2024

  70. [70]

    Multiclass learning from noisy labels for non-decomposable performance measures

    Mingyuan Zhang and Shivani Agarwal. Multiclass learning from noisy labels for non-decomposable performance measures. In International Conference on Artificial Intelligence and Statistics, pages 2170--2178. PMLR, 2024

  71. [71]

    Generalized cross entropy loss for training deep neural networks with noisy labels

    Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018

  72. [72]

    A review of convolutional neural networks in computer vision

    Xia Zhao, Limin Wang, Yufei Zhang, Xuming Han, Muhammet Deveci, and Milan Parmar. A review of convolutional neural networks in computer vision. Artificial Intelligence Review, 57 0 (4): 0 99, 2024

  73. [73]

    Training cost-sensitive neural networks with methods addressing the class imbalance problem

    Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on knowledge and data engineering, 18 0 (1): 0 63--77, 2006

  74. [74]

    Label noise: Ignorance is bliss

    Yilun Zhu, Jianxin Zhang, Aditya Gangrade, and Clay Scott. Label noise: Ignorance is bliss. Advances in Neural Information Processing Systems, 37: 0 116575--116616, 2024