LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training

arxiv: 2509.20786 · v4 · submitted 2025-09-25 · 💻 cs.LG

LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training

Abhishek Moturu , Muhammad Muzammil , Anna Goldenberg , Babak Taati This is my paper

Pith reviewed 2026-05-18 14:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords noisy trainingadaptive weightingsample difficultyrobust deep learningmedical imaginggeneralizationlightweight methods

0 comments p. Extension

The pith

LiLAW learns sample difficulty weights for noisy training using three global scalars updated by one gradient step on a validation batch.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LiLAW to address noise and heterogeneity when training deep networks. It assigns each sample an adaptive loss weight by classifying it as easy, moderate, or hard through three learnable scalar parameters. After every training mini-batch the parameters receive a single gradient-descent update computed on a validation mini-batch, without any requirement that the validation data be clean or unbiased. Experiments on general and medical imaging datasets, multiple noise types and levels, loss functions, and architectures demonstrate consistent gains in accuracy and AUROC that are largest at high noise. A reader would care because the approach adds almost no overhead yet improves robustness across practical, resource-limited settings.

Core claim

LiLAW categorizes training samples into easy, moderate, and hard difficulty using three global learnable scalar parameters that are updated after each training mini-batch by a single gradient descent step performed on a validation mini-batch. This procedure allows the model to adaptively reweight the loss without requiring a clean or representative validation set and yields improved generalization on noisy data.

What carries the argument

Three global learnable scalar parameters that define loss weights for easy, moderate, and hard samples and are adjusted by one gradient descent step on a validation mini-batch.

If this is right

Accuracy and AUROC rise across general and medical imaging datasets under varied noise types and levels, with larger gains at higher noise.
The method works with multiple loss functions, architectures, pretraining regimes, linear probing, and full fine-tuning.
State-of-the-art results are obtained when synthetic and augmented data from SynPAIN, GAITGen, and ECG5000 are incorporated.
Fairness metrics improve on the Adult dataset.
The approach remains computationally lightweight and suitable for resource-constrained environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-parameter update rule could be tested in online or streaming settings where new noisy samples arrive continuously.
Because validation data need not be clean, the method might reduce preprocessing costs in medical imaging pipelines that already contain label noise.
Difficulty weighting learned this way may complement curriculum-learning schedules that currently rely on hand-crafted or precomputed difficulty scores.

Load-bearing premise

A single gradient descent step on a validation mini-batch is enough to learn difficulty weights that work well on the full training distribution even when the validation batch is noisy.

What would settle it

Running LiLAW on a high-noise dataset where accuracy or AUROC does not improve or decreases relative to the unweighted baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2509.20786 by Abhishek Moturu, Anna Goldenberg, Babak Taati, Muhammad Muzammil.

**Figure 2.** Figure 2: A graphical representation of the LiLAW weighting method. Darker areas correspond to [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Top-1 accuracy, top-5 accuracy, and AUROC with and without LiLAW using different [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Plots showing how α, β, δ change over the course of training with 0% uniform noise and 50% uniform noise on CIFAR-100-M. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Plots with means and standard deviations of top-1 accuracy, top-5 accuracy, and AUROC [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Accuracy with and without LiLAW on ten 2D datasets from MedMNISTv2. [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: AUROC with and without LiLAW on ten 2D datasets from MedMNISTv2. [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

read the original abstract

Training deep neural networks with noise and data heterogeneity is a major challenge. We introduce Lightweight Learnable Adaptive Weighting (LiLAW), a method that dynamically adjusts the loss weight of each training sample based on its evolving difficulty, categorized as easy, moderate, and hard, using only three global learnable scalar parameters. LiLAW learns to adaptively prioritize samples by updating these parameters with a single gradient descent step on a validation mini-batch after each training mini-batch, without requiring a clean, unbiased validation set. Experiments across general and medical imaging datasets, several noise types and levels, loss functions, and architectures with and without pretraining, including linear probing and full fine-tuning, show that LiLAW consistently improves accuracy and AUROC, especially in higher-noise settings, without requiring excessive tuning. We also obtain state-of-the-art results incorporating synthetic and augmented data from SynPAIN, GAITGen, ECG5000, and improved fairness on the Adult dataset. LiLAW is lightweight, practical, and computationally efficient, making it an effective, scalable approach to boost generalization and robustness across diverse deep learning training setups, especially in resource-constrained settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LiLAW's three global scalars updated by one noisy validation gradient step deliver reported gains on noisy data but lack a clear reason why that update separates signal from noise.

read the letter

LiLAW keeps the weighting mechanism to three learnable scalars that control loss contributions for easy, moderate, and hard samples. After every training batch it takes one gradient step on a validation mini-batch to adjust those scalars, and the abstract says this works even when the validation data is drawn from the same noisy pool as training. The experiments span general vision datasets, medical imaging, multiple noise models, architectures, and both pre-trained and from-scratch training, with extra results on fairness and some synthetic data generators. That range is the main practical strength: the method is cheap to run and the authors position it for settings where clean validation data is unavailable. The design itself is a modest variant on existing difficulty-aware weighting ideas, but the extreme parameter count and single-step rule make it easy to implement and test. The paper therefore earns credit for showing usable improvements in higher-noise regimes without extra hyper-parameter search. The central weakness is that the update rule itself is not shown to be stable or corrective. When the validation batch contains the same label noise as the training set, the gradient can easily be dominated by mislabeled points, so the scalars may end up reinforcing rather than correcting the difficulty estimates. No derivation or fixed-point analysis is supplied to explain why one step should produce weights that generalize better, and the abstract supplies no error bars, ablation tables, or statistical tests to quantify how much of the reported lift comes from the weighting versus other factors. If the full experiments contain those controls and the gains survive them, the contribution is a useful engineering note. If they do not, the method reduces to another heuristic that sometimes helps. This is the sort of paper that belongs in an applied machine-learning venue or a robustness workshop. Practitioners who already run noisy medical or sensor data would find the implementation details worth trying. A serious editor should send it to review so the experimental claims and the validation-update assumption can be checked directly rather than desk-rejected on the abstract alone.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces LiLAW, a lightweight adaptive weighting method for training deep networks under label noise and data heterogeneity. It categorizes samples into easy/moderate/hard difficulty levels and controls their loss contributions via three global learnable scalar parameters. After each training mini-batch update, the parameters are adjusted by a single gradient-descent step on a validation mini-batch drawn from the same (potentially noisy) distribution; the method claims this does not require a clean or unbiased validation set. Experiments across general and medical imaging datasets, multiple noise types/levels, loss functions, and architectures (with/without pretraining, linear probing, and fine-tuning) report consistent gains in accuracy and AUROC, with state-of-the-art results on synthetic/augmented data from SynPAIN, GAITGen, and ECG5000 plus improved fairness on Adult.

Significance. If the empirical gains are robust, LiLAW would provide a practical, low-overhead alternative to existing reweighting or meta-learning approaches for noisy-label training. Its use of only three scalar parameters, single-step updates, and explicit avoidance of clean validation data are genuine strengths for resource-constrained or real-world settings; the breadth of tested datasets, noise regimes, and training protocols (including fairness) adds to its potential utility if the central mechanism is shown to be stable.

major comments (2)

[§3 (method), update rule] §3 (method), update rule θ_{t+1} = θ_t − η ∇_θ L_val(B_val; θ_t): the manuscript provides no derivation or fixed-point analysis showing that a single gradient step on a mini-batch drawn from the same noisy distribution produces weights whose stationary point preferentially down-weights mislabeled samples. Under symmetric or class-conditional noise this gradient can be dominated by noisy examples, raising the risk that the update reinforces rather than corrects difficulty estimates; this assumption is load-bearing for the claim that the method works without a clean validation set.
[Experiments section] Experiments section (tables/figures reporting accuracy/AUROC): while consistent improvements are asserted across noise levels and architectures, the absence of reported error bars, ablation results on the three-parameter design versus alternatives, and statistical significance tests leaves the magnitude and reliability of the gains, especially in high-noise regimes, difficult to evaluate.

minor comments (2)

[Abstract] Abstract: the sentence listing SynPAIN, GAITGen, ECG5000 and Adult fairness results is run-on and should be split for clarity.
[Notation] Notation: define explicitly how the three scalar parameters map onto the easy/moderate/hard loss weights (e.g., via a softmax or piecewise function) and whether they are constrained to be positive.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3 (method), update rule] §3 (method), update rule θ_{t+1} = θ_t − η ∇_θ L_val(B_val; θ_t): the manuscript provides no derivation or fixed-point analysis showing that a single gradient step on a mini-batch drawn from the same noisy distribution produces weights whose stationary point preferentially down-weights mislabeled samples. Under symmetric or class-conditional noise this gradient can be dominated by noisy examples, raising the risk that the update reinforces rather than corrects difficulty estimates; this assumption is load-bearing for the claim that the method works without a clean validation set.

Authors: We acknowledge that the current manuscript does not contain a formal derivation or fixed-point analysis of the single-gradient-step update. The update is motivated by the practical observation that a single step on a validation mini-batch (even when drawn from the same noisy distribution) yields a direction that improves downstream generalization, as evidenced by consistent gains across symmetric, class-conditional, and real-world noise regimes in our experiments. We agree that a more explicit discussion of the underlying assumptions would be valuable. In the revision we will add a dedicated paragraph in §3 that provides the design intuition, notes the lack of a full stationary-point guarantee, and discusses the empirical robustness observed under different noise models. revision: partial
Referee: [Experiments section] Experiments section (tables/figures reporting accuracy/AUROC): while consistent improvements are asserted across noise levels and architectures, the absence of reported error bars, ablation results on the three-parameter design versus alternatives, and statistical significance tests leaves the magnitude and reliability of the gains, especially in high-noise regimes, difficult to evaluate.

Authors: We agree that the current experimental presentation would benefit from additional statistical rigor. In the revised manuscript we will (i) report mean ± standard deviation over at least five independent runs for all accuracy and AUROC tables and figures, (ii) include an ablation study comparing the three-scalar design against variants with one, two, or five parameters, and (iii) add paired t-test p-values to quantify statistical significance of the reported improvements, with particular emphasis on the high-noise settings. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with independent experimental validation

full rationale

The paper introduces LiLAW as a practical algorithm that updates three scalar weighting parameters via one gradient step on a validation mini-batch drawn from the training distribution. No derivation chain is presented that reduces a claimed prediction or first-principles result to its own inputs by construction. The core claim rests on empirical results across multiple datasets, noise levels, and architectures rather than on a self-referential mathematical identity or load-bearing self-citation. The method description is self-contained and does not invoke uniqueness theorems, ansatzes smuggled via prior work, or renaming of known results as new derivations.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on three learnable scalar parameters that are fitted during training and on the assumption that samples possess an evolving difficulty amenable to three-category classification.

free parameters (1)

three global learnable scalar parameters
These scalars control the loss weighting for easy, moderate, and hard samples and are updated via gradient descent.

axioms (1)

domain assumption Training samples can be meaningfully categorized into easy, moderate, and hard based on evolving model difficulty
This categorization is required for the adaptive weighting to function as described.

pith-pipeline@v0.9.0 · 5748 in / 1391 out tokens · 48269 ms · 2026-05-18T14:57:35.367514+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using only three learnable parameters, LiLAW adaptively prioritizes informative samples throughout training by updating these weights using a single mini-batch gradient descent step on the validation set after each training mini-batch
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Wα(si,eyi)=σ(α·si[eyi]−max(si)), Wβ=σ(−(β·si[eyi]−max(si))), Wδ=exp(−(δ·si[eyi]−max(si))²/2)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 7 internal anchors

[1]

Robert Baldock, Hartmut Maennel, and Behnam Neyshabur

URLhttp://arxiv.org/abs/2008.11600. Robert Baldock, Hartmut Maennel, and Behnam Neyshabur. Deep learning through the lens of example difficulty.Advances in Neural Information Processing Systems, 34:10876–10889,

work page arXiv 2008
[2]

Rethinking model prototyping through the medmnist+ dataset collection.arXiv preprint arXiv:2404.15786,

Sebastian Doerrich, Francesco Di Salvo, Julius Brockmann, and Christian Ledig. Rethinking model prototyping through the medmnist+ dataset collection.arXiv preprint arXiv:2404.15786,

work page arXiv
[3]

Generalized uncertainty of deep neural networks: Taxonomy and applications.arXiv preprint arXiv:2302.01440,

Chengyu Dong. Generalized uncertainty of deep neural networks: Taxonomy and applications.arXiv preprint arXiv:2302.01440,

work page arXiv
[5]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

URLhttps://arxiv.org/abs/2010.11929. Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks,

work page internal anchor Pith review Pith/arXiv arXiv 2010
[6]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

URLhttp://arxiv.org/abs/1703.03400. Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

On Calibration of Modern Neural Networks

URLhttp://arxiv.org/abs/1706.04599. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.CoRR, abs/1512.03385,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Deep Residual Learning for Image Recognition

URLhttp://arxiv.org/abs/1512.03385. Nishant Jain, Arun S. Suggala, and Pradeep Shenoy. Improving generalization via meta-learning on hard samples,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Qingrui Jia, Xuhong Li, Lei Yu, Jiang Bian, Penghao Zhao, Shupeng Li, Haoyi Xiong, and Dejing Dou

URLhttp://arxiv.org/abs/2403.12236. Qingrui Jia, Xuhong Li, Lei Yu, Jiang Bian, Penghao Zhao, Shupeng Li, Haoyi Xiong, and Dejing Dou. Learning from training dynamics: Identifying mislabeled data beyond manually designed features,

work page arXiv
[10]

Shenwang Jiang, Jianan Li, Ying Wang, Bo Huang, Zhang Zhang, and Tingfa Xu

URLhttp://arxiv.org/abs/2212.09321. Shenwang Jiang, Jianan Li, Ying Wang, Bo Huang, Zhang Zhang, and Tingfa Xu. Delving into sample loss curve to embrace noisy and imbalanced data,

work page arXiv
[11]

Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, and Michael C Mozer

URL http://arxiv.org/ abs/2201.00849. Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, and Michael C Mozer. Characterizing structural regularities of labeled data in overparameterized models.arXiv preprint arXiv:2002.03206,

work page arXiv 2002
[12]

Diederik P Kingma

URL https://arxiv.org/abs/2203.14542. Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page arXiv
[13]

Modulated Periodic Activations for Generalizable Local Functional Representations , rights =

doi: 10.1109/ICCV48922.2021.00502. URL https://ieeexplore.ieee.org/document/ 9709930. ISSN: 2380-7504. Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. CIFAR-10 (Canadian Institute for Advanced Research). 2009a. URLhttp://www.cs.toronto.edu/ ˜kriz/cifar.html. Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-100 (canadian institute for advanced res...

work page doi:10.1109/iccv48922.2021.00502 2021
[14]

Chao Liang, Linchao Zhu, Humphrey Shi, and Yi Yang

URLhttps://arxiv.org/abs/2002.07394. Chao Liang, Linchao Zhu, Humphrey Shi, and Yi Yang. Combating label noise with a general surrogate model for sample selection.International Journal of Computer Vision, December

work page arXiv 2002
[15]

Focal Loss for Dense Object Detection

ISSN 1573-1405. doi: 10.1007/s11263-024-02324-z. URL http://dx.doi.org/10. 1007/s11263-024-02324-z. Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection.CoRR, abs/1708.02002,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s11263-024-02324-z
[16]

Rafael M¨uller, Simon Kornblith, and Geoffrey Hinton

URL http://arxiv.org/abs/2206.07137. Rafael M¨uller, Simon Kornblith, and Geoffrey Hinton. When does label smoothing help?,

work page arXiv
[17]

arXiv preprint arXiv:1906.02629 , year=

URL http://arxiv.org/abs/1906.02629. Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang. Confident learning: Estimating uncertainty in dataset labels,

work page arXiv 1906
[18]

Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite

URLhttp://arxiv.org/abs/1911.00068. Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. Deep learning on a data diet: Finding important examples early in training.Advances in Neural Information Processing Systems, 34: 20596–20607,

work page arXiv 1911
[19]

Jason Rennie

URLhttp://arxiv.org/abs/2107.07075. PhysioToolkit PhysioBank. Physionet: components of a new research resource for complex physio- logic signals.Circulation, 101(23):e215–e220,

work page arXiv
[20]

Iuliia Pliushch, Martin Mundt, Nicolas Lupp, and Visvanathan Ramesh

URL https://proceedings.neurips.cc/paper_ files/paper/2020/file/c6102b3727b2a7d8b1bb6981147081ef-Paper.pdf. Iuliia Pliushch, Martin Mundt, Nicolas Lupp, and Visvanathan Ramesh. When deep classifiers agree: Analyzing correlations between learning order and image statistics. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23...

work page 2020
[21]

Selective classification via neural network training dynamics.arXiv preprint arXiv:2205.13532,

12 Stephan Rabanser, Anvith Thudi, Kimia Hamidieh, Adam Dziedzic, and Nicolas Papernot. Selective classification via neural network training dynamics.arXiv preprint arXiv:2205.13532,

work page arXiv
[22]

Learning to Reweight Examples for Robust Deep Learning

URLhttp://arxiv.org/abs/1803.09050. Nabeel Seedat, Jonathan Crabb´e, Ioana Bica, and Mihaela van der Schaar. Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data,

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Nabeel Seedat, Nicolas Huynh, Fergus Imrie, and Mihaela van der Schaar

URL http://arxiv.org/ abs/2210.13043. Nabeel Seedat, Nicolas Huynh, Fergus Imrie, and Mihaela van der Schaar. You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling, 2024a. URL http://arxiv.org/abs/ 2406.13733. Nabeel Seedat, Fergus Imrie, and Mihaela van der Schaar. Dissecting sample hardness: A fine- grained analysis of hardne...

work page arXiv
[24]

Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon

URLhttp://arxiv.org/abs/2009.10795. Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159,

work page arXiv 2009
[25]

Toneva, A

URLhttp://arxiv.org/abs/1812.05159. Qizhou Wang, Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, and Masashi Sugiyama. Probabilistic margins for instance reweighting in adversarial training,

work page arXiv
[26]

Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, and Chao Chen

URLhttp://arxiv.org/abs/2106.07904. Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, and Chao Chen. A topological filter for learning with label noise,

work page arXiv
[27]

Yinjun Wu, Adam Stein, Jacob Gardner, and Mayur Naik

URLhttps://arxiv.org/abs/2012.04835. Yinjun Wu, Adam Stein, Jacob Gardner, and Mayur Naik. Learning to select pivotal samples for meta re-weighting,

work page arXiv 2012
[28]

Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, and Masashi Sugiyama

URLhttp://arxiv.org/abs/2302.04418. Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, and Masashi Sugiyama. Sample selection with uncertainty of losses for learning with noisy labels,

work page arXiv
[29]

Han Xiao, Kashif Rasul, and Roland V ollgraf

URL http: //arxiv.org/abs/2106.00445. Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.CoRR, abs/1708.07747,

work page arXiv
[30]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

URL http://arxiv.org/abs/ 1708.07747. Da Xu, Yuting Ye, and Chuanwei Ruan. Understanding the role of importance weighting for deep learning,

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Jiancheng Yang, Rui Shi, and Bingbing Ni

URLhttp://arxiv.org/abs/2103.15209. Jiancheng Yang, Rui Shi, and Bingbing Ni. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. InIEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 191–195,

work page arXiv
[32]

doi: 10.1109/TNNLS.2023.3284430

ISSN 2162-2388. doi: 10.1109/TNNLS.2023.3284430. URL https://ieeexplore.ieee. org/document/10155763. Conference Name: IEEE Transactions on Neural Networks and Learning Systems. Xiaoling Zhou, Ou Wu, Weiyao Zhu, and Ziyang Liang. Understanding difficulty-based sample weighting with a universal difficulty measure,

work page doi:10.1109/tnnls.2023.3284430 2023
[33]

URLhttp://arxiv.org/abs/2205.07427. 14 A APPENDIX A.1 MOTIVATING EXAMPLE Case (Label is [1,0])Predictionsy smax CE α= 10, β= 2, δ= 6 α= 9, β= 3, δ= 7 Wα Wδ Wβ WL W Wα Wδ Wβ WL W Correct & Confident[0.95,0.05]0.95 0.95 0.0510.999 0.199 0.500 1.6980.0870.999 0.000 0.130 1.1290.058Correct & Unconfident[0.60,0.40]0.60 0.60 0.5110.998 0.865 0.500 2.3631.2080.9...

work page arXiv 2080
[34]

The total isO(|θ| ·B), same as without LiLAW

=O(|θ|) for the model parameter gradients and the LiLAW parameter gradients, and O((|θ|+ 3)·B) =O(|θ| ·B) for the activations during the forward pass, where B is the batch size (assuming the same batch size for training and validation). The total isO(|θ| ·B), same as without LiLAW. 17 A.6 PERFORMANCE WITH VARIOUS NOISE LEVELS Noise Level (%) Top-1 Acc. (%...

work page 2080

[1] [1]

Robert Baldock, Hartmut Maennel, and Behnam Neyshabur

URLhttp://arxiv.org/abs/2008.11600. Robert Baldock, Hartmut Maennel, and Behnam Neyshabur. Deep learning through the lens of example difficulty.Advances in Neural Information Processing Systems, 34:10876–10889,

work page arXiv 2008

[2] [2]

Rethinking model prototyping through the medmnist+ dataset collection.arXiv preprint arXiv:2404.15786,

Sebastian Doerrich, Francesco Di Salvo, Julius Brockmann, and Christian Ledig. Rethinking model prototyping through the medmnist+ dataset collection.arXiv preprint arXiv:2404.15786,

work page arXiv

[3] [3]

Generalized uncertainty of deep neural networks: Taxonomy and applications.arXiv preprint arXiv:2302.01440,

Chengyu Dong. Generalized uncertainty of deep neural networks: Taxonomy and applications.arXiv preprint arXiv:2302.01440,

work page arXiv

[4] [5]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

URLhttps://arxiv.org/abs/2010.11929. Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks,

work page internal anchor Pith review Pith/arXiv arXiv 2010

[5] [6]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

URLhttp://arxiv.org/abs/1703.03400. Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [7]

On Calibration of Modern Neural Networks

URLhttp://arxiv.org/abs/1706.04599. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.CoRR, abs/1512.03385,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [8]

Deep Residual Learning for Image Recognition

URLhttp://arxiv.org/abs/1512.03385. Nishant Jain, Arun S. Suggala, and Pradeep Shenoy. Improving generalization via meta-learning on hard samples,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [9]

Qingrui Jia, Xuhong Li, Lei Yu, Jiang Bian, Penghao Zhao, Shupeng Li, Haoyi Xiong, and Dejing Dou

URLhttp://arxiv.org/abs/2403.12236. Qingrui Jia, Xuhong Li, Lei Yu, Jiang Bian, Penghao Zhao, Shupeng Li, Haoyi Xiong, and Dejing Dou. Learning from training dynamics: Identifying mislabeled data beyond manually designed features,

work page arXiv

[9] [10]

Shenwang Jiang, Jianan Li, Ying Wang, Bo Huang, Zhang Zhang, and Tingfa Xu

URLhttp://arxiv.org/abs/2212.09321. Shenwang Jiang, Jianan Li, Ying Wang, Bo Huang, Zhang Zhang, and Tingfa Xu. Delving into sample loss curve to embrace noisy and imbalanced data,

work page arXiv

[10] [11]

Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, and Michael C Mozer

URL http://arxiv.org/ abs/2201.00849. Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, and Michael C Mozer. Characterizing structural regularities of labeled data in overparameterized models.arXiv preprint arXiv:2002.03206,

work page arXiv 2002

[11] [12]

Diederik P Kingma

URL https://arxiv.org/abs/2203.14542. Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page arXiv

[12] [13]

Modulated Periodic Activations for Generalizable Local Functional Representations , rights =

doi: 10.1109/ICCV48922.2021.00502. URL https://ieeexplore.ieee.org/document/ 9709930. ISSN: 2380-7504. Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. CIFAR-10 (Canadian Institute for Advanced Research). 2009a. URLhttp://www.cs.toronto.edu/ ˜kriz/cifar.html. Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-100 (canadian institute for advanced res...

work page doi:10.1109/iccv48922.2021.00502 2021

[13] [14]

Chao Liang, Linchao Zhu, Humphrey Shi, and Yi Yang

URLhttps://arxiv.org/abs/2002.07394. Chao Liang, Linchao Zhu, Humphrey Shi, and Yi Yang. Combating label noise with a general surrogate model for sample selection.International Journal of Computer Vision, December

work page arXiv 2002

[14] [15]

Focal Loss for Dense Object Detection

ISSN 1573-1405. doi: 10.1007/s11263-024-02324-z. URL http://dx.doi.org/10. 1007/s11263-024-02324-z. Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection.CoRR, abs/1708.02002,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s11263-024-02324-z

[15] [16]

Rafael M¨uller, Simon Kornblith, and Geoffrey Hinton

URL http://arxiv.org/abs/2206.07137. Rafael M¨uller, Simon Kornblith, and Geoffrey Hinton. When does label smoothing help?,

work page arXiv

[16] [17]

arXiv preprint arXiv:1906.02629 , year=

URL http://arxiv.org/abs/1906.02629. Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang. Confident learning: Estimating uncertainty in dataset labels,

work page arXiv 1906

[17] [18]

Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite

URLhttp://arxiv.org/abs/1911.00068. Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. Deep learning on a data diet: Finding important examples early in training.Advances in Neural Information Processing Systems, 34: 20596–20607,

work page arXiv 1911

[18] [19]

Jason Rennie

URLhttp://arxiv.org/abs/2107.07075. PhysioToolkit PhysioBank. Physionet: components of a new research resource for complex physio- logic signals.Circulation, 101(23):e215–e220,

work page arXiv

[19] [20]

Iuliia Pliushch, Martin Mundt, Nicolas Lupp, and Visvanathan Ramesh

URL https://proceedings.neurips.cc/paper_ files/paper/2020/file/c6102b3727b2a7d8b1bb6981147081ef-Paper.pdf. Iuliia Pliushch, Martin Mundt, Nicolas Lupp, and Visvanathan Ramesh. When deep classifiers agree: Analyzing correlations between learning order and image statistics. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23...

work page 2020

[20] [21]

Selective classification via neural network training dynamics.arXiv preprint arXiv:2205.13532,

12 Stephan Rabanser, Anvith Thudi, Kimia Hamidieh, Adam Dziedzic, and Nicolas Papernot. Selective classification via neural network training dynamics.arXiv preprint arXiv:2205.13532,

work page arXiv

[21] [22]

Learning to Reweight Examples for Robust Deep Learning

URLhttp://arxiv.org/abs/1803.09050. Nabeel Seedat, Jonathan Crabb´e, Ioana Bica, and Mihaela van der Schaar. Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data,

work page internal anchor Pith review Pith/arXiv arXiv

[22] [23]

Nabeel Seedat, Nicolas Huynh, Fergus Imrie, and Mihaela van der Schaar

URL http://arxiv.org/ abs/2210.13043. Nabeel Seedat, Nicolas Huynh, Fergus Imrie, and Mihaela van der Schaar. You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling, 2024a. URL http://arxiv.org/abs/ 2406.13733. Nabeel Seedat, Fergus Imrie, and Mihaela van der Schaar. Dissecting sample hardness: A fine- grained analysis of hardne...

work page arXiv

[23] [24]

Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon

URLhttp://arxiv.org/abs/2009.10795. Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159,

work page arXiv 2009

[24] [25]

Toneva, A

URLhttp://arxiv.org/abs/1812.05159. Qizhou Wang, Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, and Masashi Sugiyama. Probabilistic margins for instance reweighting in adversarial training,

work page arXiv

[25] [26]

Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, and Chao Chen

URLhttp://arxiv.org/abs/2106.07904. Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, and Chao Chen. A topological filter for learning with label noise,

work page arXiv

[26] [27]

Yinjun Wu, Adam Stein, Jacob Gardner, and Mayur Naik

URLhttps://arxiv.org/abs/2012.04835. Yinjun Wu, Adam Stein, Jacob Gardner, and Mayur Naik. Learning to select pivotal samples for meta re-weighting,

work page arXiv 2012

[27] [28]

Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, and Masashi Sugiyama

URLhttp://arxiv.org/abs/2302.04418. Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, and Masashi Sugiyama. Sample selection with uncertainty of losses for learning with noisy labels,

work page arXiv

[28] [29]

Han Xiao, Kashif Rasul, and Roland V ollgraf

URL http: //arxiv.org/abs/2106.00445. Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.CoRR, abs/1708.07747,

work page arXiv

[29] [30]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

URL http://arxiv.org/abs/ 1708.07747. Da Xu, Yuting Ye, and Chuanwei Ruan. Understanding the role of importance weighting for deep learning,

work page internal anchor Pith review Pith/arXiv arXiv

[30] [31]

Jiancheng Yang, Rui Shi, and Bingbing Ni

URLhttp://arxiv.org/abs/2103.15209. Jiancheng Yang, Rui Shi, and Bingbing Ni. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. InIEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 191–195,

work page arXiv

[31] [32]

doi: 10.1109/TNNLS.2023.3284430

ISSN 2162-2388. doi: 10.1109/TNNLS.2023.3284430. URL https://ieeexplore.ieee. org/document/10155763. Conference Name: IEEE Transactions on Neural Networks and Learning Systems. Xiaoling Zhou, Ou Wu, Weiyao Zhu, and Ziyang Liang. Understanding difficulty-based sample weighting with a universal difficulty measure,

work page doi:10.1109/tnnls.2023.3284430 2023

[32] [33]

URLhttp://arxiv.org/abs/2205.07427. 14 A APPENDIX A.1 MOTIVATING EXAMPLE Case (Label is [1,0])Predictionsy smax CE α= 10, β= 2, δ= 6 α= 9, β= 3, δ= 7 Wα Wδ Wβ WL W Wα Wδ Wβ WL W Correct & Confident[0.95,0.05]0.95 0.95 0.0510.999 0.199 0.500 1.6980.0870.999 0.000 0.130 1.1290.058Correct & Unconfident[0.60,0.40]0.60 0.60 0.5110.998 0.865 0.500 2.3631.2080.9...

work page arXiv 2080

[33] [34]

The total isO(|θ| ·B), same as without LiLAW

=O(|θ|) for the model parameter gradients and the LiLAW parameter gradients, and O((|θ|+ 3)·B) =O(|θ| ·B) for the activations during the forward pass, where B is the batch size (assuming the same batch size for training and validation). The total isO(|θ| ·B), same as without LiLAW. 17 A.6 PERFORMANCE WITH VARIOUS NOISE LEVELS Noise Level (%) Top-1 Acc. (%...

work page 2080