arxiv: 2604.07306 · v1 · submitted 2026-04-08 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment

Huaiyuan Qin , Muli Yang , Gabriel James Goenawan , Kai Wang , Zheng Wang , Peng Hu , Xi Peng , Hongyuan Zhu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:08 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords dynamic pruningnoisy labelsloss trajectorydata pruningrobust learningimage classificationlabel noise

0 comments

The pith

Tracking how loss evolves during training identifies noisy samples for pruning more reliably than their final loss value.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Dynamic pruning methods that rely on a single loss value often preserve noisy samples because those samples show high loss, which degrades final model accuracy. The paper introduces AlignPrune, a module that instead measures alignment of each sample's full loss trajectory across training epochs to decide what to keep. This change lets pruning frameworks discard mislabeled data more accurately while remaining a plug-and-play addition to existing pipelines. The result matters for real-world training, where label noise is common and efficient data selection can reduce compute without sacrificing performance.

Core claim

AlignPrune adds a Dynamic Alignment Score that ranks samples by how their loss curves align over time rather than by instantaneous loss magnitude, enabling more accurate removal of noisy examples during dynamic pruning and producing accuracy gains of up to 6.3 percent over prior methods on noisy benchmarks.

What carries the argument

Dynamic Alignment Score (DAS), a loss-trajectory-based ranking criterion that measures how closely each sample's loss sequence follows the pattern expected for clean data and is used to decide pruning order.

If this is right

Existing dynamic pruning frameworks can be made robust to label noise by swapping their ranking criterion for DAS without changing model architecture or training loop.
Accuracy under label noise improves consistently across multiple benchmarks and noise levels while pruning ratios remain unchanged.
The method provides a generalizable way to handle pruning when training data contains mislabeled examples.
No additional hyper-parameters or dataset-specific tuning are required beyond the original pruning schedule.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Loss dynamics may capture sample difficulty and label quality more reliably than static loss snapshots in other noisy-label settings.
The same trajectory idea could be tested for selecting data in semi-supervised or active learning pipelines.
If trajectory alignment proves stable, it suggests pruning criteria should be defined over sequences rather than single training steps.

Load-bearing premise

Loss trajectories of noisy samples differ consistently from those of clean samples across different noise types and pruning ratios.

What would settle it

On a controlled noisy dataset, if DAS ranks noisy samples for retention at rates no better than single-loss ranking, or if accuracy does not rise when DAS replaces the baseline criterion, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.07306 by Gabriel James Goenawan, Hongyuan Zhu, Huaiyuan Qin, Kai Wang, Muli Yang, Peng Hu, Xi Peng, Zheng Wang.

**Figure 1.** Figure 1: Illustrations of: (a) Our Motivation: Performance comparison of static vs. dynamic pruning methods under label noise. (upper) Dynamic pruning is inherently more robust to noise than static pruning. Random dynamic pruning consistently outperforms random static pruning across all noise types. (bottom) This motivates our work to further improve dynamic methods, where our AlignPrune achieves a new state-of-the… view at source ↗

**Figure 2.** Figure 2: Illustration of the proposed plug-and-play AlignPrune module. Existing dynamic pruning methods [42, 67] rank samples based on their loss values, which could mistakenly result in a final subset S (t) retained with noisy samples. Our method replaces the ranking score with DAS, where noisy samples tend to have low scores to be correctly pruned, resulting in a robust subset S (t) composed primarily of clean sa… view at source ↗

**Figure 3.** Figure 3: Data pruning performance comparison: (a) WebVision; (b) Clothing-1M. Best viewed in color. drop of -1.4% in the same setting. These results demonstrate that AlignPrune can be effectively integrated into dynamic methods to improve performance under both real and synthetic label noise, without sacrificing the effectiveness when dealing with clean data. See Tab. J for statistical analysis. To further valida… view at source ↗

**Figure 4.** Figure 4: Ablation on hyper-parameter selection on CIFAR-100N. Best viewed in color [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation on AlignPrune with varying reference set sizes. Reference set size is reduced from 100% to 0.1% of the original one. AlignPrune remains effective even with minimal clean supervision. Best viewed in color. noisy training set as the reference for comparison. Results show that while this performs similarly in the clean-label setting (where train and reference distributions match), it leads to signifi… view at source ↗

read the original abstract

Existing dynamic data pruning methods often fail under noisy-label settings, as they typically rely on per-sample loss as the ranking criterion. This could mistakenly lead to preserving noisy samples due to their high loss values, resulting in significant performance drop. To address this, we propose AlignPrune, a noise-robust module designed to enhance the reliability of dynamic pruning under label noise. Specifically, AlignPrune introduces the Dynamic Alignment Score (DAS), which is a loss-trajectory-based criterion that enables more accurate identification of noisy samples, thereby improving pruning effectiveness. As a simple yet effective plug-and-play module, AlignPrune can be seamlessly integrated into state-of-the-art dynamic pruning frameworks, consistently outperforming them without modifying either the model architecture or the training pipeline. Extensive experiments on five widely-used benchmarks across various noise types and pruning ratios demonstrate the effectiveness of AlignPrune, boosting accuracy by up to 6.3\% over state-of-the-art baselines. Our results offer a generalizable solution for pruning under noisy data, encouraging further exploration of learning in real-world scenarios. Code is available at: https://github.com/leonqin430/AlignPrune.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AlignPrune swaps single-loss pruning for a trajectory alignment score to handle noisy labels better, with reported gains but limited method details.

read the letter

The central idea here is replacing per-sample loss with a Dynamic Alignment Score drawn from loss trajectories over training. This aims to flag noisy samples more reliably during dynamic pruning instead of keeping high-loss noisy ones by mistake. The paper positions it as a plug-and-play addition to existing frameworks without touching the model or pipeline, and the abstract claims consistent gains across five benchmarks, noise types, and pruning ratios, up to 6.3% over baselines. That practical framing is the main strength: it targets a real issue in real-world CV training where label noise is common and efficiency matters. The code link is also a plus for anyone wanting to test it directly. The soft spots are more noticeable. The abstract stays high-level on exactly how DAS is computed from trajectories, with no equations, no ablation breakdowns, and no mention of error bars or hyperparameter checks. This makes it hard to assess whether the trajectory shape really separates noise invariantly or if the gains depend on dataset-specific fitting that could introduce its own biases at different pruning ratios. The stress-test concern about reliability across noise types without new selection issues holds up as a fair open question given what's shown. This is for people already running dynamic pruning pipelines on noisy vision data who want a drop-in tweak. A reader focused on noisy-label efficiency could get value from trying the code, but the work needs clearer method exposition and controls to stand on its own. It deserves peer review to sort out the details and verify the claims, even if revisions on ablations and sensitivity would be expected.

Referee Report

4 major / 2 minor

Summary. The paper claims that existing dynamic pruning methods relying on per-sample loss fail under label noise by preserving high-loss noisy samples. It introduces AlignPrune as a plug-and-play module using a new Dynamic Alignment Score (DAS) derived from loss trajectories to more accurately identify noisy samples. The method integrates into existing frameworks without architecture or pipeline changes and reports accuracy gains of up to 6.3% over SOTA baselines across five benchmarks, various noise types, and pruning ratios.

Significance. If the DAS criterion demonstrably separates noisy from clean samples more reliably than loss alone without introducing dataset-specific biases or requiring extensive tuning, the work would offer a practical improvement for robust pruning in noisy real-world settings. The plug-and-play design and code release are positive for adoption, but the significance hinges on whether trajectory alignment provides independent, generalizable signal beyond what per-sample loss already captures.

major comments (4)

[§3] §3 (Method), DAS definition: the abstract and high-level description do not specify the exact formula for computing alignment between loss trajectories (e.g., whether it uses cosine similarity, DTW, or another metric, and any normalization or windowing choices). This makes it impossible to assess whether DAS is truly parameter-free or reduces to a fitted quantity that could overfit to the observed training dynamics.
[Experiments] Experiments section (results tables): the reported gains of up to 6.3% lack accompanying error bars, standard deviations across runs, or statistical tests. Without these, it is unclear whether the improvements over baselines are consistent or could be explained by variance in training or pruning ratio selection.
[§4] §4 (Experiments), ablation studies: no ablations isolate the contribution of trajectory alignment from other factors such as the base pruning framework, hyperparameter choices for DAS, or simple alternatives like smoothed loss. This leaves the central claim that DAS enables 'more accurate identification of noisy samples' unverified against the skeptic concern of new selection biases at different pruning ratios.
[§3.1–3.2] §3.1–3.2, noise-type invariance: the method assumes loss-trajectory shapes differ reliably between noisy and clean samples across symmetric and asymmetric noise without dataset-specific tuning. No theoretical characterization or targeted experiments (e.g., per-noise-type trajectory plots) are described to support this, risking that the reported gains are not generalizable.

minor comments (2)

[Abstract] The abstract states 'Code is available at: https://github.com/leonqin430/AlignPrune' but does not indicate whether the released code includes the exact DAS implementation, hyperparameter settings used in the reported experiments, or scripts for reproducing the five-benchmark results.
[§3] Notation for loss trajectories and alignment score should be introduced with explicit equations early in §3 to improve readability, rather than relying on prose descriptions.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate revisions to improve clarity, rigor, and completeness of the manuscript.

read point-by-point responses

Referee: [§3] §3 (Method), DAS definition: the abstract and high-level description do not specify the exact formula for computing alignment between loss trajectories (e.g., whether it uses cosine similarity, DTW, or another metric, and any normalization or windowing choices). This makes it impossible to assess whether DAS is truly parameter-free or reduces to a fitted quantity that could overfit to the observed training dynamics.

Authors: We thank the referee for pointing this out. We will revise §3 to explicitly provide the mathematical definition of the Dynamic Alignment Score (DAS), including the alignment metric used, normalization procedure, and windowing choices. This will clarify that it is computed directly from the loss trajectories without additional fitting or parameters that could lead to overfitting to the observed dynamics. revision: yes
Referee: [Experiments] Experiments section (results tables): the reported gains of up to 6.3% lack accompanying error bars, standard deviations across runs, or statistical tests. Without these, it is unclear whether the improvements over baselines are consistent or could be explained by variance in training or pruning ratio selection.

Authors: We agree that including measures of variability is important for assessing the reliability of the reported gains. In the revised version, we will add error bars (standard deviations from multiple runs) to the results tables and include statistical tests (e.g., t-tests) to confirm the significance of improvements over baselines. revision: yes
Referee: [§4] §4 (Experiments), ablation studies: no ablations isolate the contribution of trajectory alignment from other factors such as the base pruning framework, hyperparameter choices for DAS, or simple alternatives like smoothed loss. This leaves the central claim that DAS enables 'more accurate identification of noisy samples' unverified against the skeptic concern of new selection biases at different pruning ratios.

Authors: We will include additional ablation studies in the revised manuscript to isolate the contribution of the trajectory alignment component. This will include comparisons against smoothed loss alternatives, sensitivity to DAS hyperparameters, and analysis of selection at different pruning ratios to address potential biases. revision: yes
Referee: [§3.1–3.2] §3.1–3.2, noise-type invariance: the method assumes loss-trajectory shapes differ reliably between noisy and clean samples across symmetric and asymmetric noise without dataset-specific tuning. No theoretical characterization or targeted experiments (e.g., per-noise-type trajectory plots) are described to support this, risking that the reported gains are not generalizable.

Authors: Our experiments already demonstrate performance across symmetric and asymmetric noise without dataset-specific tuning. To further support the invariance, we will add per-noise-type loss trajectory visualization plots in the appendix and a discussion in §3.1–3.2. A complete theoretical characterization is not provided as the work is primarily empirical, but the added experiments will help verify generalizability. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in DAS derivation or claims

full rationale

The paper introduces AlignPrune and its Dynamic Alignment Score (DAS) as a novel loss-trajectory-based criterion for robust pruning under label noise. The abstract and description frame DAS as an independent plug-and-play module whose value is shown through empirical gains (up to 6.3%) on five benchmarks rather than by algebraic reduction to the input loss values or training trajectories. No equations, self-citations, or uniqueness theorems are presented that would make the central claim tautological or force the result by construction. The derivation chain remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger is incomplete. The method appears to rest on the assumption that loss trajectories contain reliable signal for noise detection, but no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5521 in / 1068 out tokens · 34173 ms · 2026-05-10T18:08:37.432989+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DAS(t)_i = ρ(v_i^(t), v_ref^(t)) where ρ is Pearson correlation; clean samples follow monotonic loss decay aligning with reference while noisy samples exhibit erratic patterns (Sec. 3.2, Fig. 1c)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Plug-and-play replacement of loss-based ranking in InfoBatch/SeTa; no architecture or pipeline change (Algorithm 1)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 6 canonical work pages · 1 internal anchor

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 1

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Contextual diversity for active learning

Sharat Agarwal, Himanshu Arora, Saket Anand, and Chetan Arora. Contextual diversity for active learning. InECCV, pages 137–153. Springer, 2020. 2

2020
[3]

Approxi- mating memorization using loss surface geometry for dataset pruning and summarization

Andrea Agiollo, Young In Kim, and Rajiv Khanna. Approxi- mating memorization using loss surface geometry for dataset pruning and summarization. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Min- ing, pages 17–28, 2024. 2

2024
[4]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InNeurIPS, pages 1877–1901,

1901
[5]

Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels

Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels. InICML, pages 1062–1070, 2019. 13

2019
[6]

Lightweight dataset pruning without full training via example difficulty and prediction uncertainty

Yeseul Cho, Baekrok Shin, Changmin Kang, and Chulhee Yun. Lightweight dataset pruning without full training via example difficulty and prediction uncertainty. InICML, 2025. 2, 5, 15, 16

2025
[7]

Selec- tion via proxy: Efficient data selection for deep learning

Cody Coleman, Christopher Yeh, Stephen Mussmann, Baha- ran Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. Selection via Proxy: Efficient Data Se- lection for Deep Learning.arXiv preprint arXiv:1906.11829,

work page arXiv 1906
[8]

Imagenet: A Large-scale Hierarchical Image Database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A Large-scale Hierarchical Image Database. InCVPR, pages 248–255, 2009. 2, 5, 7, 13

2009
[9]

Scaling language-free visual representation learning

David Fan, Shengbang Tong, Jiachen Zhu, Koustuv Sinha, Zhuang Liu, Xinlei Chen, Michael Rabbat, Nicolas Ballas, Yann LeCun, Amir Bar, et al. Scaling language-free visual representation learning. InICCV, 2025. 1

2025
[10]

Samples with low loss curvature improve data efficiency

Isha Garg and Kaushik Roy. Samples with low loss curvature improve data efficiency. InCVPR, pages 20290–20300, 2023. 2

2023
[11]

Deepcore: A Comprehensive Library for Coreset Selection in Deep Learn- ing

Chengcheng Guo, Bo Zhao, and Yanbing Bai. Deepcore: A Comprehensive Library for Coreset Selection in Deep Learn- ing. InDEXA, pages 181–195, 2022. 1, 2

2022
[12]

Scan: Bootstrapping contrastive pre-training for data efficiency

Yangyang Guo and Mohan Kankanhalli. Scan: Bootstrapping contrastive pre-training for data efficiency. InICCV, 2025. 2, 13

2025
[13]

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels. InNeurIPS, pages 8536–8546, 2018. 1, 2

2018
[14]

Atif Hassan, Swanand Khare, and Jiaul H. Paik. RCAP: Robust, class-aware, probabilistic dynamic dataset pruning. InUAI, 2025. 2, 13

2025
[15]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InCVPR, pages 770–778, 2016. 13

2016
[16]

Large- scale dataset pruning with dynamic uncertainty

Muyang He, Shuo Yang, Tiejun Huang, and Bo Zhao. Large- scale dataset pruning with dynamic uncertainty. InCVPR, pages 7713–7722, 2024. 2, 5, 15, 16

2024
[17]

Diversified batch selection for training acceleration

Feng Hong, Yueming Lyu, Jiangchao Yao, Ya Zhang, Ivor W Tsang, and Yanfeng Wang. Diversified batch selection for training acceleration. InICML, 2024. 2, 13, 15, 16

2024
[18]

Submodular combinatorial information measures with applications in machine learning

Rishabh Iyer, Ninad Khargoankar, Jeff Bilmes, and Himanshu Asanani. Submodular combinatorial information measures with applications in machine learning. InAlgorithmic Learn- ing Theory, pages 722–754. PMLR, 2021. 2

2021
[19]

Mentornet: Learning Data-driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning Data-driven Curriculum for Very Deep Neural Networks on Corrupted Labels. InICML, pages 2304–2313, 2018. 1, 3, 5, 8, 15, 16

2018
[20]

Grad-match: Gradient Match- ing based Data Subset Selection for Efficient Deep Model Training

Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: Gradient Match- ing based Data Subset Selection for Efficient Deep Model Training. InICML, pages 5464–5474, 2021. 2

2021
[21]

Glister: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ra- makrishnan, and Rishabh Iyer. Glister: Generalization based Data Subset Selection for Efficient and Robust Learning. In AAAI, pages 8110–8118, 2021. 2, 5, 15, 16

2021
[22]

Positive-unlabeled learning with non- negative risk estimator

Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non- negative risk estimator. InNeurIPS, 2017. 20

2017
[23]

Flaws of imagenet, computer vision’s favourite dataset

Nikita Kisel, Illia V olkov, Katerina Hanzelkova, Klara Janouskova, and Jiri Matas. Flaws of imagenet, computer vision’s favourite dataset. InICLR, 2025. 1

2025
[24]

Learning Multi- ple Layers of Features from Tiny Images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning Multi- ple Layers of Features from Tiny Images. Technical report, University of Toronto, 2009. 13

2009
[25]

Dividemix: Learning with Noisy Labels as Semi-supervised Learning

Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with Noisy Labels as Semi-supervised Learning. In ICLR, 2020. 2, 3

2020
[26]

Neighbor- hood collective estimation for noisy label identification and correction

Jichang Li, Guanbin Li, Feng Liu, and Yizhou Yu. Neighbor- hood collective estimation for noisy label identification and correction. InECCV, pages 128–145. Springer, 2022. 2

2022
[27]

Webvision Database: Visual Learning and Under- standing from Web Data.arXiv preprint arXiv:1708.02862,

Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. Webvision Database: Visual Learning and Under- standing from Web Data.arXiv preprint arXiv:1708.02862,

work page arXiv
[28]

Early-learning Regularization Pre- vents Memorization of Noisy Labels

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Car- los Fernandez-Granda. Early-learning Regularization Pre- vents Memorization of Noisy Labels. InNeurIPS, pages 20331–20342, 2020. 2, 3

2020
[29]

Robust Training under Label Noise by Over-parameterization

Sheng Liu, Zhihui Zhu, Qing Qu, and Chong You. Robust Training under Label Noise by Over-parameterization. In ICML, pages 14153–14172, 2022. 2, 3, 14, 17

2022
[30]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021. 7 9

2021
[31]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InCVPR, pages 11976–11986, 2022. 7

2022
[32]

Curriculum loss: Robust learning and generalization against label corruption

Yueming Lyu and Ivor W Tsang. Curriculum loss: Robust learning and generalization against label corruption. InICLR,
[33]

Coresets for Data-efficient Training of Machine Learning Models

Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for Data-efficient Training of Machine Learning Models. InICML, pages 6950–6960, 2020. 2

2020
[34]

Core- sets for robust training of deep neural networks against noisy labels

Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec. Core- sets for robust training of deep neural networks against noisy labels. InNeurIPS, pages 11465–11477, 2020. 13

2020
[35]

Finding the muses: Identifying coresets through loss trajectories.arXiv preprint arXiv:2503.09721, 2025

Manish Nagaraj, Deepak Ravikumar, Efstathia Soufleri, and Kaushik Roy. Finding the muses: Identifying coresets through loss trajectories.arXiv preprint arXiv:2503.09721, 2025. 2

work page arXiv 2025
[36]

Repeated random sampling for minimizing the time-to-accuracy of learning

Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Kon- stantinos Nikolakakis, Amin Karbasi, Dionysios Kalogerias, Nezihe Merve Gürel, and Theodoros Rekatsinas. Repeated random sampling for minimizing the time-to-accuracy of learning. InICLR, 2024. 1, 2, 5, 6, 15, 16

2024
[37]

DINOv2: Learning robust visual features without supervision.TMLR,

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.TMLR,
[38]

Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning

Dongmin Park, Yooju Shin, Jihwan Bang, Youngjun Lee, Hwanjun Song, and Jae-Gil Lee. Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning. InNeurIPS, pages 31416–31429, 2022. 1, 2

2022
[39]

Robust data pruning under label noise via maximizing re-labeling accuracy

Dongmin Park, Seola Choi, Doyoung Kim, Hwanjun Song, and Jae-Gil Lee. Robust data pruning under label noise via maximizing re-labeling accuracy. InNeurIPS, pages 74501– 74514, 2023. 1, 2, 3, 5, 8, 13, 14, 15, 16, 17

2023
[40]

Deep Learning on a Data Diet: Finding Important Examples Early in Training

Mansheej Paul, Surya Ganguli, and Gintare Karolina Dz- iugaite. Deep Learning on a Data Diet: Finding Important Examples Early in Training. InNeurIPS, pages 20596–20607,
[41]

Karl Pearson. Vii. note on regression and inheritance in the case of two parents.proceedings of the royal society of London, 58(347-352):240–242, 1895. 4
[42]

Infobatch: Lossless training speed up by unbiased dynamic data pruning

Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xi- angyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, et al. Infobatch: Lossless training speed up by unbiased dynamic data pruning. InICLR, 2024. 1, 2, 3, 4, 5, 6, 13, 14, 15, 16, 17

2024
[43]

Learning Transferable Visual Models from Natural Language Supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models from Natural Language Supervi- sion. InICML, pages 8748–8763, 2021. 1, 13

2021
[44]

S., Daruwalla, K., and Lipasti, M

Ravi S Raju, Kyle Daruwalla, and Mikko Lipasti. Accelerat- ing deep learning with dynamic data pruning.arXiv preprint arXiv:2111.12621, 2021. 1, 2, 3, 13

work page arXiv 2021
[45]

Active Learning for Convo- lutional Neural Networks: A Core-Set Approach

Ozan Sener and Silvio Savarese. Active Learning for Convo- lutional Neural Networks: A Core-Set Approach. InICLR,
[46]

Benchmarking object detec- tors with coco: A new path forward

Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, and Karan Desai. Benchmarking object detec- tors with coco: A new path forward. InECCV, pages 279–295. Springer, 2024. 1

2024
[47]

Learning from Noisy Labels with Deep Neural Networks: A Survey.TNNLS, pages 1–19, 2022

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. Learning from Noisy Labels with Deep Neural Networks: A Survey.TNNLS, pages 1–19, 2022. 2

2022
[48]

Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning

Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Gan- guli, and Ari Morcos. Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning. InNeurIPS, pages 19523–19536, 2022. 1, 2, 5, 15, 16

2022
[49]

Inception-v4, inception-resnet and the Impact of Residual Connections on Learning

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. Inception-v4, inception-resnet and the Impact of Residual Connections on Learning. InAAAI, pages 4278–4284, 2017. 13

2017
[50]

Data pruning via moving-one- sample-out

Haoru Tan, Sitong Wu, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, and Xiaojuan Qi. Data pruning via moving-one- sample-out. InNeurIPS, pages 18251–18262, 2023. 2

2023
[51]

Data pruning by information maximization

Haoru Tan, Sitong Wu, Wei Huang, Shizhen Zhao, and Xi- aojuan Qi. Data pruning by information maximization. In ICLR, 2025. 2

2025
[52]

arXiv preprint arXiv:1812.05159 , year=

Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gor- don. An Empirical Study of Example Forgetting during Deep Neural Network Learning.arXiv preprint arXiv:1812.05159,

work page arXiv
[53]

Training data-efficient image transformers & distillation through atten- tion

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through atten- tion. InICML, pages 10347–10357, 2021. 7

2021
[54]

Learning with Noisy Labels Revisited: A Study using Real-world Human Annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with Noisy Labels Revisited: A Study using Real-world Human Annotations. InICLR, 2022. 2, 5, 13

2022
[55]

Partial forward blocking: A novel data prun- ing paradigm for lossless training acceleration

Dongyue Wu, Zilin Guo, Jialong Zuo, Nong Sang, and Changxin Gao. Partial forward blocking: A novel data prun- ing paradigm for lossless training acceleration. InICCV, 2025. 2, 13

2025
[56]

Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning

Xiaobo Xia, Jiale Liu, Jun Yu, Xu Shen, Bo Han, and Tongliang Liu. Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning. InICLR, 2022. 1, 2, 5, 8, 15, 16

2022
[57]

Learning from Massive Noisy Labeled Data for Image Classification

Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. Learning from Massive Noisy Labeled Data for Image Classification. InCVPR, pages 2691–2699, 2015. 2, 5, 13

2015
[58]

When dynamic data selection meets data augmentation

Suorong Yang, Peng Ye, Furao Shen, and Dongzhan Zhou. When dynamic data selection meets data augmentation. In ICML, 2025. 2, 13

2025
[59]

Large batch optimization for deep learning: Training bert in 76 minutes

Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. Large batch optimization for deep learning: Training bert in 76 minutes. InICLR, 2020. 13

2020
[60]

How does disagreement help general- ization against label corruption? InICML, pages 7164–7173

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. How does disagreement help general- ization against label corruption? InICML, pages 7164–7173. PMLR, 2019. 20 10

2019
[61]

Instance-dependent early stopping

Suqin Yuan, Runqi Lin, Lei Feng, Bo Han, and Tongliang Liu. Instance-dependent early stopping. InICLR, 2025. 2, 13, 15, 16

2025
[62]

Spanning training progress: Temporal dual-depth scoring (tdds) for enhanced dataset pruning

Xin Zhang, Jiawei Du, Yunsong Li, Weiying Xie, and Joey Tianyi Zhou. Spanning training progress: Temporal dual-depth scoring (tdds) for enhanced dataset pruning. In CVPR, pages 26223–26232, 2024. 1, 2

2024
[63]

Rankmatch: Fostering confidence and consistency in learning with noisy labels

Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, and Guanbin Li. Rankmatch: Fostering confidence and consistency in learning with noisy labels. In ICCV, pages 1644–1654, 2023. 2

2023
[64]

Differential-informed sample selection accelerates multi- modal contrastive learning

Zihua Zhao, Feng Hong, Mengxi Chen, Pengyi Chen, Benyuan Liu, Jiangchao Yao, Ya Zhang, and Yanfeng Wang. Differential-informed sample selection accelerates multi- modal contrastive learning. InICCV, 2025. 2, 13

2025
[65]

Coverage-centric Coreset Selection for High Pruning Rates

Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. Coverage-centric Coreset Selection for High Pruning Rates. InICLR, 2022. 2

2022
[66]

Elfs: Label- free coreset selection with proxy training dynamics

Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian R Bartoldson, Bhavya Kailkhura, and Atul Prakash. Elfs: Label- free coreset selection with proxy training dynamics. InICLR,
[67]

Scale efficient training for large datasets

Qing Zhou, Junyu Gao, and Qi Wang. Scale efficient training for large datasets. InCVPR, 2025. 1, 2, 3, 4, 5, 6, 13, 15, 16 11 Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment Supplementary Material Contents 1 . Introduction 1 2 . Related Work 2 3 . Methodology 3 3.1 . Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . ...

2025
[68]

Results show that the proposed AlignPrune performs consistently with window size rang- ing from 4 to 50 across all noisy-label types, with minimal variance presented

as shown in Table G. Results show that the proposed AlignPrune performs consistently with window size rang- ing from 4 to 50 across all noisy-label types, with minimal variance presented. However, when using extremely small window size 2 or 3, the performance show significant drop, where the loss trajectory under this condition cannot fully capture the le...