Recognition: 2 theorem links
· Lean TheoremBeyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment
Pith reviewed 2026-05-10 18:08 UTC · model grok-4.3
The pith
Tracking how loss evolves during training identifies noisy samples for pruning more reliably than their final loss value.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AlignPrune adds a Dynamic Alignment Score that ranks samples by how their loss curves align over time rather than by instantaneous loss magnitude, enabling more accurate removal of noisy examples during dynamic pruning and producing accuracy gains of up to 6.3 percent over prior methods on noisy benchmarks.
What carries the argument
Dynamic Alignment Score (DAS), a loss-trajectory-based ranking criterion that measures how closely each sample's loss sequence follows the pattern expected for clean data and is used to decide pruning order.
If this is right
- Existing dynamic pruning frameworks can be made robust to label noise by swapping their ranking criterion for DAS without changing model architecture or training loop.
- Accuracy under label noise improves consistently across multiple benchmarks and noise levels while pruning ratios remain unchanged.
- The method provides a generalizable way to handle pruning when training data contains mislabeled examples.
- No additional hyper-parameters or dataset-specific tuning are required beyond the original pruning schedule.
Where Pith is reading between the lines
- Loss dynamics may capture sample difficulty and label quality more reliably than static loss snapshots in other noisy-label settings.
- The same trajectory idea could be tested for selecting data in semi-supervised or active learning pipelines.
- If trajectory alignment proves stable, it suggests pruning criteria should be defined over sequences rather than single training steps.
Load-bearing premise
Loss trajectories of noisy samples differ consistently from those of clean samples across different noise types and pruning ratios.
What would settle it
On a controlled noisy dataset, if DAS ranks noisy samples for retention at rates no better than single-loss ranking, or if accuracy does not rise when DAS replaces the baseline criterion, the central claim would be falsified.
Figures
read the original abstract
Existing dynamic data pruning methods often fail under noisy-label settings, as they typically rely on per-sample loss as the ranking criterion. This could mistakenly lead to preserving noisy samples due to their high loss values, resulting in significant performance drop. To address this, we propose AlignPrune, a noise-robust module designed to enhance the reliability of dynamic pruning under label noise. Specifically, AlignPrune introduces the Dynamic Alignment Score (DAS), which is a loss-trajectory-based criterion that enables more accurate identification of noisy samples, thereby improving pruning effectiveness. As a simple yet effective plug-and-play module, AlignPrune can be seamlessly integrated into state-of-the-art dynamic pruning frameworks, consistently outperforming them without modifying either the model architecture or the training pipeline. Extensive experiments on five widely-used benchmarks across various noise types and pruning ratios demonstrate the effectiveness of AlignPrune, boosting accuracy by up to 6.3\% over state-of-the-art baselines. Our results offer a generalizable solution for pruning under noisy data, encouraging further exploration of learning in real-world scenarios. Code is available at: https://github.com/leonqin430/AlignPrune.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that existing dynamic pruning methods relying on per-sample loss fail under label noise by preserving high-loss noisy samples. It introduces AlignPrune as a plug-and-play module using a new Dynamic Alignment Score (DAS) derived from loss trajectories to more accurately identify noisy samples. The method integrates into existing frameworks without architecture or pipeline changes and reports accuracy gains of up to 6.3% over SOTA baselines across five benchmarks, various noise types, and pruning ratios.
Significance. If the DAS criterion demonstrably separates noisy from clean samples more reliably than loss alone without introducing dataset-specific biases or requiring extensive tuning, the work would offer a practical improvement for robust pruning in noisy real-world settings. The plug-and-play design and code release are positive for adoption, but the significance hinges on whether trajectory alignment provides independent, generalizable signal beyond what per-sample loss already captures.
major comments (4)
- [§3] §3 (Method), DAS definition: the abstract and high-level description do not specify the exact formula for computing alignment between loss trajectories (e.g., whether it uses cosine similarity, DTW, or another metric, and any normalization or windowing choices). This makes it impossible to assess whether DAS is truly parameter-free or reduces to a fitted quantity that could overfit to the observed training dynamics.
- [Experiments] Experiments section (results tables): the reported gains of up to 6.3% lack accompanying error bars, standard deviations across runs, or statistical tests. Without these, it is unclear whether the improvements over baselines are consistent or could be explained by variance in training or pruning ratio selection.
- [§4] §4 (Experiments), ablation studies: no ablations isolate the contribution of trajectory alignment from other factors such as the base pruning framework, hyperparameter choices for DAS, or simple alternatives like smoothed loss. This leaves the central claim that DAS enables 'more accurate identification of noisy samples' unverified against the skeptic concern of new selection biases at different pruning ratios.
- [§3.1–3.2] §3.1–3.2, noise-type invariance: the method assumes loss-trajectory shapes differ reliably between noisy and clean samples across symmetric and asymmetric noise without dataset-specific tuning. No theoretical characterization or targeted experiments (e.g., per-noise-type trajectory plots) are described to support this, risking that the reported gains are not generalizable.
minor comments (2)
- [Abstract] The abstract states 'Code is available at: https://github.com/leonqin430/AlignPrune' but does not indicate whether the released code includes the exact DAS implementation, hyperparameter settings used in the reported experiments, or scripts for reproducing the five-benchmark results.
- [§3] Notation for loss trajectories and alignment score should be introduced with explicit equations early in §3 to improve readability, rather than relying on prose descriptions.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate revisions to improve clarity, rigor, and completeness of the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Method), DAS definition: the abstract and high-level description do not specify the exact formula for computing alignment between loss trajectories (e.g., whether it uses cosine similarity, DTW, or another metric, and any normalization or windowing choices). This makes it impossible to assess whether DAS is truly parameter-free or reduces to a fitted quantity that could overfit to the observed training dynamics.
Authors: We thank the referee for pointing this out. We will revise §3 to explicitly provide the mathematical definition of the Dynamic Alignment Score (DAS), including the alignment metric used, normalization procedure, and windowing choices. This will clarify that it is computed directly from the loss trajectories without additional fitting or parameters that could lead to overfitting to the observed dynamics. revision: yes
-
Referee: [Experiments] Experiments section (results tables): the reported gains of up to 6.3% lack accompanying error bars, standard deviations across runs, or statistical tests. Without these, it is unclear whether the improvements over baselines are consistent or could be explained by variance in training or pruning ratio selection.
Authors: We agree that including measures of variability is important for assessing the reliability of the reported gains. In the revised version, we will add error bars (standard deviations from multiple runs) to the results tables and include statistical tests (e.g., t-tests) to confirm the significance of improvements over baselines. revision: yes
-
Referee: [§4] §4 (Experiments), ablation studies: no ablations isolate the contribution of trajectory alignment from other factors such as the base pruning framework, hyperparameter choices for DAS, or simple alternatives like smoothed loss. This leaves the central claim that DAS enables 'more accurate identification of noisy samples' unverified against the skeptic concern of new selection biases at different pruning ratios.
Authors: We will include additional ablation studies in the revised manuscript to isolate the contribution of the trajectory alignment component. This will include comparisons against smoothed loss alternatives, sensitivity to DAS hyperparameters, and analysis of selection at different pruning ratios to address potential biases. revision: yes
-
Referee: [§3.1–3.2] §3.1–3.2, noise-type invariance: the method assumes loss-trajectory shapes differ reliably between noisy and clean samples across symmetric and asymmetric noise without dataset-specific tuning. No theoretical characterization or targeted experiments (e.g., per-noise-type trajectory plots) are described to support this, risking that the reported gains are not generalizable.
Authors: Our experiments already demonstrate performance across symmetric and asymmetric noise without dataset-specific tuning. To further support the invariance, we will add per-noise-type loss trajectory visualization plots in the appendix and a discussion in §3.1–3.2. A complete theoretical characterization is not provided as the work is primarily empirical, but the added experiments will help verify generalizability. revision: partial
Circularity Check
No significant circularity detected in DAS derivation or claims
full rationale
The paper introduces AlignPrune and its Dynamic Alignment Score (DAS) as a novel loss-trajectory-based criterion for robust pruning under label noise. The abstract and description frame DAS as an independent plug-and-play module whose value is shown through empirical gains (up to 6.3%) on five benchmarks rather than by algebraic reduction to the input loss values or training trajectories. No equations, self-citations, or uniqueness theorems are presented that would make the central claim tautological or force the result by construction. The derivation chain remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DAS(t)_i = ρ(v_i^(t), v_ref^(t)) where ρ is Pearson correlation; clean samples follow monotonic loss decay aligning with reference while noisy samples exhibit erratic patterns (Sec. 3.2, Fig. 1c)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Plug-and-play replacement of loss-based ranking in InfoBatch/SeTa; no architecture or pipeline change (Algorithm 1)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Contextual diversity for active learning
Sharat Agarwal, Himanshu Arora, Saket Anand, and Chetan Arora. Contextual diversity for active learning. InECCV, pages 137–153. Springer, 2020. 2
2020
-
[3]
Approxi- mating memorization using loss surface geometry for dataset pruning and summarization
Andrea Agiollo, Young In Kim, and Rajiv Khanna. Approxi- mating memorization using loss surface geometry for dataset pruning and summarization. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Min- ing, pages 17–28, 2024. 2
2024
-
[4]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InNeurIPS, pages 1877–1901,
1901
-
[5]
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels. InICML, pages 1062–1070, 2019. 13
2019
-
[6]
Lightweight dataset pruning without full training via example difficulty and prediction uncertainty
Yeseul Cho, Baekrok Shin, Changmin Kang, and Chulhee Yun. Lightweight dataset pruning without full training via example difficulty and prediction uncertainty. InICML, 2025. 2, 5, 15, 16
2025
-
[7]
Selec- tion via proxy: Efficient data selection for deep learning
Cody Coleman, Christopher Yeh, Stephen Mussmann, Baha- ran Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. Selection via Proxy: Efficient Data Se- lection for Deep Learning.arXiv preprint arXiv:1906.11829,
-
[8]
Imagenet: A Large-scale Hierarchical Image Database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A Large-scale Hierarchical Image Database. InCVPR, pages 248–255, 2009. 2, 5, 7, 13
2009
-
[9]
Scaling language-free visual representation learning
David Fan, Shengbang Tong, Jiachen Zhu, Koustuv Sinha, Zhuang Liu, Xinlei Chen, Michael Rabbat, Nicolas Ballas, Yann LeCun, Amir Bar, et al. Scaling language-free visual representation learning. InICCV, 2025. 1
2025
-
[10]
Samples with low loss curvature improve data efficiency
Isha Garg and Kaushik Roy. Samples with low loss curvature improve data efficiency. InCVPR, pages 20290–20300, 2023. 2
2023
-
[11]
Deepcore: A Comprehensive Library for Coreset Selection in Deep Learn- ing
Chengcheng Guo, Bo Zhao, and Yanbing Bai. Deepcore: A Comprehensive Library for Coreset Selection in Deep Learn- ing. InDEXA, pages 181–195, 2022. 1, 2
2022
-
[12]
Scan: Bootstrapping contrastive pre-training for data efficiency
Yangyang Guo and Mohan Kankanhalli. Scan: Bootstrapping contrastive pre-training for data efficiency. InICCV, 2025. 2, 13
2025
-
[13]
Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels. InNeurIPS, pages 8536–8546, 2018. 1, 2
2018
-
[14]
Atif Hassan, Swanand Khare, and Jiaul H. Paik. RCAP: Robust, class-aware, probabilistic dynamic dataset pruning. InUAI, 2025. 2, 13
2025
-
[15]
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InCVPR, pages 770–778, 2016. 13
2016
-
[16]
Large- scale dataset pruning with dynamic uncertainty
Muyang He, Shuo Yang, Tiejun Huang, and Bo Zhao. Large- scale dataset pruning with dynamic uncertainty. InCVPR, pages 7713–7722, 2024. 2, 5, 15, 16
2024
-
[17]
Diversified batch selection for training acceleration
Feng Hong, Yueming Lyu, Jiangchao Yao, Ya Zhang, Ivor W Tsang, and Yanfeng Wang. Diversified batch selection for training acceleration. InICML, 2024. 2, 13, 15, 16
2024
-
[18]
Submodular combinatorial information measures with applications in machine learning
Rishabh Iyer, Ninad Khargoankar, Jeff Bilmes, and Himanshu Asanani. Submodular combinatorial information measures with applications in machine learning. InAlgorithmic Learn- ing Theory, pages 722–754. PMLR, 2021. 2
2021
-
[19]
Mentornet: Learning Data-driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning Data-driven Curriculum for Very Deep Neural Networks on Corrupted Labels. InICML, pages 2304–2313, 2018. 1, 3, 5, 8, 15, 16
2018
-
[20]
Grad-match: Gradient Match- ing based Data Subset Selection for Efficient Deep Model Training
Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: Gradient Match- ing based Data Subset Selection for Efficient Deep Model Training. InICML, pages 5464–5474, 2021. 2
2021
-
[21]
Glister: Generalization based Data Subset Selection for Efficient and Robust Learning
Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ra- makrishnan, and Rishabh Iyer. Glister: Generalization based Data Subset Selection for Efficient and Robust Learning. In AAAI, pages 8110–8118, 2021. 2, 5, 15, 16
2021
-
[22]
Positive-unlabeled learning with non- negative risk estimator
Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non- negative risk estimator. InNeurIPS, 2017. 20
2017
-
[23]
Flaws of imagenet, computer vision’s favourite dataset
Nikita Kisel, Illia V olkov, Katerina Hanzelkova, Klara Janouskova, and Jiri Matas. Flaws of imagenet, computer vision’s favourite dataset. InICLR, 2025. 1
2025
-
[24]
Learning Multi- ple Layers of Features from Tiny Images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning Multi- ple Layers of Features from Tiny Images. Technical report, University of Toronto, 2009. 13
2009
-
[25]
Dividemix: Learning with Noisy Labels as Semi-supervised Learning
Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with Noisy Labels as Semi-supervised Learning. In ICLR, 2020. 2, 3
2020
-
[26]
Neighbor- hood collective estimation for noisy label identification and correction
Jichang Li, Guanbin Li, Feng Liu, and Yizhou Yu. Neighbor- hood collective estimation for noisy label identification and correction. InECCV, pages 128–145. Springer, 2022. 2
2022
-
[27]
Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. Webvision Database: Visual Learning and Under- standing from Web Data.arXiv preprint arXiv:1708.02862,
-
[28]
Early-learning Regularization Pre- vents Memorization of Noisy Labels
Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Car- los Fernandez-Granda. Early-learning Regularization Pre- vents Memorization of Noisy Labels. InNeurIPS, pages 20331–20342, 2020. 2, 3
2020
-
[29]
Robust Training under Label Noise by Over-parameterization
Sheng Liu, Zhihui Zhu, Qing Qu, and Chong You. Robust Training under Label Noise by Over-parameterization. In ICML, pages 14153–14172, 2022. 2, 3, 14, 17
2022
-
[30]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021. 7 9
2021
-
[31]
A convnet for the 2020s
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InCVPR, pages 11976–11986, 2022. 7
2022
-
[32]
Curriculum loss: Robust learning and generalization against label corruption
Yueming Lyu and Ivor W Tsang. Curriculum loss: Robust learning and generalization against label corruption. InICLR,
-
[33]
Coresets for Data-efficient Training of Machine Learning Models
Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for Data-efficient Training of Machine Learning Models. InICML, pages 6950–6960, 2020. 2
2020
-
[34]
Core- sets for robust training of deep neural networks against noisy labels
Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec. Core- sets for robust training of deep neural networks against noisy labels. InNeurIPS, pages 11465–11477, 2020. 13
2020
-
[35]
Manish Nagaraj, Deepak Ravikumar, Efstathia Soufleri, and Kaushik Roy. Finding the muses: Identifying coresets through loss trajectories.arXiv preprint arXiv:2503.09721, 2025. 2
-
[36]
Repeated random sampling for minimizing the time-to-accuracy of learning
Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Kon- stantinos Nikolakakis, Amin Karbasi, Dionysios Kalogerias, Nezihe Merve Gürel, and Theodoros Rekatsinas. Repeated random sampling for minimizing the time-to-accuracy of learning. InICLR, 2024. 1, 2, 5, 6, 15, 16
2024
-
[37]
DINOv2: Learning robust visual features without supervision.TMLR,
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.TMLR,
-
[38]
Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning
Dongmin Park, Yooju Shin, Jihwan Bang, Youngjun Lee, Hwanjun Song, and Jae-Gil Lee. Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning. InNeurIPS, pages 31416–31429, 2022. 1, 2
2022
-
[39]
Robust data pruning under label noise via maximizing re-labeling accuracy
Dongmin Park, Seola Choi, Doyoung Kim, Hwanjun Song, and Jae-Gil Lee. Robust data pruning under label noise via maximizing re-labeling accuracy. InNeurIPS, pages 74501– 74514, 2023. 1, 2, 3, 5, 8, 13, 14, 15, 16, 17
2023
-
[40]
Deep Learning on a Data Diet: Finding Important Examples Early in Training
Mansheej Paul, Surya Ganguli, and Gintare Karolina Dz- iugaite. Deep Learning on a Data Diet: Finding Important Examples Early in Training. InNeurIPS, pages 20596–20607,
-
[41]
Karl Pearson. Vii. note on regression and inheritance in the case of two parents.proceedings of the royal society of London, 58(347-352):240–242, 1895. 4
-
[42]
Infobatch: Lossless training speed up by unbiased dynamic data pruning
Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xi- angyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, et al. Infobatch: Lossless training speed up by unbiased dynamic data pruning. InICLR, 2024. 1, 2, 3, 4, 5, 6, 13, 14, 15, 16, 17
2024
-
[43]
Learning Transferable Visual Models from Natural Language Supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models from Natural Language Supervi- sion. InICML, pages 8748–8763, 2021. 1, 13
2021
-
[44]
S., Daruwalla, K., and Lipasti, M
Ravi S Raju, Kyle Daruwalla, and Mikko Lipasti. Accelerat- ing deep learning with dynamic data pruning.arXiv preprint arXiv:2111.12621, 2021. 1, 2, 3, 13
-
[45]
Active Learning for Convo- lutional Neural Networks: A Core-Set Approach
Ozan Sener and Silvio Savarese. Active Learning for Convo- lutional Neural Networks: A Core-Set Approach. InICLR,
-
[46]
Benchmarking object detec- tors with coco: A new path forward
Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, and Karan Desai. Benchmarking object detec- tors with coco: A new path forward. InECCV, pages 279–295. Springer, 2024. 1
2024
-
[47]
Learning from Noisy Labels with Deep Neural Networks: A Survey.TNNLS, pages 1–19, 2022
Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. Learning from Noisy Labels with Deep Neural Networks: A Survey.TNNLS, pages 1–19, 2022. 2
2022
-
[48]
Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning
Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Gan- guli, and Ari Morcos. Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning. InNeurIPS, pages 19523–19536, 2022. 1, 2, 5, 15, 16
2022
-
[49]
Inception-v4, inception-resnet and the Impact of Residual Connections on Learning
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. Inception-v4, inception-resnet and the Impact of Residual Connections on Learning. InAAAI, pages 4278–4284, 2017. 13
2017
-
[50]
Data pruning via moving-one- sample-out
Haoru Tan, Sitong Wu, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, and Xiaojuan Qi. Data pruning via moving-one- sample-out. InNeurIPS, pages 18251–18262, 2023. 2
2023
-
[51]
Data pruning by information maximization
Haoru Tan, Sitong Wu, Wei Huang, Shizhen Zhao, and Xi- aojuan Qi. Data pruning by information maximization. In ICLR, 2025. 2
2025
-
[52]
arXiv preprint arXiv:1812.05159 , year=
Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gor- don. An Empirical Study of Example Forgetting during Deep Neural Network Learning.arXiv preprint arXiv:1812.05159,
-
[53]
Training data-efficient image transformers & distillation through atten- tion
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through atten- tion. InICML, pages 10347–10357, 2021. 7
2021
-
[54]
Learning with Noisy Labels Revisited: A Study using Real-world Human Annotations
Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with Noisy Labels Revisited: A Study using Real-world Human Annotations. InICLR, 2022. 2, 5, 13
2022
-
[55]
Partial forward blocking: A novel data prun- ing paradigm for lossless training acceleration
Dongyue Wu, Zilin Guo, Jialong Zuo, Nong Sang, and Changxin Gao. Partial forward blocking: A novel data prun- ing paradigm for lossless training acceleration. InICCV, 2025. 2, 13
2025
-
[56]
Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning
Xiaobo Xia, Jiale Liu, Jun Yu, Xu Shen, Bo Han, and Tongliang Liu. Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning. InICLR, 2022. 1, 2, 5, 8, 15, 16
2022
-
[57]
Learning from Massive Noisy Labeled Data for Image Classification
Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. Learning from Massive Noisy Labeled Data for Image Classification. InCVPR, pages 2691–2699, 2015. 2, 5, 13
2015
-
[58]
When dynamic data selection meets data augmentation
Suorong Yang, Peng Ye, Furao Shen, and Dongzhan Zhou. When dynamic data selection meets data augmentation. In ICML, 2025. 2, 13
2025
-
[59]
Large batch optimization for deep learning: Training bert in 76 minutes
Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. Large batch optimization for deep learning: Training bert in 76 minutes. InICLR, 2020. 13
2020
-
[60]
How does disagreement help general- ization against label corruption? InICML, pages 7164–7173
Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. How does disagreement help general- ization against label corruption? InICML, pages 7164–7173. PMLR, 2019. 20 10
2019
-
[61]
Instance-dependent early stopping
Suqin Yuan, Runqi Lin, Lei Feng, Bo Han, and Tongliang Liu. Instance-dependent early stopping. InICLR, 2025. 2, 13, 15, 16
2025
-
[62]
Spanning training progress: Temporal dual-depth scoring (tdds) for enhanced dataset pruning
Xin Zhang, Jiawei Du, Yunsong Li, Weiying Xie, and Joey Tianyi Zhou. Spanning training progress: Temporal dual-depth scoring (tdds) for enhanced dataset pruning. In CVPR, pages 26223–26232, 2024. 1, 2
2024
-
[63]
Rankmatch: Fostering confidence and consistency in learning with noisy labels
Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, and Guanbin Li. Rankmatch: Fostering confidence and consistency in learning with noisy labels. In ICCV, pages 1644–1654, 2023. 2
2023
-
[64]
Differential-informed sample selection accelerates multi- modal contrastive learning
Zihua Zhao, Feng Hong, Mengxi Chen, Pengyi Chen, Benyuan Liu, Jiangchao Yao, Ya Zhang, and Yanfeng Wang. Differential-informed sample selection accelerates multi- modal contrastive learning. InICCV, 2025. 2, 13
2025
-
[65]
Coverage-centric Coreset Selection for High Pruning Rates
Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. Coverage-centric Coreset Selection for High Pruning Rates. InICLR, 2022. 2
2022
-
[66]
Elfs: Label- free coreset selection with proxy training dynamics
Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian R Bartoldson, Bhavya Kailkhura, and Atul Prakash. Elfs: Label- free coreset selection with proxy training dynamics. InICLR,
-
[67]
Scale efficient training for large datasets
Qing Zhou, Junyu Gao, and Qi Wang. Scale efficient training for large datasets. InCVPR, 2025. 1, 2, 3, 4, 5, 6, 13, 15, 16 11 Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment Supplementary Material Contents 1 . Introduction 1 2 . Related Work 2 3 . Methodology 3 3.1 . Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . ...
2025
-
[68]
Results show that the proposed AlignPrune performs consistently with window size rang- ing from 4 to 50 across all noisy-label types, with minimal variance presented
as shown in Table G. Results show that the proposed AlignPrune performs consistently with window size rang- ing from 4 to 50 across all noisy-label types, with minimal variance presented. However, when using extremely small window size 2 or 3, the performance show significant drop, where the loss trajectory under this condition cannot fully capture the le...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.