SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning

Hao Chen; Jianhang Ding; Renshu Gu; Run Zou; Wen Wu; Yifan Ding

arxiv: 2605.23969 · v1 · pith:HU4V53SJnew · submitted 2026-05-13 · 💻 cs.CL

SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning

Run Zou , Jianhang Ding , Yifan Ding , Wen Wu , Hao Chen , Renshu Gu This is my paper

Pith reviewed 2026-06-30 22:00 UTC · model grok-4.3

classification 💻 cs.CL

keywords instruction tuningdata pruningbatch selectionHessian approximationstratified samplingdata efficiencylarge language modelsloss-based pruning

0 comments

The pith

SLAP selects entire batches of instruction data via Hessian-approximated gradients and stratified sampling, letting models reach or exceed full-dataset performance on dialogue, translation, and QA tasks with 20-40 percent less data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SLAP to reduce the data and compute demands of instruction tuning by shifting from per-example pruning to evaluation of whole batch compositions for learnability. It applies distribution-aware stratified sampling to maintain coverage, relative distance optimization to increase variety inside each batch, and dynamic selection driven by Hessian-approximated gradient signals. If these steps succeed, the resulting subsets produce stronger or equal results than the complete training set across LLaMA and ChatGLM models while cutting data volume substantially. A reader would care because current instruction tuning still relies on large fixed datasets that are expensive to collect and train on, and a reliable batch-level filter could lower that cost barrier without loss of capability.

Core claim

SLAP is a batch-aware data selection framework that evaluates the learnability of entire batch compositions rather than individual samples, ensures comprehensive data distribution coverage through distribution-aware stratified sampling while maximizing intra-batch diversity through relative distance optimization, and leverages Hessian-approximated gradient information for dynamic batch selection, achieving superior performance with 20-40% less training data compared to full dataset training while maintaining or improving model capabilities across multiple architectures and tasks.

What carries the argument

The dynamic batch selection mechanism that scores learnability of complete batch compositions using Hessian-approximated gradient information.

If this is right

SLAP-selected subsets outperform full datasets on multi-turn dialogue, multilingual translation, and question answering.
The gains hold across LLaMA and ChatGLM architectures.
Training data volume drops by 20-40% with no capability loss.
Overall computational cost of instruction tuning falls substantially.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The batch-level scoring could be applied to other fine-tuning settings such as preference alignment.
Further savings might appear if SLAP is combined with model compression techniques.
Limits of the method would become clearer through tests on models larger than those reported.
Practice could move toward repeated on-policy selection during training instead of one-time static pruning.

Load-bearing premise

That batch-level learnability scores from Hessian approximations reliably pick data compositions that generalize across model architectures and tasks.

What would settle it

An experiment on a held-out model or task where the SLAP-selected 60-80% subset produces statistically lower performance than the full dataset.

Figures

Figures reproduced from arXiv: 2605.23969 by Hao Chen, Jianhang Ding, Renshu Gu, Run Zou, Wen Wu, Yifan Ding.

**Figure 1.** Figure 1: The workflow of SLAP. Step 1: We divide a batch of data into K strata based on loss. Then, we select |S| data according to the probability of normalized exp(loss) and calculate the number of data in each stratum. Step 2: We calculate the Hessian-approximated gradient Ht of the data as features. Step 3: For stratum 1, we randomly initialize a point. We calculate the L2 distance to the first point and select… view at source ↗

**Figure 2.** Figure 2: Maximizing L2 Distance Within the Batch. Step 1: For stratum 1, randomly initialize a point and calculate the L2 distance from the points in the same stratum to the first point. Step 2-3: Select the point that is farthest from the first point as the second point, then update the minimum distance from the remaining points to the selected points and iteratively choose |Si | (e.g. 3) samples. Steps 4 and 7: F… view at source ↗

**Figure 3.** Figure 3: The data distribution under hard sampling, CCS, and SLAP. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluation with different k on NetLit using ChatGLM3 model [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 6.** Figure 6: Evaluation on different datasets using ChatGLM3 model with pruning [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Evaluation with different pruning rates on NetLit using ChatGLM3 model [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Evaluation on LLaMaQA using ChatGLM3 and LLaMa3 with pruning [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: GPT-4 and human evaluation scores for LLM generated responses on [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of FLOPs for Pruning and Full data. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

read the original abstract

Instruction tuning has optimized the specialized capabilities of large language models (LLMs), but it often requires extensive datasets and prolonged training times. The challenge lies in developing specific capabilities by identifying useful data and efficiently fine-tuning. High-quality and diverse pruned data can help models achieve lossless performance at a lower cost. In this paper, we propose \textbf{SLAP}, a novel batch-aware data selection framework that evaluates the learnability of entire batch compositions rather than individual. SLAP ensures comprehensive data distribution coverage through distribution-aware stratified sampling while maximizing intra-batch diversity through relative distance optimization. By leveraging Hessian-approximated gradient information for dynamic batch selection, SLAP significantly outperforms existing state-of-the-art methods across multiple model architectures (LLaMA, ChatGLM) and diverse downstream tasks including multi-turn dialogue, multilingual translation, and question answering. Most notably, SLAP achieves superior performance with 20-40\% less training data compared to full dataset training, substantially reducing computational costs while maintaining or improving model capabilities. These results establish SLAP as a powerful approach for efficient and effective instruction tuning of large language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SLAP shifts data pruning to batch level with Hessian and stratification, but the 20-40% savings claim needs the full experiments to hold up.

read the letter

The main thing to know is that this paper proposes SLAP, a batch-aware pruning method for instruction tuning. It scores entire batches for learnability using Hessian-approximated gradients, adds stratified sampling to keep distribution coverage, and optimizes relative distances inside batches for diversity. The reported outcome is that models reach or exceed full-dataset performance on LLaMA and ChatGLM with 20-40% less data across dialogue, translation, and QA tasks.

What the work does well is move the selection unit from single examples to batches, which aligns with how on-policy tuning actually runs. The three pieces fit together logically: the Hessian term supplies a signal for value, stratification prevents coverage holes, and the distance term pushes variety. That combination is a reasonable extension of existing pruning ideas and targets a practical pain point in LLM adaptation.

The soft spot is the experimental support for the headline numbers. Strong claims like consistent outperformance with large data cuts require clear ablations on each component, multiple random seeds, fair baseline implementations, and checks that the Hessian approximation stays stable across model scales. The abstract alone does not supply those, so the full paper must show them before the gains look reliable rather than setup-dependent.

This is for researchers focused on data-efficient fine-tuning and pruning methods. Anyone already working on Hessian-based selection or batch-level metrics would find the framing useful to compare against. It deserves a serious referee because the problem matters and the method is concrete enough to test and refine.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes SLAP, a batch-aware data selection framework for efficient instruction tuning of LLMs. It combines distribution-aware stratified sampling for coverage, relative distance optimization for intra-batch diversity, and Hessian-approximated gradient information for dynamic batch selection. The central claim is that SLAP outperforms existing SOTA methods on LLaMA and ChatGLM across tasks (multi-turn dialogue, multilingual translation, QA), achieving superior performance with 20-40% less training data than full-dataset training while maintaining or improving capabilities.

Significance. If the empirical claims were substantiated, the work could meaningfully advance data-efficient fine-tuning by reducing compute costs for LLM instruction tuning. However, the provided manuscript consists solely of an abstract with no experimental details, quantitative results, baselines, error bars, ablation studies, or methodology sections, so the significance cannot be assessed. The method's reliance on Hessian approximations and batch-level learnability for generalization across architectures and tasks remains unevaluated.

major comments (2)

[Abstract] Abstract: the central performance claim (superior results with 20-40% less data across models and tasks) is stated without any supporting experimental evidence, tables, baselines, or implementation details, rendering the claim impossible to evaluate or reproduce from the manuscript.
[Abstract] Abstract: the description of the dynamic batch selection mechanism (Hessian-approximated gradients combined with batch-level learnability) provides no procedure, approximation details, or pseudocode, so it is impossible to assess whether this component reliably identifies generalizable data compositions as claimed.

minor comments (2)

[Abstract] The title refers to 'Stratified Loss-based Pruning' and 'On-Policy' but the abstract describes a 'batch-aware data selection framework' without explaining the loss-based pruning aspect or the on-policy component.
[Abstract] The abstract asserts outperformance over 'existing state-of-the-art methods' but does not name those methods or indicate how they were implemented for comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments. We acknowledge that the submitted manuscript was limited to the abstract and contained no experimental sections, results, or methodological details. The revised version will address this by expanding to a full paper.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim (superior results with 20-40% less data across models and tasks) is stated without any supporting experimental evidence, tables, baselines, or implementation details, rendering the claim impossible to evaluate or reproduce from the manuscript.

Authors: We agree that the abstract alone provides no evidence for the claims. The revised manuscript will include full experimental results with tables, baselines, error bars, ablation studies, and implementation details across the reported models and tasks. revision: yes
Referee: [Abstract] Abstract: the description of the dynamic batch selection mechanism (Hessian-approximated gradients combined with batch-level learnability) provides no procedure, approximation details, or pseudocode, so it is impossible to assess whether this component reliably identifies generalizable data compositions as claimed.

Authors: We agree that the abstract lacks the necessary procedural details. The revised manuscript will add a dedicated methodology section with the exact procedure, Hessian approximation method, batch-level learnability formulation, and pseudocode. revision: yes

Circularity Check

0 steps flagged

No circularity: method relies on external Hessian approximation and stratified sampling without self-referential reduction

full rationale

The abstract and description present SLAP as a batch-aware selection framework that applies Hessian-approximated gradients, distribution-aware stratified sampling, and relative distance optimization. No equations, derivation steps, or self-citations are supplied that would make any claimed prediction equivalent to its inputs by construction. The approach invokes standard external techniques (Hessian approximation) rather than defining quantities in terms of the target performance gains. The central claim of 20-40% data reduction therefore rests on empirical validation outside the method's own definitions, yielding a self-contained derivation with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no concrete free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5736 in / 1143 out tokens · 27719 ms · 2026-06-30T22:00:02.070522+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 17 canonical work pages · 4 internal anchors

[1]

AI@Meta: Llama 3 model card (2024), https://github.com/meta-llama/llama3/ blob/main/MODEL_CARD.md

2024
[2]

Cornell University - arXiv,Cornell University - arXiv (Oct 2015)

Babenko, A., Lempitsky, V.: Aggregating deep convolutional features for image retrieval. Cornell University - arXiv,Cornell University - arXiv (Oct 2015)

2015
[3]

Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial Optimization, vol. 605. Springer (1998)

1998
[4]

Advances in Neural Information Processing Systems36, 8513–8527 (2023)

Deng, Z., Cui, P., Zhu, J.: Towards accelerated model training via bayesian data se- lection. Advances in Neural Information Processing Systems36, 8513–8527 (2023)

2023
[5]

org/abs/2406.17711

Evans, T., Parthasarathy, N., Merzic, H., Henaff, O.J.: Data curation via joint example selection further accelerates multimodal learning (2024), https://arxiv. org/abs/2406.17711

work page arXiv 2024
[6]

arXiv preprint arXiv:2306.11670 (2023)

Everaert, D., Potts, C.: Gio: Gradient information optimization for training dataset selection. arXiv preprint arXiv:2306.11670 (2023)

work page arXiv 2023
[7]

GLM, T., Zeng, A., Xu, B., Wang, B., Zhang, C., Yin, D., Rojas, D., Feng, G., Zhao, H., Lai, H., Yu, H., Wang, H., Sun, J., Zhang, J., Cheng, J., Gui, J., Tang, J., Zhang, J., Li, J., Zhao, L., Wu, L., Zhong, L., Liu, M., Huang, M., Zhang, P., Zheng, Q., Lu, R., Duan, S., Zhang, S., Cao, S., Yang, S., Tam, W.L., Zhao, W., Liu, X., Xia, X., Zhang, X., Gu, ...

2024
[8]

In: International Conference on Database and Expert Systems Applications

Guo, C., Zhao, B., Bai, Y.: Deepcore: A comprehensive library for coreset selection in deep learning. In: International Conference on Database and Expert Systems Applications. pp. 181–195. Springer (2022)

2022
[9]

Scaling Laws and Interpretability of Learning from Repeated Data

Hernandez, D., Brown, T., Conerly, T., DasSarma, N., Drain, D., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Henighan, T., Hume, T., et al.: Scaling laws and interpretability of learning from repeated data. arXiv preprint arXiv:2205.10487 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[10]

arXiv preprint arXiv:2406.04872 (2024)

Hong, F., Lyu, Y., Yao, J., Zhang, Y., Tsang, I.W., Wang, Y.: Diversified batch selection for training acceleration. arXiv preprint arXiv:2406.04872 (2024)

work page arXiv 2024
[11]

arXiv: Databases,arXiv: Databases (Jan 2018)

Hsieh, K., Ananthanarayanan, G., Bodik, P., Bahl, P., Philipose, M., Gibbons, P., Mutlu, O.: Focus: Querying large video datasets with low latency and low cost. arXiv: Databases,arXiv: Databases (Jan 2018)

2018
[12]

In: International conference on machine learning

Jiang,L.,Zhou,Z.,Leung,T.,Li,L.J.,Fei-Fei,L.:Mentornet:Learningdata-driven curriculum for very deep neural networks on corrupted labels. In: International conference on machine learning. pp. 2304–2313. PMLR (2018)

2018
[13]

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2017), https: //arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

In: International conference on machine learning

Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: International conference on machine learning. pp. 1885–1894. PMLR (2017)

2017
[15]

In: Text sum- marization branches out

Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text sum- marization branches out. pp. 74–81 (2004) 14 F. Author et al

2004
[16]

Mindermann, S., Brauner, J., Razzak, M., Sharma, M., Kirsch, A., Xu, W., Höltgen, B., Gomez, A.N., Morisot, A., Farquhar, S., Gal, Y.: Prioritized train- ing on points that are learnable, worth learning, and not yet learnt (2022), https://arxiv.org/abs/2206.07137

work page arXiv 2022
[17]

In: International Conference on Machine Learning

Mindermann, S., Brauner, J.M., Razzak, M.T., Sharma, M., Kirsch, A., Xu, W., Höltgen, B., Gomez, A.N., Morisot, A., Farquhar, S., et al.: Prioritized training on points that are learnable, worth learning, and not yet learnt. In: International Conference on Machine Learning. pp. 15630–15649. PMLR (2022)

2022
[18]

Advances in neural information processing sys- tems35, 27730–27744 (2022)

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Advances in neural information processing sys- tems35, 27730–27744 (2022)

2022
[19]

In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002)

2002
[20]

Advances in neural information processing systems34, 20596–20607 (2021)

Paul, M., Ganguli, S., Dziugaite, G.K.: Deep learning on a data diet: Finding important examples early in training. Advances in neural information processing systems34, 20596–20607 (2021)

2021
[21]

In: International Conference on Machine Learning

Pooladzandi, O., Davini, D., Mirzasoleiman, B.: Adaptive second order coresets for data-efficient machine learning. In: International Conference on Machine Learning. pp. 17848–17869. PMLR (2022)

2022
[22]

Qin, Z., Wang, K., Zheng, Z., Gu, J., Peng, X., Xu, Z., Zhou, D., Shang, L., Sun, B., Xie, X., You, Y.: Infobatch: Lossless training speed up by unbiased dynamic data pruning (2023), https://arxiv.org/abs/2303.04947

work page arXiv 2023
[23]

Schwenk, H., Chaudhary, V., Sun, S., Gong, H., Guzmán, F.: Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia (2019), https:// arxiv.org/abs/1907.05791

work page internal anchor Pith review Pith/arXiv arXiv 2019
[24]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core- set approach. arXiv preprint arXiv:1708.00489 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Jun- Mei Song, Mingchuan Zhang, Y

Shao, Y., Li, L., Dai, J., Qiu, X.: Character-llm: A trainable agent for role-playing. arXiv preprint arXiv:2310.10158 (2023)

work page arXiv 2023
[26]

Advances in Neural In- formation Processing Systems35, 19523–19536 (2022)

Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.: Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural In- formation Processing Systems35, 19523–19536 (2022)

2022
[27]

arXiv preprint arXiv:2305.12816 (2023)

Wang, X., Zhou, W., Zhang, Q., Zhou, J., Gao, S., Wang, J., Zhang, M., Gao, X., Chen, Y., Gui, T.: Farewell to aimless large-scale pretraining: Influential subset selection for language model. arXiv preprint arXiv:2305.12816 (2023)

work page arXiv 2023
[28]

arXiv preprint arXiv:2310.00746 (2023)

Wang, Z.M., Peng, Z., Que, H., Liu, J., Zhou, W., Wu, Y., Guo, H., Gan, R., Ni, Z., Zhang, M., et al.: Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746 (2023)

work page arXiv 2023
[29]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wei, H., Feng, L., Chen, X., An, B.: Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13726–13735 (2020)

2020
[30]

Advances in neural information processing systems35, 24824–24837 (2022)

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

2022
[32]

Xia, M., Malladi, S., Gururangan, S., Arora, S., Chen, D.: Less: Selecting influential data for targeted instruction tuning (2024), https://arxiv.org/abs/2402.04333 Title Suppressed Due to Excessive Length 15

work page arXiv 2024
[33]

In: The Eleventh International Conference on Learning Representations (2022)

Xia, X., Liu, J., Yu, J., Shen, X., Han, B., Liu, T.: Moderate coreset: A univer- sal method of data selection for real-world data-efficient deep learning. In: The Eleventh International Conference on Learning Representations (2022)

2022
[34]

arXiv preprint arXiv:2205.09329 (2022)

Yang, S., Xie, Z., Peng, H., Xu, M., Sun, M., Li, P.: Dataset pruning: Re- ducing training data by examining generalization influence. arXiv preprint arXiv:2205.09329 (2022)

work page arXiv 2022
[35]

arXiv preprint arXiv:2106.01085 (2021)

Yoon, J., Madaan, D., Yang, E., Hwang, S.J.: Online coreset selection for rehearsal- based continual learning. arXiv preprint arXiv:2106.01085 (2021)

work page arXiv 2021
[36]

arXiv preprint arXiv:2210.15809 (2023)

Zheng, H., Liu, R., Lai, F., Prakash, A.: Coverage-centric coreset selection for high pruning rates. arXiv preprint arXiv:2210.15809 (2023)

work page arXiv 2023
[37]

arXiv preprint arXiv:2406.04273 (2024)

Zheng, H., Tsai, E., Lu, Y., Sun, J., Bartoldson, B.R., Kailkhura, B., Prakash, A.: Elfs: Enhancing label-free coreset selection via clustering-based pseudo-labeling. arXiv preprint arXiv:2406.04273 (2024)

work page arXiv 2024

[1] [1]

AI@Meta: Llama 3 model card (2024), https://github.com/meta-llama/llama3/ blob/main/MODEL_CARD.md

2024

[2] [2]

Cornell University - arXiv,Cornell University - arXiv (Oct 2015)

Babenko, A., Lempitsky, V.: Aggregating deep convolutional features for image retrieval. Cornell University - arXiv,Cornell University - arXiv (Oct 2015)

2015

[3] [3]

Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial Optimization, vol. 605. Springer (1998)

1998

[4] [4]

Advances in Neural Information Processing Systems36, 8513–8527 (2023)

Deng, Z., Cui, P., Zhu, J.: Towards accelerated model training via bayesian data se- lection. Advances in Neural Information Processing Systems36, 8513–8527 (2023)

2023

[5] [5]

org/abs/2406.17711

Evans, T., Parthasarathy, N., Merzic, H., Henaff, O.J.: Data curation via joint example selection further accelerates multimodal learning (2024), https://arxiv. org/abs/2406.17711

work page arXiv 2024

[6] [6]

arXiv preprint arXiv:2306.11670 (2023)

Everaert, D., Potts, C.: Gio: Gradient information optimization for training dataset selection. arXiv preprint arXiv:2306.11670 (2023)

work page arXiv 2023

[7] [7]

GLM, T., Zeng, A., Xu, B., Wang, B., Zhang, C., Yin, D., Rojas, D., Feng, G., Zhao, H., Lai, H., Yu, H., Wang, H., Sun, J., Zhang, J., Cheng, J., Gui, J., Tang, J., Zhang, J., Li, J., Zhao, L., Wu, L., Zhong, L., Liu, M., Huang, M., Zhang, P., Zheng, Q., Lu, R., Duan, S., Zhang, S., Cao, S., Yang, S., Tam, W.L., Zhao, W., Liu, X., Xia, X., Zhang, X., Gu, ...

2024

[8] [8]

In: International Conference on Database and Expert Systems Applications

Guo, C., Zhao, B., Bai, Y.: Deepcore: A comprehensive library for coreset selection in deep learning. In: International Conference on Database and Expert Systems Applications. pp. 181–195. Springer (2022)

2022

[9] [9]

Scaling Laws and Interpretability of Learning from Repeated Data

Hernandez, D., Brown, T., Conerly, T., DasSarma, N., Drain, D., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Henighan, T., Hume, T., et al.: Scaling laws and interpretability of learning from repeated data. arXiv preprint arXiv:2205.10487 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[10] [10]

arXiv preprint arXiv:2406.04872 (2024)

Hong, F., Lyu, Y., Yao, J., Zhang, Y., Tsang, I.W., Wang, Y.: Diversified batch selection for training acceleration. arXiv preprint arXiv:2406.04872 (2024)

work page arXiv 2024

[11] [11]

arXiv: Databases,arXiv: Databases (Jan 2018)

Hsieh, K., Ananthanarayanan, G., Bodik, P., Bahl, P., Philipose, M., Gibbons, P., Mutlu, O.: Focus: Querying large video datasets with low latency and low cost. arXiv: Databases,arXiv: Databases (Jan 2018)

2018

[12] [12]

In: International conference on machine learning

Jiang,L.,Zhou,Z.,Leung,T.,Li,L.J.,Fei-Fei,L.:Mentornet:Learningdata-driven curriculum for very deep neural networks on corrupted labels. In: International conference on machine learning. pp. 2304–2313. PMLR (2018)

2018

[13] [13]

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2017), https: //arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

In: International conference on machine learning

Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: International conference on machine learning. pp. 1885–1894. PMLR (2017)

2017

[15] [15]

In: Text sum- marization branches out

Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text sum- marization branches out. pp. 74–81 (2004) 14 F. Author et al

2004

[16] [16]

Mindermann, S., Brauner, J., Razzak, M., Sharma, M., Kirsch, A., Xu, W., Höltgen, B., Gomez, A.N., Morisot, A., Farquhar, S., Gal, Y.: Prioritized train- ing on points that are learnable, worth learning, and not yet learnt (2022), https://arxiv.org/abs/2206.07137

work page arXiv 2022

[17] [17]

In: International Conference on Machine Learning

Mindermann, S., Brauner, J.M., Razzak, M.T., Sharma, M., Kirsch, A., Xu, W., Höltgen, B., Gomez, A.N., Morisot, A., Farquhar, S., et al.: Prioritized training on points that are learnable, worth learning, and not yet learnt. In: International Conference on Machine Learning. pp. 15630–15649. PMLR (2022)

2022

[18] [18]

Advances in neural information processing sys- tems35, 27730–27744 (2022)

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Advances in neural information processing sys- tems35, 27730–27744 (2022)

2022

[19] [19]

In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002)

2002

[20] [20]

Advances in neural information processing systems34, 20596–20607 (2021)

Paul, M., Ganguli, S., Dziugaite, G.K.: Deep learning on a data diet: Finding important examples early in training. Advances in neural information processing systems34, 20596–20607 (2021)

2021

[21] [21]

In: International Conference on Machine Learning

Pooladzandi, O., Davini, D., Mirzasoleiman, B.: Adaptive second order coresets for data-efficient machine learning. In: International Conference on Machine Learning. pp. 17848–17869. PMLR (2022)

2022

[22] [22]

Qin, Z., Wang, K., Zheng, Z., Gu, J., Peng, X., Xu, Z., Zhou, D., Shang, L., Sun, B., Xie, X., You, Y.: Infobatch: Lossless training speed up by unbiased dynamic data pruning (2023), https://arxiv.org/abs/2303.04947

work page arXiv 2023

[23] [23]

Schwenk, H., Chaudhary, V., Sun, S., Gong, H., Guzmán, F.: Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia (2019), https:// arxiv.org/abs/1907.05791

work page internal anchor Pith review Pith/arXiv arXiv 2019

[24] [24]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core- set approach. arXiv preprint arXiv:1708.00489 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Jun- Mei Song, Mingchuan Zhang, Y

Shao, Y., Li, L., Dai, J., Qiu, X.: Character-llm: A trainable agent for role-playing. arXiv preprint arXiv:2310.10158 (2023)

work page arXiv 2023

[26] [26]

Advances in Neural In- formation Processing Systems35, 19523–19536 (2022)

Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.: Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural In- formation Processing Systems35, 19523–19536 (2022)

2022

[27] [27]

arXiv preprint arXiv:2305.12816 (2023)

Wang, X., Zhou, W., Zhang, Q., Zhou, J., Gao, S., Wang, J., Zhang, M., Gao, X., Chen, Y., Gui, T.: Farewell to aimless large-scale pretraining: Influential subset selection for language model. arXiv preprint arXiv:2305.12816 (2023)

work page arXiv 2023

[28] [28]

arXiv preprint arXiv:2310.00746 (2023)

Wang, Z.M., Peng, Z., Que, H., Liu, J., Zhou, W., Wu, Y., Guo, H., Gan, R., Ni, Z., Zhang, M., et al.: Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746 (2023)

work page arXiv 2023

[29] [29]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wei, H., Feng, L., Chen, X., An, B.: Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13726–13735 (2020)

2020

[30] [30]

Advances in neural information processing systems35, 24824–24837 (2022)

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

2022

[31] [32]

Xia, M., Malladi, S., Gururangan, S., Arora, S., Chen, D.: Less: Selecting influential data for targeted instruction tuning (2024), https://arxiv.org/abs/2402.04333 Title Suppressed Due to Excessive Length 15

work page arXiv 2024

[32] [33]

In: The Eleventh International Conference on Learning Representations (2022)

Xia, X., Liu, J., Yu, J., Shen, X., Han, B., Liu, T.: Moderate coreset: A univer- sal method of data selection for real-world data-efficient deep learning. In: The Eleventh International Conference on Learning Representations (2022)

2022

[33] [34]

arXiv preprint arXiv:2205.09329 (2022)

Yang, S., Xie, Z., Peng, H., Xu, M., Sun, M., Li, P.: Dataset pruning: Re- ducing training data by examining generalization influence. arXiv preprint arXiv:2205.09329 (2022)

work page arXiv 2022

[34] [35]

arXiv preprint arXiv:2106.01085 (2021)

Yoon, J., Madaan, D., Yang, E., Hwang, S.J.: Online coreset selection for rehearsal- based continual learning. arXiv preprint arXiv:2106.01085 (2021)

work page arXiv 2021

[35] [36]

arXiv preprint arXiv:2210.15809 (2023)

Zheng, H., Liu, R., Lai, F., Prakash, A.: Coverage-centric coreset selection for high pruning rates. arXiv preprint arXiv:2210.15809 (2023)

work page arXiv 2023

[36] [37]

arXiv preprint arXiv:2406.04273 (2024)

Zheng, H., Tsai, E., Lu, Y., Sun, J., Bartoldson, B.R., Kailkhura, B., Prakash, A.: Elfs: Enhancing label-free coreset selection via clustering-based pseudo-labeling. arXiv preprint arXiv:2406.04273 (2024)

work page arXiv 2024