Large-scale data selection for instruction tuning.arXiv preprint arXiv:2503.01807

Ivison, H · 2025 · arXiv 2503.01807

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning

cs.LG · 2026-05-13 · conditional · novelty 7.0

AutoSelection discovers data recipes from a 90K instruction pool that outperform full-data training and other selectors on reasoning tasks for SFT across multiple models.

Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing

cs.LG · 2026-02-03 · unverdicted · novelty 7.0

Positive-negative prompt pairing with weighted GRPO improves RLVR sample efficiency, raising AIME 2025 Pass@8 from 16.8 to 22.2 on Qwen2.5-Math-7B while matching large-scale training.

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

cs.LG · 2025-04-29 · accept · novelty 7.0

One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.

Interaction-Aware Influence Functions for Group Attribution

cs.LG · 2026-05-15 · conditional · novelty 6.0

Extends influence functions with a second-order pairwise interaction term that improves group attribution accuracy over simple summation on multiple model-dataset pairs and instruction-tuning selection tasks.

HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

HEAL mitigates entropy collapse in few-shot RLVR by selectively adding general-domain data and aligning trajectory-level entropy dynamics, matching full-shot performance with 32 target samples.

GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry

cs.LG · 2026-02-20 · unverdicted · novelty 6.0

GIST recovers a task-specific low-dimensional subspace from validation gradients using SVD and scores training examples by their alignment within this coupled subspace for LoRA-based instruction tuning.

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

cs.LG · 2026-04-19 · unverdicted · novelty 5.0

ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.

citing papers explorer

Showing 7 of 7 citing papers.

From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning cs.LG · 2026-05-13 · conditional · none · ref 41
AutoSelection discovers data recipes from a 90K instruction pool that outperform full-data training and other selectors on reasoning tasks for SFT across multiple models.
Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing cs.LG · 2026-02-03 · unverdicted · none · ref 9
Positive-negative prompt pairing with weighted GRPO improves RLVR sample efficiency, raising AIME 2025 Pass@8 from 16.8 to 22.2 on Qwen2.5-Math-7B while matching large-scale training.
Reinforcement Learning for Reasoning in Large Language Models with One Training Example cs.LG · 2025-04-29 · accept · none · ref 48
One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.
Interaction-Aware Influence Functions for Group Attribution cs.LG · 2026-05-15 · conditional · none · ref 27
Extends influence functions with a second-order pairwise interaction term that improves group attribution accuracy over simple summation on multiple model-dataset pairs and instruction-tuning selection tasks.
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment cs.LG · 2026-04-20 · unverdicted · none · ref 92
HEAL mitigates entropy collapse in few-shot RLVR by selectively adding general-domain data and aligning trajectory-level entropy dynamics, matching full-shot performance with 32 target samples.
GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry cs.LG · 2026-02-20 · unverdicted · none · ref 1
GIST recovers a task-specific low-dimensional subspace from validation gradients using SVD and scores training examples by their alignment within this coupled subspace for LoRA-based instruction tuning.
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods cs.LG · 2026-04-19 · unverdicted · none · ref 16
ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.

Large-scale data selection for instruction tuning.arXiv preprint arXiv:2503.01807

fields

years

verdicts

representative citing papers

citing papers explorer