arxiv: 2605.13149 · v1 · submitted 2026-05-13 · 💻 cs.CL · cs.AI· cs.LG

Recognition: unknown

AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions

Ishika Agarwal , Sofia Stoica , Emre Can Acikgoz , Pradeep Natarajan , Mahdi Namazifar , Jiaqi Ma , Dilek Hakkani-T\"ur

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:26 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords acquisition functionssynthetic data generationactive learninglanguage modelscatastrophic forgettingdata qualityself-improvement

0 comments

The pith

Acquisition functions as rewards train models to generate synthetic data that improves student performance by 2-7 percent

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AcquisitionSynthesis, which treats acquisition functions from active learning as reward models to train language models that produce synthetic data. This supplies a quantitative, model-centric measure of how much each sample will help a downstream learner, unlike prior reliance on rejection sampling or signals from larger models. Experiments on math, medical question-answering, and coding tasks show that student models trained on the resulting data gain 2-7 percent on in-distribution examples and retain prior knowledge more effectively when new tasks arrive. The same generators also produce useful data for other models and across low-to-high resource regimes.

Core claim

AcquisitionSynthesis trains a generator model by using acquisition functions directly as rewards, so the produced synthetic data is more informative for the target learner. On verifiable tasks in math, medical QA, and coding, student models trained with this data achieve 2-7 percent higher accuracy on in-distribution test sets and exhibit measurably lower catastrophic forgetting when the training distribution shifts.

What carries the argument

Acquisition functions used as reward signals to train the synthetic data generator, where each function scores a sample by its expected informativeness or influence on the learner.

If this is right

Student models reach 2-7 percent higher accuracy on in-distribution math, medical QA, and coding tasks.
The same training data reduces catastrophic forgetting when models encounter new tasks.
Generated data transfers usefully to other models and to both low-resource and high-resource training regimes.
The method supplies a quantitative route to model-aware data synthesis that does not depend on fixed external datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Repeated cycles could let a model generate its own improved training data by feeding its current acquisition signals back into the generator.
The approach could reduce dependence on large teacher models for data curation once the generator is trained.
If acquisition functions can be defined for open-ended tasks, the same reward mechanism might extend beyond easily verifiable domains.

Load-bearing premise

Acquisition function scores give a reliable, direct signal of a sample's positive impact on the specific downstream learner that can be used as a reward without extra validation.

What would settle it

Train identical student models on AcquisitionSynthesis data versus unguided synthetic data on the same math, medical, and coding tasks and observe no accuracy gain or faster forgetting in the AcquisitionSynthesis group.

Figures

Figures reproduced from arXiv: 2605.13149 by Dilek Hakkani-T\"ur, Emre Can Acikgoz, Ishika Agarwal, Jiaqi Ma, Mahdi Namazifar, Pradeep Natarajan, Sofia Stoica.

**Figure 1.** Figure 1: Left: AcquisitionSynthesis shifts generated samples toward high acquisition reward regions of the data space. Grey circles: samples from an untrained generator, scattered with varying rewards. Red ×’s: samples rejected by rejection sampling due to low reward. Green checks: samples from a trained AcquisitionSynthesis generator, concentrated near peak reward. Right: Plotted is the average in-distribution and… view at source ↗

**Figure 2.** Figure 2: an illustration of our evaluation framework. Steps 1 and 2 describe our dataset generation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of student models with varying training paradigms. We use the data se [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Prompt used during AcquisitionSynthesis training. As input, the prompt takes a question, answer, and reasoning from the Numina training dataset. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Prompt used during AcquisitionSynthesis training. As input, the prompt takes a question, answer, and reasoning from the MedMCQA training dataset. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

Data quality remains a critical bottleneck in developing capable, competitive models. Researchers have explored many ways to generate top quality samples. Some works rely on rejection sampling: generating lots of synthetic samples and filtering out low-quality samples. Other works rely on larger or closed-source models to extract model weaknesses, necessary skills, or a curriculum off of which to base data generation. These works have one common limitation: there is no quantitative approach to measure the impact of the generated samples on the downstream learner. Active learning literature provides exactly this, in the form of acquisition functions. Acquisition functions measure the informativeness and/or influence of data, providing interpretable, model-centric signals. Inspired by this, we propose AcquisitionSynthesis: using acquisition functions as reward models to train language models to generate higher-quality synthetic data. We conduct experiments on classic verifiable tasks of math, medical question-answering, and coding. Our experimental results indicate that (1) student models trained with AcquisitionSynthesis data achieve good performance on in-distribution tasks (2-7% gain) and is more robust to catastrophic forgetting, and (2) AcquisitionSynthesis models can generate data for other models and for low-to-high resource training paradigms. By leveraging acquisition rewards, we seek to demonstrate a principled path toward model-aware self-improvement that surpasses static datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Acquisition functions as rewards for synthetic data generation is a clean framing, but the paper needs to show those scores actually drive the reported 2-7% gains.

read the letter

The one or two things to know are that this work uses acquisition functions as reward models to train a data generator for targeted synthetic data, claiming 2-7% gains on in-distribution tasks and reduced catastrophic forgetting. The approach is new in framing active learning tools this way for generation. They do well by testing on verifiable domains like math, medical QA, and coding, and by checking transfer across models and resource settings. The method avoids circularity by building on established acquisition functions without new unverified assumptions in the math. The soft spots are in the experimental support. The abstract gives performance numbers but no details on baselines, controls, or any analysis showing that higher acquisition scores lead to larger performance improvements than lower ones. The stress-test concern holds: without that correlation, the gains could be from synthetic data in general rather than the specific signal. This paper is for researchers in LLM data pipelines who want a model-centric way to select what to generate. A reader familiar with active learning would see the value quickly, though full experiments would be needed to assess impact. I would recommend sending it for peer review. The core idea is worth referee time to strengthen the evidence.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes AcquisitionSynthesis, a method that treats acquisition functions from active learning as reward models to train a generator language model to produce synthetic data optimized for improving a downstream student model's performance. Experiments on verifiable tasks in math, medical question-answering, and coding report 2-7% gains on in-distribution tasks, reduced catastrophic forgetting, and transferability to other models and low-to-high resource settings.

Significance. If the central claim holds, the work supplies a quantitative, model-centric alternative to rejection sampling or teacher-model distillation for synthetic data curation, potentially enabling more targeted self-improvement loops in language models.

major comments (2)

[Abstract] Abstract: the headline claim of 2-7% in-distribution gains and reduced forgetting is stated without any reference to baselines, controls, statistical tests, or number of runs, so the support for the central claim cannot be evaluated from the provided information.
[Experiments] Experiments section: no correlation analysis, ablation, or validation is reported showing that samples with higher acquisition scores produce larger performance deltas on the student model than lower-scoring samples; without this link the observed gains could arise from generic synthetic-data effects rather than the acquisition signal.

minor comments (1)

[Methods] Methods: the precise formulation of how acquisition scores are normalized or scaled before being used as RL rewards should be stated explicitly to allow reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and have revised the manuscript to incorporate the suggested clarifications and additional analyses.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of 2-7% in-distribution gains and reduced forgetting is stated without any reference to baselines, controls, statistical tests, or number of runs, so the support for the central claim cannot be evaluated from the provided information.

Authors: We agree that the abstract would benefit from additional context. In the revised manuscript, we have expanded the abstract to specify that the 2-7% gains are measured relative to standard synthetic data baselines (rejection sampling and teacher distillation), averaged over 5 independent runs, with statistical significance assessed via paired t-tests (p < 0.05). We have also briefly noted the experimental controls for data volume and model scale. revision: yes
Referee: [Experiments] Experiments section: no correlation analysis, ablation, or validation is reported showing that samples with higher acquisition scores produce larger performance deltas on the student model than lower-scoring samples; without this link the observed gains could arise from generic synthetic-data effects rather than the acquisition signal.

Authors: This is a fair critique. While the main results compare AcquisitionSynthesis against baselines, the submitted version did not include explicit correlation analyses or high/low-score ablations. We will add a new subsection and figure in the Experiments section that reports the correlation between acquisition scores and student performance deltas, along with an ablation comparing high- versus low-scoring samples. This will help isolate the contribution of the acquisition signal. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript proposes AcquisitionSynthesis by directly adopting established acquisition functions from active learning literature as reward signals for RL-based generator training. No equations, parameter fits, or derivations are shown that reduce the reported 2-7% gains or robustness claims to self-referential quantities, fitted inputs renamed as predictions, or self-citation chains. The approach treats acquisition scores as external inputs and evaluates outcomes empirically on downstream tasks, keeping the central method self-contained against independent benchmarks rather than closing a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that acquisition functions measure informativeness in a way that directly translates to improved downstream performance when used as rewards.

axioms (1)

domain assumption Acquisition functions from active learning provide interpretable, model-centric signals of data informativeness and influence.
This is the explicit inspiration cited for turning acquisition scores into reward models.

pith-pipeline@v0.9.0 · 5558 in / 1098 out tokens · 59147 ms · 2026-05-14T19:26:22.803140+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 12 canonical work pages · 1 internal anchor

[1]

2020 , eprint=

Understanding Black-box Predictions via Influence Functions , author=. 2020 , eprint=

2020
[2]

2025 , eprint=

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback , author=. 2025 , eprint=

2025
[3]

Xia, Mengzhou and Malladi, Sadhika and Gururangan, Suchin and Arora, Sanjeev and Chen, Danqi , booktitle=
[4]

2023 , eprint=

Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=

2023
[5]

2025 , eprint=

DAPO: An Open-Source LLM Reinforcement Learning System at Scale , author=. 2025 , eprint=

2025
[6]

2024 , eprint=

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks , author=. 2024 , eprint=

2024
[7]

2021 , eprint=

Measuring Massive Multitask Language Understanding , author=. 2021 , eprint=

2021
[8]

2025 , eprint=

Absolute Zero: Reinforced Self-play Reasoning with Zero Data , author=. 2025 , eprint=

2025
[9]

2020 , eprint=

Dataset Distillation , author=. 2020 , eprint=

2020
[10]

2026 , eprint=

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability , author=. 2026 , eprint=

2026
[11]

2025 , eprint=

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning , author=. 2025 , eprint=

2025
[12]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024
[13]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[14]

2023 , eprint=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. 2023 , eprint=

2023
[15]

2022 , eprint=

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , author=. 2022 , eprint=

2022
[16]

Proceedings of ACL-08: HLT, Short Papers , pages=

Active learning with confidence , author=. Proceedings of ACL-08: HLT, Short Papers , pages=
[17]

2025 , eprint=

Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance , author=. 2025 , eprint=

2025
[18]

2018 , eprint=

Active Learning for Convolutional Neural Networks: A Core-Set Approach , author=. 2018 , eprint=

2018
[19]

ActiveLLM: Large Language Model-Based Active Learning for Textual Few-Shot Scenarios , volume=

Bayer, Markus and Lutz, Justin and Reuter, Christian , year=. ActiveLLM: Large Language Model-Based Active Learning for Textual Few-Shot Scenarios , volume=. doi:10.1162/tacl.a.63 , journal=

work page doi:10.1162/tacl.a.63
[20]

2025 , eprint=

Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis , author=. 2025 , eprint=

2025
[21]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Hugging Face repository , howpublished =

Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu , title =. Hugging Face repository , howpublished =. 2024 , publisher =

2024
[23]

Hugging Face repository , howpublished =

CodeForces CoTs , author=. Hugging Face repository , howpublished =. 2025 , publisher =

2025
[24]

International Conference on Computational Learning Theory , pages=

Margin based active learning , author=. International Conference on Computational Learning Theory , pages=. 2007 , organization=

2007
[25]

International conference on intelligent data engineering and automated learning , pages=

Active learning for regression based on query by committee , author=. International conference on intelligent data engineering and automated learning , pages=. 2007 , organization=

2007
[26]

PubMedQA: A Dataset for Biomedical Research Question Answering , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019
[27]

Proceedings of the Conference on Health, Inference, and Learning , pages =

MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering , author =. Proceedings of the Conference on Health, Inference, and Learning , pages =. 2022 , editor =

2022
[28]

MTEB: Massive Text Embedding Benchmark

Muennighoff, Niklas and Tazi, Nouamane and Magne, Loïc and Reimers, Nils , title =. arXiv preprint arXiv:2210.07316 , year =

work page arXiv
[29]

STAR : Self-Automated Back-Querying for Production Data Generation

Cheng, Kellen Tan and Gentile, Anna Lisa and DeLuca, Chad and Ren, Guang-Jie. STAR : Self-Automated Back-Querying for Production Data Generation. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 2025. doi:10.18653/v1/20...

work page doi:10.18653/v1/2025.ijcnlp-long.9 2025
[30]

2025 , eprint=

Scaling Synthetic Data Creation with 1,000,000,000 Personas , author=. 2025 , eprint=

2025
[31]

2025 , eprint=

Less is More: Adaptive Coverage for Synthetic Training Data , author=. 2025 , eprint=

2025
[32]

2023 , eprint=

LIMA: Less Is More for Alignment , author=. 2023 , eprint=

2023
[33]

2025 , eprint=

WizardLM: Empowering large pre-trained language models to follow complex instructions , author=. 2025 , eprint=

2025
[34]

International joint conference on neural networks , volume=

Query learning can work poorly when a human oracle is used , author=. International joint conference on neural networks , volume=. 1992 , organization=

1992
[35]

2026 , eprint=

Synthetic Data Generation for Training Diversified Commonsense Reasoning Models , author=. 2026 , eprint=

2026
[36]

Enhancing Text Classification through LLM -Driven Active Learning and Human Annotation

Rouzegar, Hamidreza and Makrehchi, Masoud. Enhancing Text Classification through LLM -Driven Active Learning and Human Annotation. Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII). 2024. doi:10.18653/v1/2024.law-1.10

work page doi:10.18653/v1/2024.law-1.10 2024
[37]

Xia, Yu and Mukherjee, Subhojyoti and Xie, Zhouhang and Wu, Junda and Li, Xintong and Aponte, Ryan and Lyu, Hanjia and Barrow, Joe and Chen, Hongjie and Dernoncourt, Franck and Kveton, Branislav and Yu, Tong and Zhang, Ruiyi and Gu, Jiuxiang and Ahmed, Nesreen K. and Wang, Yu and Chen, Xiang and Deilamsalehy, Hanieh and Kim, Sungchul and Hu, Zhengmian and...

work page doi:10.18653/v1/2025.acl-long.708 2025
[38]

2025 , eprint=

DELIFT: Data Efficient Language model Instruction Fine Tuning , author=. 2025 , eprint=

2025
[39]

2017 , eprint=

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , author=. 2017 , eprint=

2017
[40]

2021 , eprint=

Bayesian Active Learning by Disagreements: A Geometric Perspective , author=. 2021 , eprint=

2021
[41]

2025 , eprint=

PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models , author=. 2025 , eprint=

2025
[42]

2020 , eprint=

Coresets for Data-efficient Training of Machine Learning Models , author=. 2020 , eprint=

2020
[43]

Evaluating Language Models as Synthetic Data Generators

Kim, Seungone and Suk, Juyoung and Yue, Xiang and Viswanathan, Vijay and Lee, Seongyun and Wang, Yizhong and Gashteovski, Kiril and Lawrence, Carolin and Welleck, Sean and Neubig, Graham. Evaluating Language Models as Synthetic Data Generators. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)....

work page doi:10.18653/v1/2025.acl-long.320 2025
[44]

2025 , eprint=

DataComp-LM: In search of the next generation of training sets for language models , author=. 2025 , eprint=

2025
[45]

A Hybrid Approach To Hierarchical Density-based Cluster Selection , url=

Malzer, Claudia and Baum, Marcus , year=. A Hybrid Approach To Hierarchical Density-based Cluster Selection , url=. doi:10.1109/mfi49285.2020.9235263 , booktitle=

work page doi:10.1109/mfi49285.2020.9235263 2020
[46]

2020 , eprint=

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , author=. 2020 , eprint=

2020
[47]

2024 , eprint=

LPLgrad: Optimizing Active Learning Through Gradient Norm Sample Selection and Auxiliary Model Training , author=. 2024 , eprint=

2024
[48]

2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP) , pages=

Uncertainty sampling based active learning with diversity constraint by sparse selection , author=. 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP) , pages=. 2017 , organization=

2017
[49]

2024 , eprint=

Active Prompting with Chain-of-Thought for Large Language Models , author=. 2024 , eprint=

2024
[50]

Information Sciences , volume=

Query-by-committee improvement with diversity and density in batch active learning , author=. Information Sciences , volume=. 2018 , publisher=

2018
[51]

2026 , eprint=

Next Generation Active Learning: Mixture of LLMs in the Loop , author=. 2026 , eprint=

2026
[52]

1996 , eprint=

Active Learning with Statistical Models , author=. 1996 , eprint=

1996
[53]

Machine learning , volume=

Improving generalization with active learning , author=. Machine learning , volume=. 1994 , publisher=

1994
[54]

2012 , publisher=

Active learning , author=. 2012 , publisher=

2012
[55]

Smith, Daniel Khashabi, and Hannaneh Hajishirzi

Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.754

work page doi:10.18653/v1/2023.acl-long.754 2023
[56]

The Thirteenth International Conference on Learning Representations , year=

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback , author=. The Thirteenth International Conference on Learning Representations , year=
[57]

2025 , eprint=

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning , author=. 2025 , eprint=

2025
[58]

NeurIPS 2025 Fourth Workshop on Deep Learning for Code , year=

CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback , author=. NeurIPS 2025 Fourth Workshop on Deep Learning for Code , year=

2025
[59]

2026 , url=

Yinghui He and Abhishek Panigrahi and Yong Lin and Sanjeev Arora , booktitle=. 2026 , url=

2026
[60]

arXiv preprint arXiv:2403.01081 , year=

Lab: Large-scale alignment for chatbots , author=. arXiv preprint arXiv:2403.01081 , year=

work page arXiv
[61]

Large Language Models Can Self-Improve

Huang, Jiaxin and Gu, Shixiang and Hou, Le and Wu, Yuexin and Wang, Xuezhi and Yu, Hongkun and Han, Jiawei. Large Language Models Can Self-Improve. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.67

work page doi:10.18653/v1/2023.emnlp-main.67 2023
[62]

Tool-R0: Self-evolving LLM agents for tool-learning from zero data.arXiv, 2026

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data , author=. arXiv preprint arXiv:2602.21320 , year=

work page arXiv