pith. machine review for the scientific record. sign in

arxiv: 2605.13149 · v1 · submitted 2026-05-13 · 💻 cs.CL · cs.AI· cs.LG

Recognition: unknown

AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:26 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords acquisition functionssynthetic data generationactive learninglanguage modelscatastrophic forgettingdata qualityself-improvement
0
0 comments X

The pith

Acquisition functions as rewards train models to generate synthetic data that improves student performance by 2-7 percent

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AcquisitionSynthesis, which treats acquisition functions from active learning as reward models to train language models that produce synthetic data. This supplies a quantitative, model-centric measure of how much each sample will help a downstream learner, unlike prior reliance on rejection sampling or signals from larger models. Experiments on math, medical question-answering, and coding tasks show that student models trained on the resulting data gain 2-7 percent on in-distribution examples and retain prior knowledge more effectively when new tasks arrive. The same generators also produce useful data for other models and across low-to-high resource regimes.

Core claim

AcquisitionSynthesis trains a generator model by using acquisition functions directly as rewards, so the produced synthetic data is more informative for the target learner. On verifiable tasks in math, medical QA, and coding, student models trained with this data achieve 2-7 percent higher accuracy on in-distribution test sets and exhibit measurably lower catastrophic forgetting when the training distribution shifts.

What carries the argument

Acquisition functions used as reward signals to train the synthetic data generator, where each function scores a sample by its expected informativeness or influence on the learner.

If this is right

  • Student models reach 2-7 percent higher accuracy on in-distribution math, medical QA, and coding tasks.
  • The same training data reduces catastrophic forgetting when models encounter new tasks.
  • Generated data transfers usefully to other models and to both low-resource and high-resource training regimes.
  • The method supplies a quantitative route to model-aware data synthesis that does not depend on fixed external datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Repeated cycles could let a model generate its own improved training data by feeding its current acquisition signals back into the generator.
  • The approach could reduce dependence on large teacher models for data curation once the generator is trained.
  • If acquisition functions can be defined for open-ended tasks, the same reward mechanism might extend beyond easily verifiable domains.

Load-bearing premise

Acquisition function scores give a reliable, direct signal of a sample's positive impact on the specific downstream learner that can be used as a reward without extra validation.

What would settle it

Train identical student models on AcquisitionSynthesis data versus unguided synthetic data on the same math, medical, and coding tasks and observe no accuracy gain or faster forgetting in the AcquisitionSynthesis group.

Figures

Figures reproduced from arXiv: 2605.13149 by Dilek Hakkani-T\"ur, Emre Can Acikgoz, Ishika Agarwal, Jiaqi Ma, Mahdi Namazifar, Pradeep Natarajan, Sofia Stoica.

Figure 1
Figure 1. Figure 1: Left: AcquisitionSynthesis shifts generated samples toward high acquisition reward regions of the data space. Grey circles: samples from an untrained generator, scattered with varying rewards. Red ×’s: samples rejected by rejection sampling due to low reward. Green checks: samples from a trained AcquisitionSynthesis generator, concentrated near peak reward. Right: Plotted is the average in-distribution and… view at source ↗
Figure 2
Figure 2. Figure 2: an illustration of our evaluation framework. Steps 1 and 2 describe our dataset generation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of student models with varying training paradigms. We use the data se [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prompt used during AcquisitionSynthesis training. As input, the prompt takes a question, answer, and reasoning from the Numina training dataset. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Prompt used during AcquisitionSynthesis training. As input, the prompt takes a question, answer, and reasoning from the MedMCQA training dataset. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
read the original abstract

Data quality remains a critical bottleneck in developing capable, competitive models. Researchers have explored many ways to generate top quality samples. Some works rely on rejection sampling: generating lots of synthetic samples and filtering out low-quality samples. Other works rely on larger or closed-source models to extract model weaknesses, necessary skills, or a curriculum off of which to base data generation. These works have one common limitation: there is no quantitative approach to measure the impact of the generated samples on the downstream learner. Active learning literature provides exactly this, in the form of acquisition functions. Acquisition functions measure the informativeness and/or influence of data, providing interpretable, model-centric signals. Inspired by this, we propose AcquisitionSynthesis: using acquisition functions as reward models to train language models to generate higher-quality synthetic data. We conduct experiments on classic verifiable tasks of math, medical question-answering, and coding. Our experimental results indicate that (1) student models trained with AcquisitionSynthesis data achieve good performance on in-distribution tasks (2-7% gain) and is more robust to catastrophic forgetting, and (2) AcquisitionSynthesis models can generate data for other models and for low-to-high resource training paradigms. By leveraging acquisition rewards, we seek to demonstrate a principled path toward model-aware self-improvement that surpasses static datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes AcquisitionSynthesis, a method that treats acquisition functions from active learning as reward models to train a generator language model to produce synthetic data optimized for improving a downstream student model's performance. Experiments on verifiable tasks in math, medical question-answering, and coding report 2-7% gains on in-distribution tasks, reduced catastrophic forgetting, and transferability to other models and low-to-high resource settings.

Significance. If the central claim holds, the work supplies a quantitative, model-centric alternative to rejection sampling or teacher-model distillation for synthetic data curation, potentially enabling more targeted self-improvement loops in language models.

major comments (2)
  1. [Abstract] Abstract: the headline claim of 2-7% in-distribution gains and reduced forgetting is stated without any reference to baselines, controls, statistical tests, or number of runs, so the support for the central claim cannot be evaluated from the provided information.
  2. [Experiments] Experiments section: no correlation analysis, ablation, or validation is reported showing that samples with higher acquisition scores produce larger performance deltas on the student model than lower-scoring samples; without this link the observed gains could arise from generic synthetic-data effects rather than the acquisition signal.
minor comments (1)
  1. [Methods] Methods: the precise formulation of how acquisition scores are normalized or scaled before being used as RL rewards should be stated explicitly to allow reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and have revised the manuscript to incorporate the suggested clarifications and additional analyses.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim of 2-7% in-distribution gains and reduced forgetting is stated without any reference to baselines, controls, statistical tests, or number of runs, so the support for the central claim cannot be evaluated from the provided information.

    Authors: We agree that the abstract would benefit from additional context. In the revised manuscript, we have expanded the abstract to specify that the 2-7% gains are measured relative to standard synthetic data baselines (rejection sampling and teacher distillation), averaged over 5 independent runs, with statistical significance assessed via paired t-tests (p < 0.05). We have also briefly noted the experimental controls for data volume and model scale. revision: yes

  2. Referee: [Experiments] Experiments section: no correlation analysis, ablation, or validation is reported showing that samples with higher acquisition scores produce larger performance deltas on the student model than lower-scoring samples; without this link the observed gains could arise from generic synthetic-data effects rather than the acquisition signal.

    Authors: This is a fair critique. While the main results compare AcquisitionSynthesis against baselines, the submitted version did not include explicit correlation analyses or high/low-score ablations. We will add a new subsection and figure in the Experiments section that reports the correlation between acquisition scores and student performance deltas, along with an ablation comparing high- versus low-scoring samples. This will help isolate the contribution of the acquisition signal. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript proposes AcquisitionSynthesis by directly adopting established acquisition functions from active learning literature as reward signals for RL-based generator training. No equations, parameter fits, or derivations are shown that reduce the reported 2-7% gains or robustness claims to self-referential quantities, fitted inputs renamed as predictions, or self-citation chains. The approach treats acquisition scores as external inputs and evaluates outcomes empirically on downstream tasks, keeping the central method self-contained against independent benchmarks rather than closing a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that acquisition functions measure informativeness in a way that directly translates to improved downstream performance when used as rewards.

axioms (1)
  • domain assumption Acquisition functions from active learning provide interpretable, model-centric signals of data informativeness and influence.
    This is the explicit inspiration cited for turning acquisition scores into reward models.

pith-pipeline@v0.9.0 · 5558 in / 1098 out tokens · 59147 ms · 2026-05-14T19:26:22.803140+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    2020 , eprint=

    Understanding Black-box Predictions via Influence Functions , author=. 2020 , eprint=

  2. [2]

    2025 , eprint=

    DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback , author=. 2025 , eprint=

  3. [3]

    Xia, Mengzhou and Malladi, Sadhika and Gururangan, Suchin and Arora, Sanjeev and Chen, Danqi , booktitle=

  4. [4]

    2023 , eprint=

    Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=

  5. [5]

    2025 , eprint=

    DAPO: An Open-Source LLM Reinforcement Learning System at Scale , author=. 2025 , eprint=

  6. [6]

    2024 , eprint=

    InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks , author=. 2024 , eprint=

  7. [7]

    2021 , eprint=

    Measuring Massive Multitask Language Understanding , author=. 2021 , eprint=

  8. [8]

    2025 , eprint=

    Absolute Zero: Reinforced Self-play Reasoning with Zero Data , author=. 2025 , eprint=

  9. [9]

    2020 , eprint=

    Dataset Distillation , author=. 2020 , eprint=

  10. [10]

    2026 , eprint=

    Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability , author=. 2026 , eprint=

  11. [11]

    2025 , eprint=

    Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning , author=. 2025 , eprint=

  12. [12]

    2024 , eprint=

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

  13. [13]

    2025 , eprint=

    Qwen3 Technical Report , author=. 2025 , eprint=

  14. [14]

    2023 , eprint=

    Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. 2023 , eprint=

  15. [15]

    2022 , eprint=

    Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , author=. 2022 , eprint=

  16. [16]

    Proceedings of ACL-08: HLT, Short Papers , pages=

    Active learning with confidence , author=. Proceedings of ACL-08: HLT, Short Papers , pages=

  17. [17]

    2025 , eprint=

    Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance , author=. 2025 , eprint=

  18. [18]

    2018 , eprint=

    Active Learning for Convolutional Neural Networks: A Core-Set Approach , author=. 2018 , eprint=

  19. [19]

    ActiveLLM: Large Language Model-Based Active Learning for Textual Few-Shot Scenarios , volume=

    Bayer, Markus and Lutz, Justin and Reuter, Christian , year=. ActiveLLM: Large Language Model-Based Active Learning for Textual Few-Shot Scenarios , volume=. doi:10.1162/tacl.a.63 , journal=

  20. [20]

    2025 , eprint=

    Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis , author=. 2025 , eprint=

  21. [21]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

  22. [22]

    Hugging Face repository , howpublished =

    Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu , title =. Hugging Face repository , howpublished =. 2024 , publisher =

  23. [23]

    Hugging Face repository , howpublished =

    CodeForces CoTs , author=. Hugging Face repository , howpublished =. 2025 , publisher =

  24. [24]

    International Conference on Computational Learning Theory , pages=

    Margin based active learning , author=. International Conference on Computational Learning Theory , pages=. 2007 , organization=

  25. [25]

    International conference on intelligent data engineering and automated learning , pages=

    Active learning for regression based on query by committee , author=. International conference on intelligent data engineering and automated learning , pages=. 2007 , organization=

  26. [26]

    PubMedQA: A Dataset for Biomedical Research Question Answering , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

  27. [27]

    Proceedings of the Conference on Health, Inference, and Learning , pages =

    MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering , author =. Proceedings of the Conference on Health, Inference, and Learning , pages =. 2022 , editor =

  28. [28]

    MTEB: Massive Text Embedding Benchmark

    Muennighoff, Niklas and Tazi, Nouamane and Magne, Loïc and Reimers, Nils , title =. arXiv preprint arXiv:2210.07316 , year =

  29. [29]

    STAR : Self-Automated Back-Querying for Production Data Generation

    Cheng, Kellen Tan and Gentile, Anna Lisa and DeLuca, Chad and Ren, Guang-Jie. STAR : Self-Automated Back-Querying for Production Data Generation. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 2025. doi:10.18653/v1/20...

  30. [30]

    2025 , eprint=

    Scaling Synthetic Data Creation with 1,000,000,000 Personas , author=. 2025 , eprint=

  31. [31]

    2025 , eprint=

    Less is More: Adaptive Coverage for Synthetic Training Data , author=. 2025 , eprint=

  32. [32]

    2023 , eprint=

    LIMA: Less Is More for Alignment , author=. 2023 , eprint=

  33. [33]

    2025 , eprint=

    WizardLM: Empowering large pre-trained language models to follow complex instructions , author=. 2025 , eprint=

  34. [34]

    International joint conference on neural networks , volume=

    Query learning can work poorly when a human oracle is used , author=. International joint conference on neural networks , volume=. 1992 , organization=

  35. [35]

    2026 , eprint=

    Synthetic Data Generation for Training Diversified Commonsense Reasoning Models , author=. 2026 , eprint=

  36. [36]

    Enhancing Text Classification through LLM -Driven Active Learning and Human Annotation

    Rouzegar, Hamidreza and Makrehchi, Masoud. Enhancing Text Classification through LLM -Driven Active Learning and Human Annotation. Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII). 2024. doi:10.18653/v1/2024.law-1.10

  37. [37]

    Xia, Yu and Mukherjee, Subhojyoti and Xie, Zhouhang and Wu, Junda and Li, Xintong and Aponte, Ryan and Lyu, Hanjia and Barrow, Joe and Chen, Hongjie and Dernoncourt, Franck and Kveton, Branislav and Yu, Tong and Zhang, Ruiyi and Gu, Jiuxiang and Ahmed, Nesreen K. and Wang, Yu and Chen, Xiang and Deilamsalehy, Hanieh and Kim, Sungchul and Hu, Zhengmian and...

  38. [38]

    2025 , eprint=

    DELIFT: Data Efficient Language model Instruction Fine Tuning , author=. 2025 , eprint=

  39. [39]

    2017 , eprint=

    Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , author=. 2017 , eprint=

  40. [40]

    2021 , eprint=

    Bayesian Active Learning by Disagreements: A Geometric Perspective , author=. 2021 , eprint=

  41. [41]

    2025 , eprint=

    PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models , author=. 2025 , eprint=

  42. [42]

    2020 , eprint=

    Coresets for Data-efficient Training of Machine Learning Models , author=. 2020 , eprint=

  43. [43]

    Evaluating Language Models as Synthetic Data Generators

    Kim, Seungone and Suk, Juyoung and Yue, Xiang and Viswanathan, Vijay and Lee, Seongyun and Wang, Yizhong and Gashteovski, Kiril and Lawrence, Carolin and Welleck, Sean and Neubig, Graham. Evaluating Language Models as Synthetic Data Generators. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)....

  44. [44]

    2025 , eprint=

    DataComp-LM: In search of the next generation of training sets for language models , author=. 2025 , eprint=

  45. [45]

    A Hybrid Approach To Hierarchical Density-based Cluster Selection , url=

    Malzer, Claudia and Baum, Marcus , year=. A Hybrid Approach To Hierarchical Density-based Cluster Selection , url=. doi:10.1109/mfi49285.2020.9235263 , booktitle=

  46. [46]

    2020 , eprint=

    Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , author=. 2020 , eprint=

  47. [47]

    2024 , eprint=

    LPLgrad: Optimizing Active Learning Through Gradient Norm Sample Selection and Auxiliary Model Training , author=. 2024 , eprint=

  48. [48]

    2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP) , pages=

    Uncertainty sampling based active learning with diversity constraint by sparse selection , author=. 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP) , pages=. 2017 , organization=

  49. [49]

    2024 , eprint=

    Active Prompting with Chain-of-Thought for Large Language Models , author=. 2024 , eprint=

  50. [50]

    Information Sciences , volume=

    Query-by-committee improvement with diversity and density in batch active learning , author=. Information Sciences , volume=. 2018 , publisher=

  51. [51]

    2026 , eprint=

    Next Generation Active Learning: Mixture of LLMs in the Loop , author=. 2026 , eprint=

  52. [52]

    1996 , eprint=

    Active Learning with Statistical Models , author=. 1996 , eprint=

  53. [53]

    Machine learning , volume=

    Improving generalization with active learning , author=. Machine learning , volume=. 1994 , publisher=

  54. [54]

    2012 , publisher=

    Active learning , author=. 2012 , publisher=

  55. [55]

    Smith, Daniel Khashabi, and Hannaneh Hajishirzi

    Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.754

  56. [56]

    The Thirteenth International Conference on Learning Representations , year=

    DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback , author=. The Thirteenth International Conference on Learning Representations , year=

  57. [57]

    2025 , eprint=

    SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning , author=. 2025 , eprint=

  58. [58]

    NeurIPS 2025 Fourth Workshop on Deep Learning for Code , year=

    CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback , author=. NeurIPS 2025 Fourth Workshop on Deep Learning for Code , year=

  59. [59]

    2026 , url=

    Yinghui He and Abhishek Panigrahi and Yong Lin and Sanjeev Arora , booktitle=. 2026 , url=

  60. [60]

    arXiv preprint arXiv:2403.01081 , year=

    Lab: Large-scale alignment for chatbots , author=. arXiv preprint arXiv:2403.01081 , year=

  61. [61]

    Large Language Models Can Self-Improve

    Huang, Jiaxin and Gu, Shixiang and Hou, Le and Wu, Yuexin and Wang, Xuezhi and Yu, Hongkun and Han, Jiawei. Large Language Models Can Self-Improve. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.67

  62. [62]

    Tool-R0: Self-evolving LLM agents for tool-learning from zero data.arXiv, 2026

    Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data , author=. arXiv preprint arXiv:2602.21320 , year=