AcquisitionSynthesis uses acquisition functions as rewards to train generators that produce higher-quality synthetic data, delivering 2-7% gains on math, medical QA, and coding tasks with improved robustness to forgetting.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Experiments on coding and deterministic tasks demonstrate that data gating is sufficient for self-play stability while reward variants are not, revealing the Grounded Proposer Paradox and a two-stage phase transition under continuous gate strictness.
citing papers explorer
-
AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions
AcquisitionSynthesis uses acquisition functions as rewards to train generators that produce higher-quality synthetic data, delivering 2-7% gains on math, medical QA, and coding tasks with improved robustness to forgetting.
-
Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL
Experiments on coding and deterministic tasks demonstrate that data gating is sufficient for self-play stability while reward variants are not, revealing the Grounded Proposer Paradox and a two-stage phase transition under continuous gate strictness.