Experiments on coding and deterministic tasks demonstrate that data gating is sufficient for self-play stability while reward variants are not, revealing the Grounded Proposer Paradox and a two-stage phase transition under continuous gate strictness.
Towards Understanding Self-play for
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL
Experiments on coding and deterministic tasks demonstrate that data gating is sufficient for self-play stability while reward variants are not, revealing the Grounded Proposer Paradox and a two-stage phase transition under continuous gate strictness.