PPO in a new competitive game fails due to five implementation bugs and then competitive overfitting where self-play stays near 50% but generalization drops to 21.6%; mixing 20% random opponents restores generalization to 77.1%.
pink_action
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
PPO in a new competitive game fails due to five implementation bugs and then competitive overfitting where self-play stays near 50% but generalization drops to 21.6%; mixing 20% random opponents restores generalization to 77.1%.