CL-MARL uses an adaptive curriculum scheduler called FlexDiff and Counterfactual Group Relative Policy Advantage to break static-difficulty training in MARL and achieve higher win rates on hard StarCraft maps.
Google research football: A novel reinforcement learning environment
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
DAGS initializes policy-gradient self-play from human-derived intermediate states to reduce exploitability in challenging imperfect-information games, with a multi-task flag fix for resulting bias and new benchmark environments.
citing papers explorer
-
Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage
CL-MARL uses an adaptive curriculum scheduler called FlexDiff and Counterfactual Group Relative Policy Advantage to break static-difficulty training in MARL and achieve higher win rates on hard StarCraft maps.
-
Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games
DAGS initializes policy-gradient self-play from human-derived intermediate states to reduce exploitability in challenging imperfect-information games, with a multi-task flag fix for resulting bias and new benchmark environments.