AlphaZero modified with separate heads for attacker and defender in Tablut achieves a BayesElo rating of 1235 after 100 self-play iterations with reduced policy entropy.
Grandmaster level in starcraft ii using multi-agent reinforcement learning.nature, 575(7782):350–354, 2019
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game
AlphaZero modified with separate heads for attacker and defender in Tablut achieves a BayesElo rating of 1235 after 100 self-play iterations with reduced policy entropy.