AlphaZero modified with separate heads for attacker and defender in Tablut achieves a BayesElo rating of 1235 after 100 self-play iterations with reduced policy entropy.
The DeepMind JAX Ecosystem, 2020
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game
AlphaZero modified with separate heads for attacker and defender in Tablut achieves a BayesElo rating of 1235 after 100 self-play iterations with reduced policy entropy.