MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environment dynamics.
AlphaZero used the set of legal actions obtained from the simulator to mask the prior produced by the network everywhere in the search tree
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2019 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environment dynamics.