Playing Go without Game Tree Search Using Convolutional Neural Networks
Pith reviewed 2026-05-25 10:43 UTC · model grok-4.3
The pith
A convolutional neural network plays Go at intermediate amateur level without any game tree search by learning directly from professional games.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors create a convolutional neural policy network that, after supervised training on professional games, surpasses intermediate amateur skill in Go while making moves with no game tree search at all. They propose non-rectangular convolutions to improve learning of local shapes and reinforcement learning on self-play games to strengthen the policy further, though the current result is achieved with supervised learning alone.
What carries the argument
Convolutional neural policy network that maps board positions directly to move probability distributions.
If this is right
- The network generates legal moves far faster than any program that expands millions of positions per second.
- Long-range planning in Go can be stored in network weights learned from data rather than computed on the fly.
- Non-rectangular convolutions can be added to improve recognition of common board shapes.
- Reinforcement learning between network versions can raise performance without additional human games.
Where Pith is reading between the lines
- Pure policy networks might reach higher levels if scaled with more data and compute, reducing reliance on search.
- The same supervised approach could be tested on other perfect-information games with large branching factors.
- Success would imply that explicit value heads are not strictly required for competent play in some domains.
Load-bearing premise
Supervised training on professional game records plus the chosen CNN architecture is enough to capture the long-term strategic knowledge required for strong play without search or value estimation.
What would settle it
Measure the network's win rate in a large set of games against a pool of players rated at or above the intermediate amateur level; consistent losses would falsify the claim of having surpassed that skill level.
Figures
read the original abstract
The game of Go has a long history in East Asian countries, but the field of Computer Go has yet to catch up to humans until the past couple of years. While the rules of Go are simple, the strategy and combinatorics of the game are immensely complex. Even within the past couple of years, new programs that rely on neural networks to evaluate board positions still explore many orders of magnitude more board positions per second than a professional can. We attempt to mimic human intuition in the game by creating a convolutional neural policy network which, without any sort of tree search, should play the game at or above the level of most humans. We introduce three structures and training methods that aim to create a strong Go player: non-rectangular convolutions, which will better learn the shapes on the board, supervised learning, training on a data set of 53,000 professional games, and reinforcement learning, training on games played between different versions of the network. Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning. Further training and implementation of non-rectangular convolutions and reinforcement learning will likely increase this skill level much further.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes a convolutional neural policy network for Go that plays without tree search. It trains via supervised learning on 53,000 professional games and asserts that this already reaches intermediate amateur strength; it also outlines plans to incorporate non-rectangular convolutions and reinforcement learning between network versions for further gains.
Significance. If the central claim holds with proper evaluation, the result would be significant: it would show that standard supervised next-move prediction on expert records can encode enough long-term strategy for amateur-level play without any search or value estimation, thereby reducing reliance on Monte Carlo tree search in Go AI.
major comments (1)
- [Abstract] Abstract: the assertion that 'Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning' is presented without any supporting data (win rates, number of evaluated games, opponent ratings, or baseline comparisons). This is load-bearing because the manuscript's primary contribution rests on this unevidenced performance claim.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the recommendation for major revision. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that 'Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning' is presented without any supporting data (win rates, number of evaluated games, opponent ratings, or baseline comparisons). This is load-bearing because the manuscript's primary contribution rests on this unevidenced performance claim.
Authors: We agree that the performance claim in the abstract is presented without supporting data or evaluation details in the current manuscript. The claim was intended to reflect preliminary internal testing, but this was not documented. We will revise the abstract to remove or qualify the claim and add a dedicated evaluation section describing the testing procedure, number of games, opponent ratings, win rates, and any baselines used. revision: yes
Circularity Check
No circularity: standard supervised move prediction on external records
full rationale
The paper trains a convolutional policy network via cross-entropy loss on an external corpus of 53,000 professional games and then reports empirical playing strength. No equation or fitted quantity is defined in terms of the target win-rate or amateur rating; the loss optimizes next-move accuracy on held-out pro moves, which is independent of the downstream claim that the resulting policy reaches intermediate-amateur level without search. No self-citation chain, uniqueness theorem, or ansatz is invoked to justify the architecture or the performance assertion. The result remains falsifiable by direct play against rated opponents.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame
Sensei’s Library: Lee Sedol - Hong Chang Sik - ladder game. http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame
-
[3]
C. Clark and A. Storkey. Teaching deep convolutional neural networks to play go, 2014
work page 2014
-
[4]
M. Enzenberger, M. Muller, B. Arneson, and R. Segal. Fue- goan open-source framework for board games and go engine based on monte carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2(4):259–270, 2010
work page 2010
- [5]
-
[6]
T. Graepel, M. Goutri ´e, M. Kr¨uger, and R. Herbrich. Learn- ing on Graphs in the Game of Go, pages 347–352. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001
work page 2001
-
[7]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [8]
-
[9]
Z. Huang. Googles alpha go now has a serious game-playing rival from tencent. https://qz.com/936654/googles-alpha-go- now-has-a-serious-game-playing-rival-with-tencents-jueyi- or-fineart/
-
[10]
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016
work page 2016
-
[11]
C.-S. Lee, M.-H. Wang, G. Chaslot, J.-B. Hoock, A. Rimmel, O. Teytaud, S.-R. Tsai, S.-C. Hsu, and T.-P. Hong. The com- putational intelligence of mogo revealed in taiwan’s com- puter go tournaments. IEEE Transactions on Computational Intelligence and AI in games, 1(1):73–89, 2009
work page 2009
-
[12]
J. Lewis. Playing super hexagon with convolutional neural networks (milestone)
-
[13]
C. J. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in go using deep convolutional neural networks. CoRR, abs/1412.6564, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[14]
V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Play- ing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[15]
B. Oshri and N. Khandwala. Predicting moves in chess using convolutional neural networks
- [16]
-
[17]
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, 6 V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(...
work page 2016
-
[18]
S. Smith. Learning to play stratego with convolutional neural networks
-
[19]
I. Sutskever and V . Nair.Mimicking Go Experts with Convo- lutional Neural Networks , pages 101–110. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008
work page 2008
-
[20]
C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015
work page 2015
-
[21]
Better Computer Go Player with Neural Network and Long-term Prediction
Y . Tian and Y . Zhu. Better computer go player with neural network and long-term prediction. CoRR, abs/1511.06410, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[22]
M. H. Winands, Y . Bj ¨ornsson, and J.-T. Saito. Monte-carlo tree search solver. In Proceedings of the 6th International Conference on Computers and Games , CG ’08, pages 25– 36, Berlin, Heidelberg, 2008. Springer-Verlag
work page 2008
- [23]
-
[24]
A. Zobrist. Feature extraction and representation for pattern recognition and the game of go. 1970. 152. 7
work page 1970
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.