Playing Go without Game Tree Search Using Convolutional Neural Networks

Chuanbo Pan; Jeffrey Barratt

arxiv: 1907.04658 · v1 · pith:JFOLYTASnew · submitted 2019-07-02 · 💻 cs.AI · cs.LG

Playing Go without Game Tree Search Using Convolutional Neural Networks

Jeffrey Barratt , Chuanbo Pan This is my paper

Pith reviewed 2026-05-25 10:43 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords Goconvolutional neural networkspolicy networksupervised learningreinforcement learninggame AIboard gamesartificial intelligence

0 comments

The pith

A convolutional neural network plays Go at intermediate amateur level without any game tree search by learning directly from professional games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a convolutional neural policy network trained via supervised learning on 53,000 professional Go games can already exceed the strength of intermediate amateurs while outputting moves with no explicit search of future positions. This setup aims to capture human-like intuition in the network weights alone rather than relying on evaluating large numbers of board states. The authors introduce non-rectangular convolutions to better model board shapes and outline plans for reinforcement learning between successive network versions, but the reported performance comes from the supervised stage. If the approach scales, it would mean long-term strategic knowledge in Go can be encoded without search or separate value estimation.

Core claim

The authors create a convolutional neural policy network that, after supervised training on professional games, surpasses intermediate amateur skill in Go while making moves with no game tree search at all. They propose non-rectangular convolutions to improve learning of local shapes and reinforcement learning on self-play games to strengthen the policy further, though the current result is achieved with supervised learning alone.

What carries the argument

Convolutional neural policy network that maps board positions directly to move probability distributions.

If this is right

The network generates legal moves far faster than any program that expands millions of positions per second.
Long-range planning in Go can be stored in network weights learned from data rather than computed on the fly.
Non-rectangular convolutions can be added to improve recognition of common board shapes.
Reinforcement learning between network versions can raise performance without additional human games.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pure policy networks might reach higher levels if scaled with more data and compute, reducing reliance on search.
The same supervised approach could be tested on other perfect-information games with large branching factors.
Success would imply that explicit value heads are not strictly required for competent play in some domains.

Load-bearing premise

Supervised training on professional game records plus the chosen CNN architecture is enough to capture the long-term strategic knowledge required for strong play without search or value estimation.

What would settle it

Measure the network's win rate in a large set of games against a pool of players rated at or above the intermediate amateur level; consistent losses would falsify the claim of having surpassed that skill level.

Figures

Figures reproduced from arXiv: 1907.04658 by Chuanbo Pan, Jeffrey Barratt.

**Figure 1.** Figure 1: Capturing stones. While the rules of go are simple, mastering the game is not. Almost all commercial Go playing programs rely heavily on some form of tree search to explore outcomes of certain moves in a game, most commonly Monte Carlo Tree Search (MCTS) [5]. However, these programs don’t seem to fully understand the game, instead relying on brute force search to make good moves. In the past two years, man… view at source ↗

**Figure 2.** Figure 2: 5 × 5 convolutions with cross-widths of 1 and 2 respectively. In addition to rectangular convolution filters, we implemented a novel cross-shaped convolution filter as shown in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: An example ladder on a 7x7 board. These convolutions were motivated by patterns in Go 2 [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 5.** Figure 5: A single cross convolutional layer. We did not directly incorporate our cross convolution directly in the network as its own individual layer. Instead, inspired by Google’s Inception Module as shown in [20], we decided to incorporate our model in a modular “cross layer” as shown in [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

**Figure 6.** Figure 6: Our model. convolutional filters with stride 1 and pad 1, each with 256 layers. The final layer is a 1 × 1 convolution with only 1 filter, used to flatten the baord position into a 19 × 19 score vector. We previously had a fully connected layer to evaluate final board positions. However, we discovered that the a significant portion of training was spent at this layer. Therefore, we instead took the resu… view at source ↗

**Figure 8.** Figure 8: Number of liberties a stone has. layer would be “on” if a stone had two liberties, and so on and so forth. Any stone with more than eight liberties would have the eighth feature layer “on”. A similar idea holds for number of liberties after playing on a certain position, which may well be our only “lookahead search”, as well as times since move was played, which simply turns on the feature for the point th… view at source ↗

**Figure 9.** Figure 9: An example board position with all stones with 4 [PITH_FULL_IMAGE:figures/full_fig_p005_9.png] view at source ↗

**Figure 10.** Figure 10: An example position from game 1 of Ke Jie [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗

read the original abstract

The game of Go has a long history in East Asian countries, but the field of Computer Go has yet to catch up to humans until the past couple of years. While the rules of Go are simple, the strategy and combinatorics of the game are immensely complex. Even within the past couple of years, new programs that rely on neural networks to evaluate board positions still explore many orders of magnitude more board positions per second than a professional can. We attempt to mimic human intuition in the game by creating a convolutional neural policy network which, without any sort of tree search, should play the game at or above the level of most humans. We introduce three structures and training methods that aim to create a strong Go player: non-rectangular convolutions, which will better learn the shapes on the board, supervised learning, training on a data set of 53,000 professional games, and reinforcement learning, training on games played between different versions of the network. Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning. Further training and implementation of non-rectangular convolutions and reinforcement learning will likely increase this skill level much further.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims intermediate amateur Go strength from a search-free CNN after supervised training on pro games, but supplies no performance numbers or evaluations to back the assertion.

read the letter

The one thing to know is that this paper presents a convolutional neural network for playing Go with no tree search at all, and claims it already plays at intermediate amateur level after supervised training on professional game records. They introduce non-rectangular convolutions to better fit board shapes and train on 53,000 pro games, with reinforcement learning noted as future work. The no-search rule is kept throughout. The non-rectangular convolution idea is a small, concrete change worth noting, and the motivation for avoiding search is laid out plainly. The paper follows the standard supervised policy network setup without adding new equations or fitting tricks. The central problem is the missing evidence. The abstract states the strength level has been reached, yet there are no win rates, opponent ratings, game counts, or baseline comparisons anywhere in the provided text. Supervised move prediction can capture local patterns without guaranteeing the global coordination that search normally supplies, and nothing here shows how the network bridges that gap. Readers experimenting with lightweight or search-free agents might pick up the convolution detail or the training scale. Anyone needing verified results or reproducible strength claims will find the work thin. I would not send this to peer review until the evaluations are added.

Referee Report

1 major / 0 minor

Summary. The paper describes a convolutional neural policy network for Go that plays without tree search. It trains via supervised learning on 53,000 professional games and asserts that this already reaches intermediate amateur strength; it also outlines plans to incorporate non-rectangular convolutions and reinforcement learning between network versions for further gains.

Significance. If the central claim holds with proper evaluation, the result would be significant: it would show that standard supervised next-move prediction on expert records can encode enough long-term strategy for amateur-level play without any search or value estimation, thereby reducing reliance on Monte Carlo tree search in Go AI.

major comments (1)

[Abstract] Abstract: the assertion that 'Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning' is presented without any supporting data (win rates, number of evaluated games, opponent ratings, or baseline comparisons). This is load-bearing because the manuscript's primary contribution rests on this unevidenced performance claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation for major revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning' is presented without any supporting data (win rates, number of evaluated games, opponent ratings, or baseline comparisons). This is load-bearing because the manuscript's primary contribution rests on this unevidenced performance claim.

Authors: We agree that the performance claim in the abstract is presented without supporting data or evaluation details in the current manuscript. The claim was intended to reflect preliminary internal testing, but this was not documented. We will revise the abstract to remove or qualify the claim and add a dedicated evaluation section describing the testing procedure, number of games, opponent ratings, win rates, and any baselines used. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised move prediction on external records

full rationale

The paper trains a convolutional policy network via cross-entropy loss on an external corpus of 53,000 professional games and then reports empirical playing strength. No equation or fitted quantity is defined in terms of the target win-rate or amateur rating; the loss optimizes next-move accuracy on held-out pro moves, which is independent of the downstream claim that the resulting policy reaches intermediate-amateur level without search. No self-citation chain, uniqueness theorem, or ansatz is invoked to justify the architecture or the performance assertion. The result remains falsifiable by direct play against rated opponents.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated. The non-rectangular convolution is presented as a modeling choice rather than a new entity.

pith-pipeline@v0.9.0 · 5726 in / 1008 out tokens · 32176 ms · 2026-05-25T10:43:21.940637+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 4 internal anchors

[1]

http://senseis

Sensei’s Library: Go Databases. http://senseis. xmp.net/?GoDatabases

work page
[2]

http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame

Sensei’s Library: Lee Sedol - Hong Chang Sik - ladder game. http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame

work page
[3]

Clark and A

C. Clark and A. Storkey. Teaching deep convolutional neural networks to play go, 2014

work page 2014
[4]

Enzenberger, M

M. Enzenberger, M. Muller, B. Arneson, and R. Segal. Fue- goan open-source framework for board games and go engine based on monte carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2(4):259–270, 2010

work page 2010
[5]

Gelly, L

S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesv´ari, and O. Teytaud. The grand challenge of com- puter go: Monte carlo tree search and extensions. Communi- cations of the ACM, 55(3):106–113, 2012

work page 2012
[6]

Graepel, M

T. Graepel, M. Goutri ´e, M. Kr¨uger, and R. Herbrich. Learn- ing on Graphs in the Game of Go, pages 347–352. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001

work page 2001
[7]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[8]

Herbrich

R. Herbrich. Machine learning in industry. http: //mlss.tuebingen.mpg.de/2015/slides/ herbrich/herbrich.pdf. 68–87

work page 2015
[9]

Z. Huang. Googles alpha go now has a serious game-playing rival from tencent. https://qz.com/936654/googles-alpha-go- now-has-a-serious-game-playing-rival-with-tencents-jueyi- or-ﬁneart/

work page
[10]

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016

work page 2016
[11]

Lee, M.-H

C.-S. Lee, M.-H. Wang, G. Chaslot, J.-B. Hoock, A. Rimmel, O. Teytaud, S.-R. Tsai, S.-C. Hsu, and T.-P. Hong. The com- putational intelligence of mogo revealed in taiwan’s com- puter go tournaments. IEEE Transactions on Computational Intelligence and AI in games, 1(1):73–89, 2009

work page 2009
[12]

J. Lewis. Playing super hexagon with convolutional neural networks (milestone)

work page
[13]

C. J. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in go using deep convolutional neural networks. CoRR, abs/1412.6564, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Play- ing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[15]

Oshri and N

B. Oshri and N. Khandwala. Predicting moves in chess using convolutional neural networks

work page
[16]

Rimmel, F

A. Rimmel, F. Teytaud, and O. Teytaud. Biasing Monte- Carlo Simulations through RAVE Values , pages 59–68. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011

work page 2011
[17]

Silver, A

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, 6 V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(...

work page 2016
[18]

S. Smith. Learning to play stratego with convolutional neural networks

work page
[19]

Sutskever and V

I. Sutskever and V . Nair.Mimicking Go Experts with Convo- lutional Neural Networks , pages 101–110. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008

work page 2008
[20]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015

work page 2015
[21]

Better Computer Go Player with Neural Network and Long-term Prediction

Y . Tian and Y . Zhu. Better computer go player with neural network and long-term prediction. CoRR, abs/1511.06410, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

M. H. Winands, Y . Bj ¨ornsson, and J.-T. Saito. Monte-carlo tree search solver. In Proceedings of the 6th International Conference on Computers and Games , CG ’08, pages 25– 36, Berlin, Heidelberg, 2008. Springer-Verlag

work page 2008
[23]

Woodcraft

M. Woodcraft. Sgfmill. https://github.com/ mattheww/sgfmill, 2017

work page 2017
[24]

A. Zobrist. Feature extraction and representation for pattern recognition and the game of go. 1970. 152. 7

work page 1970

[1] [1]

http://senseis

Sensei’s Library: Go Databases. http://senseis. xmp.net/?GoDatabases

work page

[2] [2]

http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame

Sensei’s Library: Lee Sedol - Hong Chang Sik - ladder game. http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame

work page

[3] [3]

Clark and A

C. Clark and A. Storkey. Teaching deep convolutional neural networks to play go, 2014

work page 2014

[4] [4]

Enzenberger, M

M. Enzenberger, M. Muller, B. Arneson, and R. Segal. Fue- goan open-source framework for board games and go engine based on monte carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2(4):259–270, 2010

work page 2010

[5] [5]

Gelly, L

S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesv´ari, and O. Teytaud. The grand challenge of com- puter go: Monte carlo tree search and extensions. Communi- cations of the ACM, 55(3):106–113, 2012

work page 2012

[6] [6]

Graepel, M

T. Graepel, M. Goutri ´e, M. Kr¨uger, and R. Herbrich. Learn- ing on Graphs in the Game of Go, pages 347–352. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001

work page 2001

[7] [7]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[8] [8]

Herbrich

R. Herbrich. Machine learning in industry. http: //mlss.tuebingen.mpg.de/2015/slides/ herbrich/herbrich.pdf. 68–87

work page 2015

[9] [9]

Z. Huang. Googles alpha go now has a serious game-playing rival from tencent. https://qz.com/936654/googles-alpha-go- now-has-a-serious-game-playing-rival-with-tencents-jueyi- or-ﬁneart/

work page

[10] [10]

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016

work page 2016

[11] [11]

Lee, M.-H

C.-S. Lee, M.-H. Wang, G. Chaslot, J.-B. Hoock, A. Rimmel, O. Teytaud, S.-R. Tsai, S.-C. Hsu, and T.-P. Hong. The com- putational intelligence of mogo revealed in taiwan’s com- puter go tournaments. IEEE Transactions on Computational Intelligence and AI in games, 1(1):73–89, 2009

work page 2009

[12] [12]

J. Lewis. Playing super hexagon with convolutional neural networks (milestone)

work page

[13] [13]

C. J. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in go using deep convolutional neural networks. CoRR, abs/1412.6564, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[14] [14]

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Play- ing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[15] [15]

Oshri and N

B. Oshri and N. Khandwala. Predicting moves in chess using convolutional neural networks

work page

[16] [16]

Rimmel, F

A. Rimmel, F. Teytaud, and O. Teytaud. Biasing Monte- Carlo Simulations through RAVE Values , pages 59–68. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011

work page 2011

[17] [17]

Silver, A

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, 6 V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(...

work page 2016

[18] [18]

S. Smith. Learning to play stratego with convolutional neural networks

work page

[19] [19]

Sutskever and V

I. Sutskever and V . Nair.Mimicking Go Experts with Convo- lutional Neural Networks , pages 101–110. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008

work page 2008

[20] [20]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015

work page 2015

[21] [21]

Better Computer Go Player with Neural Network and Long-term Prediction

Y . Tian and Y . Zhu. Better computer go player with neural network and long-term prediction. CoRR, abs/1511.06410, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[22] [22]

M. H. Winands, Y . Bj ¨ornsson, and J.-T. Saito. Monte-carlo tree search solver. In Proceedings of the 6th International Conference on Computers and Games , CG ’08, pages 25– 36, Berlin, Heidelberg, 2008. Springer-Verlag

work page 2008

[23] [23]

Woodcraft

M. Woodcraft. Sgfmill. https://github.com/ mattheww/sgfmill, 2017

work page 2017

[24] [24]

A. Zobrist. Feature extraction and representation for pattern recognition and the game of go. 1970. 152. 7

work page 1970