pith. sign in

arxiv: 1907.04658 · v1 · pith:JFOLYTASnew · submitted 2019-07-02 · 💻 cs.AI · cs.LG

Playing Go without Game Tree Search Using Convolutional Neural Networks

Pith reviewed 2026-05-25 10:43 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords Goconvolutional neural networkspolicy networksupervised learningreinforcement learninggame AIboard gamesartificial intelligence
0
0 comments X

The pith

A convolutional neural network plays Go at intermediate amateur level without any game tree search by learning directly from professional games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a convolutional neural policy network trained via supervised learning on 53,000 professional Go games can already exceed the strength of intermediate amateurs while outputting moves with no explicit search of future positions. This setup aims to capture human-like intuition in the network weights alone rather than relying on evaluating large numbers of board states. The authors introduce non-rectangular convolutions to better model board shapes and outline plans for reinforcement learning between successive network versions, but the reported performance comes from the supervised stage. If the approach scales, it would mean long-term strategic knowledge in Go can be encoded without search or separate value estimation.

Core claim

The authors create a convolutional neural policy network that, after supervised training on professional games, surpasses intermediate amateur skill in Go while making moves with no game tree search at all. They propose non-rectangular convolutions to improve learning of local shapes and reinforcement learning on self-play games to strengthen the policy further, though the current result is achieved with supervised learning alone.

What carries the argument

Convolutional neural policy network that maps board positions directly to move probability distributions.

If this is right

  • The network generates legal moves far faster than any program that expands millions of positions per second.
  • Long-range planning in Go can be stored in network weights learned from data rather than computed on the fly.
  • Non-rectangular convolutions can be added to improve recognition of common board shapes.
  • Reinforcement learning between network versions can raise performance without additional human games.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Pure policy networks might reach higher levels if scaled with more data and compute, reducing reliance on search.
  • The same supervised approach could be tested on other perfect-information games with large branching factors.
  • Success would imply that explicit value heads are not strictly required for competent play in some domains.

Load-bearing premise

Supervised training on professional game records plus the chosen CNN architecture is enough to capture the long-term strategic knowledge required for strong play without search or value estimation.

What would settle it

Measure the network's win rate in a large set of games against a pool of players rated at or above the intermediate amateur level; consistent losses would falsify the claim of having surpassed that skill level.

Figures

Figures reproduced from arXiv: 1907.04658 by Chuanbo Pan, Jeffrey Barratt.

Figure 1
Figure 1. Figure 1: Capturing stones. While the rules of go are simple, mastering the game is not. Almost all commercial Go playing programs rely heavily on some form of tree search to explore outcomes of certain moves in a game, most commonly Monte Carlo Tree Search (MCTS) [5]. However, these programs don’t seem to fully understand the game, instead relying on brute force search to make good moves. In the past two years, man… view at source ↗
Figure 2
Figure 2. Figure 2: 5 × 5 convolutions with cross-widths of 1 and 2 respectively. In addition to rectangular convolution filters, we imple￾mented a novel cross-shaped convolution filter as shown in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An example ladder on a 7x7 board. These convolutions were motivated by patterns in Go 2 [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: A single cross convolutional layer. We did not directly incorporate our cross convolution di￾rectly in the network as its own individual layer. Instead, in￾spired by Google’s Inception Module as shown in [20], we decided to incorporate our model in a modular “cross layer” as shown in [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Our model. convolutional filters with stride 1 and pad 1, each with 256 layers. The final layer is a 1 × 1 convolution with only 1 filter, used to flatten the baord position into a 19 × 19 score vector. We previously had a fully connected layer to evaluate fi￾nal board positions. However, we discovered that the a sig￾nificant portion of training was spent at this layer. There￾fore, we instead took the resu… view at source ↗
Figure 8
Figure 8. Figure 8: Number of liberties a stone has. layer would be “on” if a stone had two liberties, and so on and so forth. Any stone with more than eight liberties would have the eighth feature layer “on”. A similar idea holds for number of liberties after playing on a certain position, which may well be our only “lookahead search”, as well as times since move was played, which simply turns on the feature for the point th… view at source ↗
Figure 9
Figure 9. Figure 9: An example board position with all stones with 4 [PITH_FULL_IMAGE:figures/full_fig_p005_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: An example position from game 1 of Ke Jie [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
read the original abstract

The game of Go has a long history in East Asian countries, but the field of Computer Go has yet to catch up to humans until the past couple of years. While the rules of Go are simple, the strategy and combinatorics of the game are immensely complex. Even within the past couple of years, new programs that rely on neural networks to evaluate board positions still explore many orders of magnitude more board positions per second than a professional can. We attempt to mimic human intuition in the game by creating a convolutional neural policy network which, without any sort of tree search, should play the game at or above the level of most humans. We introduce three structures and training methods that aim to create a strong Go player: non-rectangular convolutions, which will better learn the shapes on the board, supervised learning, training on a data set of 53,000 professional games, and reinforcement learning, training on games played between different versions of the network. Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning. Further training and implementation of non-rectangular convolutions and reinforcement learning will likely increase this skill level much further.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper describes a convolutional neural policy network for Go that plays without tree search. It trains via supervised learning on 53,000 professional games and asserts that this already reaches intermediate amateur strength; it also outlines plans to incorporate non-rectangular convolutions and reinforcement learning between network versions for further gains.

Significance. If the central claim holds with proper evaluation, the result would be significant: it would show that standard supervised next-move prediction on expert records can encode enough long-term strategy for amateur-level play without any search or value estimation, thereby reducing reliance on Monte Carlo tree search in Go AI.

major comments (1)
  1. [Abstract] Abstract: the assertion that 'Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning' is presented without any supporting data (win rates, number of evaluated games, opponent ratings, or baseline comparisons). This is load-bearing because the manuscript's primary contribution rests on this unevidenced performance claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation for major revision. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that 'Our network has already surpassed the skill level of intermediate amateurs simply using supervised learning' is presented without any supporting data (win rates, number of evaluated games, opponent ratings, or baseline comparisons). This is load-bearing because the manuscript's primary contribution rests on this unevidenced performance claim.

    Authors: We agree that the performance claim in the abstract is presented without supporting data or evaluation details in the current manuscript. The claim was intended to reflect preliminary internal testing, but this was not documented. We will revise the abstract to remove or qualify the claim and add a dedicated evaluation section describing the testing procedure, number of games, opponent ratings, win rates, and any baselines used. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised move prediction on external records

full rationale

The paper trains a convolutional policy network via cross-entropy loss on an external corpus of 53,000 professional games and then reports empirical playing strength. No equation or fitted quantity is defined in terms of the target win-rate or amateur rating; the loss optimizes next-move accuracy on held-out pro moves, which is independent of the downstream claim that the resulting policy reaches intermediate-amateur level without search. No self-citation chain, uniqueness theorem, or ansatz is invoked to justify the architecture or the performance assertion. The result remains falsifiable by direct play against rated opponents.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated. The non-rectangular convolution is presented as a modeling choice rather than a new entity.

pith-pipeline@v0.9.0 · 5726 in / 1008 out tokens · 32176 ms · 2026-05-25T10:43:21.940637+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 4 internal anchors

  1. [1]

    http://senseis

    Sensei’s Library: Go Databases. http://senseis. xmp.net/?GoDatabases

  2. [2]

    http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame

    Sensei’s Library: Lee Sedol - Hong Chang Sik - ladder game. http://senseis.xmp.net/ ?LeeSedolHongChangSikLadderGame

  3. [3]

    Clark and A

    C. Clark and A. Storkey. Teaching deep convolutional neural networks to play go, 2014

  4. [4]

    Enzenberger, M

    M. Enzenberger, M. Muller, B. Arneson, and R. Segal. Fue- goan open-source framework for board games and go engine based on monte carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2(4):259–270, 2010

  5. [5]

    Gelly, L

    S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesv´ari, and O. Teytaud. The grand challenge of com- puter go: Monte carlo tree search and extensions. Communi- cations of the ACM, 55(3):106–113, 2012

  6. [6]

    Graepel, M

    T. Graepel, M. Goutri ´e, M. Kr¨uger, and R. Herbrich. Learn- ing on Graphs in the Game of Go, pages 347–352. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001

  7. [7]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

  8. [8]

    Herbrich

    R. Herbrich. Machine learning in industry. http: //mlss.tuebingen.mpg.de/2015/slides/ herbrich/herbrich.pdf. 68–87

  9. [9]

    Z. Huang. Googles alpha go now has a serious game-playing rival from tencent. https://qz.com/936654/googles-alpha-go- now-has-a-serious-game-playing-rival-with-tencents-jueyi- or-fineart/

  10. [10]

    F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016

  11. [11]

    Lee, M.-H

    C.-S. Lee, M.-H. Wang, G. Chaslot, J.-B. Hoock, A. Rimmel, O. Teytaud, S.-R. Tsai, S.-C. Hsu, and T.-P. Hong. The com- putational intelligence of mogo revealed in taiwan’s com- puter go tournaments. IEEE Transactions on Computational Intelligence and AI in games, 1(1):73–89, 2009

  12. [12]

    J. Lewis. Playing super hexagon with convolutional neural networks (milestone)

  13. [13]

    C. J. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in go using deep convolutional neural networks. CoRR, abs/1412.6564, 2014

  14. [14]

    V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Play- ing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013

  15. [15]

    Oshri and N

    B. Oshri and N. Khandwala. Predicting moves in chess using convolutional neural networks

  16. [16]

    Rimmel, F

    A. Rimmel, F. Teytaud, and O. Teytaud. Biasing Monte- Carlo Simulations through RAVE Values , pages 59–68. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011

  17. [17]

    Silver, A

    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, 6 V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(...

  18. [18]

    S. Smith. Learning to play stratego with convolutional neural networks

  19. [19]

    Sutskever and V

    I. Sutskever and V . Nair.Mimicking Go Experts with Convo- lutional Neural Networks , pages 101–110. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008

  20. [20]

    Szegedy, W

    C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015

  21. [21]

    Better Computer Go Player with Neural Network and Long-term Prediction

    Y . Tian and Y . Zhu. Better computer go player with neural network and long-term prediction. CoRR, abs/1511.06410, 2015

  22. [22]

    M. H. Winands, Y . Bj ¨ornsson, and J.-T. Saito. Monte-carlo tree search solver. In Proceedings of the 6th International Conference on Computers and Games , CG ’08, pages 25– 36, Berlin, Heidelberg, 2008. Springer-Verlag

  23. [23]

    Woodcraft

    M. Woodcraft. Sgfmill. https://github.com/ mattheww/sgfmill, 2017

  24. [24]

    A. Zobrist. Feature extraction and representation for pattern recognition and the game of go. 1970. 152. 7