Designing Game of Theorems

Yutaka Nagashima

arxiv: 1906.08549 · v1 · pith:CE6XBFH2new · submitted 2019-06-20 · 💻 cs.AI

Designing Game of Theorems

Yutaka Nagashima This is my paper

Pith reviewed 2026-05-25 19:50 UTC · model grok-4.3

classification 💻 cs.AI

keywords theorem provinggame of Goautomated reasoningmulti-agent systemsself-modifying agentsconjecture provingAlphaGo

0 comments

The pith

Theorem proving can be framed as a game in which multiple provers compete and revise themselves until user conjectures are proved.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether theorem proving shares enough structure with the game of Go to benefit from similar machine-learning techniques. It maps the shared traits such as strategic search and the differences such as the discrete nature of proofs. Building on that analysis, it outlines a system where multiple provers interact by competing to find proofs, evaluating each other's performance, and modifying their own methods over time. If successful, the system would allow users to submit conjectures and receive proofs from an evolving collective without manual configuration of a single prover.

Core claim

After comparing theorem proving to Go, the paper proposes a multi-prover system in which various provers keep competing against each other and changing themselves until they prove conjectures provided by users.

What carries the argument

The Game of Theorems architecture: a collection of competing and self-modifying theorem provers that drive one another toward proofs.

If this is right

Different provers can exploit complementary strengths on different problems through direct competition.
Continuous self-modification removes the need for repeated human tuning of search tactics.
User-supplied conjectures alone trigger the process and determine when the competition ends.
The architecture can incorporate any existing prover as a starting participant.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Existing single provers could serve as initial players rather than requiring entirely new code.
Success metrics could compare the rate of solved conjectures against standalone tools on shared benchmarks.
The same competition loop might later accept human-generated proof steps as additional moves.

Load-bearing premise

That competition and mutual self-modification among provers will succeed on conjectures where any single fixed prover would fail.

What would settle it

An experiment that runs the multi-prover system on a collection of open conjectures and finds that it proves no more of them than the strongest existing single prover.

read the original abstract

"Theorem proving is similar to the game of Go. So, we can probably improve our provers using deep learning, like DeepMind built the super-human computer Go program, AlphaGo." Such optimism has been observed among participants of AITP2017. But is theorem proving really similar to Go? In this paper, we first identify the similarities and differences between them and then propose a system in which various provers keep competing against each other and changing themselves until they prove conjectures provided by users.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a thin conceptual proposal that notes differences between Go and theorem proving but offers no argument or design for why a competitive multi-prover system would succeed.

read the letter

Quick take on Nagashima's 'Designing Game of Theorems': it's basically an idea paper suggesting we make theorem provers compete and evolve like AlphaGo players after noting some parallels and differences with the game. The useful part is the upfront comparison of the two tasks. It flags real differences—Go is closed and decisive, proofs are open and can be partial—which shows some care in not overclaiming the analogy. That's honest. But the proposal doesn't go anywhere with that. It just says 'let's have them compete and change themselves' without explaining the mechanism or why that would beat existing single-prover setups. No reward design, no complexity discussion, no toy implementation or even pseudocode. The stress-test note is right; the assumption that competition plus self-modification solves the hard parts is left unsupported. The paper acknowledges the differences but doesn't map them to any concrete design choices that would address the search space issues in theorem proving. This might spark a conversation in a workshop on AI for theorem proving, but as a standalone paper it doesn't deliver enough to be worth referee time. It reads more like a position statement or a call for future work than a completed piece of research. I'd pass on sending it out for review unless the authors add substantial technical content.

Referee Report

2 major / 1 minor

Summary. The paper identifies similarities and differences between theorem proving and the game of Go, then proposes a system in which multiple provers compete against each other and modify themselves until they prove user-provided conjectures, drawing an analogy to AlphaGo.

Significance. If a competitive, self-modifying multi-prover architecture could be realized and empirically shown to solve conjectures beyond the reach of existing single provers, the approach would be significant for automated theorem proving. The manuscript, however, supplies neither a concrete design nor any supporting argument or experiment.

major comments (2)

[Proposal description (following the comparison of Go and theorem proving)] The central proposal (that competing and self-modifying provers will succeed on conjectures where single provers fail) is asserted without any argument, reward-function sketch, complexity analysis, or preliminary experiment; this assumption is load-bearing for the entire contribution.
[Comparison section and subsequent proposal] Differences between Go (perfect information, fixed rules, terminal win/loss) and theorem proving (open conjecture space, partial proofs, undecidability) are acknowledged but never mapped to concrete architectural choices that would allow the proposed system to overcome the search-space and guidance limitations of current provers.

minor comments (1)

The manuscript is extremely brief and functions as an extended abstract; any revision should expand the proposal with at least an informal design of the competition mechanism and self-modification process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: The central proposal (that competing and self-modifying provers will succeed on conjectures where single provers fail) is asserted without any argument, reward-function sketch, complexity analysis, or preliminary experiment; this assumption is load-bearing for the entire contribution.

Authors: The manuscript is a conceptual proposal that draws an analogy to AlphaGo and outlines a high-level multi-prover architecture. Its contribution is the identification of similarities and differences plus the suggestion of a competitive, self-modifying system as a research direction, rather than a fully specified implementation with reward functions or experiments. We do not claim empirical success but argue that the load-bearing assumption is worth exploring precisely because single-prover limitations are well-known. revision: no
Referee: Differences between Go (perfect information, fixed rules, terminal win/loss) and theorem proving (open conjecture space, partial proofs, undecidability) are acknowledged but never mapped to concrete architectural choices that would allow the proposed system to overcome the search-space and guidance limitations of current provers.

Authors: The proposal maps the differences at the architectural level: the competitive multi-prover setup is intended to address the open conjecture space and lack of guidance by allowing parallel exploration of different proof strategies, while self-modification is proposed to handle partial proofs and undecidability through ongoing adaptation. We agree that these mappings remain high-level and could be elaborated with more explicit design choices. revision: partial

Circularity Check

0 steps flagged

No circularity: conceptual proposal with no derivations or self-referential reductions

full rationale

The paper is a high-level idea piece that notes surface similarities between theorem proving and Go, acknowledges differences, and then proposes a competitive multi-prover architecture. No equations, fitted parameters, predictions, or load-bearing self-citations appear anywhere in the text. The central suggestion is presented as an unargued design idea rather than a derived result that reduces to its own inputs by construction. Absence of any derivation chain means no circularity can be exhibited.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The paper is a high-level proposal; it introduces no fitted parameters, no explicit axioms beyond standard background knowledge of theorem proving and reinforcement learning, and one invented entity (the competitive evolving prover system) with no independent evidence.

invented entities (1)

competitive multi-prover evolving system no independent evidence
purpose: to prove user conjectures by having provers compete and adapt
Introduced in the abstract as the main proposal; no implementation or evidence supplied.

pith-pipeline@v0.9.0 · 5589 in / 1041 out tokens · 19641 ms · 2026-05-25T19:50:13.696881+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

propose a system in which various provers keep competing against each other and changing themselves until they prove conjectures
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

treat conjecturing and proof search as one problem... use these proved subgoals to judge the competence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

Gerwin Klein, Tobias Nipkow, Larry Paulson, and Rene Thi emann. 2004

work page 2004
[2]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driess- che, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas P anneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Dem is Hassabis. Mast...

work page 2016
[3]

Mastering the game of go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Io annis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yut ian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel , and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550:354 EP –, Oct 2017. Article

work page 2017

[1] [1]

Gerwin Klein, Tobias Nipkow, Larry Paulson, and Rene Thi emann. 2004

work page 2004

[2] [2]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driess- che, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas P anneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Dem is Hassabis. Mast...

work page 2016

[3] [3]

Mastering the game of go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Io annis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yut ian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel , and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550:354 EP –, Oct 2017. Article

work page 2017