Formal Verification of Minimax Algorithms

Huub van de Wetering; Kees Huizing; Wieger Wesselink

arxiv: 2509.20138 · v2 · submitted 2025-09-24 · 💻 cs.AI

Formal Verification of Minimax Algorithms

Wieger Wesselink , Kees Huizing , Huub van de Wetering This is my paper

Pith reviewed 2026-05-18 14:36 UTC · model grok-4.3

classification 💻 cs.AI

keywords formal verificationminimaxalpha-beta pruningtransposition tablesnegamaxDafnygame tree searchcorrectness proof

0 comments

The pith

One practical depth-limited minimax implementation satisfies a new witness-based correctness criterion while a second common variant violates it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses the Dafny verification system to check minimax search algorithms that include alpha-beta pruning and transposition tables. It defines a witness-based correctness notion for depth-limited cases, requiring that every returned value be justifiable by an explicit expansion of the game tree. When this criterion is applied to two real variants of depth-limited negamax, one receives a fully machine-checked proof of correctness and the other is shown to admit a concrete counterexample. The work matters because these algorithms sit at the core of classical game-playing engines and because testing alone has repeatedly failed to surface their subtle errors.

Core claim

For depth-limited negamax with alpha-beta pruning and transposition tables the authors introduce a witness-based correctness criterion that accepts a returned value precisely when it can be justified by an explicit game-tree expansion. Under this criterion one practical variant admits a fully mechanized Dafny proof while a second variant is refuted by an explicit counterexample that demonstrates a violation of the witness property.

What carries the argument

The witness-based correctness criterion, which requires that every value returned by depth-limited search with transposition tables be accompanied by an explicit justifying game-tree expansion.

If this is right

Implementations of depth-limited search must choose transposition-table update rules carefully to preserve the witness property.
A mechanized proof becomes feasible once the witness criterion is adopted as the target property.
Counterexamples can be generated mechanically when the witness condition is violated.
Verification artifacts for both the proof and the counterexample are executable and publicly available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same witness technique could be tried on other search enhancements such as iterative deepening or move ordering heuristics.
Engine authors may now treat the witness criterion as an explicit invariant to maintain during future optimizations.
The existence of a counterexample suggests that some widely used transposition-table implementations may silently return unjustified values in certain edge cases.

Load-bearing premise

The witness-based correctness criterion accurately captures what it means for a depth-limited search value to be acceptable when transposition tables are present.

What would settle it

Running the counterexample variant on the concrete position and move sequence supplied in the paper and checking whether the value it returns can or cannot be justified by a full tree expansion of the reported depth.

Figures

Figures reproduced from arXiv: 2509.20138 by Huub van de Wetering, Kees Huizing, Wieger Wesselink.

**Figure 1.** Figure 1: Game tree with white maximizer nodes and gray minimizer nodes, labeled with evaluation values. A game tree is a rooted directed acyclic graph representing the possible evolutions of a two-player game. Each node corresponds to a game position, and each edge to a legal move. Nodes are colored by the player to move: white for the maximizer and gray for the minimizer, as illustrated in Figure 1. Formally, a … view at source ↗

**Figure 2.** Figure 2: Schematic truncated game tree showing the visited/pruned nodes for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Counterexample to NegamaxTTM. Node v (with double border) is searched twice. Call (b) stores a lower bound of 3 in the transposition table. Call (c) reuses this result in a shallower search with a wider window, leading to an unexpected return value. Dotted nodes are not visited during search. NegamaxTTW. This resulted in a hand-crafted counterexample, demonstrating that this variant does not satisfy our w… view at source ↗

read the original abstract

Minimax-based search algorithms with alpha-beta pruning and transposition tables are a central component of classical game-playing engines and remain widely used in practice. Despite their widespread use, these algorithms are subtle, highly optimized, and notoriously difficult to reason about, making non-obvious errors hard to detect by testing alone. Using the Dafny verification system, we formally verify a range of minimax search algorithms, including variants with alpha-beta pruning and transposition tables. For depth-limited search with transposition tables, we introduce a witness-based correctness criterion that captures when returned values can be justified by an explicit game-tree expansion. We apply this criterion to two practical variants of depth-limited negamax with alpha-beta pruning and transposition tables: for one variant, we obtain a fully mechanized correctness proof, while for the other we construct a concrete counterexample demonstrating a violation of the proposed correctness notion. All verification artifacts, including Dafny proofs and executable Python implementations, are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They give a mechanized Dafny proof for one practical variant of depth-limited negamax with alpha-beta and transposition tables under a new witness-based criterion, plus an explicit counterexample for the other variant.

read the letter

The main thing to know is that this paper supplies a machine-checked correctness proof in Dafny for one concrete variant of depth-limited negamax with alpha-beta pruning and transposition tables, using a witness-based criterion they introduce, and a clear counterexample showing the second variant violates that same criterion. All the Dafny artifacts and matching Python implementations are released publicly, which is the part that actually matters for anyone who wants to build on it or check the claims directly.

Referee Report

0 major / 2 minor

Summary. The paper claims to formally verify minimax search algorithms with alpha-beta pruning and transposition tables using Dafny. For depth-limited search, it introduces a witness-based correctness criterion (that a returned value is acceptable precisely when justified by explicit game-tree expansion). One practical variant of depth-limited negamax with alpha-beta pruning and transposition tables receives a fully mechanized Dafny correctness proof; the other is shown to violate the criterion via an explicit counterexample. All Dafny proofs and executable Python implementations are released publicly.

Significance. If the results hold, the work is significant for delivering machine-checked Dafny proofs and a concrete counterexample for subtle, widely deployed algorithms in game-playing engines. The release of verification artifacts together with the falsifiable witness-based criterion provides reproducible, high-assurance evidence that strengthens trust in these implementations and raises the standard for formal methods in AI search.

minor comments (2)

The introduction of the witness-based correctness criterion as a modeling choice (rather than a derived property) is stated clearly but would benefit from an early, self-contained illustrative example before the formal definitions.
Cross-references between the Dafny proof modules and the corresponding Python implementations could be added to the artifact repository README to ease independent reproduction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The provided summary accurately captures the paper's contributions, including the witness-based correctness criterion, the mechanized Dafny proof for one variant, and the counterexample for the other.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper explicitly introduces a witness-based correctness criterion as a modeling choice in the section describing the criterion, then delivers a mechanized Dafny proof that one practical variant of depth-limited negamax with alpha-beta pruning and transposition tables satisfies the criterion while constructing a concrete counterexample showing violation for the other variant. All results follow directly from the stated definitions, the Dafny semantics, and the explicitly modeled witness notion without any reduction to fitted parameters, self-citations that bear the central load, or equivalences by construction. The formalization is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The verification rests on standard mathematical properties of finite game trees and the operational semantics of the Dafny language; the only novel modeling choice is the witness-based criterion itself.

axioms (2)

domain assumption Game trees are finite and acyclic with perfect information.
Invoked when modeling the search as an explicit tree expansion in the witness criterion.
standard math Dafny's verification semantics correctly capture the operational behavior of the encoded algorithms.
Background assumption of any Dafny proof; stated in the verification setup.

invented entities (1)

witness-based correctness criterion no independent evidence
purpose: To define when a depth-limited search result with transposition tables is acceptable by requiring an explicit justifying game-tree expansion.
New concept introduced to handle the interaction of depth limits and memoization; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5691 in / 1516 out tokens · 40973 ms · 2026-05-18T14:36:32.226580+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Bird, R.S., Hughes, J.: The alpha-beta algorithm: An exercise in program trans- formation. Inf. Process. Lett.24(1), 53–57 (1987). https://doi.org/10.1016/0020- 0190(87)90198-0

work page doi:10.1016/0020- 1987
[2]

Breuker, D.: Memory versus Search in Games. Ph.d. thesis, Maastricht University, Maastricht (1998). https://doi.org/10.26481/dis.19981016db, supervised by Hen- drik van den Herik

work page doi:10.26481/dis.19981016db 1998
[3]

SIGART Newsl.72, 29–31 (1980)

Fishburn, J.P.: An optimization of alpha-beta search. SIGART Newsl.72, 29–31 (1980). https://doi.org/10.1145/1056447.1056450

work page doi:10.1145/1056447.1056450 1980
[4]

SIGART Newsl.84, 37–38 (1983)

Fishburn, J.P.: Another optimization of alpha-beta search. SIGART Newsl.84, 37–38 (1983). https://doi.org/10.1145/1056623.1056628

work page doi:10.1145/1056623.1056628 1983
[5]

Knuth, D.E., Moore, R.W.: An analysis of alpha-beta pruning. Artif. Intell.6(4), 293–326 (1975). https://doi.org/10.1016/0004-3702(75)90019-3

work page doi:10.1016/0004-3702(75)90019-3 1975
[6]

In: Clarke, E.M., Voronkov, A

Leino, R.: Dafny: An automatic program verifier for functional correctness. In: 16th International Conference, LPAR-16, Dakar, Senegal. pp. 348–370. Springer Berlin Heidelberg (April 2010). https://doi.org/10.1007/978-3-642-17511-4_20

work page doi:10.1007/978-3-642-17511-4_20 2010
[7]

ICCA Journal9(1), 3–19 (1986)

Marsland, T.A.: A review of game-tree pruning. ICCA Journal9(1), 3–19 (1986). https://doi.org/10.3233/ICG-1986-9102

work page doi:10.3233/icg-1986-9102 1986
[8]

ACM Books, self-published (Jun 2024), https://functional-algorithms- verified.org/

Nipkow, T., et al.: Functional Data Structures and Algorithms: A Proof Assistant Approach. ACM Books, self-published (Jun 2024), https://functional-algorithms- verified.org/

work page 2024
[9]

Intell.12(2), 179–196 (1979)

Stockman, G.C.: A minimax algorithm better than alpha-beta? Artif. Intell.12(2), 179–196 (1979). https://doi.org/10.1016/0004-3702(79)90016-X

work page doi:10.1016/0004-3702(79)90016-x 1979
[10]

Wesselink, W.: Minimax verification.https://github.com/wiegerw/minimax (2025), accessed: 2025-07-16

work page 2025
[11]

https://en.wikipedia.org/wiki/Negamax (2025), [Online; accessed 09-June-2025]

Wikipedia contributors: Negamax — Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Negamax (2025), [Online; accessed 09-June-2025]

work page 2025
[12]

ACM Trans

Ying, M.: Floyd-Hoare logic for quantum programs. ACM Trans. Program. Lang. Syst.33(6), 19:1–19:49 (2011). https://doi.org/10.1145/2049706.2049708 10 W. Wesselink et al. Algorithm 3Transposition-table-based search from Wikipedia [11]. Input: A game treeu, an alpha-beta window(α, β)and a transposition tableT. Output: A valuevaluesuch thatis-negamax-tt-valu...

work page doi:10.1145/2049706.2049708 2011

[1] [1]

Bird, R.S., Hughes, J.: The alpha-beta algorithm: An exercise in program trans- formation. Inf. Process. Lett.24(1), 53–57 (1987). https://doi.org/10.1016/0020- 0190(87)90198-0

work page doi:10.1016/0020- 1987

[2] [2]

Breuker, D.: Memory versus Search in Games. Ph.d. thesis, Maastricht University, Maastricht (1998). https://doi.org/10.26481/dis.19981016db, supervised by Hen- drik van den Herik

work page doi:10.26481/dis.19981016db 1998

[3] [3]

SIGART Newsl.72, 29–31 (1980)

Fishburn, J.P.: An optimization of alpha-beta search. SIGART Newsl.72, 29–31 (1980). https://doi.org/10.1145/1056447.1056450

work page doi:10.1145/1056447.1056450 1980

[4] [4]

SIGART Newsl.84, 37–38 (1983)

Fishburn, J.P.: Another optimization of alpha-beta search. SIGART Newsl.84, 37–38 (1983). https://doi.org/10.1145/1056623.1056628

work page doi:10.1145/1056623.1056628 1983

[5] [5]

Knuth, D.E., Moore, R.W.: An analysis of alpha-beta pruning. Artif. Intell.6(4), 293–326 (1975). https://doi.org/10.1016/0004-3702(75)90019-3

work page doi:10.1016/0004-3702(75)90019-3 1975

[6] [6]

In: Clarke, E.M., Voronkov, A

Leino, R.: Dafny: An automatic program verifier for functional correctness. In: 16th International Conference, LPAR-16, Dakar, Senegal. pp. 348–370. Springer Berlin Heidelberg (April 2010). https://doi.org/10.1007/978-3-642-17511-4_20

work page doi:10.1007/978-3-642-17511-4_20 2010

[7] [7]

ICCA Journal9(1), 3–19 (1986)

Marsland, T.A.: A review of game-tree pruning. ICCA Journal9(1), 3–19 (1986). https://doi.org/10.3233/ICG-1986-9102

work page doi:10.3233/icg-1986-9102 1986

[8] [8]

ACM Books, self-published (Jun 2024), https://functional-algorithms- verified.org/

Nipkow, T., et al.: Functional Data Structures and Algorithms: A Proof Assistant Approach. ACM Books, self-published (Jun 2024), https://functional-algorithms- verified.org/

work page 2024

[9] [9]

Intell.12(2), 179–196 (1979)

Stockman, G.C.: A minimax algorithm better than alpha-beta? Artif. Intell.12(2), 179–196 (1979). https://doi.org/10.1016/0004-3702(79)90016-X

work page doi:10.1016/0004-3702(79)90016-x 1979

[10] [10]

Wesselink, W.: Minimax verification.https://github.com/wiegerw/minimax (2025), accessed: 2025-07-16

work page 2025

[11] [11]

https://en.wikipedia.org/wiki/Negamax (2025), [Online; accessed 09-June-2025]

Wikipedia contributors: Negamax — Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Negamax (2025), [Online; accessed 09-June-2025]

work page 2025

[12] [12]

ACM Trans

Ying, M.: Floyd-Hoare logic for quantum programs. ACM Trans. Program. Lang. Syst.33(6), 19:1–19:49 (2011). https://doi.org/10.1145/2049706.2049708 10 W. Wesselink et al. Algorithm 3Transposition-table-based search from Wikipedia [11]. Input: A game treeu, an alpha-beta window(α, β)and a transposition tableT. Output: A valuevaluesuch thatis-negamax-tt-valu...

work page doi:10.1145/2049706.2049708 2011