pith. machine review for the scientific record. sign in

arxiv: 2604.15585 · v1 · submitted 2026-04-16 · 💻 cs.LG · cs.AI

Recognition: unknown

PAWN: Piece Value Analysis with Neural Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords chess piece value predictionneural networksautoencodersboard state encodingmachine learning in gamesmultilayer perceptronsposition evaluation
0
0 comments X

The pith

Incorporating full chessboard context through CNN autoencoder latents improves MLP piece-value prediction accuracy by 16 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a chess piece's contribution cannot be predicted accurately from its type and location alone because its value depends on spatial relationships with every other piece. It shows that encoding the entire board state as a compact latent vector via a convolutional autoencoder and supplying that vector to a multilayer perceptron yields substantially lower error than context-free MLPs. Training uses more than 12 million piece-value pairs extracted from grandmaster games, with labels produced by Stockfish 17. The resulting model predicts relative value to within roughly 0.65 pawns on held-out positions. The authors conclude that explicit full-state encoding supplies useful inductive bias when the goal is to estimate the marginal contribution of any single component inside a large structured system.

Core claim

Latent position representations derived from a CNN-based autoencoder, when added as context to MLP architectures, reduce validation mean absolute error for piece-value prediction by 16 percent and achieve accuracy within approximately 0.65 pawns, outperforming context-independent baselines on a dataset of over 12 million examples labeled by Stockfish 17 from grandmaster games.

What carries the argument

The CNN-based autoencoder that compresses the full chessboard state into latent vectors, which are concatenated with piece-specific features before being passed to the MLP predictor.

If this is right

  • Piece-value models that receive explicit board-wide context outperform those that receive only local features.
  • The 0.65-pawn accuracy level is tight enough for practical use in position analysis or engine tuning.
  • Encoding the complete state as context improves prediction of any individual component's contribution inside interdependent systems.
  • The performance gap demonstrates that spatial relationships across the board carry predictive signal beyond isolated piece attributes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same latent-context approach could be tested on other turn-based games where component values vary with global configuration.
  • If the autoencoder latents capture position essence, they might also improve downstream tasks such as move recommendation or blunder detection.
  • The result suggests that any domain requiring marginal-contribution estimates may benefit from learning a compressed representation of the entire input configuration rather than treating components in isolation.

Load-bearing premise

Stockfish 17 evaluations supply reliable ground-truth labels for each piece's marginal contribution that remain valid across different model architectures.

What would settle it

Retraining the same architecture on labels produced by an independent engine such as Komodo or Leela Chess Zero and measuring whether the 16 percent error reduction persists on the same test positions.

Figures

Figures reproduced from arXiv: 2604.15585 by Ethan Tang, Hasan Davulcu, Jia Zou, Zhongju Zhang.

Figure 1
Figure 1. Figure 1: In this position [3] with White to move, our piece value predictor assigned the Nd6 a piece value of 703 cp, which is significantly larger than the ng6 assigned a piece value of -355 cp [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: In this position [19] with White to move, the piece value of the rg6 cannot be calculated using our definition since the Qe4 would attack the kh7. 3 System Architecture 3.1 Overview Computing vx for a single piece requires two separate engine evaluations: one for the original position and one for the position with the piece removed. Across all non-king pieces in a given position, inference quickly becomes … view at source ↗
Figure 3
Figure 3. Figure 3: Our piece value predictor assigns the bad bg7 (left) biting on White’s granite of Pd4-e5-f4 a modest piece value of -453 cp. Meanwhile, the active bg7 (right), which acts as a key contributor to Black’s attack on the Pc3, is assigned a significantly larger piece value of -950 cp [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A position in Dataset MC from one of GM Magnus Carlsen’s online blitz games [18], where White (Carlsen) has chosen the non-standard plan of swapping his Q and K in the opening via Qa4-h4-e1 and Kd1. 4.2 Preprocessing Before training, piece value entries were standardized using Z-score normalization based on the mean and standard deviation of the training set. We also experimented with capping piece values … view at source ↗
Figure 5
Figure 5. Figure 5: Predicted piece values in the Jobava-Rapport system after 3..c5 4. e4! cxd4 ... 9.e6! fxe6 [17, 34]. In this position, White has sacrificed a pawn on e6 in exchange for a lead in development. Play in this position revolves around the question of whether Black can finish their development before White coordinates an attack against Black’s backwards pe6/pe7. The best move in the position is 10. . . qb6, offe… view at source ↗
read the original abstract

Predicting the relative value of any given chess piece in a position remains an open challenge, as a piece's contribution depends on its spatial relationships with every other piece on the board. We demonstrate that incorporating the state of the full chess board via latent position representations derived using a CNN-based autoencoder significantly improves accuracy for MLP-based piece value prediction architectures. Using a dataset of over 12 million piece-value pairs gathered from Grandmaster-level games, with ground-truth labels generated by Stockfish 17, our enhanced piece value predictor significantly outperforms context-independent MLP-based systems, reducing validation mean absolute error by 16% and predicting relative piece value within approximately 0.65 pawns. More generally, our findings suggest that encoding the full problem state as context provides useful inductive bias for predicting the contribution of any individual component.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript presents PAWN, a neural architecture that derives latent representations of full chessboard states via a CNN-based autoencoder and feeds them into an MLP to predict the relative value of individual pieces. Using over 12 million piece-value pairs extracted from Grandmaster games and labeled by Stockfish 17, the context-aware model is reported to reduce validation mean absolute error by 16% relative to context-independent MLPs while achieving approximately 0.65-pawn accuracy. The authors conclude that encoding the complete board state supplies useful inductive bias for predicting the contribution of any single component.

Significance. If the 16% MAE reduction is shown to arise specifically from the CNN-derived board context rather than differences in model capacity or training procedure, the result would illustrate how global state encoding can improve local component-value prediction in a structured domain like chess. The scale of the dataset (more than 12 million examples drawn from real Grandmaster play) constitutes a concrete empirical strength that would support broader claims about inductive bias if accompanied by proper controls and ablations.

major comments (3)
  1. [Abstract] Abstract: the reported 16% validation MAE reduction is presented as arising from 'incorporating the state of the full chess board via latent position representations,' yet no evidence is supplied that the context-independent MLP baseline was capacity-matched (e.g., by equating total parameters, adding dummy input dimensions, or using identical layer widths and training schedules). Without such controls the performance delta cannot be attributed to the CNN autoencoder rather than differences in effective model size or optimization.
  2. [Abstract] Abstract: the manuscript supplies no architecture diagrams, latent dimension, CNN or MLP layer sizes, training hyperparameters, validation split protocol, or ablation studies. These omissions render it impossible to determine whether the 0.65-pawn accuracy is reproducible or robust, directly undermining assessment of the central empirical claim.
  3. [Abstract] Abstract: Stockfish 17 evaluations are used as ground-truth labels for piece marginal contributions without discussion of potential engine-specific biases or cross-validation against other sources (alternative engines or expert annotations). Because the accuracy metric and the 16% improvement are measured against these labels, the assumption is load-bearing for interpreting the reported results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to incorporate the suggested improvements where feasible.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 16% validation MAE reduction is presented as arising from 'incorporating the state of the full chess board via latent position representations,' yet no evidence is supplied that the context-independent MLP baseline was capacity-matched (e.g., by equating total parameters, adding dummy input dimensions, or using identical layer widths and training schedules). Without such controls the performance delta cannot be attributed to the CNN autoencoder rather than differences in effective model size or optimization.

    Authors: We agree that the original experiments did not include explicit capacity-matched controls for the baseline. The context-independent MLP uses only local piece features, while the full model adds parameters from the CNN-derived latent representation. In the revised manuscript we will add an ablation study that expands the baseline MLP (via increased hidden-layer widths or dummy input dimensions) to match the total parameter count of the PAWN model and retrains it under identical schedules. The results of this controlled comparison will be reported to isolate the contribution of board-state context. revision: yes

  2. Referee: [Abstract] Abstract: the manuscript supplies no architecture diagrams, latent dimension, CNN or MLP layer sizes, training hyperparameters, validation split protocol, or ablation studies. These omissions render it impossible to determine whether the 0.65-pawn accuracy is reproducible or robust, directly undermining assessment of the central empirical claim.

    Authors: We acknowledge that the current manuscript omits these implementation details. The revised version will include a new appendix containing: architecture diagrams for the CNN autoencoder and MLP; exact specifications (latent dimension of 128, CNN filter counts and kernel sizes, MLP layer widths); full training hyperparameters (optimizer, learning-rate schedule, batch size, epochs); the validation protocol (game-level split to avoid intra-game leakage); and additional ablation results. These additions will enable reproducibility and allow readers to assess the robustness of the reported 0.65-pawn MAE. revision: yes

  3. Referee: [Abstract] Abstract: Stockfish 17 evaluations are used as ground-truth labels for piece marginal contributions without discussion of potential engine-specific biases or cross-validation against other sources (alternative engines or expert annotations). Because the accuracy metric and the 16% improvement are measured against these labels, the assumption is load-bearing for interpreting the reported results.

    Authors: We recognize the need to address the choice of ground-truth labels. The revised manuscript will add a dedicated paragraph discussing potential engine-specific biases in Stockfish 17 (e.g., its evaluation heuristics for mobility and pawn structure). We will also report a limited cross-validation on a 100k-position subset re-labeled by an alternative engine to quantify consistency in both absolute MAE and the observed relative improvement. A full re-labeling of the 12-million-position corpus is computationally prohibitive at present, but the added discussion and partial validation will clarify the scope and limitations of the current results. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical comparison is self-contained and falsifiable

full rationale

The paper reports an empirical head-to-head experiment: an MLP piece-value predictor augmented with CNN-autoencoder latent board encodings is trained on >12M Stockfish-labeled examples and achieves 16% lower validation MAE than a context-free MLP baseline. No equations, derivations, or uniqueness theorems are invoked; the central claim is a measured performance delta on held-out data. This delta is independent of the input labels by construction and can be falsified by re-running the training with capacity-matched baselines or different seeds. No self-citations, fitted-input renamings, or ansatzes appear in the provided text, so the result does not reduce to its own inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The claim depends on the assumption that Stockfish labels are faithful proxies for piece contribution and that the autoencoder latent space captures causally relevant board context. Neural-network weights constitute a large set of fitted parameters whose values are not reported.

free parameters (2)
  • Autoencoder latent dimension and layer sizes
    Chosen to produce a compact board encoding; exact values not stated.
  • MLP architecture and training hyperparameters
    All weights and optimization settings fitted to the 12 M labeled positions.
axioms (1)
  • domain assumption Stockfish 17 evaluations provide accurate, model-independent ground-truth piece values
    Labels are generated by Stockfish 17 and treated as reliable targets.

pith-pipeline@v0.9.0 · 5433 in / 1399 out tokens · 47145 ms · 2026-05-10T10:47:06.739790+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 10 canonical work pages

  1. [1]

    ChessBase: ChessBase Mega Database 2025 (Nov 2024), accessed: 2026-03-01

  2. [2]

    – Ganguly, S.S.: Khanty-Mansiysk Olympiad, position after 19...Ra8 (2010), https://www.chessgames.com/perl/chessgame?gid=1593255, accessed: 2026-03-01

    Chessgames.com: Vitiugov, N. – Ganguly, S.S.: Khanty-Mansiysk Olympiad, position after 19...Ra8 (2010), https://www.chessgames.com/perl/chessgame?gid=1593255, accessed: 2026-03-01

  3. [3]

    – Rozum, I.: 76th Russian Championship, position after 22...Nxg6 (2023), https://www.chessgames.com/perl/chessgame?gid=2579817, accessed: 2026-03-01

    Chessgames.com: Artemiev, V. – Rozum, I.: 76th Russian Championship, position after 22...Nxg6 (2023), https://www.chessgames.com/perl/chessgame?gid=2579817, accessed: 2026-03-01

  4. [4]

    – Mittal, A.: Pavlodar Open-A, position after 32

    Chessgames.com: Nikitenko, M. – Mittal, A.: Pavlodar Open-A, position after 32. Kc2 (2023), https: //www.chessgames.com/perl/chessgame?gid=1593255, accessed: 2026-03-01

  5. [5]

    Chessprogramming.org: Centipawns, https://www.chessprogramming.org/Centipawns, accessed: 2026-03- 01

  6. [6]

    Chessprogramming.org: Evaluation, https://www.chessprogramming.org/Evaluation, accessed: 2026-03-01

  7. [7]

    Computer Chess Rating Lists: CCRL 40/15 Rating List, https://computerchess.org.uk/ccrl/4040/, ac- cessed: 2026-03-01

  8. [8]

    In: Computational Science – ICCS 2018: 18th International Conference, Wuxi, China, June 11–13, 2018 Proceedings, Part III

    Czarnul, P.: Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors. In: Computational Science – ICCS 2018: 18th International Conference, Wuxi, China, June 11–13, 2018 Proceedings, Part III. p. 457–464. Springer-Verlag, Berlin, Heidelberg (2018). https://doi.org/10.1007/ 978-3-319-93713-7_40, https://doi.org/10.1007/9...

  9. [9]

    Frost, M.F

    Elo, A.E.: The Rating of Chessplayers, Past and Present. Batsford chess books, Batsford (1978), https: //cir.nii.ac.jp/crid/1970586434859718946

  10. [10]

    Bantam Dell Publishing Group, New York, NY (Jan 1984)

    Fischer, B.: Bobby Fischer Teaches Chess. Bantam Dell Publishing Group, New York, NY (Jan 1984)

  11. [11]

    Entropy25(10), 1374 (Sep 2023)

    Gupta, A., Maharaj, S., Polson, N., Sokolov, V.: On the Value of Chess Squares. Entropy25(10), 1374 (Sep 2023). https://doi.org/10.3390/e25101374, http://dx.doi.org/10.3390/e25101374

  12. [12]

    Jennewein, D., Lee, J., Kurtz, C., Dizon, W., Shaeffer, I., Chapman, A., Chiquete, A., Burks, J., Carlson, A., Mason, N., Kobawala, A., Jagadeesan, T., Basani, P.B., Battelle, T., Belshe, R., McCaffrey, D., Brazil, M., Inumella, C., Kuznia, K., Yalim, J.: The Sol Supercomputer at Arizona State University. pp. 296–301 (07 2023). https://doi.org/10.1145/356...

  13. [13]

    Chess.com Lessons, https://www.chess.com/lessons/ advanced-piece-values, accessed: 2026-03-01

    Kaufman, L.: Advanced Piece Values. Chess.com Lessons, https://www.chess.com/lessons/ advanced-piece-values, accessed: 2026-03-01

  14. [14]

    Accessed: 2026-03-01

    Kaufman, L.: The Evaluation of Material Imbalances (2018), https://www.danheisman.com/ evaluation-of-material-imbalances.html, reprinted at above URL. Accessed: 2026-03-01

  15. [15]

    https://github.com/asdfjkl/ nnue, accessed: 2026-03-01

    Klein, D.: NNUE – English translation of Yu Nasu’s original NNUE paper. https://github.com/asdfjkl/ nnue, accessed: 2026-03-01

  16. [16]

    Lasker,E.:Lasker’sChessPrimer.Batsford,London,England,reprintedn.(Nov1988),originallypublished 1934

  17. [17]

    e4 cxd4 followed by 9

    Lichess.org: Lichess Master’s Database: D01 Rapport-Jobava System - 3..c5 4. e4 cxd4 followed by 9. e6!, https://database.lichess.org/, accessed: 2026-03-01

  18. [18]

    – Schneider, I.: Lichess Blitz Titled Arena, position after 6

    Lichess.org: Carlsen, M. – Schneider, I.: Lichess Blitz Titled Arena, position after 6. Qe1! (2021), https: //lichess.org/IiYBoLKL#11, accessed: 2026-03-01

  19. [19]

    Lichess.org: Ding, L. – Nepomniachtchi, I.: FIDE World Chess Championship Rapid Tiebreaks, Game 4, position after 46...Rg6! (2023), https://lichess.org/broadcast/fide-world-chess-championship-2023/ tie-breaks/jCs1wd0E/8QvKR1zU, accessed: 2026-03-01

  20. [20]

    Deep blue.Artificial Intelligence, 134(1):57–83, 2002

    Murray Campbell and A.Joseph Hoane and Feng-hsiung Hsu: Deep Blue. Artificial Intelligence134(1), 57– 83 (2002). https://doi.org/https://doi.org/10.1016/S0004-3702(01)00129-1, https://www.sciencedirect. com/science/article/pii/S0004370201001291

  21. [21]

    The 28th World Computer Shogi Championship Appeal Document

    Nasu, Y.: Efficiently Updatable Neural-Network-based Evaluation Functions for Computer Shogi. The 28th World Computer Shogi Championship Appeal Document. Ziosoft Computer Shogi Club (2018), https: //github.com/ynasu87/nnue

  22. [22]

    Pav, S.: Inferring Piece Value in Chess and Chess Variants (2025), https://arxiv.org/abs/2509.04691

  23. [23]

    Gambit Publications, London, England (Oct 2005)

    Rowson, J.: Chess for Zebras. Gambit Publications, London, England (Oct 2005)

  24. [24]

    Ruoss, A., Delétang, G., Medapati, S., Grau-Moya, J., Wenliang, L.K., Catt, E., Reid, J., Lewis, C.A., Veness, J., Genewein, T.: Amortized Planning with Large-Scale Transformers: A Case Study on Chess (2024), https://arxiv.org/abs/2402.04494

  25. [25]

    Siles Press, 4th edn

    Silman, J.: How to Reassess Your Chess. Siles Press, 4th edn. (Oct 2010)

  26. [26]

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

    Silver, David and Hubert, Thomas and Schrittwieser, Julian and Antonoglou, Ioannis and Lai, Matthew and Guez, Arthur and Lanctot, Marc and Sifre, Laurent and Kumaran, Dharshan and Graepel, Thore and others: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv preprint arXiv:1712.01815 (2017)

  27. [27]

    org/abs/2510.25775

    Spinnato, F.: Towards Piece-by-Piece Explanations for Chess Positions with SHAP (2025), https://arxiv. org/abs/2510.25775

  28. [28]

    Stockfish Developers: WDL Model, https://github.com/official-stockfish/WDL_model, accessed: 2026-03- 01

  29. [29]

    Stockfish Team: Stockfish 18 (2026), https://stockfishchess.org/blog/2026/stockfish-18/, accessed: 2026- 03-01

  30. [30]

    Stockfish Wiki: Useful Data – Threading Efficiency and Elo Gain, https://official-stockfish.github.io/docs/ stockfish-wiki/Useful-data.html#stc, accessed: 2026-03-01

  31. [31]

    Tomašev, N., Paquet, U., Hassabis, D., Kramnik, V.: Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (2020), https://arxiv.org/abs/2009.04374

  32. [32]

    In: Levy, D

    Turing, A.: Chess. In: Levy, D. (ed.) Computer Chess Compendium, p. 15. Springer Verlag, Berlin (1988), originally published 1953

  33. [33]

    Wikipedia: Chess piece relative value, https://en.wikipedia.org/wiki/Chess_piece_relative_value, ac- cessed: 2026-03-01

  34. [34]

    Similarly, Table 6 showcases the performance of all MLP+CNN configurations trained on Dataset TF

    Wikipedia: London System – Jobava London, https://en.wikipedia.org/wiki/London_System, accessed: 2026-03-01 PAWN 13 A MLP+CNN Configuration Performance Table 5 showcases the performance of all MLP+CNN piece value predictor configurations trained on Dataset MC. Similarly, Table 6 showcases the performance of all MLP+CNN configurations trained on Dataset TF...

  35. [35]

    White’sPa2/Ph2 are worth significantly less than any other White pawns. White’s advantage is largely dynamic in this position due to their lead in development, so removing thePa2/Ph2 does not heavily impact the evaluation of the position due to their removal activating either the Ra1/Rh1, allowing them to pressure thepa7/ph7 respectively

  36. [36]

    Black’s pawns are worth more on average than White’s. This is due to Black’s advantage being largely static (up +1p); losing any material would swing the position’s evaluation heavily in White’s favor due to White’s advantage being primarily dynamic and therefore not as dependent (see Section 5.2) on the static factor of material count

  37. [37]

    The bc6 is the most valuable minor piece in this position, even outvaluing therh8/Ra1/Rh1 despite it being understood that bishops are worth less than rooks in general valuation systems. The bc6 prevents White’s thematic idea ofNb5-Nc7 while all rooks in this position are inactive, leading to thebc6 being valued more highly in this position due to the imp...