pith. sign in

arxiv: 2508.00712 · v2 · pith:CI756NCZnew · submitted 2025-08-01 · 💻 cs.LG · cs.AI

JSON-Bag: A generic game trajectory representation

Pith reviewed 2026-05-21 23:10 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords game trajectory representationJSON tokenizationbag-of-tokensJensen-Shannon distanceagent classificationtabletop gamesprototype nearest neighborpolicy distance correlation
0
0 comments X

The pith

Tokenizing JSON game state descriptions into bags and comparing them with Jensen-Shannon distance classifies agents and game settings more accurately than hand-crafted features across six tabletop games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces JSON-Bag as a generic way to turn full game trajectories into bags of tokens taken from their JSON state records. It measures similarity between these bags using Jensen-Shannon distance and tests the approach on three classification problems: identifying which agent produced a trajectory, which game parameters were used, or which random seed started the game. The tests cover six tabletop games including 7 Wonders, Dominion, and Connect4. A sympathetic reader would care because the method needs no game-specific feature engineering yet still beats a hand-crafted baseline in most tasks and reveals that distances between these simple bags track real differences in how agents play.

Core claim

JSON-Bag converts JSON descriptions of game states into unordered token collections and applies Jensen-Shannon distance to compare entire trajectories. When used inside a prototype-based nearest-neighbor classifier, the representation outperforms a hand-crafted feature baseline on the majority of agent, parameter, and seed classification tasks across the six games. The same prototypes also prove sample-efficient in N-shot settings. Treating the individual tokens as features for a Random Forest classifier further raises accuracy on tasks where the bag model alone underperformed. Finally, the Jensen-Shannon distances between agent-class prototypes correlate strongly with measured distances in

What carries the argument

The JSON-Bag representation, which tokenizes JSON game-state strings into frequency bags and uses Jensen-Shannon distance to compare those bags without regard to token order or sequence structure.

If this is right

  • The bag representation classifies which agent generated a trajectory more accurately than hand-crafted features in most of the tested tasks.
  • Prototype vectors built from JSON-Bag allow sample-efficient N-shot classification of trajectory classes.
  • Individual tokens inside the bags can be fed directly to a Random Forest model to raise accuracy on tasks where the pure bag approach lagged.
  • Jensen-Shannon distance between agent prototypes tracks the distance between the agents' underlying policies across all six games.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same token-bag approach could be tried on any domain that already produces structured JSON logs, such as robot control traces or simulation outputs, without new feature design.
  • Because the method compares trajectories rather than policies directly, it offers a route to rank or cluster black-box agents solely from observed play data.
  • The success of the unordered bag model suggests that many trajectory discrimination tasks may not require recurrent or graph-based sequence modeling at the first analysis stage.

Load-bearing premise

That the bag of tokens extracted from JSON state descriptions still carries enough information about game dynamics and player behavior for classification accuracy and policy-distance correlation to remain meaningful.

What would settle it

If JSON-Bag with Jensen-Shannon distance produced lower classification accuracy than the hand-crafted baseline on a majority of new games or tasks, or if the correlation between prototype distances and actual policy distances fell near zero, the central claims would be falsified.

Figures

Figures reproduced from arXiv: 2508.00712 by Diego Perez-Liebana, Dien Nguyen, Simon Lucas.

Figure 1
Figure 1. Figure 1: Sea Salt and Paper Confusion Matrices with P-NNS. From left to right, classification of agents, game parameters, and game seeds. Top row shows results for hand-crafted features, bottom row for the JSON-Bag model. Darker shades in the diagonal represent higher classification accuracies [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dots and boxes agents confusion matrices with RF. Stop. This suggests JSON-Bag is extracting information that the hand-crafted features do not have. On the other hand, using hand-crafted features significantly outperforms JSON-Bag in classifying Dots and boxes agents with just 8 features. Using only turn count as a feature already reaches an accuracy of 61%, compared to JSON-Bag’s 51%. The same information… view at source ↗
Figure 3
Figure 3. Figure 3: JSON-Bag prototype distance vs. Policy Distance between agent [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

We introduce JSON Bag-of-Tokens model (JSON-Bag) as a method to generically represent game trajectories by tokenizing their JSON descriptions and apply Jensen-Shannon distance (JSD) as distance metric for them. Using a prototype-based nearest-neighbor search (P-NNS), we evaluate the validity of JSON-Bag with JSD on six tabletop games: 7 Wonders, Dominion, Sea Salt and Paper, Can't Stop, Connect4, Dots and boxes; each over three game trajectory classification tasks: classifying the playing agents, game parameters, or game seeds that were used to generate the trajectories. Our approach outperforms a baseline using hand-crafted features in the majority of tasks. Evaluating on N-shot classification suggests using JSON-Bag prototype to represent game trajectory classes is also sample efficient. Additionally, we demonstrate JSON-Bag ability for automatic feature extraction by treating tokens as individual features to be used in Random Forest to solve the tasks above, which significantly improves accuracy on underperforming tasks. Finally, we show that, across all six games, the JSD between JSON-Bag prototypes of agent classes highly correlates with the distances between agents' policies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces JSON-Bag, a generic representation for game trajectories obtained by tokenizing JSON state descriptions into bags of tokens and employing Jensen-Shannon distance (JSD) as a metric. Through prototype-based nearest-neighbor search (P-NNS), it evaluates this on six tabletop games for three classification tasks (agent identity, game parameters, game seeds), reporting outperformance versus a hand-crafted feature baseline in the majority of cases, sample efficiency in N-shot settings, further gains when using tokens as features in Random Forest classifiers, and a high correlation between JSD of agent-class prototypes and distances between the agents' policies.

Significance. If the empirical results hold under more rigorous statistical scrutiny, JSON-Bag provides a simple, domain-agnostic trajectory representation that sidesteps manual feature engineering, which could be useful for agent comparison, behavior clustering, and policy analysis in game AI and imitation learning. The multi-game, multi-task evaluation and the reported JSD–policy-distance correlation are positive elements; the manuscript also benefits from using concrete, reproducible games rather than abstract claims.

major comments (2)
  1. [§4] §4 (Experimental results): The accuracy tables and figures report point estimates for JSON-Bag versus the hand-crafted baseline without error bars, standard deviations across runs, or any statistical significance tests; this directly affects the central claim of outperformance 'in the majority of tasks' and the N-shot results, as variability due to seeds or sampling cannot be assessed.
  2. [§3] §3 (JSON-Bag construction): The method extracts tokens from successive JSON states but collapses them into an unordered multiset before applying JSD; no ablation compares this bag representation against order-preserving or structure-aware alternatives (e.g., sequential models or tree kernels on the JSON), leaving untested whether marginal token frequencies alone suffice to capture policy differences in path-dependent games such as Connect4 and Dominion.
minor comments (2)
  1. [Abstract] Abstract and §2: The phrasing 'JSON Bag-of-Tokens model (JSON-Bag)' is slightly inconsistent; a single defined term would improve readability.
  2. [§4.3] §4.3 (Random Forest experiments): The description of how individual tokens are turned into features for the classifier lacks detail on vocabulary size, handling of rare tokens, or cross-validation protocol.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on experimental reporting and the design assumptions of JSON-Bag. We address each major comment below and describe the changes we will make to the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental results): The accuracy tables and figures report point estimates for JSON-Bag versus the hand-crafted baseline without error bars, standard deviations across runs, or any statistical significance tests; this directly affects the central claim of outperformance 'in the majority of tasks' and the N-shot results, as variability due to seeds or sampling cannot be assessed.

    Authors: We agree that reporting only point estimates limits the strength of the outperformance claims. In the revised version we will rerun all experiments with multiple random seeds for prototype selection, N-shot sampling, and classifier training. We will report mean accuracies together with standard deviations in the tables, add error bars to the figures, and include statistical significance tests (e.g., paired Wilcoxon tests) for the main comparisons. revision: yes

  2. Referee: [§3] §3 (JSON-Bag construction): The method extracts tokens from successive JSON states but collapses them into an unordered multiset before applying JSD; no ablation compares this bag representation against order-preserving or structure-aware alternatives (e.g., sequential models or tree kernels on the JSON), leaving untested whether marginal token frequencies alone suffice to capture policy differences in path-dependent games such as Connect4 and Dominion.

    Authors: The unordered bag-of-tokens is an intentional design choice that prioritizes simplicity and domain-agnosticism. The strong correlation we report between JSD of agent prototypes and actual policy distances (across all six games, including Connect4 and Dominion) indicates that state-visit frequencies already capture policy-relevant differences. Nevertheless, we acknowledge that an explicit ablation against sequential or tree-based alternatives is absent. We will add a dedicated paragraph in the discussion section that motivates the bag representation, cites the policy-distance correlation as supporting evidence, and explicitly lists the lack of order-preserving ablations as a limitation for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical method with external validation

full rationale

The paper introduces JSON-Bag as a tokenization-based representation for game trajectories and validates it through direct empirical tasks: prototype-based nearest-neighbor classification of agents/parameters/seeds, N-shot sample efficiency, Random Forest feature extraction, and JSD correlation with policy distances across six games. All reported results are obtained by applying the representation to held-out trajectory data and comparing against an independent hand-crafted baseline; no equations, fitted parameters, or self-citations are used to derive the accuracies or correlations from the inputs themselves. The derivation chain consists solely of data processing followed by standard distance and classifier evaluation, remaining fully self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that JSON serializations capture the state information needed for behavioral classification and that bag-of-tokens plus JSD is sufficient to recover agent and parameter distinctions.

axioms (1)
  • domain assumption JSON descriptions of game states contain the information necessary to distinguish agents, parameters, and seeds via token frequencies alone
    Invoked by the choice to tokenize JSON directly without additional state engineering or sequential modeling.

pith-pipeline@v0.9.0 · 5729 in / 1330 out tokens · 47268 ms · 2026-05-21T23:10:13.185189+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

  1. [1]

    Temporal-difference search in computer Go,

    D. Silver, R. S. Sutton, and M. Mller, “Temporal-difference search in computer Go,” Machine Learning, vol. 87, no. 2, May 2012

  2. [2]

    Defining personas in games using metrics,

    A. Tychsen and A. Canossa, “Defining personas in games using metrics,” in Proceedings of the 2008 Conference on Future Play , 2008

  3. [3]

    MAP-Elites to Generate a Team of Agents that Elicits Diverse Automated Gameplay,

    C. Guerrero-Romero and D. Perez-Liebana, “MAP-Elites to Generate a Team of Agents that Elicits Diverse Automated Gameplay,” 2021 IEEE Conference on Games (CoG) , 2021

  4. [4]

    Automatic generation and evaluation of recombination games,

    C. B. Browne, “Automatic generation and evaluation of recombination games,” PhD Thesis, Queensland University of Technology, 2008

  5. [5]

    Abandoning Objectives: Evolution Through the Search for Novelty Alone,

    J. Lehman and K. O. Stanley, “Abandoning Objectives: Evolution Through the Search for Novelty Alone,” Evolutionary Computation , vol. 19, no. 2, pp. 189–223, Jun. 2011

  6. [6]

    Illuminating search spaces by mapping elites

    J.-B. Mouret and J. Clune, “Illuminating search spaces by mapping elites,” Apr. 2015, 10.48550/arXiv.1504.04909

  7. [7]

    A comparison of methods for player clustering via behavioral telemetry,

    A. Drachen, C. Thurau, R. Sifa, and C. Bauckhage, “A comparison of methods for player clustering via behavioral telemetry,” FDG, 2013

  8. [8]

    Retrieving Game States with Moment Vectors,

    Z. Zhan and A. M. Smith, “Retrieving Game States with Moment Vectors,” in AAAI Workshops, 2018

  9. [9]

    Divergence measures based on the Shannon entropy,

    J. Lin, “Divergence measures based on the Shannon entropy,” IEEE Transactions on Information Theory , vol. 37, no. 1, 1991

  10. [10]

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,

    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” 2017

  11. [11]

    Automated Game Design via Concep- tual Expansion,

    M. J. Guzdial and M. O. Riedl, “Automated Game Design via Concep- tual Expansion,” in AIIDE, 2018

  12. [12]

    A Video Game Description Language for Model-based or Interactive Learning,

    T. Schaul, “A Video Game Description Language for Model-based or Interactive Learning,” in IEEE CIG, 2013

  13. [13]

    Ludii The Ludemic General Game System,

    Piette, D. J. N. J. Soemers, M. Stephenson, C. F. Sironi, M. H. M. Winands, and C. Browne, “Ludii The Ludemic General Game System,” in ECAI, 2020

  14. [14]

    Trans- forming Exploratory Creativity with DeLeNoX,,

    A. Liapis, H. P. Martnez, J. Togelius, and G. N. Yannakakis, “Trans- forming Exploratory Creativity with DeLeNoX,,” in ICCC, 2021

  15. [15]

    Novelty search for deep reinforcement learning policy network weights by action sequence edit metric dis- tance,

    E. C. Jackson and M. Daley, “Novelty search for deep reinforcement learning policy network weights by action sequence edit metric dis- tance,” GECCO, 2019

  16. [16]

    TAG: A Tabletop Games Framework,

    R. D. Gaina, M. Balla, A. Dockhorn, R. Montoliu, and D. Prez-Libana, “TAG: A Tabletop Games Framework,” in AIIDE Workshops, 2020

  17. [17]

    Matrices, Vector Spaces, and Information Retrieval,

    M. W. Berry, Z. Drmac, and E. R. Jessup, “Matrices, Vector Spaces, and Information Retrieval,” SIAM Review, vol. 41, no. 2, 1999

  18. [18]

    On Information and Sufficiency,

    S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics , vol. 22, no. 1, pp. 79–86, Mar. 1951

  19. [19]

    Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus,

    G. Storey and D. Mimno, “Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus,” Journal of Cultural Analyt- ics, vol. 5, no. 2, Jul. 2020

  20. [20]

    Bag-of- words representation for biomedical time series classification,

    J. Wang, P. Liu, M. F. H. She, S. Nahavandi, and A. Kouzani, “Bag-of- words representation for biomedical time series classification,” Biomed- ical Signal Processing and Control , vol. 8, no. 6, 2013

  21. [21]

    A new metric for probability distributions,

    D. Endres and J. Schindelin, “A new metric for probability distributions,” IEEE Transactions on Information Theory , vol. 49, no. 7, 2003

  22. [22]

    A Survey of Monte Carlo Tree Search Methods,

    C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A Survey of Monte Carlo Tree Search Methods,” IEEE Transactions on Computational Intelligence and AI in Games , vol. 4, no. 1, 2012

  23. [23]

    The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation,

    S. M. Lucas, J. Liu, and D. Perez-Liebana, “The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation,” in 2018 IEEE Congress on Evolutionary Computation (CEC) , 2018

  24. [24]

    Random Forests,

    L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, 2001

  25. [25]

    Seeding for Success: Skill and Stochasticity in Tabletop Games,

    J. Goodman, D. Perez-Liebana, and S. Lucas, “Seeding for Success: Skill and Stochasticity in Tabletop Games,” IEEE ToG, 2025

  26. [26]

    COMPUTING ELO RATINGS OF MOVE PATTERNS IN THE GAME OF GO,

    R. Coulom, “COMPUTING ELO RATINGS OF MOVE PATTERNS IN THE GAME OF GO,” ICGA Journal, vol. 30, no. 4, 2007

  27. [27]

    MultiTree MCTS in Tabletop Games,

    J. Goodman, D. Perez-Liebana, and S. Lucas, “MultiTree MCTS in Tabletop Games,” in 2022 IEEE Conference on Games (CoG) , 2022

  28. [28]

    Clustering by Compression,

    R. Cilibrasi and P. Vitanyi, “Clustering by Compression,” IEEE Trans- actions on Information Theory , 2005

  29. [29]

    ”Low-Resource

    Z. Jiang, M. Y . R. Yang, M. Tsirlin, R. Tang, Y . Dai, and J. J. Lin, “”Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors,” in ACL, 2023