Automated Playtesting of Matching Tile Games

Christoffer Holmg{\aa}rd; Fernando de Mesentier Silva; Julian Togelius; Luvneesh Mugrai

arxiv: 1907.06570 · v1 · pith:2MU7APKWnew · submitted 2019-07-15 · 💻 cs.AI

Automated Playtesting of Matching Tile Games

Luvneesh Mugrai , Fernando de Mesentier Silva , Christoffer Holmg{\aa}rd , Julian Togelius This is my paper

Pith reviewed 2026-05-24 21:22 UTC · model grok-4.3

classification 💻 cs.AI

keywords Match-3 gamesautomated playtestingprocedural personasMonte Carlo Tree Searchevolutionary algorithmsgame AIuser study

0 comments

The pith

Evolving the utility function of Monte Carlo Tree Search agents generates procedural personas that approximate different human playstyles for automated playtesting of Match-3 games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to build an automated playtesting system for Match-3 games by producing procedural personas that stand in for varied human playstyles. These personas arise when an evolutionary process tunes the utility function inside a Monte Carlo Tree Search agent. The resulting agents are measured against a standard Monte Carlo Tree Search agent and a random agent, their effects on level design are noted, and a user study checks how closely their traces match real human play. If the method works, designers could probe many playstyles and design choices without recruiting large numbers of human testers.

Core claim

Procedural personas realized through evolving the utility function for the Monte Carlo Tree Search agent can approximate different human playstyles in Match-3 games, thereby creating an automated playtesting system.

What carries the argument

The evolved utility function inside the Monte Carlo Tree Search agent, which encodes priorities that produce distinct behavioral personas.

If this is right

Evolved agents produce different performance and move patterns from both vanilla Monte Carlo Tree Search and random selection.
The agents allow direct observation of effects on game design choices and on the overall design workflow.
A user study can measure how closely the agents' traces align with collected human play traces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same evolutionary tuning of utility functions could be applied to other matching-tile or simple puzzle games.
Designers might run batches of these personas early to surface levels that frustrate or bore particular player types.
Repeated evolution runs could be used to map how small rule changes shift the range of reachable playstyles.

Load-bearing premise

That differences among the evolved utility functions will yield agent behaviors that match meaningfully distinct human playstyles rather than arbitrary variations.

What would settle it

A user study in which participants cannot reliably tell the evolved agents' play traces apart from one another or from human traces when asked to identify playstyle differences.

Figures

Figures reproduced from arXiv: 1907.06570 by Christoffer Holmg{\aa}rd, Fernando de Mesentier Silva, Julian Togelius, Luvneesh Mugrai.

**Figure 2.** Figure 2: A few possible scenarios to make matches by swapping the 2 pieces in any of the highlighted colored squares. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 5.** Figure 5: Experiment 3. Maximizing average number of moves [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Experiment 4. Minimizing average number of moves [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

read the original abstract

Matching tile games are an extremely popular game genre. Arguably the most popular iteration, Match-3 games, are simple to understand puzzle games, making them great benchmarks for research. In this paper, we propose developing different procedural personas for Match-3 games in order to approximate different human playstyles to create an automated playtesting system. The procedural personas are realized through evolving the utility function for the Monte Carlo Tree Search agent. We compare the performance and results of the evolution agents with the standard Vanilla Monte Carlo Tree Search implementation as well as to a random move-selection agent. We then observe the impacts on both the game's design and the game design process. Lastly, a user study is performed to compare the agents to human play traces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Evolving MCTS utility functions to generate Match-3 playstyle personas is a direct application of existing techniques, but the paper supplies no results or metrics to show the agents actually approximate distinct human behaviors.

read the letter

The main point is that the authors evolve the utility function inside an MCTS agent to produce different procedural personas for Match-3 games, with the aim of using those agents for automated playtesting. They compare the evolved agents against vanilla MCTS and random selection, note effects on game design, and mention a user study against human traces. That combination for this specific genre is the new element; the components themselves are standard. The setup is clear enough and the baseline comparisons make sense as a way to ground the work. The user study is the right move to check whether the personas track real playstyles. The soft spot is that none of the validation is shown. The description stops at the plan for the study without metrics, statistical tests, move-sequence comparisons, or any outcomes from the evolution runs. This leaves the central assumption—that utility differences will produce meaningfully human-like behavioral differences—unsupported in the text. The stress-test note correctly flags this as the least secure step. The paper is aimed at people building practical AI tools for casual puzzle games rather than anyone looking for broad advances in search or evolution methods. A reader working on agent-based testing might pick up the framing as a starting point. It deserves peer review so the full experiments and results can be examined; the idea is workable if the data holds up.

Referee Report

2 major / 1 minor

Summary. The paper proposes an automated playtesting system for Match-3 games by evolving the utility function of Monte Carlo Tree Search agents to generate procedural personas that approximate distinct human playstyles. It compares the evolved agents to vanilla MCTS and random agents, examines effects on game design, and reports performing a user study to align agent behaviors with human play traces.

Significance. If the validation holds, the approach could supply a practical method for generating diverse, human-like AI testers in a popular puzzle genre, reducing reliance on manual playtesting during design iteration.

major comments (2)

[Abstract] Abstract: the central claim that evolved utility functions produce procedural personas approximating different human playstyles rests on an unshown user study; no methodology, metrics (move-sequence similarity, score distributions, statistical tests), participant details, or results are supplied, leaving the key approximation assumption unverified and load-bearing.
[User study description] User study description (throughout): without reported fitness-function alignment to human behavioral distributions or quantitative comparison results, it cannot be determined whether observed agent variations correspond to meaningfully distinct playstyles rather than arbitrary parameter-induced differences.

minor comments (1)

[Abstract] The abstract would be strengthened by briefly naming the fitness function and evolution parameters used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review and the opportunity to clarify the user study aspects of our work. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that evolved utility functions produce procedural personas approximating different human playstyles rests on an unshown user study; no methodology, metrics (move-sequence similarity, score distributions, statistical tests), participant details, or results are supplied, leaving the key approximation assumption unverified and load-bearing.

Authors: The abstract provides a high-level summary of the contributions. Detailed description of the user study, including methodology and participant details, appears in Section 5 of the manuscript. However, we agree that the abstract would benefit from including key results and metrics to support the central claim. In the revision, we will update the abstract to mention the metrics (e.g., move-sequence similarity, score distributions) and statistical findings from the user study. revision: yes
Referee: [User study description] User study description (throughout): without reported fitness-function alignment to human behavioral distributions or quantitative comparison results, it cannot be determined whether observed agent variations correspond to meaningfully distinct playstyles rather than arbitrary parameter-induced differences.

Authors: The fitness functions are evolved to target different behavioral emphases (detailed in Section 3), producing the procedural personas. The user study then provides the alignment to human play by comparing agent traces to human traces. We acknowledge the need for explicit quantitative results on this alignment. We will add quantitative comparison results and any statistical analyses to the revised manuscript to demonstrate that the variations reflect distinct playstyles. revision: yes

Circularity Check

0 steps flagged

No circularity; evolutionary method and user-study validation are independent of fitted inputs

full rationale

The paper describes an evolutionary process to tune MCTS utility functions, producing agents whose behaviors are then compared to human traces via a planned user study. No equations, fitted parameters, or self-citations are presented in the provided text that would make any claimed result equivalent to its own inputs by construction. The central claim rests on external validation (user study) rather than on renaming or re-deriving the fitness function itself. This is the normal non-circular case for an optimization-plus-validation pipeline.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all such elements remain unknown.

pith-pipeline@v0.9.0 · 5658 in / 1042 out tokens · 21400 ms · 2026-05-24T21:22:00.429864+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The procedural personas are realized through evolving the utility function for the Monte Carlo Tree Search agent... fitness... overall score after making a total of 20 turns... average length of legal available moves
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compare the performance... with the standard Vanilla Monte Carlo Tree Search implementation as well as to a random move-selection agent

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Automated Playtesting with Procedural Personas through MCTS with Evolved Heuristics

C. Holmg ˚ard, M. C. Green, A. Liapis, and J. Togelius, “Automated playtesting with procedural personas through MCTS with evolved heuristics,” CoRR, vol. abs/1802.06881, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Evolving personas for player decision modeling,

C. Holmg ˚ard, A. Liapis, J. Togelius, and G. N. Yannakakis, “Evolving personas for player decision modeling,” in 2014 IEEE Conference on Computational Intelligence and Games (CIG) , Aug 2014, pp. 1–8

work page 2014
[3]

Learning policies for ﬁrst person shooter games using inverse reinforcement learning,

B. Tastan and G. Sukthankar, “Learning policies for ﬁrst person shooter games using inverse reinforcement learning,” in Proceedings of the Seventh AAAI Conference on Artiﬁcial Intelligence and Interactive Digital Entertainment, ser. AIIDE’11. AAAI Press, 2011, pp. 85–90. [Online]. Available: http://dl.acm.org/citation.cfm?id=3014589.3014604

work page arXiv 2011
[4]

Deﬁning personas in games using metrics,

A. Tychsen and A. Canossa, “Deﬁning personas in games using metrics,” in Proceedings of the 2008 Conference on Future Play: Research, Play, Share, ser. Future Play ’08. New York, NY , USA: ACM, 2008, pp. 73–

work page 2008
[5]

Available: http://doi.acm.org/10.1145/1496984.1496997

[Online]. Available: http://doi.acm.org/10.1145/1496984.1496997

work page doi:10.1145/1496984.1496997
[6]

Patterns of play: Play-personas in user- centred game development,

A. Drachen and A. Canossa, “Patterns of play: Play-personas in user- centred game development,” in Proceedings of DiGRA 2009 . DIGRA, 2009

work page 2009
[7]

Generative agents for player decision modeling in games,

C. Holmg ˚ard, A. Liapis, J. Togelius, and G. N. Yannakakis, “Generative agents for player decision modeling in games,” in Poster Proceedings of the 9th Conference on the Foundations of Digital Games (FDG) , 2014

work page 2014
[8]

A survey of monte carlo tree search methods,

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. Cowling, P. Rohlf- shagen, S. Tavener, D. Perez Liebana, S. Samothrakis, and S. Colton, “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games (TCIAIG) , vol. 4:1, pp. 1–43, 03 2012

work page 2012
[9]

Evomcts: Enhancing mcts-based players through genetic programming,

A. Benbassat and M. Sipper, “Evomcts: Enhancing mcts-based players through genetic programming,” in 2013 IEEE Conference on Computa- tional Intelligence in Games (CIG) , Aug 2013, pp. 1–8

work page 2013
[10]

Evolving monte-carlo tree search algorithms, dept,

T. Cazenave, “Evolving monte-carlo tree search algorithms, dept,” Inf., Univ. Paris, p. 2007

work page 2007
[11]

G. N. Yannakakis and J. Togelius, Artiﬁcial Intelligence and Games . Springer, 2018, http://gameaibook.org

work page 2018
[12]

Bandit based monte-carlo planning,

L. Kocsis and C. Szepesv ´ari, “Bandit based monte-carlo planning,” in Machine Learning: ECML 2006 , J. F ¨urnkranz, T. Scheffer, and M. Spiliopoulou, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 282–293

work page 2006

[1] [1]

Automated Playtesting with Procedural Personas through MCTS with Evolved Heuristics

C. Holmg ˚ard, M. C. Green, A. Liapis, and J. Togelius, “Automated playtesting with procedural personas through MCTS with evolved heuristics,” CoRR, vol. abs/1802.06881, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Evolving personas for player decision modeling,

C. Holmg ˚ard, A. Liapis, J. Togelius, and G. N. Yannakakis, “Evolving personas for player decision modeling,” in 2014 IEEE Conference on Computational Intelligence and Games (CIG) , Aug 2014, pp. 1–8

work page 2014

[3] [3]

Learning policies for ﬁrst person shooter games using inverse reinforcement learning,

B. Tastan and G. Sukthankar, “Learning policies for ﬁrst person shooter games using inverse reinforcement learning,” in Proceedings of the Seventh AAAI Conference on Artiﬁcial Intelligence and Interactive Digital Entertainment, ser. AIIDE’11. AAAI Press, 2011, pp. 85–90. [Online]. Available: http://dl.acm.org/citation.cfm?id=3014589.3014604

work page arXiv 2011

[4] [4]

Deﬁning personas in games using metrics,

A. Tychsen and A. Canossa, “Deﬁning personas in games using metrics,” in Proceedings of the 2008 Conference on Future Play: Research, Play, Share, ser. Future Play ’08. New York, NY , USA: ACM, 2008, pp. 73–

work page 2008

[5] [5]

Available: http://doi.acm.org/10.1145/1496984.1496997

[Online]. Available: http://doi.acm.org/10.1145/1496984.1496997

work page doi:10.1145/1496984.1496997

[6] [6]

Patterns of play: Play-personas in user- centred game development,

A. Drachen and A. Canossa, “Patterns of play: Play-personas in user- centred game development,” in Proceedings of DiGRA 2009 . DIGRA, 2009

work page 2009

[7] [7]

Generative agents for player decision modeling in games,

C. Holmg ˚ard, A. Liapis, J. Togelius, and G. N. Yannakakis, “Generative agents for player decision modeling in games,” in Poster Proceedings of the 9th Conference on the Foundations of Digital Games (FDG) , 2014

work page 2014

[8] [8]

A survey of monte carlo tree search methods,

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. Cowling, P. Rohlf- shagen, S. Tavener, D. Perez Liebana, S. Samothrakis, and S. Colton, “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games (TCIAIG) , vol. 4:1, pp. 1–43, 03 2012

work page 2012

[9] [9]

Evomcts: Enhancing mcts-based players through genetic programming,

A. Benbassat and M. Sipper, “Evomcts: Enhancing mcts-based players through genetic programming,” in 2013 IEEE Conference on Computa- tional Intelligence in Games (CIG) , Aug 2013, pp. 1–8

work page 2013

[10] [10]

Evolving monte-carlo tree search algorithms, dept,

T. Cazenave, “Evolving monte-carlo tree search algorithms, dept,” Inf., Univ. Paris, p. 2007

work page 2007

[11] [11]

G. N. Yannakakis and J. Togelius, Artiﬁcial Intelligence and Games . Springer, 2018, http://gameaibook.org

work page 2018

[12] [12]

Bandit based monte-carlo planning,

L. Kocsis and C. Szepesv ´ari, “Bandit based monte-carlo planning,” in Machine Learning: ECML 2006 , J. F ¨urnkranz, T. Scheffer, and M. Spiliopoulou, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 282–293

work page 2006