pith. sign in

arxiv: 1907.02548 · v1 · pith:R33OME5Snew · submitted 2019-07-04 · 💻 cs.AI

Procedural Generation of Initial States of Sokoban

Pith reviewed 2026-05-25 09:08 UTC · model grok-4.3

classification 💻 cs.AI
keywords Sokobanprocedural generationhardness metricspattern database heuristicsnovelty searchplanning systemsinitial state generation
0
0 comments X

The pith

Beta generates solvable Sokoban initial states harder for a specialized solver than those designed by human experts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a procedural method to create initial states for Sokoban that are both solvable and difficult for planning systems. It defines hardness metrics using pattern database heuristics to estimate solution cost and incorporates novelty to guide search toward unexplored configurations. A system named Beta applies these metrics to produce candidate states. Experiments evaluate the generated states against human-designed ones by measuring the effort required by a specialized solver. This supports uses in training learners and testing planners on challenging instances.

Core claim

Beta, which employs hardness metrics based on pattern database heuristics and the use of novelty to improve exploration, generates initial states of Sokoban that are harder to solve by a specialized solver than those designed by human experts.

What carries the argument

Hardness metrics based on pattern database heuristics combined with novelty to guide search for initial states.

If this is right

  • Automatically generated hard solvable states support training of both human players and machine learning systems for Sokoban.
  • Planning systems can be evaluated using procedurally created test cases that exceed the difficulty of manually designed ones.
  • Novelty search improves the exploration needed to locate hard yet solvable initial states.
  • The approach scales the production of evaluation instances beyond what human designers can produce manually.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metrics could be tested on other state-space search problems such as different puzzle domains to check transferability.
  • If the solver-based hardness correlates with human player experience, the generated states could serve as training levels in educational games.
  • Training AI agents on Beta states might yield planners that generalize better to unseen hard instances.

Load-bearing premise

The hardness metrics derived from pattern database heuristics and novelty search provide a valid proxy for the actual difficulty experienced by the specialized solver used in the evaluation experiments.

What would settle it

Running the specialized solver on matched sets of Beta-generated and human-designed states and finding no significant increase in solving time or expanded nodes for the Beta states would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.02548 by Andr\'e G. Pereira, D\^amaris S. Bento, Levi H. S. Lelis.

Figure 1
Figure 1. Figure 1: Example adapted from Pommerening et al. [2013]. Pe has three variables that can be assigned a value in {0, 1, 2, 3, 4}. The initial state assigns 0 to all variables, and the goal state has all variables with the value of 3. Pe has two types of actions incvi x and jumpvi . Actions incvi x change the value of the variable vi from x to x + 1. Actions jumpvi change the value of variable vi from 0 to 3, as long… view at source ↗
Figure 2
Figure 2. Figure 2: Sokoban problem. We hypothesize that novelty can improve the exploration of the search performed by β, thus allowing it to visit a di￾verse set of states from which one can select initial states that maximize solution length and number of conflicts. 6 Difficulty Metrics for Humans in Sokoban Jarusek and Pel ˇ anek ´ [2010] performed an extensive user study showing strong positive correlations between metri… view at source ↗
Figure 3
Figure 3. Figure 3: Initial states of Sokoban generated by our best variant of [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Procedural generation of initial states of state-space search problems have applications in human and machine learning as well as in the evaluation of planning systems. In this paper we deal with the task of generating hard and solvable initial states of Sokoban puzzles. We propose hardness metrics based on pattern database heuristics and the use of novelty to improve the exploration of search methods in the task of generating initial states. We then present a system called Beta that uses our hardness metrics and novelty to generate initial states. Experiments show that Beta is able to generate initial states that are harder to solve by a specialized solver than those designed by human experts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes hardness metrics for Sokoban initial states based on pattern-database heuristics combined with novelty search to guide procedural generation. It introduces the Beta system that applies these metrics to produce solvable yet difficult puzzles and claims that experiments demonstrate the generated states require more effort from a specialized solver than human-designed levels.

Significance. If the metrics are shown to correlate with actual solver difficulty, the work could supply a useful method for creating challenging benchmarks and training instances in automated planning and puzzle-solving domains.

major comments (2)
  1. [Experiments] Experiments section (as described in the abstract): the central claim that Beta states are harder for the specialized solver rests on the unvalidated assumption that PDB-heuristic plus novelty scores predict the solver's node expansions or runtime; no scatter plot, rank correlation, or held-out validation relating metric values to solver performance is reported.
  2. [Abstract] Abstract and evaluation description: metric definitions, the precise implementation of the specialized solver, choice of statistical tests, and controls against post-hoc selection of generated instances are not provided, leaving the reported superiority over human-designed states without a reproducible basis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important aspects of validation and reproducibility that will strengthen the paper. We address each major comment below and will incorporate the necessary changes.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (as described in the abstract): the central claim that Beta states are harder for the specialized solver rests on the unvalidated assumption that PDB-heuristic plus novelty scores predict the solver's node expansions or runtime; no scatter plot, rank correlation, or held-out validation relating metric values to solver performance is reported.

    Authors: The manuscript reports a direct empirical comparison showing that Beta-generated states require more node expansions and runtime from the specialized solver than human-designed levels. We agree, however, that an explicit validation of the metrics themselves (e.g., correlation with solver effort) is absent. In the revision we will add a dedicated validation subsection containing scatter plots, Spearman rank correlations, and held-out evaluation relating the combined PDB+novelty scores to solver performance. revision: yes

  2. Referee: [Abstract] Abstract and evaluation description: metric definitions, the precise implementation of the specialized solver, choice of statistical tests, and controls against post-hoc selection of generated instances are not provided, leaving the reported superiority over human-designed states without a reproducible basis.

    Authors: We accept that the current manuscript lacks sufficient detail for full reproducibility. The revised version will expand the methods section with precise definitions of the pattern-database heuristics and novelty measure, a complete description of the specialized solver (including heuristic implementation and search parameters), the exact statistical tests applied, and an account of the generation pipeline that includes safeguards against post-hoc instance selection. revision: yes

Circularity Check

0 steps flagged

No significant circularity; experimental claim independent of generation metrics

full rationale

The paper defines hardness metrics (pattern database heuristics plus novelty) to guide procedural generation of Sokoban states via the Beta system, then evaluates those states separately with a specialized solver's runtime and node expansions. No equation or step equates the final measured outcome to the generation metrics by construction; the evaluation uses an external solver whose behavior is not defined in terms of the PDB/novelty scores. No self-citations, fitted parameters renamed as predictions, or uniqueness theorems are invoked in a load-bearing way. The central result is an empirical comparison against human-designed puzzles, which remains falsifiable and does not reduce to a tautology within the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5632 in / 1138 out tokens · 30373 ms · 2026-05-25T09:08:40.180663+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

  1. [1]

    Intelligent tutoring systems

    [Anderson et al., 1985] John R Anderson, C Franklin Boyle, and Brian J Reiser. Intelligent tutoring systems. Science, 228:456– 462,

  2. [2]

    Learning heuristic functions for large state spaces

    [Arfaee et al., 2011] Shahab Jabbari Arfaee, Sandra Zilles, and Robert Craig Holte. Learning heuristic functions for large state spaces. Artificial Intelligence, 175:2075–2098,

  3. [3]

    Culberson and Jonathan Schaeffer

    [Culberson and Schaeffer, 1996] Joseph C. Culberson and Jonathan Schaeffer. Searching with pattern databases. In Canadian Con- ference on Artificial Intelligence, pages 402–416,

  4. [4]

    Culberson

    [Culberson, 1999] Joseph C. Culberson. Sokoban is PSPACE- Complete. In Fun With Algorithms, pages 65–76,

  5. [5]

    Ex- periments with the graph traverser program

    [Doran and Michie, 1966] Jim E Doran and Donald Michie. Ex- periments with the graph traverser program. In Royal Society of London A, volume 294, pages 235–259. The Royal Society,

  6. [6]

    Planning with pattern databases

    [Edelkamp, 2001] Stefan Edelkamp. Planning with pattern databases. In European Conference on Planning , pages 13–24,

  7. [7]

    Korf, and Sarit Hanan

    [Felner et al., 2004] Ariel Felner, Richard E. Korf, and Sarit Hanan. Additive pattern database heuristics. Journal of Artificial Intelli- gence Research, 22:279–318,

  8. [8]

    The ff planning system: Fast plan generation through heuristic search

    [Hoffmann and Nebel, 2001] J¨org Hoffmann and Bernhard Nebel. The ff planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research , 14(1):253– 302,

  9. [9]

    Holte, Ariel Felner, Jack Newton, Ram Meshulam, and David Furcy

    [Holte et al., 2006] Robert C. Holte, Ariel Felner, Jack Newton, Ram Meshulam, and David Furcy. Maximizing over multiple pat- tern databases speeds up heuristic search. Artificial Intelligence, 170(16–17):1123–1136,

  10. [10]

    Bidirectional search that is guaranteed to meet in the middle

    [Holte et al., 2016] Robert C Holte, Ariel Felner, Guni Sharon, and Nathan R Sturtevant. Bidirectional search that is guaranteed to meet in the middle. InAAAI Conference on Artificial Intelligence, volume 16, pages 3411–3417,

  11. [11]

    Dif- ficulty rating of sokoban puzzle

    [Jaruˇsek and Pel´anek, 2010] Petr Jaruˇsek and Radek Pel ´anek. Dif- ficulty rating of sokoban puzzle. In Starting AI Researcher Sym- posium, pages 140–150,

  12. [12]

    Sokoban: Enhancing general single-agent search methods using domain knowledge

    [Junghanns and Schaeffer, 2001] Andreas Junghanns and Jonathan Schaeffer. Sokoban: Enhancing general single-agent search methods using domain knowledge. Artificial Intelligence , 129:219–251,

  13. [13]

    Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation

    [Justesen et al., 2018] Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. Procedural level generation improves generality of deep re- inforcement learning. CoRR, abs/1806.10729,

  14. [14]

    Data-driven sokoban puzzle generation with monte carlo tree search

    [Kartal et al., 2016] Bilal Kartal, Nick Sohre, and Stephen J Guy. Data-driven sokoban puzzle generation with monte carlo tree search. In AAAI Conference on Artificial Intelligence and In- teractive Digital Entertainment,

  15. [15]

    [Korf, 1985] Richard E. Korf. Depth-first iterative-deepening: An optimal admissible tree earch. Artificial Intelligence, 27(1):97– 109,

  16. [16]

    [Lelis et al., 2013] Levi H. S. Lelis, Sandra Zilles, and Robert C. Holte. Predicting the Size of IDA*’s Search Tree. Artificial In- telligence, pages 53–76,

  17. [17]

    [Lelis et al., 2016] Levi H. S. Lelis, Roni Tzvi Stern, Shahab Jab- bari Arfaee, Sandra Zilles, Ariel Felner, and Robert Craig Holte. Predicting optimal solution costs with bidirectional stratified sampling in regular search spaces.Artificial Intelligence, 230:51– 73,

  18. [18]

    Width and serialization of classical planning prob- lems

    [Lipovetzky and Geffner, 2012] Nir Lipovetzky and H ´ector Geffner. Width and serialization of classical planning prob- lems. In European Conference on Artificial Intelligence , pages 540–545. IOS Press,

  19. [19]

    Best-first width search: Exploration and exploitation in classical planning

    [Lipovetzky and Geffner, 2017] Nir Lipovetzky and Hector Geffner. Best-first width search: Exploration and exploitation in classical planning. In AAAI Conference on Artificial Intelligence, pages 3590–3596,

  20. [20]

    H Mari ˜no and Levi H

    [Mari˜no and Lelis, 2016] Julian R. H Mari ˜no and Levi H. S. Lelis. A computational model based on symmetry for generating vi- sually pleasing maps of platform games. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , pages 65–71,

  21. [21]

    Automatic making of sokoban problems

    [Murase et al., 1996] Yoshio Murase, Hitoshi Matsubara, and Yuzuru Hiraga. Automatic making of sokoban problems. PRI- CAI: Topics in Artificial Intelligence, pages 592–600,

  22. [22]

    [Orseau et al., 2018] Laurent Orseau, Levi H. S. Lelis, Tor Latti- more, and Th ´eophane Weber. Single-agent policy tree search with guarantees. In Advances in Neural Information Processing Systems, pages 3205–3215,

  23. [23]

    Optimal sokoban solving using pattern databases with specific domain knowledge

    [Pereira et al., 2015] Andr´e G Pereira, Marcus Ritt, and Luciana S Buriol. Optimal sokoban solving using pattern databases with specific domain knowledge. Artificial Intelligence, 227:52–70,

  24. [24]

    Smith, Luke Zettlemoyer, Sumit Gulwani, and Zoran Popovic

    [Polozov et al., 2015] Oleksandr Polozov, Eleanor O’Rourke, Adam M. Smith, Luke Zettlemoyer, Sumit Gulwani, and Zoran Popovic. Personalized mathematical word problem generation. In International Joint Conference on Artificial Intelligence , pages 381–388. AAAI,

  25. [25]

    Getting the most out of pattern databases for classical planning

    [Pommerening et al., 2013] Florian Pommerening, Gabriele R¨oger, and Malte Helmert. Getting the most out of pattern databases for classical planning. In International Joint Conference on Artificial Intelligence,

  26. [26]

    Imagination-augmented agents for deep reinforcement learning

    [Racani`ere et al., 2017] S´ebastien Racani `ere, Th ´eophane Weber, David Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adri `a Puigdom `enech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmented agents for deep reinforcement learning. In Advances in Neural Information Pro- cessing Systems, pages 5690–5701,

  27. [27]

    Smith and Michael Mateas

    [Smith and Mateas, 2011] Adam M. Smith and Michael Mateas. Answer set programming for procedural content generation: A design space approach. IEEE Transactions on Computational In- telligence and AI in Games, 3(3):187–200,

  28. [28]

    Player movement models for video game level genera- tion

    [Snodgrass and Onta˜n´on, 2017] Sam Snodgrass and Santiago Onta˜n´on. Player movement models for video game level genera- tion. In International Joint Conference on Artificial Intelligence, pages 757–763,

  29. [29]

    Sturtevant and Matheus Jun Ota

    [Sturtevant and Ota, 2018] Nathan R. Sturtevant and Matheus Jun Ota. Exhaustive and semi-exhaustive procedural content genera- tion. In AAAI Conference on Artificial Intelligence and Interac- tive Digital Entertainment, pages 109–115,

  30. [30]

    Proce- dural generation of sokoban levels

    [Taylor and Parberry, 2011] Joshua Taylor and Ian Parberry. Proce- dural generation of sokoban levels. In International North Amer- ican Conference on Intelligent Games and Simulation , pages 5– 12, 2011