Procedural Generation of Initial States of Sokoban
Pith reviewed 2026-05-25 09:08 UTC · model grok-4.3
The pith
Beta generates solvable Sokoban initial states harder for a specialized solver than those designed by human experts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Beta, which employs hardness metrics based on pattern database heuristics and the use of novelty to improve exploration, generates initial states of Sokoban that are harder to solve by a specialized solver than those designed by human experts.
What carries the argument
Hardness metrics based on pattern database heuristics combined with novelty to guide search for initial states.
If this is right
- Automatically generated hard solvable states support training of both human players and machine learning systems for Sokoban.
- Planning systems can be evaluated using procedurally created test cases that exceed the difficulty of manually designed ones.
- Novelty search improves the exploration needed to locate hard yet solvable initial states.
- The approach scales the production of evaluation instances beyond what human designers can produce manually.
Where Pith is reading between the lines
- The same metrics could be tested on other state-space search problems such as different puzzle domains to check transferability.
- If the solver-based hardness correlates with human player experience, the generated states could serve as training levels in educational games.
- Training AI agents on Beta states might yield planners that generalize better to unseen hard instances.
Load-bearing premise
The hardness metrics derived from pattern database heuristics and novelty search provide a valid proxy for the actual difficulty experienced by the specialized solver used in the evaluation experiments.
What would settle it
Running the specialized solver on matched sets of Beta-generated and human-designed states and finding no significant increase in solving time or expanded nodes for the Beta states would falsify the claim.
Figures
read the original abstract
Procedural generation of initial states of state-space search problems have applications in human and machine learning as well as in the evaluation of planning systems. In this paper we deal with the task of generating hard and solvable initial states of Sokoban puzzles. We propose hardness metrics based on pattern database heuristics and the use of novelty to improve the exploration of search methods in the task of generating initial states. We then present a system called Beta that uses our hardness metrics and novelty to generate initial states. Experiments show that Beta is able to generate initial states that are harder to solve by a specialized solver than those designed by human experts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes hardness metrics for Sokoban initial states based on pattern-database heuristics combined with novelty search to guide procedural generation. It introduces the Beta system that applies these metrics to produce solvable yet difficult puzzles and claims that experiments demonstrate the generated states require more effort from a specialized solver than human-designed levels.
Significance. If the metrics are shown to correlate with actual solver difficulty, the work could supply a useful method for creating challenging benchmarks and training instances in automated planning and puzzle-solving domains.
major comments (2)
- [Experiments] Experiments section (as described in the abstract): the central claim that Beta states are harder for the specialized solver rests on the unvalidated assumption that PDB-heuristic plus novelty scores predict the solver's node expansions or runtime; no scatter plot, rank correlation, or held-out validation relating metric values to solver performance is reported.
- [Abstract] Abstract and evaluation description: metric definitions, the precise implementation of the specialized solver, choice of statistical tests, and controls against post-hoc selection of generated instances are not provided, leaving the reported superiority over human-designed states without a reproducible basis.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight important aspects of validation and reproducibility that will strengthen the paper. We address each major comment below and will incorporate the necessary changes.
read point-by-point responses
-
Referee: [Experiments] Experiments section (as described in the abstract): the central claim that Beta states are harder for the specialized solver rests on the unvalidated assumption that PDB-heuristic plus novelty scores predict the solver's node expansions or runtime; no scatter plot, rank correlation, or held-out validation relating metric values to solver performance is reported.
Authors: The manuscript reports a direct empirical comparison showing that Beta-generated states require more node expansions and runtime from the specialized solver than human-designed levels. We agree, however, that an explicit validation of the metrics themselves (e.g., correlation with solver effort) is absent. In the revision we will add a dedicated validation subsection containing scatter plots, Spearman rank correlations, and held-out evaluation relating the combined PDB+novelty scores to solver performance. revision: yes
-
Referee: [Abstract] Abstract and evaluation description: metric definitions, the precise implementation of the specialized solver, choice of statistical tests, and controls against post-hoc selection of generated instances are not provided, leaving the reported superiority over human-designed states without a reproducible basis.
Authors: We accept that the current manuscript lacks sufficient detail for full reproducibility. The revised version will expand the methods section with precise definitions of the pattern-database heuristics and novelty measure, a complete description of the specialized solver (including heuristic implementation and search parameters), the exact statistical tests applied, and an account of the generation pipeline that includes safeguards against post-hoc instance selection. revision: yes
Circularity Check
No significant circularity; experimental claim independent of generation metrics
full rationale
The paper defines hardness metrics (pattern database heuristics plus novelty) to guide procedural generation of Sokoban states via the Beta system, then evaluates those states separately with a specialized solver's runtime and node expansions. No equation or step equates the final measured outcome to the generation metrics by construction; the evaluation uses an external solver whose behavior is not defined in terms of the PDB/novelty scores. No self-citations, fitted parameters renamed as predictions, or uniqueness theorems are invoked in a load-bearing way. The central result is an empirical comparison against human-designed puzzles, which remains falsifiable and does not reduce to a tautology within the paper's own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
[Anderson et al., 1985] John R Anderson, C Franklin Boyle, and Brian J Reiser. Intelligent tutoring systems. Science, 228:456– 462,
work page 1985
-
[2]
Learning heuristic functions for large state spaces
[Arfaee et al., 2011] Shahab Jabbari Arfaee, Sandra Zilles, and Robert Craig Holte. Learning heuristic functions for large state spaces. Artificial Intelligence, 175:2075–2098,
work page 2011
-
[3]
Culberson and Jonathan Schaeffer
[Culberson and Schaeffer, 1996] Joseph C. Culberson and Jonathan Schaeffer. Searching with pattern databases. In Canadian Con- ference on Artificial Intelligence, pages 402–416,
work page 1996
- [4]
-
[5]
Ex- periments with the graph traverser program
[Doran and Michie, 1966] Jim E Doran and Donald Michie. Ex- periments with the graph traverser program. In Royal Society of London A, volume 294, pages 235–259. The Royal Society,
work page 1966
-
[6]
Planning with pattern databases
[Edelkamp, 2001] Stefan Edelkamp. Planning with pattern databases. In European Conference on Planning , pages 13–24,
work page 2001
-
[7]
[Felner et al., 2004] Ariel Felner, Richard E. Korf, and Sarit Hanan. Additive pattern database heuristics. Journal of Artificial Intelli- gence Research, 22:279–318,
work page 2004
-
[8]
The ff planning system: Fast plan generation through heuristic search
[Hoffmann and Nebel, 2001] J¨org Hoffmann and Bernhard Nebel. The ff planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research , 14(1):253– 302,
work page 2001
-
[9]
Holte, Ariel Felner, Jack Newton, Ram Meshulam, and David Furcy
[Holte et al., 2006] Robert C. Holte, Ariel Felner, Jack Newton, Ram Meshulam, and David Furcy. Maximizing over multiple pat- tern databases speeds up heuristic search. Artificial Intelligence, 170(16–17):1123–1136,
work page 2006
-
[10]
Bidirectional search that is guaranteed to meet in the middle
[Holte et al., 2016] Robert C Holte, Ariel Felner, Guni Sharon, and Nathan R Sturtevant. Bidirectional search that is guaranteed to meet in the middle. InAAAI Conference on Artificial Intelligence, volume 16, pages 3411–3417,
work page 2016
-
[11]
Dif- ficulty rating of sokoban puzzle
[Jaruˇsek and Pel´anek, 2010] Petr Jaruˇsek and Radek Pel ´anek. Dif- ficulty rating of sokoban puzzle. In Starting AI Researcher Sym- posium, pages 140–150,
work page 2010
-
[12]
Sokoban: Enhancing general single-agent search methods using domain knowledge
[Junghanns and Schaeffer, 2001] Andreas Junghanns and Jonathan Schaeffer. Sokoban: Enhancing general single-agent search methods using domain knowledge. Artificial Intelligence , 129:219–251,
work page 2001
-
[13]
Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation
[Justesen et al., 2018] Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. Procedural level generation improves generality of deep re- inforcement learning. CoRR, abs/1806.10729,
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[14]
Data-driven sokoban puzzle generation with monte carlo tree search
[Kartal et al., 2016] Bilal Kartal, Nick Sohre, and Stephen J Guy. Data-driven sokoban puzzle generation with monte carlo tree search. In AAAI Conference on Artificial Intelligence and In- teractive Digital Entertainment,
work page 2016
-
[15]
[Korf, 1985] Richard E. Korf. Depth-first iterative-deepening: An optimal admissible tree earch. Artificial Intelligence, 27(1):97– 109,
work page 1985
-
[16]
[Lelis et al., 2013] Levi H. S. Lelis, Sandra Zilles, and Robert C. Holte. Predicting the Size of IDA*’s Search Tree. Artificial In- telligence, pages 53–76,
work page 2013
-
[17]
[Lelis et al., 2016] Levi H. S. Lelis, Roni Tzvi Stern, Shahab Jab- bari Arfaee, Sandra Zilles, Ariel Felner, and Robert Craig Holte. Predicting optimal solution costs with bidirectional stratified sampling in regular search spaces.Artificial Intelligence, 230:51– 73,
work page 2016
-
[18]
Width and serialization of classical planning prob- lems
[Lipovetzky and Geffner, 2012] Nir Lipovetzky and H ´ector Geffner. Width and serialization of classical planning prob- lems. In European Conference on Artificial Intelligence , pages 540–545. IOS Press,
work page 2012
-
[19]
Best-first width search: Exploration and exploitation in classical planning
[Lipovetzky and Geffner, 2017] Nir Lipovetzky and Hector Geffner. Best-first width search: Exploration and exploitation in classical planning. In AAAI Conference on Artificial Intelligence, pages 3590–3596,
work page 2017
-
[20]
[Mari˜no and Lelis, 2016] Julian R. H Mari ˜no and Levi H. S. Lelis. A computational model based on symmetry for generating vi- sually pleasing maps of platform games. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , pages 65–71,
work page 2016
-
[21]
Automatic making of sokoban problems
[Murase et al., 1996] Yoshio Murase, Hitoshi Matsubara, and Yuzuru Hiraga. Automatic making of sokoban problems. PRI- CAI: Topics in Artificial Intelligence, pages 592–600,
work page 1996
-
[22]
[Orseau et al., 2018] Laurent Orseau, Levi H. S. Lelis, Tor Latti- more, and Th ´eophane Weber. Single-agent policy tree search with guarantees. In Advances in Neural Information Processing Systems, pages 3205–3215,
work page 2018
-
[23]
Optimal sokoban solving using pattern databases with specific domain knowledge
[Pereira et al., 2015] Andr´e G Pereira, Marcus Ritt, and Luciana S Buriol. Optimal sokoban solving using pattern databases with specific domain knowledge. Artificial Intelligence, 227:52–70,
work page 2015
-
[24]
Smith, Luke Zettlemoyer, Sumit Gulwani, and Zoran Popovic
[Polozov et al., 2015] Oleksandr Polozov, Eleanor O’Rourke, Adam M. Smith, Luke Zettlemoyer, Sumit Gulwani, and Zoran Popovic. Personalized mathematical word problem generation. In International Joint Conference on Artificial Intelligence , pages 381–388. AAAI,
work page 2015
-
[25]
Getting the most out of pattern databases for classical planning
[Pommerening et al., 2013] Florian Pommerening, Gabriele R¨oger, and Malte Helmert. Getting the most out of pattern databases for classical planning. In International Joint Conference on Artificial Intelligence,
work page 2013
-
[26]
Imagination-augmented agents for deep reinforcement learning
[Racani`ere et al., 2017] S´ebastien Racani `ere, Th ´eophane Weber, David Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adri `a Puigdom `enech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmented agents for deep reinforcement learning. In Advances in Neural Information Pro- cessing Systems, pages 5690–5701,
work page 2017
-
[27]
[Smith and Mateas, 2011] Adam M. Smith and Michael Mateas. Answer set programming for procedural content generation: A design space approach. IEEE Transactions on Computational In- telligence and AI in Games, 3(3):187–200,
work page 2011
-
[28]
Player movement models for video game level genera- tion
[Snodgrass and Onta˜n´on, 2017] Sam Snodgrass and Santiago Onta˜n´on. Player movement models for video game level genera- tion. In International Joint Conference on Artificial Intelligence, pages 757–763,
work page 2017
-
[29]
Sturtevant and Matheus Jun Ota
[Sturtevant and Ota, 2018] Nathan R. Sturtevant and Matheus Jun Ota. Exhaustive and semi-exhaustive procedural content genera- tion. In AAAI Conference on Artificial Intelligence and Interac- tive Digital Entertainment, pages 109–115,
work page 2018
-
[30]
Proce- dural generation of sokoban levels
[Taylor and Parberry, 2011] Joshua Taylor and Ian Parberry. Proce- dural generation of sokoban levels. In International North Amer- ican Conference on Intelligent Games and Simulation , pages 5– 12, 2011
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.