SokoBench: Evaluating long-horizon planning and reasoning in large language models.arXiv preprint arXiv:2601.20856,

Gianni Pellegrini Sebastiano Monti, Carlo Nicolini et al · arXiv 2601.20856

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

The Complexity Ceiling Benchmark: A Multi-Domain Evaluation of Sequential Reasoning Under Depth Scaling

cs.AI · 2026-06-28 · unverdicted · novelty 7.0

The Complexity Ceiling Benchmark demonstrates geometric per-step decay in LLM sequential reasoning with domain-specific performance ceilings and introduces a trace metric showing incorrect intermediate steps in some correct final answers.

citing papers explorer

Showing 1 of 1 citing paper.

The Complexity Ceiling Benchmark: A Multi-Domain Evaluation of Sequential Reasoning Under Depth Scaling cs.AI · 2026-06-28 · unverdicted · none · ref 6
The Complexity Ceiling Benchmark demonstrates geometric per-step decay in LLM sequential reasoning with domain-specific performance ceilings and introduces a trace metric showing incorrect intermediate steps in some correct final answers.

SokoBench: Evaluating long-horizon planning and reasoning in large language models.arXiv preprint arXiv:2601.20856,

fields

years

verdicts

representative citing papers

citing papers explorer