The Complexity Ceiling Benchmark demonstrates geometric per-step decay in LLM sequential reasoning with domain-specific performance ceilings and introduces a trace metric showing incorrect intermediate steps in some correct final answers.
SokoBench: Evaluating long-horizon planning and reasoning in large language models.arXiv preprint arXiv:2601.20856,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
The Complexity Ceiling Benchmark: A Multi-Domain Evaluation of Sequential Reasoning Under Depth Scaling
The Complexity Ceiling Benchmark demonstrates geometric per-step decay in LLM sequential reasoning with domain-specific performance ceilings and introduces a trace metric showing incorrect intermediate steps in some correct final answers.