Large language models still can’t plan (a benchmark for llms on planning and reasoning about change)

Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati · 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

cs.CL · 2026-01-09 · accept · novelty 7.0

OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated constraint formulation.

Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.

Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding

cs.AI · 2026-05-10 · unverdicted · novelty 3.0

Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.

citing papers explorer

Showing 3 of 3 citing papers.

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling cs.CL · 2026-01-09 · accept · none · ref 12
OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated constraint formulation.
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA cs.AI · 2026-05-05 · unverdicted · none · ref 82
Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding cs.AI · 2026-05-10 · unverdicted · none · ref 90
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.

Large language models still can’t plan (a benchmark for llms on planning and reasoning about change)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer