LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s O1 on Plan- Bench

Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati · 2024 · arXiv 2409.13373

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

cs.CL · 2026-01-09 · accept · novelty 7.0

OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated constraint formulation.

SYMBOLIZER: Symbolic Model-free Task Planning with VLMs

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

SYMBOLIZER grounds symbolic states from images via VLMs using only lifted predicates and solves long-horizon tasks with goal-count and width-based heuristic search, outperforming direct VLM planning and matching VLM-heuristic baselines on ProDG and ViPlan benchmarks.

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

cs.LG · 2025-08-22 · unverdicted · novelty 6.0

In a cellular automata rule-inference task designed to block memorization, neural models achieve high next-step accuracy but accuracy falls sharply with longer reasoning chains; depth, recurrence, memory, and test-time compute extend the reachable depth but do not remove the bound.

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

cs.AI · 2025-01-27 · unverdicted · novelty 5.0

A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

cs.AI · 2025-01-16 · unverdicted · novelty 3.0

The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

citing papers explorer

Showing 5 of 5 citing papers.

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling cs.CL · 2026-01-09 · accept · none · ref 32
OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated constraint formulation.
SYMBOLIZER: Symbolic Model-free Task Planning with VLMs cs.RO · 2026-04-20 · unverdicted · none · ref 18
SYMBOLIZER grounds symbolic states from images via VLMs using only lifted predicates and solves long-horizon tasks with goal-count and width-based heuristic search, outperforming direct VLM planning and matching VLM-heuristic baselines on ProDG and ViPlan benchmarks.
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling cs.LG · 2025-08-22 · unverdicted · none · ref 66
In a cellular automata rule-inference task designed to block memorization, neural models achieve high next-step accuracy but accuracy falls sharply with longer reasoning chains; depth, recurrence, memory, and test-time compute extend the reachable depth but do not remove the bound.
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions cs.AI · 2025-01-27 · unverdicted · none · ref 157
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models cs.AI · 2025-01-16 · unverdicted · none · ref 146
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s O1 on Plan- Bench

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer