Frontier LLMs show uneven zero-shot performance on goal recognition in PDDL domains: some scale with accumulating evidence toward landmark-based accuracy while others stay anchored to world-knowledge priors.
hub
Llms can’t plan, but can help planning in llm-modulo frameworks
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 11roles
background 2polarities
background 2representative citing papers
Spatial-Gym benchmark shows the best tested model solves only 16% of pathfinding tasks versus 98% for humans, with step-by-step and backtracking formats producing mixed effects across model strengths.
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
Novelty estimation via LLM prompts enables pruning in Tree-of-Thought search, reducing overall token usage on language planning benchmarks.
LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.
U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.
ValuePlanner is a hierarchical architecture that uses LLMs to generate value-based subgoals and PDDL planners to produce executable actions, enabling self-directed behavior in embodied agents.
A hybrid system augments LLMs with an automated external RDF/OWL ontology layer for long-term memory, SHACL/OWL validation, and improved multi-step reasoning on tasks like Tower of Hanoi.
An end-to-end LLM framework refines natural language into valid PDDL domains and problems via hardcoded and dynamic agents, generates plans with standard engines, and returns readable output.
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.
citing papers explorer
-
Zero-Shot Goal Recognition with Large Language Models
Frontier LLMs show uneven zero-shot performance on goal recognition in PDDL domains: some scale with accumulating evidence toward landmark-based accuracy while others stay anchored to world-knowledge priors.
-
Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym
Spatial-Gym benchmark shows the best tested model solves only 16% of pathfinding tasks versus 98% for humans, with step-by-step and backtracking formats producing mixed effects across model strengths.
-
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
-
Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning
Novelty estimation via LLM prompts enables pruning in Tree-of-Thought search, reducing overall token usage on language planning benchmarks.
-
Do LLMs have core beliefs?
LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.
-
U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.
-
Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents
ValuePlanner is a hierarchical architecture that uses LLMs to generate value-based subgoals and PDDL planners to produce executable actions, enabling self-directed behavior in embodied agents.
-
Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems
A hybrid system augments LLMs with an automated external RDF/OWL ontology layer for long-term memory, SHACL/OWL validation, and improved multi-step reasoning on tasks like Tower of Hanoi.
-
End-to-end PDDL Planning with Hardcoded and Dynamic Agents
An end-to-end LLM framework refines natural language into valid PDDL domains and problems via hardcoded and dynamic agents, generates plans with standard engines, and returns readable output.
-
LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.