hub Canonical reference

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli · 2025 · cs.CL · arXiv 2503.09572

Canonical reference. 92% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 92% of classified citations

open full Pith review browse 18 citing papers arXiv PDF

abstract

Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 12 baseline 1

citation-polarity summary

background 12 baseline 1

representative citing papers

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

cs.AI · 2026-05-14 · unverdicted · novelty 7.0 · 2 refs

A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.

Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

cs.SE · 2026-05-14 · unverdicted · novelty 7.0

MemDocAgent generates consistent hierarchical repository-level code documentation by combining dependency-aware traversal with memory-guided agent interactions that accumulate work traces.

State-Centric Decision Process

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.

Learning Agentic Policy from Action Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

cs.AI · 2026-04-24 · unverdicted · novelty 7.0

OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).

ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories

cs.SE · 2026-04-08 · unverdicted · novelty 7.0

ReCodeAgent uses a multi-agent system to translate and validate large code repositories across multiple programming languages, achieving 60.8% higher test pass rates than prior neuro-symbolic and agentic methods on 118 real-world projects.

Security Considerations for Multi-agent Systems

cs.CR · 2026-03-09 · unverdicted · novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

cs.AI · 2026-03-01 · unverdicted · novelty 6.0

HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.

KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning

cs.RO · 2026-02-04 · unverdicted · novelty 6.0

KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

cs.CL · 2025-11-25 · unverdicted · novelty 6.0

Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.

AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting

cs.AI · 2025-11-12 · conditional · novelty 6.0

AlphaCast is a training-free LLM framework that performs interactive multi-stage reasoning for time series forecasting by integrating feature extraction, knowledge bases, case libraries, and contextual pools.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

StraTA improves LLM agent success rates to 93.1% on ALFWorld and 84.2% on WebShop by sampling a compact initial strategy and training it jointly with action execution via hierarchical GRPO-style rollouts.

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents

cs.AI · 2026-04-25 · unverdicted · novelty 5.0

AdaPlan-H enables LLM agents to generate self-adaptive hierarchical plans that adjust detail level to task difficulty, improving success rates in multi-step tasks.

End-to-end PDDL Planning with Hardcoded and Dynamic Agents

cs.AI · 2025-12-10 · unverdicted · novelty 5.0

An end-to-end LLM framework refines natural language into valid PDDL domains and problems via hardcoded and dynamic agents, generates plans with standard engines, and returns readable output.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

citing papers explorer

Showing 18 of 18 citing papers.

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems cs.AI · 2026-05-14 · unverdicted · none · ref 126 · 2 links · internal anchor
A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.
Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation cs.SE · 2026-05-14 · unverdicted · none · ref 17 · internal anchor
MemDocAgent generates consistent hierarchical repository-level code documentation by combining dependency-aware traversal with memory-guided agent interactions that accumulate work traces.
State-Centric Decision Process cs.AI · 2026-05-12 · unverdicted · none · ref 11 · internal anchor
SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.
Learning Agentic Policy from Action Guidance cs.CL · 2026-05-12 · unverdicted · none · ref 16 · internal anchor
ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems cs.CR · 2026-05-12 · unverdicted · none · ref 14 · internal anchor
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company cs.AI · 2026-04-24 · unverdicted · none · ref 10 · internal anchor
OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).
ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories cs.SE · 2026-04-08 · unverdicted · none · ref 17 · internal anchor
ReCodeAgent uses a multi-agent system to translate and validate large code repositories across multiple programming languages, achieving 60.8% higher test pass rates than prior neuro-symbolic and agentic methods on 118 real-world projects.
Security Considerations for Multi-agent Systems cs.CR · 2026-03-09 · unverdicted · none · ref 114 · internal anchor
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents cs.AI · 2026-03-01 · unverdicted · none · ref 14 · internal anchor
HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning cs.RO · 2026-02-04 · unverdicted · none · ref 17 · internal anchor
KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 152 · internal anchor
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting cs.AI · 2025-11-12 · conditional · none · ref 6 · internal anchor
AlphaCast is a training-free LLM framework that performs interactive multi-stage reasoning for time series forecasting by integrating feature extraction, knowledge bases, case libraries, and contextual pools.
Code as Agent Harness cs.CL · 2026-05-18 · accept · none · ref 146 · internal anchor
A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction cs.CL · 2026-05-07 · unverdicted · none · ref 8 · internal anchor
StraTA improves LLM agent success rates to 93.1% on ALFWorld and 84.2% on WebShop by sampling a compact initial strategy and training it jointly with action execution via hierarchical GRPO-style rollouts.
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants cs.AI · 2026-04-30 · unverdicted · none · ref 23 · internal anchor
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.
From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents cs.AI · 2026-04-25 · unverdicted · none · ref 6 · internal anchor
AdaPlan-H enables LLM agents to generate self-adaptive hierarchical plans that adjust detail level to task difficulty, improving success rates in multi-step tasks.
End-to-end PDDL Planning with Hardcoded and Dynamic Agents cs.AI · 2025-12-10 · unverdicted · none · ref 8 · internal anchor
An end-to-end LLM framework refines natural language into valid PDDL domains and problems via hardcoded and dynamic agents, generates plans with standard engines, and returns readable output.
Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 96 · internal anchor
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer