{"paper":{"title":"Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Plan-and-Act improves LLM agent performance on long-horizon tasks by separating planning from execution and training the planner with synthetic data.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Amir Gholami, Gopala Anumanchipalli, Hiroki Furuta, Kurt Keutzer, Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon","submitted_at":"2025-03-12T17:40:52Z","abstract_excerpt":"Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit p"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That annotating ground-truth trajectories with feasible plans and augmenting with diverse synthetic examples will produce plans that generalize to unseen tasks and environments without overfitting to the annotation process.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Plan-and-Act trains a dedicated Planner on synthetic plan-annotated trajectories to generate high-level plans that an Executor follows, reaching 57.58% success on WebArena-Lite and 81.36% on WebVoyager.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Plan-and-Act improves LLM agent performance on long-horizon tasks by separating planning from execution and training the planner with synthetic data.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"f93b97b200c11641ae18624ee628ff1b621961ec9ae96c613c38f2cdf579e437"},"source":{"id":"2503.09572","kind":"arxiv","version":3},"verdict":{"id":"4b2293e7-20f9-4b37-a66e-7a16f12cf8a1","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T21:27:57.581461Z","strongest_claim":"We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.","one_line_summary":"Plan-and-Act trains a dedicated Planner on synthetic plan-annotated trajectories to generate high-level plans that an Executor follows, reaching 57.58% success on WebArena-Lite and 81.36% on WebVoyager.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That annotating ground-truth trajectories with feasible plans and augmenting with diverse synthetic examples will produce plans that generalize to unseen tasks and environments without overfitting to the annotation process.","pith_extraction_headline":"Plan-and-Act improves LLM agent performance on long-horizon tasks by separating planning from execution and training the planner with synthetic data."},"references":{"count":101,"sample":[{"doi":"","year":2024,"title":"Agent-e: From autonomous web navigation to foundational design principles in agentic systems.ArXiv, abs/2407.13032","work_id":"0405533c-9300-42d1-89d0-d7d509ce813c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning","work_id":"fbe0b5bd-e928-4828-a7f5-fe7d0135ed13","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Web agents with world models: Learning and leveraging environment dynamics in web navigation","work_id":"ed121023-5e3f-4d27-bb09-1bf5b716c294","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Mind2web: Towards a generalist agent for the web","work_id":"18a218ea-52fc-4c84-b313-95d24b142f4c","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"E., Lee, N., Jha, S., Kim, S., Tabrizi, R., Moon, S., Hooper, C., Anumanchipalli, G., Keutzer, K., and Gholami, A","work_id":"cb7a8776-c507-490d-bef3-1b3be9936552","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":101,"snapshot_sha256":"aef6f3ea6a8b37d3a3af553e7f5f38ccd5a36b4ce569229864800c5f0f2a23c2","internal_anchors":15},"formal_canon":{"evidence_count":2,"snapshot_sha256":"f1b50e12109b64e6f363b7c24a2df44cf84269388518ae6bbb1d6c7bd639a553"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}