{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:63CLJGZZRI6Y3TGVIEFSI6W2DD","short_pith_number":"pith:63CLJGZZ","schema_version":"1.0","canonical_sha256":"f6c4b49b398a3d8dccd5410b247ada18cd11833be778faa25494b09179006849","source":{"kind":"arxiv","id":"2503.09572","version":3},"attestation_state":"computed","paper":{"title":"Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Plan-and-Act improves LLM agent performance on long-horizon tasks by separating planning from execution and training the planner with synthetic data.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Amir Gholami, Gopala Anumanchipalli, Hiroki Furuta, Kurt Keutzer, Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon","submitted_at":"2025-03-12T17:40:52Z","abstract_excerpt":"Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit p"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2503.09572","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2025-03-12T17:40:52Z","cross_cats_sorted":[],"title_canon_sha256":"e77e753965af9169f84a8430407792b6b1e43be96c8734ee8fc1b07e007e1c78","abstract_canon_sha256":"22c921f9cc8001db27ca5315ea28b031eece5810756a71c39c3e02c79adf1d3b"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:13.023188Z","signature_b64":"ISImmjQglJyklHWSG4vLGlk1Kn+keyCxeGZ4GZaRTn7/TiUeAPgK5XVwpQDgyOUf6M+5S4WuNUkY3zvACmLuAw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"f6c4b49b398a3d8dccd5410b247ada18cd11833be778faa25494b09179006849","last_reissued_at":"2026-05-17T23:38:13.022511Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:13.022511Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Plan-and-Act improves LLM agent performance on long-horizon tasks by separating planning from execution and training the planner with synthetic data.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Amir Gholami, Gopala Anumanchipalli, Hiroki Furuta, Kurt Keutzer, Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon","submitted_at":"2025-03-12T17:40:52Z","abstract_excerpt":"Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit p"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That annotating ground-truth trajectories with feasible plans and augmenting with diverse synthetic examples will produce plans that generalize to unseen tasks and environments without overfitting to the annotation process.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Plan-and-Act trains a dedicated Planner on synthetic plan-annotated trajectories to generate high-level plans that an Executor follows, reaching 57.58% success on WebArena-Lite and 81.36% on WebVoyager.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Plan-and-Act improves LLM agent performance on long-horizon tasks by separating planning from execution and training the planner with synthetic data.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"f93b97b200c11641ae18624ee628ff1b621961ec9ae96c613c38f2cdf579e437"},"source":{"id":"2503.09572","kind":"arxiv","version":3},"verdict":{"id":"4b2293e7-20f9-4b37-a66e-7a16f12cf8a1","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T21:27:57.581461Z","strongest_claim":"We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.","one_line_summary":"Plan-and-Act trains a dedicated Planner on synthetic plan-annotated trajectories to generate high-level plans that an Executor follows, reaching 57.58% success on WebArena-Lite and 81.36% on WebVoyager.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That annotating ground-truth trajectories with feasible plans and augmenting with diverse synthetic examples will produce plans that generalize to unseen tasks and environments without overfitting to the annotation process.","pith_extraction_headline":"Plan-and-Act improves LLM agent performance on long-horizon tasks by separating planning from execution and training the planner with synthetic data."},"references":{"count":101,"sample":[{"doi":"","year":2024,"title":"Agent-e: From autonomous web navigation to foundational design principles in agentic systems.ArXiv, abs/2407.13032","work_id":"0405533c-9300-42d1-89d0-d7d509ce813c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning","work_id":"fbe0b5bd-e928-4828-a7f5-fe7d0135ed13","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Web agents with world models: Learning and leveraging environment dynamics in web navigation","work_id":"ed121023-5e3f-4d27-bb09-1bf5b716c294","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Mind2web: Towards a generalist agent for the web","work_id":"18a218ea-52fc-4c84-b313-95d24b142f4c","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"E., Lee, N., Jha, S., Kim, S., Tabrizi, R., Moon, S., Hooper, C., Anumanchipalli, G., Keutzer, K., and Gholami, A","work_id":"cb7a8776-c507-490d-bef3-1b3be9936552","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":101,"snapshot_sha256":"aef6f3ea6a8b37d3a3af553e7f5f38ccd5a36b4ce569229864800c5f0f2a23c2","internal_anchors":15},"formal_canon":{"evidence_count":2,"snapshot_sha256":"f1b50e12109b64e6f363b7c24a2df44cf84269388518ae6bbb1d6c7bd639a553"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2503.09572","created_at":"2026-05-17T23:38:13.022638+00:00"},{"alias_kind":"arxiv_version","alias_value":"2503.09572v3","created_at":"2026-05-17T23:38:13.022638+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2503.09572","created_at":"2026-05-17T23:38:13.022638+00:00"},{"alias_kind":"pith_short_12","alias_value":"63CLJGZZRI6Y","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"63CLJGZZRI6Y3TGV","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"63CLJGZZ","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":17,"internal_anchor_count":17,"sample":[{"citing_arxiv_id":"2511.08947","citing_title":"AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2601.12538","citing_title":"Agentic Reasoning for Large Language Models","ref_index":96,"is_internal_anchor":true},{"citing_arxiv_id":"2512.09629","citing_title":"End-to-end PDDL Planning with Hardcoded and Dynamic Agents","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2602.04129","citing_title":"KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2603.00977","citing_title":"HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2603.09002","citing_title":"Security Considerations for Multi-agent Systems","ref_index":114,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14892","citing_title":"Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems","ref_index":126,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14563","citing_title":"Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2511.20857","citing_title":"Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory","ref_index":152,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12755","citing_title":"State-Centric Decision Process","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12004","citing_title":"Learning Agentic Policy from Action Guidance","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11514","citing_title":"FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27955","citing_title":"GUI Agents with Reinforcement Learning: Toward Digital Inhabitants","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23194","citing_title":"From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06642","citing_title":"StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2604.22446","citing_title":"From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07341","citing_title":"ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories","ref_index":17,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD","json":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD.json","graph_json":"https://pith.science/api/pith-number/63CLJGZZRI6Y3TGVIEFSI6W2DD/graph.json","events_json":"https://pith.science/api/pith-number/63CLJGZZRI6Y3TGVIEFSI6W2DD/events.json","paper":"https://pith.science/paper/63CLJGZZ"},"agent_actions":{"view_html":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD","download_json":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD.json","view_paper":"https://pith.science/paper/63CLJGZZ","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2503.09572&json=true","fetch_graph":"https://pith.science/api/pith-number/63CLJGZZRI6Y3TGVIEFSI6W2DD/graph.json","fetch_events":"https://pith.science/api/pith-number/63CLJGZZRI6Y3TGVIEFSI6W2DD/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD/action/timestamp_anchor","attest_storage":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD/action/storage_attestation","attest_author":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD/action/author_attestation","sign_citation":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD/action/citation_signature","submit_replication":"https://pith.science/pith/63CLJGZZRI6Y3TGVIEFSI6W2DD/action/replication_record"}},"created_at":"2026-05-17T23:38:13.022638+00:00","updated_at":"2026-05-17T23:38:13.022638+00:00"}