{"paper":{"title":"Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Plan-and-solve prompting divides tasks into subtasks before solving them to cut missing-step errors in zero-shot chain-of-thought reasoning.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Ee-Peng Lim, Lei Wang, Roy Ka-Wei Lee, Wanyu Xu, Yihuai Lan, Yunshi Lan, Zhiqiang Hu","submitted_at":"2023-05-06T16:34:37Z","abstract_excerpt":"Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with \"Let's think step by step\" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation erro"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the observed gains arise specifically from the plan-then-solve structure rather than from increased prompt length, additional instructions, or other uncontrolled prompt-engineering factors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Plan-and-Solve prompting improves zero-shot LLM reasoning by first creating an explicit plan then executing subtasks, outperforming simple 'think step by step' prompts across ten datasets.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Plan-and-solve prompting divides tasks into subtasks before solving them to cut missing-step errors in zero-shot chain-of-thought reasoning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8a80f66e3f1f9c1d7f1e6df22f2072c7b98d11e28e71f28b4759a3313000f6ff"},"source":{"id":"2305.04091","kind":"arxiv","version":3},"verdict":{"id":"6b747126-1475-4eac-bbaa-f08dac9ff3c0","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T08:40:06.485739Z","strongest_claim":"our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem.","one_line_summary":"Plan-and-Solve prompting improves zero-shot LLM reasoning by first creating an explicit plan then executing subtasks, outperforming simple 'think step by step' prompts across ten datasets.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the observed gains arise specifically from the plan-then-solve structure rather than from increased prompt length, additional instructions, or other uncontrolled prompt-engineering factors.","pith_extraction_headline":"Plan-and-solve prompting divides tasks into subtasks before solving them to cut missing-step errors in zero-shot chain-of-thought reasoning."},"references":{"count":25,"sample":[{"doi":"","year":2016,"title":"On the advance of making language models better reasoners.arXiv preprint arXiv:2206.02336, 2","work_id":"43823128-e3c9-4d2e-8e37-8e23c025e9d0","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"LLM+P: Empowering Large Language Models with Optimal Planning Proficiency","work_id":"7226ffd7-2787-4b30-823a-785e230e85d1","ref_index":2,"cited_arxiv_id":"2304.11477","is_internal_anchor":true},{"doi":"","year":2080,"title":"Measuring and Narrowing the Compositionality Gap in Language Models","work_id":"79aa4add-ff4a-4871-9c74-6f473b0579c1","ref_index":3,"cited_arxiv_id":"2210.03350","is_internal_anchor":true},{"doi":"","year":2023,"title":"LaMDA: Language Models for Dialog Applications","work_id":"1b66d0a5-f6ae-4332-8025-c662dc64b238","ref_index":4,"cited_arxiv_id":"2201.08239","is_internal_anchor":true},{"doi":"","year":null,"title":"Convert A cents to dollars","work_id":"20a509f4-362a-4a5c-9eed-0ef99405d4f6","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":25,"snapshot_sha256":"59d79be762bbc139cdb3f6637b76a61f5f6aa1629319a82a1837ed2383f76b4f","internal_anchors":3},"formal_canon":{"evidence_count":2,"snapshot_sha256":"f9983cbdb2bfc9b3fcf2048cba1b69baaa29a8d995e597ded4203c6dd90f6717"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}