{"work":{"id":"7e58c111-4666-4996-b5ad-1c8efd433083","openalex_id":null,"doi":null,"arxiv_id":"2205.10625","raw_key":null,"title":"Least-to-Most Prompting Enables Complex Reasoning in Large Language Models","authors":null,"authors_text":"Denny Zhou, Nathanael Sch\\\"arli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang","year":2022,"venue":"cs.AI","abstract":"Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.","external_url":"https://arxiv.org/abs/2205.10625","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T12:43:25.779858+00:00","pith_arxiv_id":"2205.10625","created_at":"2026-05-09T01:29:32.504922+00:00","updated_at":"2026-06-29T12:43:25.779858+00:00","title_quality_ok":true,"display_title":"Least-to-Most Prompting Enables Complex Reasoning in Large Language Models","render_title":"Least-to-Most Prompting Enables Complex Reasoning in Large Language Models"},"hub":{"state":{"work_id":"7e58c111-4666-4996-b5ad-1c8efd433083","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":92,"external_cited_by_count":null,"distinct_field_count":12,"first_pith_cited_at":"2022-06-15T17:32:01+00:00","last_pith_cited_at":"2026-05-31T11:20:00+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T13:38:55.771228+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":14},{"context_role":"method","n":4},{"context_role":"dataset","n":1},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":12},{"context_polarity":"use_method","n":4},{"context_polarity":"support","n":2},{"context_polarity":"unclear","n":1},{"context_polarity":"use_dataset","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T14:01:11.993160+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Self-Consistency Improves Chain of Thought Reasoning in Language Models","work_id":"8c6d5a6b-b5cc-4105-9c84-9c34bb9375bb","shared_citers":23},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":14},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":13},{"title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","work_id":"d1cf6693-a082-403c-ada9-dac7b96341f9","shared_citers":11},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":11},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":11},{"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","shared_citers":11},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":11},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":10},{"title":"Large Language Models are Zero-Shot Reasoners","work_id":"d9b7eb1a-7165-46ff-9f06-d2f0b9d6f95d","shared_citers":10},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":10},{"title":"Measuring Mathematical Problem Solving With the MATH Dataset","work_id":"50652ac6-fb7c-4675-a2c2-159c241feb17","shared_citers":10},{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":9},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":9},{"title":"Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks","work_id":"618aa44c-a6c6-425c-abce-8aa8aa842921","shared_citers":8},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":7},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":7},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":7},{"title":"Show Your Work: Scratchpads for Intermediate Computation with Language Models","work_id":"a05b1e60-8e76-4f26-9bea-28927a5f8620","shared_citers":7},{"title":"Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models","work_id":"bb63abb3-0d50-4362-b97c-b5e725b03b39","shared_citers":6},{"title":"LaMDA: Language Models for Dialog Applications","work_id":"1b66d0a5-f6ae-4332-8025-c662dc64b238","shared_citers":6},{"title":"Scaling Language Models: Methods, Analysis & Insights from Training Gopher","work_id":"47ce8be9-e500-407d-af41-ac2d132215eb","shared_citers":6},{"title":"Training language models to follow instructions with human feedback","work_id":"52aff42f-4fa9-4fcf-bdb3-1459b9bebf65","shared_citers":6},{"title":"arXiv preprint arXiv:2402.16837 , year=","work_id":"4ab404e2-abd3-493f-8e58-856cfffbf35f","shared_citers":5}],"time_series":[{"n":3,"year":2022},{"n":6,"year":2023},{"n":3,"year":2024},{"n":3,"year":2025},{"n":37,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T14:01:10.101191+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T14:01:16.277406+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Least-to-Most Prompting Enables Complex Reasoning in Large Language Models","claims":[{"claim_text":"Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental ","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Least-to-Most Prompting Enables Complex Reasoning in Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T14:01:22.995727+00:00"}},"summary":{"title":"Least-to-Most Prompting Enables Complex Reasoning in Large Language Models","claims":[{"claim_text":"Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental ","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Least-to-Most Prompting Enables Complex Reasoning in Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Self-Consistency Improves Chain of Thought Reasoning in Language Models","work_id":"8c6d5a6b-b5cc-4105-9c84-9c34bb9375bb","shared_citers":23},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":14},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":13},{"title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","work_id":"d1cf6693-a082-403c-ada9-dac7b96341f9","shared_citers":11},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":11},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":11},{"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","shared_citers":11},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":11},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":10},{"title":"Large Language Models are Zero-Shot Reasoners","work_id":"d9b7eb1a-7165-46ff-9f06-d2f0b9d6f95d","shared_citers":10},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":10},{"title":"Measuring Mathematical Problem Solving With the MATH Dataset","work_id":"50652ac6-fb7c-4675-a2c2-159c241feb17","shared_citers":10},{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":9},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":9},{"title":"Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks","work_id":"618aa44c-a6c6-425c-abce-8aa8aa842921","shared_citers":8},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":7},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":7},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":7},{"title":"Show Your Work: Scratchpads for Intermediate Computation with Language Models","work_id":"a05b1e60-8e76-4f26-9bea-28927a5f8620","shared_citers":7},{"title":"Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models","work_id":"bb63abb3-0d50-4362-b97c-b5e725b03b39","shared_citers":6},{"title":"LaMDA: Language Models for Dialog Applications","work_id":"1b66d0a5-f6ae-4332-8025-c662dc64b238","shared_citers":6},{"title":"Scaling Language Models: Methods, Analysis & Insights from Training Gopher","work_id":"47ce8be9-e500-407d-af41-ac2d132215eb","shared_citers":6},{"title":"Training language models to follow instructions with human feedback","work_id":"52aff42f-4fa9-4fcf-bdb3-1459b9bebf65","shared_citers":6},{"title":"arXiv preprint arXiv:2402.16837 , year=","work_id":"4ab404e2-abd3-493f-8e58-856cfffbf35f","shared_citers":5}],"time_series":[{"n":3,"year":2022},{"n":6,"year":2023},{"n":3,"year":2024},{"n":3,"year":2025},{"n":37,"year":2026}],"dependency_candidates":[]},"authors":[]}}