pith. sign in

hub

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it
abstract

Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

hub tools

citation-role summary

background 3 method 1

citation-polarity summary

polarities

background 4

representative citing papers

LoopTrap: Termination Poisoning Attacks on LLM Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.

PrismaDV: Automated Task-Aware Data Unit Test Generation

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

PrismaDV generates task-aware data unit tests by jointly analyzing downstream code and dataset profiles, outperforming task-agnostic baselines on new benchmarks spanning 60 tasks, with SIFTA enabling automatic prompt optimization that beats hand-written prompts.

QuiLL: An LLM-Based Vulnerability Assessment Framework for the Wild

cs.CR · 2025-10-05 · unverdicted · novelty 6.0

QuiLL is a new evaluation pipeline that uses optimized LLM prompts, dynamic in-context learning from an NVD vector store, and a novel accuracy-plus-reasoning metric to benchmark vulnerability detection in real code.

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

cs.CL · 2024-10-30 · unverdicted · novelty 6.0

OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.

Commander-GPT: Dividing and Routing for Multimodal Sarcasm Detection

cs.AI · 2025-06-24 · unverdicted · novelty 5.0

Commander-GPT is a multi-agent routing framework that assigns sub-tasks in multimodal sarcasm detection to specialized LLMs coordinated by different commander models, reporting average F1 gains of 4.4% and 11.7% on MMSD and MMSD 2.0.

Understanding the planning of LLM agents: A survey

cs.AI · 2024-02-05 · accept · novelty 4.0

A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.

citing papers explorer

Showing 27 of 27 citing papers.