hub

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert · 2023 · cs.SE · arXiv 2302.11382

37 Pith papers cite this work. Polarity classification is still indexing.

37 Pith papers citing it

open full Pith review browse 37 citing papers arXiv PDF

abstract

Prompt engineering is an increasingly important skill set needed to converse effectively with large language models (LLMs), such as ChatGPT. Prompts are instructions given to an LLM to enforce rules, automate processes, and ensure specific qualities (and quantities) of generated output. Prompts are also a form of programming that can customize the outputs and interactions with an LLM. This paper describes a catalog of prompt engineering techniques presented in pattern form that have been applied to solve common problems when conversing with LLMs. Prompt patterns are a knowledge transfer method analogous to software patterns since they provide reusable solutions to common problems faced in a particular context, i.e., output generation and interaction when working with LLMs. This paper provides the following contributions to research on prompt engineering that apply LLMs to automate software development tasks. First, it provides a framework for documenting patterns for structuring prompts to solve a range of problems so that they can be adapted to different domains. Second, it presents a catalog of patterns that have been applied successfully to improve the outputs of LLM conversations. Third, it explains how prompts can be built from multiple patterns and illustrates prompt patterns that benefit from combination with other prompt patterns.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Introduces the Grounded Observer framework that applies robotics-inspired formal constructs for runtime constraint enforcement on foundation model interaction trajectories in socially sensitive domains.

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.

When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

cs.SE · 2026-04-27 · unverdicted · novelty 7.0

Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.

Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

cs.HC · 2026-04-09 · unverdicted · novelty 7.0

LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.

Architecture Without Architects: How AI Coding Agents Shape Software Architecture

cs.SE · 2026-04-05 · unverdicted · novelty 7.0

AI coding agents perform vibe architecting by making prompt-driven architectural choices that produce structurally different systems for identical tasks.

Compass vs Railway Tracks: Unpacking User Mental Models for Communicating Long-Horizon Work to Humans vs. AI

cs.HC · 2026-01-17 · unverdicted · novelty 7.0

Users treat human delegation for long tasks as a flexible compass but AI delegation as rigid railway tracks due to perceived AI limitations in inference and judgment.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

cs.SE · 2026-04-24 · unverdicted · novelty 6.0

Open-weight LLMs reach 81-91% success generating formally verified Dafny code for complex algorithmic problems when given structural signatures and self-healing verifier feedback.

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

cs.CR · 2026-02-24 · unverdicted · novelty 6.0

The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.

Language Model Goal Selection Differs from Humans' in a Self-Directed Learning Task

cs.CL · 2026-02-06 · unverdicted · novelty 6.0

LLMs diverge from human goal selection in self-directed learning by exploiting single solutions with low variability across instances.

Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation

cs.HC · 2025-06-19 · unverdicted · novelty 6.0

GPT-4o identified only 21.2% of the usability issues found by human experts in heuristic evaluation, while discovering 27 additional issues and exhibiting difficulties with certain heuristics and generating false positives.

PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

cs.SE · 2025-05-12 · conditional · novelty 6.0

Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

cs.AR · 2025-04-28 · conditional · novelty 6.0

UVM^2 is an LLM-driven system that generates and refines UVM testbenches for RTL verification, reporting up to substantial time savings and average code/function coverage of 87.44%/89.58% on designs up to 1.6K lines, outperforming prior methods.

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

cs.CL · 2023-10-17 · conditional · novelty 6.0

LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.

Towards SocratiCode: Designing a Generative AI-Based Programming Tutor for K-12 Students through a 4-Week Participatory Design Study

cs.HC · 2026-05-18 · unverdicted · novelty 5.0

A participatory design study with two K-12 students iteratively refined a generative AI Python tutor toward Socratic questioning, reflection prompts, and incremental hints, with preliminary observations of better clarity and engagement when combined with human guidance.

Making OpenAPI Documentation Agent-Ready: Detecting Documentation and REST Smells with a Multi-Agent LLM System

cs.SE · 2026-05-14 · unverdicted · novelty 5.0

Hermes uses multi-agent LLMs to detect 2450 documentation and REST smells across 600 OpenAPI endpoints, demonstrating that structurally valid microservice APIs are often not semantically ready for agent consumption.

The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

cs.SE · 2026-05-13 · unverdicted · novelty 5.0

LLM-generated code matches human-written code in overall readability but exhibits different issue patterns, and prompt engineering has limited impact on improving it.

User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

cs.SE · 2026-05-12 · conditional · novelty 5.0

LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.

Benchmarking LLM-Based Static Analysis for Secure Smart Contract Development: Reliability, Limitations, and Potential Hybrid Solutions

cs.CR · 2026-05-11 · unverdicted · novelty 5.0

LLMs for smart contract security analysis show lexical bias from identifier names causing high false positives, with prompting creating precision-recall trade-offs, positioning them as complements rather than replacements for static analysis tools.

Conventional Commit Classification using Large Language Models and Prompt Engineering

cs.SE · 2026-05-03 · unverdicted · novelty 5.0

Few-shot prompting with the 32B DeepSeek-R1 model achieves the highest accuracy on a balanced set of 3,200 conventional commits mined from InfluxDB, while chain-of-thought adds no benefit and larger model scale improves results.

Enhanced Self-Learning with Epistemologically-Informed LLM Dialogue

cs.HC · 2026-04-12 · unverdicted · novelty 5.0

CausaDisco integrates Aristotle's Four Causes into LLM prompts to produce more engaging, exploratory, and multifaceted self-learning dialogues, as evidenced by controlled user studies.

STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction

cs.LG · 2026-04-09 · unverdicted · novelty 5.0

STaR-DRO applies momentum-smoothed Tsallis reweighting to focus learning on hard groups in structured prediction, yielding F1 gains on clinical label extraction.

LLM2Manim: Pedagogy-Aware AI Generation of STEM Animations

cs.MM · 2026-04-07 · unverdicted · novelty 5.0

LLM2Manim pipeline generates pedagogy-aware Manim animations for STEM, producing slightly better student post-test scores (83% vs 78%), learning gains (d=0.67), and engagement than PowerPoint in a controlled study.

citing papers explorer

Showing 37 of 37 citing papers.

Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains cs.AI · 2026-05-19 · unverdicted · none · ref 49 · internal anchor
Introduces the Grounded Observer framework that applies robotics-inspired formal constructs for runtime constraint enforcement on foundation model interaction trajectories in socially sensitive domains.
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation cs.CL · 2026-05-08 · unverdicted · none · ref 24 · internal anchor
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation cs.SE · 2026-04-27 · unverdicted · none · ref 41 · internal anchor
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery cs.HC · 2026-04-09 · unverdicted · none · ref 94 · internal anchor
LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.
Architecture Without Architects: How AI Coding Agents Shape Software Architecture cs.SE · 2026-04-05 · unverdicted · none · ref 10 · internal anchor
AI coding agents perform vibe architecting by making prompt-driven architectural choices that produce structurally different systems for identical tasks.
Compass vs Railway Tracks: Unpacking User Mental Models for Communicating Long-Horizon Work to Humans vs. AI cs.HC · 2026-01-17 · unverdicted · none · ref 80 · internal anchor
Users treat human delegation for long tasks as a flexible compass but AI delegation as rigid railway tracks due to perceived AI limitations in inference and judgment.
Automated Design of Agentic Systems cs.AI · 2024-08-15 · conditional · none · ref 29 · internal anchor
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 149 · internal anchor
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification cs.SE · 2026-04-24 · unverdicted · none · ref 90 · internal anchor
Open-weight LLMs reach 81-91% success generating formally verified Dafny code for complex algorithmic problems when given structural signatures and self-healing verifier feedback.
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents cs.CR · 2026-02-24 · unverdicted · none · ref 27 · internal anchor
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
Language Model Goal Selection Differs from Humans' in a Self-Directed Learning Task cs.CL · 2026-02-06 · unverdicted · none · ref 20 · internal anchor
LLMs diverge from human goal selection in self-directed learning by exploiting single solutions with low variability across instances.
Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation cs.HC · 2025-06-19 · unverdicted · none · ref 49 · internal anchor
GPT-4o identified only 21.2% of the usability issues found by human experts in heuristic evaluation, while discovering 27 additional issues and exhibiting difficulties with certain heuristics and generating false positives.
PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes cs.SE · 2025-05-12 · conditional · none · ref 30 · internal anchor
Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.
From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification cs.AR · 2025-04-28 · conditional · none · ref 37 · internal anchor
UVM^2 is an LLM-driven system that generates and refines UVM testbenches for RTL verification, reporting up to substantial time savings and average code/function coverage of 87.44%/89.58% on designs up to 1.6K lines, outperforming prior methods.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting cs.CL · 2023-10-17 · conditional · none · ref 64 · internal anchor
LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.
Towards SocratiCode: Designing a Generative AI-Based Programming Tutor for K-12 Students through a 4-Week Participatory Design Study cs.HC · 2026-05-18 · unverdicted · none · ref 35 · internal anchor
A participatory design study with two K-12 students iteratively refined a generative AI Python tutor toward Socratic questioning, reflection prompts, and incremental hints, with preliminary observations of better clarity and engagement when combined with human guidance.
Making OpenAPI Documentation Agent-Ready: Detecting Documentation and REST Smells with a Multi-Agent LLM System cs.SE · 2026-05-14 · unverdicted · none · ref 24 · internal anchor
Hermes uses multi-agent LLMs to detect 2450 documentation and REST smells across 600 OpenAPI endpoints, demonstrating that structurally valid microservice APIs are often not semantically ready for agent consumption.
The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code cs.SE · 2026-05-13 · unverdicted · none · ref 102 · internal anchor
LLM-generated code matches human-written code in overall readability but exhibits different issue patterns, and prompt engineering has limited impact on improving it.
User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models cs.SE · 2026-05-12 · conditional · none · ref 14 · internal anchor
LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.
Benchmarking LLM-Based Static Analysis for Secure Smart Contract Development: Reliability, Limitations, and Potential Hybrid Solutions cs.CR · 2026-05-11 · unverdicted · none · ref 29 · internal anchor
LLMs for smart contract security analysis show lexical bias from identifier names causing high false positives, with prompting creating precision-recall trade-offs, positioning them as complements rather than replacements for static analysis tools.
Conventional Commit Classification using Large Language Models and Prompt Engineering cs.SE · 2026-05-03 · unverdicted · none · ref 14 · internal anchor
Few-shot prompting with the 32B DeepSeek-R1 model achieves the highest accuracy on a balanced set of 3,200 conventional commits mined from InfluxDB, while chain-of-thought adds no benefit and larger model scale improves results.
Enhanced Self-Learning with Epistemologically-Informed LLM Dialogue cs.HC · 2026-04-12 · unverdicted · none · ref 109 · internal anchor
CausaDisco integrates Aristotle's Four Causes into LLM prompts to produce more engaging, exploratory, and multifaceted self-learning dialogues, as evidenced by controlled user studies.
STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction cs.LG · 2026-04-09 · unverdicted · none · ref 21 · internal anchor
STaR-DRO applies momentum-smoothed Tsallis reweighting to focus learning on hard groups in structured prediction, yielding F1 gains on clinical label extraction.
LLM2Manim: Pedagogy-Aware AI Generation of STEM Animations cs.MM · 2026-04-07 · unverdicted · none · ref 52 · internal anchor
LLM2Manim pipeline generates pedagogy-aware Manim animations for STEM, producing slightly better student post-test scores (83% vs 78%), learning gains (d=0.67), and engagement than PowerPoint in a controlled study.
The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure cs.CL · 2026-04-03 · accept · none · ref 13 · internal anchor
PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt concepts.
When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems cs.HC · 2026-03-14 · unverdicted · none · ref 32 · 2 links · internal anchor
A single-subject autoethnographic study documents rapid loss of decision-making agency in an LLM-based cognitive externalization system caused by context contamination and metacognitive co-option, with recovery only after physical interruption.
Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On cs.AI · 2026-05-18 · unverdicted · none · ref 64 · internal anchor
Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.
From Text to DSL: Evaluating Grammar-Based Model Generation Using Open LLMs cs.SE · 2026-05-15 · unverdicted · none · ref 1 · internal anchor
Compact open-source LLMs can produce syntactically valid, semantically complete, and inter-model consistent DSL models from text via few-shot prompting, with some 7B-12B models matching much larger ones in quality.
Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration cs.IR · 2026-04-19 · unverdicted · none · ref 30 · internal anchor
A multi-agent multimodal system with fact-grounded adjudication and a dynamic two-tier preference graph cuts false positives in content filtering by 74.3% and nearly doubles F1-score versus text-only baselines while supporting user-driven Delta adjustments.
Nanomentoring: Investigating How Quickly People Can Help People Learn Feature-Rich Software cs.HC · 2026-04-15 · unverdicted · none · ref 41 · internal anchor
Experts can deliver helpful advice on over half of short 'nanoquestions' about feature-rich software in under one minute.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 56 · internal anchor
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
From System 1 to System 2: A Survey of Reasoning Large Language Models cs.AI · 2025-02-24 · accept · none · ref 86 · internal anchor
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks cs.SE · 2024-10-01 · unverdicted · none · ref 104 · internal anchor
A survey of user studies on LLM use in programming that identifies interaction behaviors, mixed benefits and weaknesses, and factors influencing human and task performance.
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs cs.CR · 2023-12-06 · unverdicted · none · ref 21 · internal anchor
Impersonating complex misaligned personas via biographies and role-play bypasses safety in ChatGPT, Gemini, and Deepseek, succeeding on 38-40 out of 40 illicit questions across tested models.
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages cs.CL · 2026-05-16 · unverdicted · none · ref 221 · internal anchor
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.
LLMs in Qualitative Research: Opportunities, Limitations, and Practical Considerations cs.HC · 2026-05-15 · unverdicted · none · ref 59 · internal anchor
The paper outlines opportunities, limitations, and practical parameters for integrating LLMs into qualitative research while aligning with epistemological commitments like reflexivity and interpretive judgment.
Rethinking the A in STEAM: Insights from and for AI Literacy Education cs.CY · 2024-05-28 · unverdicted · none · ref 57 · internal anchor
Advocates robust inclusion of arts in STEAM education to support holistic AI literacy in K-12 by addressing media representations, anthropomorphism, societal biases, and generative AI impacts.

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer