arxiv: 2406.06608 · v6 · submitted 2024-06-06 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Sander Schulhoff , Michael Ilie , Nishant Balepur , Konstantine Kahadze , Amanda Liu , Chenglei Si , Yinheng Li , Aayush Gupta

show 23 more authors

HyoJung Han Sevien Schulhoff Pranav Sandeep Dulepet Saurav Vidyadhara Dayeon Ki Sweta Agrawal Chau Pham Gerson Kroiz Feileen Li Hudson Tao Ashay Srivastava Hevander Da Costa Saloni Gupta Megan L. Rogers Inna Goncearenco Giuseppe Sarli Igor Galynker Denis Peskoff Marine Carpuat Jules White Shyamal Anadkat Alexander Hoyle Philip Resnik

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:11 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords prompt engineeringtaxonomylarge language modelsgenerative AIprompting techniquesbest practicesmeta-analysisvocabulary

0 comments

The pith

A survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 for other modalities, supported by a 33-term vocabulary and best practices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to resolve conflicting terminology and fragmented understanding in prompt engineering for generative AI by building a shared taxonomy and vocabulary. It gathers and categorizes techniques that developers use to guide large language models and other systems. This structure clarifies how different prompting approaches affect outputs across research and industry uses. The authors review applications, supply guidelines for models such as ChatGPT, and include a meta-analysis of prefix-prompting studies. If the taxonomy holds, it gives practitioners a clearer map for choosing and refining prompts.

Core claim

The authors establish a structured understanding of prompt engineering by assembling a taxonomy of 58 LLM prompting techniques and 40 techniques for other modalities, a vocabulary of 33 terms, best practices and guidelines for prompting state-of-the-art models, and a meta-analysis of the literature on natural language prefix-prompting, presenting this collection as the most comprehensive survey to date.

What carries the argument

The taxonomy of prompting techniques, which classifies 58 LLM methods and 40 others to clarify what constitutes an effective prompt.

If this is right

A shared vocabulary reduces conflicting descriptions of the same prompting approach across papers and tools.
The taxonomy helps developers select suitable techniques for specific tasks instead of relying on trial and error.
Guidelines for state-of-the-art models improve output quality and consistency when applied to systems like ChatGPT.
The meta-analysis highlights trends in how prefix-prompting research has evolved over time.
Clear categories support more systematic testing of which techniques work best for given domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could serve as a base layer for automated tools that suggest or optimize prompts for new tasks.
Researchers might test whether the same categories apply to emerging modalities such as video generation or code interpreters.
Standard terms could enable consistent benchmarks that compare prompting performance across different models.
Extending the survey periodically would track how new techniques fit into or expand the existing structure.

Load-bearing premise

The literature search and categorization process captured a representative and exhaustive set of prompting techniques without significant omissions or misclassifications.

What would settle it

An independent literature search that identifies a substantial number of distinct prompting techniques absent from the taxonomy or placed in incorrect categories would falsify the claim of comprehensiveness.

read the original abstract

Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering. Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding of what constitutes an effective prompt due to its relatively recent emergence. We establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications. We present a detailed vocabulary of 33 vocabulary terms, a taxonomy of 58 LLM prompting techniques, and 40 techniques for other modalities. Additionally, we provide best practices and guidelines for prompt engineering, including advice for prompting state-of-the-art (SOTA) LLMs such as ChatGPT. We further present a meta-analysis of the entire literature on natural language prefix-prompting. As a culmination of these efforts, this paper presents the most comprehensive survey on prompt engineering to date.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey assembles a broad taxonomy of 58 LLM techniques plus 40 multimodal ones with a meta-analysis, but the exhaustiveness claim lacks visible support from search details.

read the letter

This paper's main contribution is a taxonomy that groups 58 LLM prompting techniques and 40 for other modalities, along with a 33-term vocabulary and a meta-analysis of prefix-prompting work. That synthesis is new and could give the field a shared structure where terminology has been all over the place. It does well at collecting practical advice and guidelines for using these techniques with current models like ChatGPT. The organization into categories makes it easier to navigate the growing literature on prompting. The soft spots are around how the taxonomy was built. The writeup does not detail the literature search process, inclusion criteria, or how categories were decided, so it's difficult to assess whether the 58 techniques represent a complete set or just the authors' sample. The meta-analysis would benefit from more on which papers were selected and why. This work is aimed at people who build or study LLM applications and need a reference for different prompting strategies. A reader who wants an overview rather than original experiments will get the most out of it. The paper shows clear thinking in trying to standardize the area. It deserves a serious referee to evaluate the survey methods and suggest improvements to the documentation. I recommend sending it for peer review, with the expectation that the authors can add the missing details on their process.

Referee Report

2 major / 3 minor

Summary. The paper surveys prompt engineering for generative AI systems. It assembles a 33-term vocabulary, a taxonomy of 58 LLM prompting techniques plus 40 multimodal techniques, best-practice guidelines (including for SOTA models such as ChatGPT), and a meta-analysis of the natural-language prefix-prompting literature, claiming to deliver the most comprehensive survey to date.

Significance. If the taxonomy and counts are reproducible and exhaustive, the work would provide a much-needed organizing framework for a rapidly growing but terminologically fragmented subfield, reducing duplication of effort and supplying practitioners with a consolidated reference.

major comments (2)

[Methods / Taxonomy Construction] Methods / Taxonomy Construction: The manuscript supplies no PRISMA-style flow diagram, search strings, database list (arXiv, ACL Anthology, etc.), date cutoffs, inclusion/exclusion criteria, or inter-annotator agreement statistics for the categorization that produced the specific counts of 58 LLM and 40 multimodal techniques. Without these details the central claim of exhaustiveness cannot be evaluated.
[Meta-analysis] Meta-analysis section: The meta-analysis of prefix-prompting literature is asserted but no quantitative aggregation method, effect-size extraction protocol, or study-selection criteria are described, making it impossible to assess whether reported trends rest on a representative sample or on the authors' sampling frame.

minor comments (3)

[Abstract] Abstract: The abstract states that a meta-analysis was performed yet reports none of its quantitative findings or key trends.
[Taxonomy] Taxonomy presentation: The boundary between the 58 LLM techniques and the 40 multimodal techniques is not explicitly justified; several techniques (e.g., certain chain-of-thought variants) appear to straddle both categories.
[Best Practices] Best-practice guidelines: Concrete prompting examples for ChatGPT and other SOTA models are given but are not cross-referenced to the numbered taxonomy entries, reducing traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important areas for improving methodological transparency in our survey. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods / Taxonomy Construction] The manuscript supplies no PRISMA-style flow diagram, search strings, database list (arXiv, ACL Anthology, etc.), date cutoffs, inclusion/exclusion criteria, or inter-annotator agreement statistics for the categorization that produced the specific counts of 58 LLM and 40 multimodal techniques. Without these details the central claim of exhaustiveness cannot be evaluated.

Authors: We acknowledge that the current manuscript does not include a dedicated methods section with these details. This omission limits the ability to fully assess reproducibility and exhaustiveness. In the revised version, we will add a new 'Systematic Review Methodology' section that provides: (1) a PRISMA-style flow diagram, (2) the exact search strings used, (3) the list of databases queried (arXiv, ACL Anthology, Google Scholar, and others), (4) date cutoffs (literature collected through May 2024), (5) explicit inclusion/exclusion criteria, and (6) inter-annotator agreement statistics for the taxonomy categorization process. These additions will directly support the claims of comprehensiveness. revision: yes
Referee: [Meta-analysis] The meta-analysis of prefix-prompting literature is asserted but no quantitative aggregation method, effect-size extraction protocol, or study-selection criteria are described, making it impossible to assess whether reported trends rest on a representative sample or on the authors' sampling frame.

Authors: We agree that the meta-analysis section lacks sufficient methodological specification. The analysis was based on a systematic collection of papers focused on natural-language prefix-prompting, with trends summarized through counts and qualitative synthesis of reported performance improvements. In revision, we will expand this section to explicitly state the study-selection criteria, the protocol for extracting trends and any quantitative metrics (such as reported accuracy deltas), and clarify that the aggregation is a narrative meta-summary with frequency counts rather than a formal statistical meta-analysis with effect sizes. This will make the sampling frame and methods transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey aggregation and taxonomy construction

full rationale

The paper is a systematic literature survey that assembles a taxonomy and vocabulary from external sources without presenting new quantitative derivations, fitted parameters, or predictions. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain. The central claim of comprehensiveness rests on literature search and categorization rather than any equation or definition that reduces to its own inputs by construction. This is the expected outcome for an honest survey paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the collected papers adequately represent the prompt-engineering literature and that the proposed taxonomy categories are both exhaustive and mutually exclusive.

axioms (1)

domain assumption The literature search methodology captured a representative sample of prompt engineering research
Survey validity depends on comprehensive coverage of the field.

pith-pipeline@v0.9.0 · 5593 in / 1057 out tokens · 44412 ms · 2026-05-15T02:11:47.649500+00:00 · methodology

discussion (0)

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data
cs.AI 2026-04 unverdicted novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design matter...
Can Vision Language Models Judge Action Quality? An Empirical Evaluation
cs.CV 2026-04 conditional novelty 7.0

Vision-language models perform only marginally above random on action quality assessment and retain systematic biases even after targeted prompting and contrastive reformulation.
Automated Design of Agentic Systems
cs.AI 2024-08 conditional novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across...
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
cs.LG 2026-05 unverdicted novelty 6.0

Adapting multi-objective pure-exploration bandits enables efficient Pareto prompt set recovery and best feasible prompt identification for LLMs, with linear-case guarantees and empirical gains over baselines.
Alignment has a Fantasia Problem
cs.AI 2026-04 unverdicted novelty 6.0

AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers
cs.CR 2026-04 unverdicted novelty 6.0

Arbiter-K is a new execution architecture that treats LLMs as probabilistic processors inside a neuro-symbolic kernel with a semantic ISA to enable deterministic security enforcement and unsafe trajectory interdiction...
LLMs for Qualitative Data Analysis Fail on Security-specificComments in Human Experiments
cs.SE 2026-04 unverdicted novelty 6.0

LLMs improve with detailed code descriptions but remain insufficient to replace human annotators for security-specific qualitative coding.
User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models
cs.SE 2026-05 conditional novelty 5.0

LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.
LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation
cs.AI 2026-05 unverdicted novelty 5.0

LLARS is a new integrated platform that combines collaborative prompt authoring, cost-controlled batch generation, and hybrid evaluation to help domain experts and developers jointly build and assess LLM systems.
U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
cs.AI 2026-05 unverdicted novelty 5.0

U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.
Looking Into the Past: Eye Movements Characterize Elements of Autobiographical Recall in Interviews with Holocaust Survivors
cs.MM 2026-04 unverdicted novelty 5.0

Eye movements during Holocaust survivor interviews vary by episodic, semantic, affective and temporal memory dimensions, with pre-onset gaze sufficient to predict sentence temporal context.
OOPrompt: Reifying Intents into Structured Artifacts for Modular and Iterative Prompting
cs.HC 2026-04 unverdicted novelty 5.0

OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.
Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis
cs.AI 2026-04 unverdicted novelty 5.0

Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.
Confidence Without Competence in AI-Assisted Knowledge Work
cs.HC 2026-04 unverdicted novelty 5.0

Standard LLM chats produce high perceived understanding but low objective learning in students, while future-self explanations best align confidence with actual gains and guided hints maximize learning with moderate workload.
The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure
cs.CL 2026-04 accept novelty 5.0

PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt ...
Self-Describing Structured Data with Dual-Layer Guidance: A Lightweight Alternative to RAG for Precision Retrieval in Large-Scale LLM Knowledge Navigation
cs.CL 2026-03 unverdicted novelty 5.0

SDSR places human metadata at file primacy and combines it with prompt routing rules to reach 100% primary category accuracy on a 119-category benchmark, far above the 65% no-guidance baseline.
Characterizing Students' LLM Usage Behaviors and Their Association with Learning in Critical Thinking Tasks
cs.HC 2026-05 unverdicted novelty 4.0

Refined bottom-up categories of LLM usage in critical thinking homework, labeled by student initiative, are examined for associations with midterm performance across two course offerings.
Hint-Writing with Deferred AI Assistance: Fostering Critical Engagement in Data Science Education
cs.HC 2026-04 unverdicted novelty 4.0

In a randomized experiment with 97 graduate students, deferred AI assistance produced the highest-quality hints and helped students spot more code mistakes than independent writing or immediate AI help.
Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study
cs.SE 2026-05 unverdicted novelty 3.0

Multi-shot prompting raises agreement with humans for Claude Haiku but not DeepSeek-Chat or Gemini 2.5 Flash, with models showing different stability and a consistent bias toward over-labeling negative feedback.
CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse
cs.CL 2026-05 unverdicted novelty 3.0

An LLM ensemble reached 80 macro-F1 on 3-class clarity detection and 59 on 9-class evasion detection, with partial layer unfreezing and multilingual ensembles improving encoder results while enriched context helped only LLMs.
A Reproducibility Study of Metacognitive Retrieval-Augmented Generation
cs.IR 2026-04 unverdicted novelty 3.0

MetaRAG is only partially reproducible with lower absolute scores than originally reported, gains substantially from reranking, and shows greater robustness than SIM-RAG under extended retrieval features.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · cited by 21 Pith papers · 3 internal anchors

[1]

arXiv preprint arXiv:2404.11018

Many-shot in-context learning. arXiv preprint arXiv:2404.11018. Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, and Marjan Ghazvininejad. 2023. In- context examples selection for machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8857–8873, Toronto, Canada. Association for Computational Linguisti...

work page arXiv 2023
[2]

Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shah- baz Khan

BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer. Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shah- baz Khan. 2023. Foundational models defining a new era in vision: A survey and outlook. Abhijeet Awasthi, Nitish Gupta, Bidisha Samanta, Shachi Da...

work page 2023
[3]

In Proceedings of the 17th Conference of the European Chapter of the As- sociation for Computational Linguistics, pages 2455– 2467, Dubrovnik, Croatia

Bootstrapping multilingual semantic parsers using large language models. In Proceedings of the 17th Conference of the European Chapter of the As- sociation for Computational Linguistics, pages 2455– 2467, Dubrovnik, Croatia. Association for Computa- tional Linguistics. Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao ...

work page 2023
[4]

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale. In ACL. Omer Bar-Tal, Dolev Ofri-Amar, Rafail Fridman, Yoni Kasten, and Tali Dekel. 2022. Text2live: Text-driven layered image and video editing. Amanda Bertsch, Maor Ivgi, Uri Alon, Jonathan Berant, Matthew R Gormley, and Graham Neubig. 2024. I...

work page arXiv 2022
[5]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Language models are few-shot learners. Sébastien Bubeck, Varun Chandrasekaran, Ronen El- dan, John A. Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuan-Fang Li, Scott M. Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. ArXiv, abs/2303.12712. ...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

How is chatgpt’s behav- ior changing over time? arXiv preprint arXiv:2307.09009, 2023

Chateval: Towards better LLM-based eval- uators through multi-agent debate. In The Twelfth International Conference on Learning Representa- tions. Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, and Vikas Chandra. 2023. In-context prompt editing for conditional audio generation. Harrison Chase. 2...

work page arXiv 2023
[7]

GPTScore: Evaluate as You Desire , publisher =

Template-based named entity recognition us- ing bart. Findings of the Association for Computa- tional Linguistics: ACL-IJCNLP 2021. Hai Dang, Lukas Mecke, Florian Lehmann, Sven Goller, and Daniel Buschek. 2022. How to prompt? opportu- nities and challenges of zero- and few-shot learning for human-ai interaction in creative applications of generative model...

work page arXiv 2021
[8]

In NeurIPS

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings. In NeurIPS. Hangfeng He, Hongming Zhang, and Dan Roth. 2023a. Socreval: Large language models with the so- cratic method for reference-free reasoning evaluation. arXiv preprint arXiv:2310.00074. Zhiwei He, Tian Liang, Wenxiang Jiao, Zhuosheng Zhang, Yujiu Yang, Rui Wang,...

work page arXiv
[9]

Measuring Massive Multitask Language Un- derstanding. In ICLR. Amr Hendy, Mohamed Gomaa Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, and Hany Hassan Awadalla. 2023. How good are gpt models at machine translation? a comprehensive evaluation. ArXiv, abs/2302.09210. Amir Hertz, Ron Mokady, Jay Tenenba...

work page arXiv 2023
[10]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

A comprehensive study of vision transformers in image classification tasks. Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, and Matei Zaharia. 2022. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Kesh...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[11]

Natalie Kiesler and Daniel Schiffner

Decomposed prompting: A modular approach for solving complex tasks. Natalie Kiesler and Daniel Schiffner. 2023. Large lan- guage models in introductory programming educa- tion: Chatgpt’s performance and implications for assessments. arXiv preprint arXiv:2308.08572. Hwichan Kim and Mamoru Komachi. 2023. Enhancing few-shot cross-lingual transfer with target...

work page arXiv 2023
[12]

51 Soochan Lee and Gunhee Kim

Euclidreamer: Fast and high-quality texturing for 3d models with stable diffusion depth. 51 Soochan Lee and Gunhee Kim. 2023. Recursion of thought: A divide-and-conquer approach to multi- context reasoning with language models. Alina Leidinger, Robert van Rooij, and Ekaterina Shutova. 2023. The language of prompting: What linguistic properties make a prom...

work page arXiv 2023
[13]

Yaoyiran Li, Anna Korhonen, and Ivan Vuli ´c

Oscar: Object-semantics aligned pre-training for vision-language tasks. Yaoyiran Li, Anna Korhonen, and Ivan Vuli ´c. 2023h. On bilingual lexicon induction with large language models. Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. 2023i. Making language models better reasoners with step-aware verifier. In Proceedin...

work page arXiv 2023
[14]

Albert Lu, Hongxin Zhang, Yanzhe Zhang, Xuezhi Wang, and Diyi Yang

Att3d: Amortized text-to-3d object synthesis. Albert Lu, Hongxin Zhang, Yanzhe Zhang, Xuezhi Wang, and Diyi Yang. 2023a. Bounding the capabili- ties of large language models in open text generation with prompt constraints. Hongyuan Lu, Haoyang Huang, Dongdong Zhang, Hao- ran Yang, Wai Lam, and Furu Wei. 2023b. Chain- of-dictionary prompting elicits transl...

work page arXiv 2021
[15]

arXiv preprint arXiv:2303.15621 , year=

Chatgpt as a factual inconsistency evaluator for abstractive text summarization. arXiv preprint arXiv:2303.15621. Jiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, and Shifeng Chen. 2023. Gpt4motion: Script- ing physical motions in text-to-video generation via blender-oriented gpt planning. Qing Lyu, Shre...

work page arXiv 2023
[16]

gradient descent

Suicide crisis syndrome: A systematic review. Suicide and Life-Threatening Behavior. February 27, online ahead of print. Fanxu Meng, Haotong Yang, Yiding Wang, and Muhan Zhang. 2023. Chain of images for intuitively reason- ing. B. Meskó. 2023. Prompt engineering as an impor- tant emerging skill for medical professionals: Tuto- rial. Journal of Medical Int...

work page arXiv 2023
[17]

Conversation style transfer using few-shot learning. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Vol- ume 1: Long Papers) , pages 119–143, Nusa Dua, Bali. Association for Computational Linguistics. Ohad Rubin, J...

work page
[18]

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies

Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies. Association for Computational Linguistics. Runway. 2023. Gen-2 prompt tips. https: //help.runwayml.com/hc/en-us/articles/ 17329337959699-Gen-2-Prompt-Tips...

work page 2022
[19]

Shubhra Kanti Karmaker Santu and Dongji Feng

Lost at c: A user study on the security implica- tions of large language model code assistants. Shubhra Kanti Karmaker Santu and Dongji Feng. 2023. Teler: A general taxonomy of llm prompts for bench- marking complex tasks. Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 20...

work page arXiv 2023
[20]

Chenglei Si, Dan Friedman, Nitish Joshi, Shi Feng, Danqi Chen, and He He

Reflexion: Language agents with verbal rein- forcement learning. Chenglei Si, Dan Friedman, Nitish Joshi, Shi Feng, Danqi Chen, and He He. 2023a. Measuring induc- tive biases of in-context learning with underspecified demonstrations. In Association for Computational Linguistics (ACL). Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jor...

work page arXiv 2023
[21]

In Proceed- ings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers), pages 819–862, Dublin, Ireland

An information-theoretic approach to prompt engineering without ground truth labels. In Proceed- ings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers), pages 819–862, Dublin, Ireland. Association for Computational Linguistics. Andrea Sottana, Bin Liang, Kai Zou, and Zheng Yuan

work page
[22]

arXiv preprint arXiv:2310.13800

Evaluation metrics in the era of gpt-4: Reli- ably evaluating large language models on sequence to sequence tasks. arXiv preprint arXiv:2310.13800. Michal Štefánik and Marek Kadl ˇcík. 2023. Can in- context learners learn a reasoning concept from demonstrations? In Proceedings of the 1st Work- shop on Natural Language Reasoning and Structured Explanations...

work page arXiv 2023
[23]

Eshaan Tanwar, Subhabrata Dutta, Manish Borthakur, and Tanmoy Chakraborty

Towards training-free open-world segmenta- tion via image prompting foundation models. Eshaan Tanwar, Subhabrata Dutta, Manish Borthakur, and Tanmoy Chakraborty. 2023. Multilingual LLMs are better cross-lingual in-context learners with align- ment. In Proceedings of the 61st Annual Meeting of 57 the Association for Computational Linguistics (Vol- ume 1: L...

work page arXiv 2023
[24]

Jason Weston and Sainbayar Sukhbaatar

Large language models are better reasoners with self-verification. Jason Weston and Sainbayar Sukhbaatar. 2023. System 2 attention (is something you might need too). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A prompt pattern catalog to enhance prompt ...

work page arXiv 2023
[25]

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. CHI Conference on Human Factors in Computing Systems. Xiaodong Wu, Ran Duan, and Jianbing Ni. 2023c. Un- veiling security, privacy, and ethical concerns of chat- gpt. Journal of Information and Intelligence. 59 Congying Xia, Chen Xing, Jiangshu Du, Xinyi...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Re-reading improves reasoning in language models. Tianci Xue, Ziqi Wang, Zhenhailong Wang, Chi Han, Pengfei Yu, and Heng Ji. 2023. Rcot: Detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought. Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V . Le, Denny Zhou, and Xinyun Chen. 2023a. Large language models as opt...

work page arXiv 2023
[27]

slots to fill

Thread of thought unraveling chaotic contexts. Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Wei- jie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, and Jifeng Dai. 2023. Ghost in the minecraft: Gener- ally capable agents for open-world environments via large language models with text-based knowledge and memory. Z...

work page 2023