pith. machine review for the scientific record. sign in

arxiv: 2604.14197 · v1 · submitted 2026-04-03 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:56 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords prompt engineeringlarge language modelsreference architecturetaxonomyPICCOprompt structureLLM prompting
0
0 comments X

The pith

PICCO provides a five-element reference architecture for structuring prompts to large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper synthesizes eleven prior prompting frameworks into a taxonomy that separates prompt frameworks, elements, generation, techniques, and engineering as distinct concepts. From this synthesis it extracts a reference architecture called PICCO that breaks prompt construction into Persona, Instructions, Context, Constraints, and Output. The stated purpose is to replace inconsistent ad-hoc prompt writing with a shared structure that makes design decisions explicit and comparable. A reader would care because clearer organization of prompts can reduce trial-and-error and produce more predictable behavior from the same models.

Core claim

The analysis yields a taxonomy distinguishing prompt frameworks from prompt elements, prompt generation, prompting techniques, and prompt engineering. It then derives a five-element reference architecture for prompt generation: Persona, Instructions, Context, Constraints, and Output. For each element the paper defines function, scope, and interrelationships, with the explicit goal of improving conceptual clarity and supporting systematic prompt design without claiming empirical performance gains.

What carries the argument

The PICCO reference architecture, which decomposes prompt generation into five named elements—Persona, Instructions, Context, Constraints, and Output—to supply a common structure for specification and comparison.

If this is right

  • Prompts become describable and comparable using a shared five-part vocabulary rather than free-form text.
  • Each element can be refined independently during iterative prompt engineering.
  • Standard techniques such as zero-shot, few-shot, chain-of-thought, and self-critique map onto specific PICCO slots.
  • Responsible prompting practices around bias, privacy, and security can be applied element by element.
  • Future work can extend the architecture to new domains while preserving the same five-part skeleton.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Automated prompt generators could be built to populate each PICCO slot from a task description.
  • The structure might reveal gaps when applied to multimodal or agentic prompts that current frameworks overlook.
  • Teams could adopt PICCO as an internal standard to reduce variance in prompt quality across different engineers.
  • Security reviews could focus on the Constraints element to surface hidden risks more systematically.

Load-bearing premise

A synthesis of eleven published prompting frameworks is sufficient to produce a general reference architecture that improves clarity for all users without requiring separate empirical validation.

What would settle it

A controlled study that measures output consistency or task success rates for prompts written with explicit PICCO elements versus unstructured prompts of similar length would directly test whether the architecture delivers the claimed clarity.

read the original abstract

Large language model (LLM) performance depends heavily on prompt design, yet prompt construction is often described and applied inconsistently. Our purpose was to derive a reference framework for structuring LLM prompts. This paper presents PICCO, a framework derived through a rigorous synthesis of 11 previously published prompting frameworks identified through a multi-database search. The analysis yields two main contributions. First, it proposes a taxonomy that distinguishes prompt frameworks, prompt elements, prompt generation, prompting techniques, and prompt engineering as related but non-equivalent concepts. Second, it derives a five-element reference architecture for prompt generation: Persona, Instructions, Context, Constraints, and Output (PICCO). For each element, we define its function, scope, and relationship to other elements, with the goal of improving conceptual clarity and supporting more systematic prompt design. Finally, to support application of the framework, we outline key concepts relevant to implementation, including prompting techniques (e.g., zero-shot, few-shot, chain-of-thought, ensembling, decomposition, and self-critique, with selected variants), human and automated approaches to iterative prompt engineering, responsible prompting considerations such as security, privacy, bias, and trust, and priorities for future research. This work is a conceptual and methodological contribution: it formalizes a common structure for prompt specification and comparison, but does not claim empirical validation of PICCO as an optimization method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper proposes the PICCO framework for structuring prompts in large language models. It derives a taxonomy that differentiates prompt frameworks, elements, generation, techniques, and engineering as related but non-equivalent concepts. From a synthesis of 11 prior frameworks identified via multi-database search, it presents a five-element reference architecture: Persona, Instructions, Context, Constraints, and Output (PICCO), defining each element's function, scope, and interrelationships. The work also outlines implementation concepts including prompting techniques (zero-shot, few-shot, chain-of-thought, etc.), iterative human/automated prompt engineering, responsible considerations (security, privacy, bias), and future research priorities, explicitly positioning the contribution as conceptual without empirical validation of performance gains.

Significance. If the synthesis holds, the taxonomy and PICCO architecture would provide a valuable standardized reference for prompt specification and comparison in the LLM literature, where terminology remains inconsistent. The explicit scoping as non-empirical synthesis, combined with the transparent derivation from prior frameworks, supports its utility for systematic prompt design and future empirical work. This is a methodological contribution that formalizes common structure without overclaiming optimization results.

major comments (1)
  1. [Methods] Methods section: The multi-database search and selection process for the 11 frameworks is described at a high level; to substantiate the central claim of a 'rigorous synthesis' yielding the PICCO architecture, explicit inclusion/exclusion criteria, search strings, and the mapping procedure from source elements to the five PICCO components should be provided (e.g., in a supplementary table or appendix).
minor comments (3)
  1. [Figure 1] Figure 1 (taxonomy diagram): The visual relationships among prompt frameworks, elements, generation, techniques, and engineering would benefit from explicit edge labels or a legend to clarify distinctions.
  2. [Section 4] Section 4 (PICCO elements): While definitions are provided, adding one concrete prompt example per element (or a combined example) would improve accessibility without altering the conceptual scope.
  3. [Discussion] Discussion of prompting techniques: The list of techniques (zero-shot, few-shot, chain-of-thought, ensembling, etc.) is useful but would be strengthened by a brief comparison table of their alignment with specific PICCO elements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and constructive suggestion. We agree that greater transparency in the methods will strengthen the claim of rigorous synthesis and will revise the manuscript to include the requested details.

read point-by-point responses
  1. Referee: [Methods] Methods section: The multi-database search and selection process for the 11 frameworks is described at a high level; to substantiate the central claim of a 'rigorous synthesis' yielding the PICCO architecture, explicit inclusion/exclusion criteria, search strings, and the mapping procedure from source elements to the five PICCO components should be provided (e.g., in a supplementary table or appendix).

    Authors: We accept this point. In the revised manuscript we will expand the Methods section to report the exact search strings used across the databases, the full inclusion/exclusion criteria applied to candidate frameworks, and a supplementary table that maps each source framework's elements to the five PICCO components, including the rationale for any consolidation or re-labeling decisions. These additions will be placed in a new Appendix A and referenced from the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a non-empirical conceptual synthesis that identifies 11 external prompting frameworks via multi-database search and integrates them into a taxonomy plus the PICCO reference architecture (Persona, Instructions, Context, Constraints, Output). No equations, fitted parameters, or derivations are present. The central claims rest on transparent aggregation of prior published work by other authors; no self-citation chains, self-definitional loops, or renamings that reduce the output to the paper's own inputs occur. The work explicitly disclaims empirical validation and presents the result as an organizational contribution, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the selected 11 frameworks adequately represent the space of prompting approaches to support a general reference architecture.

axioms (1)
  • domain assumption The 11 previously published prompting frameworks identified through a multi-database search represent a sufficient basis for deriving a general reference architecture.
    Invoked to justify the PICCO elements as a reference structure applicable beyond the sampled frameworks.

pith-pipeline@v0.9.0 · 5546 in / 1250 out tokens · 40539 ms · 2026-05-13T19:56:52.390737+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 10 internal anchors

  1. [1]

    Macy Foundation Innovation Report Part I: Current Landscape of Artificial Intelligence in Medical Education

    Boscardin CK, Abdulnour RE, Gin BC. Macy Foundation Innovation Report Part I: Current Landscape of Artificial Intelligence in Medical Education. Acad Med. 2025;100:S15-s21

  2. [2]

    Foundation

    Josiah Macy Jr. Foundation. Josiah Macy Jr. Foundation Conference on Artificial Intelligence in Medical Education: Proceedings and Recommendations. Acad Med. 2025;100:S4-s14

  3. [3]

    The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

    Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, et al. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques. arXiv e-prints. 2024:arXiv:2406.06608

  4. [4]

    The AI Workshop: Your Complete Beginner’s Guide to AI Prompts: An A-Z Guide to AI Prompt Engineering for Life, Work, and Business: Funtacular Books; 2025

    Foster M. The AI Workshop: Your Complete Beginner’s Guide to AI Prompts: An A-Z Guide to AI Prompt Engineering for Life, Work, and Business: Funtacular Books; 2025

  5. [5]

    AI Prompt Engineering Bible: The Author; 2025

    Dylik T . AI Prompt Engineering Bible: The Author; 2025

  6. [6]

    Prompt Engineering for Generative AI: Future-Proof Inputs for Reliable AI Outputs: O'Reilly Media; 2024

    Phoenix J, Taylor M. Prompt Engineering for Generative AI: Future-Proof Inputs for Reliable AI Outputs: O'Reilly Media; 2024

  7. [7]

    Prompt Engineering Playbook

    GovTech Data Science & AI Division. Prompt Engineering Playbook. Singapore: Singapore Government Developer Portal; 2023

  8. [8]

    The RICECO Prompt Formula: The Simple Framework to 10x Your AI Results

    Anh M. The RICECO Prompt Formula: The Simple Framework to 10x Your AI Results. Produced by AI Fire; 2025. Available at: www.aifire.co/p/the-riceco-prompting-framework- a-guide-to-a-better-ai-prompt. Accessed 15 Nov 2025

  9. [9]

    LearnPrompting: Basic Prompt Structure and Key Parts

    Kuka V. LearnPrompting: Basic Prompt Structure and Key Parts. Produced by Learn Prompting; 2025. Available at: https://learnprompting.org/docs/basics/prompt_structure. Accessed 3 December 2025

  10. [10]

    Prompt Engineering Paradigms for Medical Applications: Scoping Review

    Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, Lovis C. Prompt Engineering Paradigms for Medical Applications: Scoping Review. J Med Internet Res. 2024;26:e60501

  11. [11]

    A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

    Sahoo P , Singh AK, Saha S, Jain V , Mondal S, Chadha A. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv e-prints. 2024:arXiv:2402.07927

  12. [12]

    Prompt engineering in higher education: a systematic review to help inform curricula

    Lee D, Palmer E. Prompt engineering in higher education: a systematic review to help inform curricula. International Journal of Educational Technology in Higher Education. 2025;22:7

  13. [13]

    A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT . arXiv e-prints. 2023:arXiv:2302.11382

  14. [14]

    Prompting Frameworks for Large Language Models: A Survey

    Liu X, Wang J, Sun J, Yuan X, Dong G, Di P , et al. Prompting Frameworks for Large Language Models: A Survey. arXiv e-prints. 2023:arXiv:2311.12785. 24

  15. [15]

    Prompt Engineering in Clinical Practice: Tutorial for Clinicians

    Liu J, Liu F , Wang C, Liu S. Prompt Engineering in Clinical Practice: Tutorial for Clinicians. J Med Internet Res. 2025;27:e72644

  16. [16]

    Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

    Zhao C, Tan Z, Ma P , Li D, Jiang B, Wang Y, et al. Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens. arXiv. 2025:arXiv:2508.01191

  17. [17]

    Lost in the Middle: How Language Models Use Long Contexts

    Liu NF , Lin K, Hewitt J, Paranjape A, Bevilacqua M, Petroni F , et al. Lost in the Middle: How Language Models Use Long Contexts. arXiv. 2023:arXiv:2307.03172

  18. [18]

    Prompting Science Report 1: Prompt Engineering is Complicated and Contingent

    Meincke L, Mollick E, Mollick L, Shapiro D. Prompting Science Report 1: Prompt Engineering is Complicated and Contingent. arXiv. 2025:arXiv:2503.04818

  19. [19]

    Serial Position Effects of Large Language Models

    Guo X, Vosoughi S. Serial Position Effects of Large Language Models. arXiv. 2024:arXiv:2406.15981

  20. [20]

    Quantifying language models’ sensitivity to spurious features in prompt design.arXiv preprint arXiv:2310.11324, 2023

    Sclar M, Choi Y, Tsvetkov Y, Suhr A. Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting. arXiv. 2023:arXiv:2310.11324

  21. [21]

    Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems

    Byerly A, Khashabi D. Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems. arXiv. 2024:arXiv:2411.01101

  22. [22]

    Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

    Cobbina K, Zhou T . Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning. arXiv. 2025:arXiv:2507.22887

  23. [23]

    Do LLMs "know" internally when they follow instructions? arXiv

    Heo J, Heinze-Deml C, Elachqar O, Chan KHR, Ren S, Nallasamy U, et al. Do LLMs "know" internally when they follow instructions? arXiv. 2024:arXiv:2410.14516

  24. [24]

    What Makes a Good Order of Examples in In-Context Learning

    Guo Q, Wang L, Wang Y, Ye W, Zhang S. What Makes a Good Order of Examples in In-Context Learning. Findings of the Association for Computational Linguistics: ACL 2024. 2024:14892- 14904

  25. [25]

    OptiSeq: Ordering Examples On-The-Fly for In-Context Learning

    Bhope RA, Venkateswaran P , Jayaram KR, Isahagian V, Muthusamy V, Venkatasubramanian N. OptiSeq: Ordering Examples On-The-Fly for In-Context Learning. arXiv. 2025:arXiv:2501.15030

  26. [26]

    Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

    Hsieh C-Y, Chuang Y-S, Li C-L, Wang Z, Le LT , Kumar A, et al. Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization. arXiv. 2024:arXiv:2406.16008

  27. [27]

    Eliminating Position Bias of Language Models: A Mechanistic Approach

    Wang Z, Zhang H, Li X, Huang K-H, Han C, Ji S, et al. Eliminating Position Bias of Language Models: A Mechanistic Approach. arXiv. 2024:arXiv:2407.01100

  28. [28]

    5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage

    Ari U. 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage. ArXiv. 2025:arXiv:2507.07045

  29. [29]

    Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

    Murthy R, Zhu M, Yang L, Qiu J, Tan J, Heinecke S, et al. Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models. ArXiv. 2025:arXiv:2507.14241

  30. [30]

    LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language

    Wang M, Liu Y, Liang X, Li S, Huang Y, Zhang X, et al. LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language. ArXiv. 2024:arXiv:2402.16929

  31. [31]

    CRISPE - ChatGPT Prompt Engineering Framework

    Dinkevych D. CRISPE - ChatGPT Prompt Engineering Framework. Produced by Medium

  32. [32]

    Accessed 15 Nov 2025

    Available at: https://sourcingdenis.medium.com/crispe-prompt-engineering- framework-e47eaaf83611. Accessed 15 Nov 2025

  33. [33]

    How to use ChatGPT properly using the RISEN framework (Youtube video)

    Balmer K. How to use ChatGPT properly using the RISEN framework (Youtube video). 2024. Available at: www.youtube.com/shorts/kkQKF5Zonw8. Accessed 15 Nov 2025. 25

  34. [34]

    How to turn any prompt into a super prompt (Youtube video)

    Hutson K. How to turn any prompt into a super prompt (Youtube video). Produced by Futurepedia; 2025. Available at: www.youtube.com/watch?v=X7YjqKk-7Y0. Accessed 15 Nov 2025

  35. [35]

    The Prompt Engineering Life Cycle, Using Analytics with AI

    Penn C. The Prompt Engineering Life Cycle, Using Analytics with AI. Produced by Trust Insights; 2024. Available at: www.trustinsights.ai/blog/2024/04/inbox-insights-april-17- 2024-the-prompt-engineering-life-cycle-using-analytics-with-ai/. Accessed 15 Nov 2025

  36. [36]

    How to turn any prompt into a super prompt

    Kremb M. How to turn any prompt into a super prompt. Produced by The Prompt Warrior

  37. [37]

    Accessed 15 Nov 2025

    Available at: www.thepromptwarrior.com/p/turn-prompt-super-prompt. Accessed 15 Nov 2025

  38. [38]

    A Prompting Framework to Enhance Language Model Output

    Ratnayake H, Wang C. A Prompting Framework to Enhance Language Model Output. 2024; Singapore: 66-81

  39. [39]

    Better Zero-Shot Reasoning with Role-Play Prompting , rights =

    Kong A, Zhao S, Chen H, Li Q, Qin Y, Sun R, et al. Better Zero-Shot Reasoning with Role-Play Prompting. arXiv. 2023:arXiv:2308.07702

  40. [40]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Lewis P , Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv e-prints. 2020:arXiv:2005.11401

  41. [41]

    Large Language Models Can Be Easily Distracted by Irrelevant Context

    Shi F , Chen X, Misra K, Scales N, Dohan D, Chi E, et al. Large Language Models Can Be Easily Distracted by Irrelevant Context. arXiv. 2023:arXiv:2302.00093

  42. [42]

    Best practices for prompt engineering with the OpenAI API

    OpenAI. Best practices for prompt engineering with the OpenAI API. 2025. Available at: https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering- with-the-openai-api. Accessed 15 December 2025

  43. [43]

    Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation

    Stahl M, Biermann L, Nehring A, Wachsmuth H. Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation. arXiv e-prints. 2024:arXiv:2404.15845

  44. [44]

    Order Matters: Rethinking Prompt Construction in In-Context Learning

    Li W, Wang Y, Wang Z, Shang J. Order Matters: Rethinking Prompt Construction in In-Context Learning. arXiv. 2025:arXiv:2511.09700

  45. [45]

    Call Me A Jerk: Persuading AI to Comply with Objectionable Requests: Wharton School Research Paper (https://ssrn.com/abstract=5357179); 2025

    Meincke L, Shapiro D, Duckworth A, Mollick ER, Mollick L, Cialdini R. Call Me A Jerk: Persuading AI to Comply with Objectionable Requests: Wharton School Research Paper (https://ssrn.com/abstract=5357179); 2025

  46. [46]

    Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care? arXiv

    Meincke L, Mollick E, Mollick L, Shapiro D. Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care? arXiv. 2025:arXiv:2508.00614

  47. [47]

    Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

    Min S, Lyu X, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022:11048-11064

  48. [48]

    Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

    Cheng X, Pan C, Zhao M, Li D, Liu F, Zhang X, et al. Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot. arXiv. 2025:arXiv:2506.14641

  49. [49]

    Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

    Kim HJ, Cho H, Kim J, Kim T , Yoo KM, Lee S-g. Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator. arXiv. 2022:arXiv:2206.08082

  50. [50]

    InstructEval: Systematic Evaluation of Instruction Selection Methods

    Ajith A, Pan C, Xia M, Deshpande A, Narasimhan K. InstructEval: Systematic Evaluation of Instruction Selection Methods. arXiv; 2023:arXiv:2307.00259. Available at: https://ui.adsabs.harvard.edu/abs/2023arXiv230700259A. Accessed July 01, 2023

  51. [51]

    Instruction Tuning Vs

    Wang T , Xu X, Wang Y, Jiang Y. Instruction Tuning Vs. In-Context Learning: Revisiting Large Language Models in Few-Shot Computational Social Science. arXiv; 2024:arXiv:2409.14673. 26 Available at: https://ui.adsabs.harvard.edu/abs/2024arXiv240914673W. Accessed September 01, 2024

  52. [52]

    Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

    Wan X, Sun R, Nakhost H, Arik SO. Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization. arXiv; 2024:arXiv:2406.15708. Available at: https://ui.adsabs.harvard.edu/abs/2024arXiv240615708W. Accessed June 01, 2024

  53. [53]

    The Few-shot Dilemma: Over-prompting Large Language Models

    Tang Y, Tuncel D, Koerner C, Runkler T . The Few-shot Dilemma: Over-prompting Large Language Models. arXiv. 2025:arXiv:2509.13196

  54. [54]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. 2022:arXiv:2201.11903. Available at: https://ui.adsabs.harvard.edu/abs/2022arXiv220111903W. Accessed January 01, 2022

  55. [55]

    Large Language Models are Zero-Shot Reasoners

    Kojima T , Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners. 2022:arXiv:2205.11916. Available at: https://ui.adsabs.harvard.edu/abs/2022arXiv220511916K. Accessed May 01, 2022

  56. [56]

    Prompting science report 2: The decreasing value of chain of thought in prompting.arXiv preprint arXiv:2506.07142, 2025

    Meincke L, Mollick E, Mollick L, Shapiro D. Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting. arXiv. 2025:arXiv:2506.07142

  57. [57]

    Evaluating education innovations rapidly with build-measure-learn: Applying lean startup to health professions education

    Cook DA, Bikkani A, Poterucha Carter MJ. Evaluating education innovations rapidly with build-measure-learn: Applying lean startup to health professions education. Med Teach. 2023;45:167-178

  58. [58]

    Developing and Testing Changes in Delivery of Care

    Berwick DM. Developing and Testing Changes in Delivery of Care. Ann Intern Med. 1998;128:651-656

  59. [59]

    Large Language Model Instruction Following: A Survey of Progresses and Challenges

    Lou R, Zhang K, Yin W. Large Language Model Instruction Following: A Survey of Progresses and Challenges. Computational Linguistics. 2024;50:1053-1095

  60. [60]

    LearnPrompting: Advanced Techniques

    Schulhoff S. LearnPrompting: Advanced Techniques. Produced by Learn Prompting; 2025. Available at: https://learnprompting.org/docs/advanced/introduction. Accessed 25 November 2025

  61. [61]

    Ethical and social risks of harm from Language Models

    Weidinger L, Mellor J, Rauh M, Griffin C, Uesato J, Huang P-S, et al. Ethical and social risks of harm from Language Models. arXiv. 2021:arXiv:2112.04359

  62. [62]

    Macy Foundation Innovation Report Part II: From Hype to Reality: Innovators' Visions for Navigating AI Integration Challenges in Medical Education

    Gin BC, LaForge K, Burk-Rafel J, Boscardin CK. Macy Foundation Innovation Report Part II: From Hype to Reality: Innovators' Visions for Navigating AI Integration Challenges in Medical Education. Acad Med. 2025;100:S22-s29

  63. [63]

    A General Language Assistant as a Laboratory for Alignment

    Askell A, Bai Y, Chen A, Drain D, Ganguli D, Henighan T , et al. A General Language Assistant as a Laboratory for Alignment. arXiv. 2021:arXiv:2112.00861

  64. [64]

    Large language model alignment: A survey

    Shen T , Jin R, Huang Y, Liu C, Dong W, Guo Z, et al. Large Language Model Alignment: A Survey. arXiv. 2023:arXiv:2309.15025

  65. [65]

    Prompt Engineering Guide (IBM.com)

    Gadesha V. Prompt Engineering Guide (IBM.com). Produced by IBM; 2025. Available at: https://www.ibm.com/think/topics/prompt-engineering. Accessed 23 December 2025

  66. [66]

    best effort

    Vats V, Binta Nizam M, Liu M, Wang Z, Ho R, Sai Prasad M, et al. A Survey on Human-AI Collaboration with Large Foundation Models. arXiv; 2024:arXiv:2403.04931. Available at: https://ui.adsabs.harvard.edu/abs/2024arXiv240304931V. Accessed March 01, 2024. 27 Appendices Appendix Table A1. Details of a systematic literature search for prompt frameworks Full s...

  67. [67]

    thoughts

    ask LLM to complete each sub-task in sequence, often using exemplars showing how to complete each sub-task. • Understand-Plan-Act-Reflect (UPAR): aims to mirror human reasoning (Kantian philosophy) by guiding LLM to: 1) Understand – answer 4 questions about relevant entities, constraints, and relationships; 2) Plan – propose a solution (similar to generic...

  68. [68]

    in-context learning

    ask LLM to convert each question + response into a single statement and hide part of original question; 3) ask LLM to predict the hidden part. The response from step 1 that matches with correct prediction in step 3 is the final answer . 38 The difference between Chain-of-Verification and Self-Verification is that the former asks about the correctness of t...