Making Prompts First-Class Citizens for Adaptive LLM Pipelines
Pith reviewed 2026-05-19 00:54 UTC · model grok-4.3
The pith
SPEAR elevates prompts to first-class status in LLM pipelines with versioned views, runtime refinement, and when-then policy rules.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SPEAR treats prompts as first-class citizens in the execution model. It provides structured prompt management through versioned views for introspection and provenance reasoning, adaptive prompt refinement that evolves prompts based on runtime feedback during execution, and policy-driven control that specifies automatic refinement as when-then rules.
What carries the argument
SPEAR (Structured Prompt Execution and Adaptive Refinement), which maintains versioned prompt views and applies when-then rules to drive adaptive refinement from runtime feedback.
If this is right
- Prompts gain introspection capabilities and traceable provenance across pipeline runs.
- Prompts evolve automatically during execution without separate manual tuning steps.
- When-then rules allow declarative specification of refinement behavior under varying conditions.
- The model unlocks reuse and optimization opportunities in error correction and agent coordination.
- It plays a complementary role to existing prompt optimization frameworks and semantic query engines.
Where Pith is reading between the lines
- Versioned prompt views could support querying and analysis of prompt changes over time in a manner similar to database views.
- Combining this approach with semantic query processing engines could enable joint optimization of data access and prompt logic.
- Longer-term pipelines might maintain quality with reduced human intervention if refinement policies prove stable.
Load-bearing premise
Runtime feedback signals can be mapped reliably to prompt edits that improve downstream performance without causing instability, high latency, or needing extensive human oversight.
What would settle it
An experiment on a multi-step LLM pipeline in which feedback-triggered prompt changes either reduce task accuracy, increase latency beyond acceptable bounds, or require repeated manual corrections to stabilize.
Figures
read the original abstract
Modern LLM pipelines increasingly resemble complex data-centric applications: they retrieve data, correct errors, call external tools, and coordinate interactions between agents. Yet, the central element controlling this entire process -- the prompt -- remains a brittle, opaque string that is entirely disconnected from the surrounding program logic. This disconnect fundamentally limits opportunities for reuse, optimization, and runtime adaptivity. In this paper, we describe our vision and an initial design of SPEAR (Structured Prompt Execution and Adaptive Refinement), a new approach to prompt management that treats prompts as first-class citizens in the execution model. Specifically, SPEAR enables: (1) structured prompt management, with prompts organized into versioned views to support introspection and reasoning about provenance; (2) adaptive prompt refinement, whereby prompts can evolve dynamically during execution based on runtime feedback; and (3) policy-driven control, a mechanism for the specification of automatic prompt refinement logic as when-then rules. By tackling the problem of runtime prompt refinement, SPEAR plays a complementary role in the vast ecosystem of existing prompt optimization frameworks and semantic query processing engines. We describe a number of related optimization opportunities unlocked by the SPEAR model, and our preliminary results demonstrate the strong potential of this approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a vision and initial design for SPEAR (Structured Prompt Execution and Adaptive Refinement), which treats prompts as first-class citizens in LLM pipelines. It enables (1) structured prompt management with versioned views supporting introspection and provenance, (2) adaptive prompt refinement that evolves prompts dynamically based on runtime feedback, and (3) policy-driven control via when-then rules for automatic prompt evolution. The work positions SPEAR as complementary to existing prompt optimization frameworks and semantic query processing engines, noting preliminary results that demonstrate strong potential.
Significance. If the adaptive mechanisms can be realized with appropriate stability properties, SPEAR would address a genuine gap by integrating prompt management into the execution model of data-centric LLM applications, enabling better reuse, provenance tracking, and runtime optimization in complex pipelines that combine retrieval, tool use, and agent coordination.
major comments (2)
- Abstract (adaptive prompt refinement paragraph): The central claim that runtime feedback can be mapped to prompt edits via when-then rules to support automatic evolution lacks any formal semantics for the feedback language, boundedness guarantees on the edit operators, or analysis of convergence/oscillation under noisy signals such as error rates or token usage. This directly bears on the asserted support for stable, automatic prompt evolution without excessive latency or human oversight.
- Abstract (SPEAR execution model description): The proposal introduces an execution model with versioned views and policy-driven control but provides no concrete definitions, examples, or invariants showing how these mechanisms integrate with surrounding program logic, leaving the feasibility of introspection and provenance claims ungrounded.
minor comments (1)
- Abstract: The reference to 'preliminary results' demonstrating strong potential is stated without any quantitative details, error analysis, or baseline comparisons; adding a pointer to the relevant section or table would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of SPEAR to address a gap in prompt management for data-centric LLM applications. The major comments correctly highlight the need for stronger grounding of the vision in concrete mechanisms and stability considerations. We address each point below and will revise the manuscript to incorporate additional detail and discussion while preserving the paper's focus as a vision and initial design.
read point-by-point responses
-
Referee: Abstract (adaptive prompt refinement paragraph): The central claim that runtime feedback can be mapped to prompt edits via when-then rules to support automatic evolution lacks any formal semantics for the feedback language, boundedness guarantees on the edit operators, or analysis of convergence/oscillation under noisy signals such as error rates or token usage. This directly bears on the asserted support for stable, automatic prompt evolution without excessive latency or human oversight.
Authors: We agree that formal semantics, boundedness guarantees, and convergence analysis are essential for credible claims about stable automatic evolution. The current manuscript presents a high-level vision and initial design without these elements, as the work is positioned as complementary to existing optimization frameworks rather than a complete formal system. In revision we will add a new subsection discussing preliminary approaches to bounded edit operators (e.g., restricting changes to template slots or token budgets) and referencing techniques from adaptive query processing for handling noisy feedback signals. We will also explicitly state that full formal semantics and convergence proofs remain future work. This provides the requested grounding without overstating current results. revision: partial
-
Referee: Abstract (SPEAR execution model description): The proposal introduces an execution model with versioned views and policy-driven control but provides no concrete definitions, examples, or invariants showing how these mechanisms integrate with surrounding program logic, leaving the feasibility of introspection and provenance claims ungrounded.
Authors: We acknowledge that the absence of concrete definitions and examples leaves the integration claims insufficiently grounded. In the revised version we will insert a dedicated subsection containing (1) pseudocode definitions for versioned prompt views and when-then policy rules, (2) a worked example showing integration with a retrieval-augmented generation pipeline, and (3) a short list of invariants (e.g., view immutability after commit and provenance chain preservation) that ensure introspection remains feasible. These additions will directly support the feasibility of the provenance and introspection features. revision: yes
Circularity Check
No circularity: vision paper proposes system architecture without derivations or self-referential reductions
full rationale
The paper is a forward-looking system design proposal for SPEAR that outlines structured prompt management, adaptive refinement via runtime feedback, and when-then policy rules. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. Central claims describe enabling features in an execution model and do not reduce to quantities defined by the authors' own prior results or self-citations. The design sketch is self-contained as an architectural vision whose correctness depends on future implementation and evaluation rather than tautological mappings or imported uniqueness theorems.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Runtime feedback can be translated into prompt edits that measurably improve task outcomes without excessive instability or human intervention.
- domain assumption Prompts can be organized into versioned views while preserving their original expressiveness and utility for LLM calls.
invented entities (1)
-
SPEAR execution model
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SPEAR defines a prompt algebra that governs how prompts are constructed and adapted... core operators RET, GEN, REF, CHECK, MERGE, DELEGATE
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction and recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
policy-driven control... when-then rules... automatic prompt refinement logic
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. 2022. Flashat- tention: Fast and memory-efficient exact attention with io-awareness. Advances in neural information processing systems 35 (2022), 16344–16359
work page 2022
-
[2]
In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. 2024. Prompt cache: Modular attention reuse for low-latency inference. Proceedings of Machine Learning and Systems 6 (2024), 325–338
work page 2024
-
[3]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Sentiment140 dataset. https: //www.kaggle.com/datasets/kazanova/sentiment140
work page 2009
-
[4]
Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav San- thanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. 2024. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. The Twelfth International Conference on Learning Representations
work page 2024
-
[5]
LangGraph. 2023. LangGraph. https://www.langchain.com/langgraph
work page 2023
-
[6]
Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Rana Shahout, and Gerardo Vitagliano. 2025. Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing. In CIDR
work page 2025
-
[7]
Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Ni- colo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, and Eric Horvitz. 2023. Can Generalist Foun- dation Models Outcompete Special-Purpose Tuning? Case Study in ...
- [8]
-
[9]
Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Brad- bury, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. 2023. Ef- ficiently scaling transformer inference. Proceedings of machine learning and systems (2023)
work page 2023
- [10]
-
[11]
Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2024. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing. arXiv:2410.12189 [cs.DB]
-
[12]
Shreya Shankar, Haotian Li, Parth Asawa, Madelon Hulsebos, Yiming Lin, J. D. Zamfirescu-Pereira, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran, and Eugene Wu. 2024. spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines. Proc. VLDB Endow. 17, 12 (Aug. 2024), 4173–4186. doi:10.14778/3685800.3685835
-
[13]
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learn- ing. Advances in Neural Information Processing Systems 36 (2023), 8634–8652
work page 2023
- [14]
-
[15]
Sonish Sivarajkumar, Mark Kelley, Alyssa Samolyk-Mazzanti, Shyam Visweswaran, and Yanshan Wang. 2024. An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study. JMIR Medical Informatics 12 (2024), e55318. doi:10.2196/55318
-
[16]
vLLM. 2024. Automatic Prefix Caching. https://docs.vllm.ai/en/v0.9.2/design/ automatic_prefix_caching.html
work page 2024
-
[17]
Li Wang, Xi Chen, XiangWen Deng, Hao Wen, MingKe You, WeiZhi Liu, Qi Li, and Jian Li. 2024. Prompt Engineering in Consistency and Reliability with the Evidence-Based Guideline for LLMs. NPJ Digital Medicine 7, 1 (2024), 41. doi:10.1038/s41746-024-01046-y
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.