TEMPERA: Test-Time Prompting via Reinforcement Learning

Dale Schuurmans; Denny Zhou; Joseph E. Gonzalez; Tianjun Zhang; Xuezhi Wang

arxiv: 2211.11890 · v1 · pith:BOC52NC6new · submitted 2022-11-21 · 💻 cs.CL · cs.AI

TEMPERA: Test-Time Prompting via Reinforcement Learning

Tianjun Zhang , Xuezhi Wang , Denny Zhou , Dale Schuurmans , Joseph E. Gonzalez This is my paper

classification 💻 cs.CL cs.AI

keywords promptdesignlearningmethodstemperaachievescomparedediting

0 comments

read the original abstract

Careful prompt design is critical to the use of large language models in zero-shot or few-shot learning. As a consequence, there is a growing interest in automated methods to design optimal prompts. In this work, we propose Test-time Prompt Editing using Reinforcement learning (TEMPERA). In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge, is adaptive to different queries and provides an interpretable prompt for every query. To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers. The proposed method achieves significant gains compared with recent SoTA approaches like prompt tuning, AutoPrompt, and RLPrompt, across a variety of tasks including sentiment analysis, topic classification, natural language inference, and reading comprehension. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Adapting Generalist Robot Policies with Semantic Reinforcement Learning
cs.RO 2026-06 unverdicted novelty 7.0

SARL optimizes language prompt inputs to generalist vision-language-action policies through online RL to solve complex long-horizon tasks by composing existing skills.
PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses
cs.CL 2026-03 unverdicted novelty 7.0

PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.
CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts
cs.CL 2026-06 unverdicted novelty 6.0

CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis
cs.CL 2026-05 unverdicted novelty 6.0

Observational causal-inspired analysis finds prompt optimization failures arise from systematic interactions between edit families and task characteristics rather than random artifacts.
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
cs.LG 2025-05 unverdicted novelty 6.0

Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.