Rlprompt: Optimizing discrete text prompts with reinforcement learning,

· 2022 · arXiv 2205.12548

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses

cs.CL · 2026-03-11 · unverdicted · novelty 7.0

PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.

PIAST: Rapid Prompting with In-context Augmentation for Scarce Training data

cs.CL · 2025-12-11 · conditional · novelty 7.0

PIAST iteratively optimizes few-shot examples in prompts via Monte Carlo Shapley value estimation, outperforming prior automatic prompting methods and setting new SOTA on classification, simplification, and GSM8K with modest compute.

Large Language Models as Optimizers

cs.LG · 2023-09-07 · unverdicted · novelty 7.0

Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.

Robust Adaptation of Foundation Models with Black-Box Visual Prompting

cs.CV · 2024-07-04 · unverdicted · novelty 6.0

BlackVIP adapts foundation models via a Coordinator for input-dependent visual prompts and SPSA-GC for gradient estimation, enabling robust transfer on 19 datasets with low memory use and a link to randomized smoothing robustness.

Prompt Optimization for LLM Code Generation via Reinforcement Learning

cs.SE · 2026-05-18 · unverdicted · novelty 5.0

A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.

citing papers explorer

Showing 6 of 6 citing papers.

Learning, Fast and Slow: Towards LLMs That Adapt Continually cs.LG · 2026-05-12 · unverdicted · none · ref 12 · 2 links
Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.
PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses cs.CL · 2026-03-11 · unverdicted · none · ref 32
PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.
PIAST: Rapid Prompting with In-context Augmentation for Scarce Training data cs.CL · 2025-12-11 · conditional · none · ref 53
PIAST iteratively optimizes few-shot examples in prompts via Monte Carlo Shapley value estimation, outperforming prior automatic prompting methods and setting new SOTA on classification, simplification, and GSM8K with modest compute.
Large Language Models as Optimizers cs.LG · 2023-09-07 · unverdicted · none · ref 7
Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.
Robust Adaptation of Foundation Models with Black-Box Visual Prompting cs.CV · 2024-07-04 · unverdicted · none · ref 34
BlackVIP adapts foundation models via a Coordinator for input-dependent visual prompts and SPSA-GC for gradient estimation, enabling robust transfer on 19 datasets with low memory use and a link to randomized smoothing robustness.
Prompt Optimization for LLM Code Generation via Reinforcement Learning cs.SE · 2026-05-18 · unverdicted · none · ref 8
A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.

Rlprompt: Optimizing discrete text prompts with reinforcement learning,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer