pith. machine review for the scientific record. sign in

arxiv: 2511.01016 · v9 · submitted 2025-11-02 · 💻 cs.CL

Recognition: unknown

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.CL
keywords llmsprompt-r1frameworklarge-scalecomplexend-to-endinteractionlearning
0
0 comments X
read the original abstract

Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and effective prompts to interact with LLMs, thus limiting the performance of LLMs. To address this challenge, we propose Prompt-R1, an end-to-end reinforcement learning framework that uses a small-scale LLM to collaborate with large-scale LLMs, replacing user interaction to solve problems better. This collaboration is cast as a multi-turn prompt interaction, where the small-scale LLM thinks and generates prompts, and the large-scale LLM performs complex reasoning. A dual-constrained reward is designed to optimize for correctness, generation quality, and reasoning accuracy. Prompt-R1 provides a plug-and-play framework that supports both inference and training with various large-scale LLMs. Experiments on multiple public datasets show that Prompt-R1 significantly outperforms baseline models across tasks. Our code is publicly available at https://github.com/QwenQKing/Prompt-R1.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation

    cs.DC 2026-04 unverdicted novelty 6.0

    AdecPilot decentralizes administration in edge-cloud multi-agent frameworks by using a UI-agnostic cloud designer and a bimodal edge team with a Hierarchical Implicit Termination protocol, yielding 21.7% higher task s...