SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Bei Yu; Hui-Ling Zhen; Mingxuan Yuan; Shixiong Kai; Sinno Jialin Pan; Yunhe Wang; Zehua Pei

arxiv: 2512.15374 · v2 · pith:U2KTZYW4new · submitted 2025-12-17 · 💻 cs.AI

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Zehua Pei , Hui-Ling Zhen , Shixiong Kai , Sinno Jialin Pan , Yunhe Wang , Mingxuan Yuan , Bei Yu This is my paper

classification 💻 cs.AI

keywords scopecontextoptimizationpromptagentagentsevolutionguidelines

0 comments

read the original abstract

Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts. However, a critical bottleneck remains: while agents have access to this context, their static prompts lack the mechanisms to manage it effectively, leading to recurring Corrective and Enhancement failures. To address this capability gap, we introduce Self-evolving Context Optimization via Prompt Evolution (SCOPE). SCOPE frames context management as an \textit{online optimization} problem, synthesizing guidelines from execution traces to automatically evolve the agent's prompt. We propose a Dual-Stream mechanism that routes guidelines between tactical memory (immediate error correction) and strategic memory, which is continuously refined through conflict resolution, subsumption pruning, and consolidation. To maximize strategy coverage, Perspective-Driven Exploration evolves multiple parallel prompts guided by distinct optimization perspectives. Experiments on the HLE benchmark show that SCOPE improves task success rates from 14.23\% to 38.64\% without human intervention. We make our code publicly available at https://github.com/JarvisPei/SCOPE.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RewardHarness: Self-Evolving Agentic Post-Training
cs.AI 2026-05 unverdicted novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.
MemDLM: Memory-Enhanced DLM Training
cs.CL 2026-03 unverdicted novelty 7.0

MemDLM embeds a simulated denoising trajectory into DLM training via bi-level optimization, creating a parametric memory that improves convergence and long-context performance even when the memory is dropped at test time.
FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning
cs.CL 2026-05 unverdicted novelty 6.0

FocuSFT uses an inner optimization loop to adapt fast-weight parameters into a parametric memory that sharpens attention on relevant content, then conditions outer-loop supervised fine-tuning on this representation, y...