Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

· 2026 · cs.AI · arXiv 2604.11462

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized policy model, ContextCurator, with a powerful frozen foundation model, TaskExecutor. Trained via reinforcement learning, ContextCurator actively reduces information entropy in the working memory. It aggressively prunes environmental noise while preserving reasoning anchors, that is, sparse data points that are critical for future deductions. On WebArena, our framework improves the success rate of Gemini-3.0-flash from 36.4% to 41.2% while reducing token consumption by 8.8% (from 47.4K to 43.3K). On DeepSearch, it achieves a 57.1% success rate, compared with 53.9%, while reducing token consumption by a factor of 8. Remarkably, a 7B ContextCurator matches the context management performance of GPT-4o, providing a scalable and computationally efficient paradigm for autonomous long-horizon agents.

representative citing papers

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

SERL selectively reweights learning using task success and environment feedback to reach 90.0% success on ALFWorld and 80.1% on WebShop, outperforming RL and distillation baselines.

citing papers explorer

Showing 1 of 1 citing paper.

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents cs.AI · 2026-05-19 · unverdicted · none · ref 16 · internal anchor
SERL selectively reweights learning using task success and environment feedback to reach 90.0% success on ALFWorld and 80.1% on WebShop, outperforming RL and distillation baselines.

Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

fields

years

verdicts

representative citing papers

citing papers explorer