Scaling llm multi-turn rl with end-to-end summarization-based context management

Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen · 2025 · arXiv 2510.06727

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

cs.AI · 2026-05-23 · unverdicted · novelty 6.0

SAM is a standalone memory framework for long-horizon LLM agents that creates state-adaptive cues from interactions, preserves raw trajectories for intent-driven recall, and optimizes the module via expert supervision and RL, outperforming baselines on BrowseComp and related benchmarks.

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

cs.MA · 2026-05-09 · unverdicted · novelty 6.0

Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.

SWE-MeM: Learning Adaptive Memory Management for Long-Horizon Coding Agents

cs.SE · 2026-06-26 · unverdicted · novelty 5.0

SWE-MeM introduces adaptive memory management for coding agents via synthesized trajectories and Memory-aware GRPO, reporting 43.4% and 60.2% resolve rates on SWE-Bench Verified for 4B and 30B models while beating baselines on performance and token use.

When Does Overlap Help? OSU-Mem and a Cell-Conditional Analysis of Trajectory Memory for LLM Agents

cs.IR · 2026-06-19 · unverdicted · novelty 5.0

OSU-Mem shows overlapping memory helps retrieval when evidence shares tools or entities but hurts when steps are heterogeneous, with benefits on synthetic benchmarks vanishing on mixed real ones due to query mixing.

Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

cs.AI · 2026-05-14 · unverdicted · novelty 5.0

LaMR decomposes code context pruning into two rubrics using dedicated CRFs, a mixture-of-experts gate, and AST-derived labels to filter noise and often match or beat full-context baselines on coding benchmarks.

R$^2$-Searcher: Calibrating Retrieval and Reasoning Boundaries for Agentic Search

cs.IR · 2026-06-26 · unverdicted · novelty 4.0

R²-Searcher introduces fine-grained evidence modeling, retrieval reflection, and R²PO RL to calibrate retrieval-reasoning boundaries and improve multi-hop QA performance.

Rethinking Agentic Reinforcement Learning In Large Language Models

cs.AI · 2026-04-30 · unverdicted · novelty 3.0 · 3 refs

The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.

ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

cs.AI · 2026-05-05

citing papers explorer

Showing 9 of 9 citing papers.

SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent cs.AI · 2026-05-23 · unverdicted · none · ref 20
SAM is a standalone memory framework for long-horizon LLM agents that creates state-adaptive cues from interactions, preserves raw trajectories for intent-driven recall, and optimizes the module via expert supervision and RL, outperforming baselines on BrowseComp and related benchmarks.
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents cs.MA · 2026-05-09 · unverdicted · none · ref 81
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents cs.AI · 2026-05-06 · unverdicted · none · ref 6
Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.
SWE-MeM: Learning Adaptive Memory Management for Long-Horizon Coding Agents cs.SE · 2026-06-26 · unverdicted · none · ref 26
SWE-MeM introduces adaptive memory management for coding agents via synthesized trajectories and Memory-aware GRPO, reporting 43.4% and 60.2% resolve rates on SWE-Bench Verified for 4B and 30B models while beating baselines on performance and token use.
When Does Overlap Help? OSU-Mem and a Cell-Conditional Analysis of Trajectory Memory for LLM Agents cs.IR · 2026-06-19 · unverdicted · none · ref 6
OSU-Mem shows overlapping memory helps retrieval when evidence shares tools or entities but hurts when steps are heterogeneous, with benefits on synthetic benchmarks vanishing on mixed real ones due to query mixing.
Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning cs.AI · 2026-05-14 · unverdicted · none · ref 30
LaMR decomposes code context pruning into two rubrics using dedicated CRFs, a mixture-of-experts gate, and AST-derived labels to filter noise and often match or beat full-context baselines on coding benchmarks.
R$^2$-Searcher: Calibrating Retrieval and Reasoning Boundaries for Agentic Search cs.IR · 2026-06-26 · unverdicted · none · ref 29
R²-Searcher introduces fine-grained evidence modeling, retrieval reflection, and R²PO RL to calibrate retrieval-reasoning boundaries and improve multi-hop QA performance.
Rethinking Agentic Reinforcement Learning In Large Language Models cs.AI · 2026-04-30 · unverdicted · none · ref 54 · 3 links
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting cs.AI · 2026-05-05 · unreviewed · ref 71

Scaling llm multi-turn rl with end-to-end summarization-based context management

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer