Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.
Lost in the maze: Overcoming context limitations in long-horizon agentic search
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
VISTA supplies LLM agents with a visible proprioceptive dashboard of typed context blocks, enabling untrained self-management that lifts performance on long-horizon tool-use benchmarks across multiple model scales.
SlimSearcher reduces tool-call rounds by 17-58% on GAIA, BrowseComp and XBenchDeepSearch while maintaining accuracy via Pareto filtration in SFT and Adaptive Reward Gating in RL.
Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.
citing papers explorer
-
LLM Agents Are Latent Context Managers: Eliciting Self-Managed Context via a Proprioceptive Dashboard
VISTA supplies LLM agents with a visible proprioceptive dashboard of typed context blocks, enabling untrained self-management that lifts performance on long-horizon tool-use benchmarks across multiple model scales.
-
SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating
SlimSearcher reduces tool-call rounds by 17-58% on GAIA, BrowseComp and XBenchDeepSearch while maintaining accuracy via Pareto filtration in SFT and Adaptive Reward Gating in RL.
-
On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length
Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.