pith. sign in

arxiv: 2510.00615 · v3 · pith:HDGSQCNEnew · submitted 2025-10-01 · 💻 cs.AI · cs.CL

ACON: Optimizing Context Compression for Long-horizon LLM Agents

classification 💻 cs.AI cs.CL
keywords aconcompressioncontextagentslong-horizonagentcriticalexisting
0
0 comments X
read the original abstract

Large language models (LLMs) are increasingly deployed as agents in dynamic real-world environments, where success depends on maintaining precise records of actions and observations. However, the resulting unbounded context growth in long-horizon agentic tasks makes two critical bottlenecks: prohibitive inference memory costs and reasoning degradation due to irrelevant information. Existing compression methods fail to fully address this, often relying on brittle heuristics or requiring parameter updates impractical for proprietary or large-scale LLMs. We introduce Agent Context Optimization (ACON), a unified framework that optimally compresses both observations and history into concise, informative representations. Distinct from prior works, ACON employs an optimization in natural language space: it iteratively refines compression guidelines based on failure analysis of the agent, ensuring critical state information is preserved without model fine-tuning. To further minimize computational overhead, we distill the optimized compressor into smaller models. Experiments on AppWorld, OfficeBench, and Multi-objective QA demonstrate that ACON reduces peak token usage by 26-54% while improving task success over existing compression baselines. Notably, it enables smaller LMs to function effectively as long-horizon agents, achieving up to 46% performance improvement by mitigating context distraction. Our code is available at https://github.com/microsoft/acon.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

    cs.CL 2026-05 unverdicted novelty 8.0

    Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

  2. Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

    cs.SE 2026-05 unverdicted novelty 7.0

    MemDocAgent generates consistent hierarchical repository-level code documentation by combining dependency-aware traversal with memory-guided agent interactions that accumulate work traces.

  3. SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

    cs.CL 2026-05 unverdicted novelty 7.0

    SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

  4. OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

    cs.CL 2026-04 unverdicted novelty 7.0

    OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

  5. MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration

    cond-mat.mtrl-sci 2026-04 conditional novelty 7.0

    MatClaw is a code-first LLM agent that autonomously executes end-to-end materials workflows by generating and running Python scripts on remote clusters, achieving reliable code generation via memory architecture and R...

  6. MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration

    cond-mat.mtrl-sci 2026-04 conditional novelty 7.0

    MatClaw shows a code-first LLM agent autonomously generating and executing workflows for ML force field training, Curie temperature prediction, and parameter search on CuInP2S6, succeeding on code but requiring interv...

  7. Parallel Context Compaction for Long-Horizon LLM Agent Serving

    cs.AI 2026-05 unverdicted novelty 6.0

    Parallel compaction for LLM agent context management provides predictable volume control and reduces wall time versus sequential baselines on HotpotQA and LoCoMo.

  8. From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

    cs.SE 2026-05 unverdicted novelty 6.0

    P2T distills reference patches into a latent process graph and uses it to select shortest effective trajectory segments from teacher rollouts, yielding up to 10.8 point Pass@1 gains on SWE-bench Verified with 15% lowe...

  9. PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

    cs.AI 2026-05 unverdicted novelty 6.0

    PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-...

  10. Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs

    cs.LG 2026-05 unverdicted novelty 6.0

    Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.

  11. PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

    cs.LG 2026-05 unverdicted novelty 6.0

    PAAC aligns planner-executor decomposition with the device-cloud boundary via typed placeholders and on-device sanitization, delivering 15-36% higher accuracy and 2-6x lower leakage than prior device-cloud baselines o...

  12. Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries

    cs.CL 2026-05 unverdicted novelty 6.0

    GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.

  13. HARBOR: Automated Harness Optimization

    cs.LG 2026-04 unverdicted novelty 6.0

    HARBOR formalizes harness optimization as constrained noisy Bayesian optimization over mixed-variable spaces and reports a case study where it outperforms manual tuning on a production coding agent.

  14. AgentSPEX: An Agent SPecification and EXecution Language

    cs.CL 2026-04 unverdicted novelty 6.0

    AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.

  15. Latent Action Reparameterization for Efficient Agent Inference

    cs.AI 2026-05 unverdicted novelty 5.0

    LAR learns a compact latent action space from trajectories that shortens the effective decision horizon for LLM agents, reducing token count and inference time while preserving task success.

  16. Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

    cs.AI 2026-05 unverdicted novelty 5.0

    LaMR decomposes code context pruning into two rubrics using dedicated CRFs, a mixture-of-experts gate, and AST-derived labels to filter noise and often match or beat full-context baselines on coding benchmarks.

  17. The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

    cs.LG 2026-03 unverdicted novelty 5.0

    The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.