hub Canonical reference

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru · 2025 · cs.LG · arXiv 2510.04618

Canonical reference. 90% of citing Pith papers cite this work as background.

51 Pith papers citing it

Background 90% of classified citations

open full Pith review browse 51 citing papers arXiv PDF

abstract

Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation: modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. We introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts both offline (e.g., system prompts) and online (e.g., agent memory), consistently outperforming strong baselines: +10.6% on agents and +8.6% on finance, while significantly reducing adaptation latency and rollout cost. Notably, ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback. On the AppWorld leaderboard, ACE matches the top-ranked production-level agent on the overall average and surpasses it on the harder test-challenge split, despite using a smaller open-source model. These results show that comprehensive, evolving contexts enable scalable, efficient, and self-improving LLM systems with low overhead.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 method 1

citation-polarity summary

background 9 use method 1

representative citing papers

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

cs.AI · 2026-06-27 · unverdicted · novelty 7.0

LLM agents often fail to abstain at the right time in uncertain multi-turn tasks, and the CONVOLVE context engineering method raises timely abstention rates on WebShop from 26.7 to 57.4 without parameter updates.

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

cs.CR · 2026-05-25 · unverdicted · novelty 7.0

CyberEvolver introduces a four-layer self-evolving agent architecture with trace-to-diagnosis and population beam search that raises seed agent success rates by 13.6% on CTF, exploitation, and penetration tasks across four LLMs.

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Life-Harness evolves reusable interventions from training trajectories to enhance frozen LLM agents on unseen tasks across seven deterministic environments, yielding 88.5% average relative improvement in 116 of 126 model-environment settings.

EXG: Self-Evolving Agents with Experience Graphs

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

RewardHarness: Self-Evolving Agentic Post-Training

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception

cs.AI · 2026-04-06 · unverdicted · novelty 7.0

Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent self-diagnosed bugs and maintained cross-channel context.

Meta-Harness: End-to-End Optimization of Model Harnesses

cs.AI · 2026-03-30 · unverdicted · novelty 7.0

Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.

EvoDiagram: Agentic Editable Diagram Creation via Design Expertise Evolution

cs.HC · 2026-02-20 · unverdicted · novelty 7.0

EvoDiagram uses a coordinated multi-agent system and design knowledge evolution to generate editable diagrams via canvas schema, with a new CanvasBench benchmark showing strong performance over baselines.

Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

cs.CV · 2026-01-16 · conditional · novelty 7.0

VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.

LEDGER: Scaling Agentic Document Editing with Dependency-aware Graph Retrieval

cs.IR · 2026-06-19 · unverdicted · novelty 6.0

LEDGER improves consistency in agentic document editing from 56% to 76% across six models by using a dependency graph for targeted context retrieval while cutting token usage.

Recursive Self-Evolving Agents via Held-Out Selection

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

RSEA adds a strict held-out keep-better gate to recursive self-evolution of agent artifacts, yielding monotone-safe gains or parity with the base ReAct agent on ALFWorld, GAIA, τ-bench, and WebShop.

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.

PACE: Two-Timescale Self-Evolution for Small Language Model Agents

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

PACE coordinates low-risk prompt evolution with validated higher-risk control-logic updates to improve frozen SLM agents on benchmarks without model retraining.

Towards Direct Evaluation of Harness Optimizers via Priority Ranking

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

PQR framework generates diverse realistic queries to elicit QA agent failures, uncovering 23-78% more unhelpful responses than prior methods in e-commerce agent tests.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

RESD turns failure trajectories into token-level supervision via retrospective reflections and a persistent global playbook, enabling faster improvement than standard self-distillation or GRPO with only one rollout per prompt.

Shepherd: Enabling Programmable Meta-Agents via Reversible Agentic Execution Traces

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Shepherd provides a reversible execution trace substrate for LLM agents that enables meta-agents to inspect and transform runs, yielding reported gains on coding and terminal benchmarks via supervision, counterfactual repair, and RL credit assignment.

SkillEvolver: Skill Learning as a Meta-Skill

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

cs.AI · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

MarsTSC is a VLM agentic system with generator, reflector, and modifier roles that iteratively refines a knowledge bank to improve few-shot multimodal time series classification and produce human-readable explanations.

FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Joint Optimization of Trajectory Control, Resource Allocation, and Task Offloading for Multi-UAV-Assisted IoV cs.NI · 2026-05-06 · unverdicted · none · ref 59 · internal anchor
A joint optimization approach using SOCP for UAV trajectories, DRL-LLM for resource scheduling, and LP for offloading achieves higher task success rates and system efficiency than multi-agent RL baselines in simulated dense urban IoV environments.

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer