Governed Evolution of Agent Runtimes through Executable Operational Cognition

Mariano Garralda-Barrio

arxiv: 2605.27328 · v1 · pith:GYZTV2TYnew · submitted 2026-05-26 · 💻 cs.SE · cs.AI· cs.MA

Governed Evolution of Agent Runtimes through Executable Operational Cognition

Mariano Garralda-Barrio This is my paper

Pith reviewed 2026-06-29 15:47 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.MA

keywords agent runtimesruntime evolutiongoverned adaptationpersistent operational memoryHarnessMutationexecutable operational cognitionmulti-agent systemslifecycle management

0 comments

The pith

A framework formalizes agent-generated artifacts as persistent runtime capabilities that evolve under explicit governance constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a framework that models the evolution of agent runtimes in multi-agent systems as a bounded and observable process. It formalizes agent-generated artifacts as persistent runtime capabilities that become part of the operational substrate instead of transient outputs. This perspective supports the introduction of HarnessMutation, a mechanism for lifecycle-aware adaptation that enforces validation, traceability, evaluation, and rollback. The approach aims to keep changes in agent systems explicit, auditable, and constrained rather than unrestricted. A sympathetic reader would care because it offers a way to retain adaptability in evolving agents without creating untraceable modifications.

Core claim

Agent-generated artifacts are formalized as persistent runtime capabilities that progressively become part of the operational substrate rather than transient intermediate outputs. HarnessMutation is introduced as a governed mechanism for lifecycle-aware runtime adaptation that operates under explicit validation, traceability, evaluation, and rollback constraints. Rather than treating runtime adaptation as unrestricted self-modification, the framework models evolution as a bounded and observable process over persistent operational memory and shows how these ideas can be operationalized over modern agent runtimes, providing a conceptual foundation for adaptive infrastructures whose evolution r

What carries the argument

HarnessMutation, the governed mechanism for lifecycle-aware runtime adaptation that enforces validation, traceability, evaluation, and rollback constraints on changes to persistent operational memory.

If this is right

Runtime adaptation is treated as a bounded process over persistent operational memory instead of unrestricted self-modification.
Changes to agent systems remain explicit, traceable, and subject to rollback through the defined constraints.
The ideas can be applied to modern agent runtimes and governance-oriented orchestration systems.
Adaptive infrastructures gain a foundation where evolution stays auditable and constrained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framing could support safer deployment of self-modifying agents by making rollback and audit trails standard features.
The approach might extend to non-agent software systems where code changes need similar governance.
A practical test could measure whether adding HarnessMutation increases traceability without reducing adaptation speed in an existing runtime.

Load-bearing premise

Agent-generated artifacts can be formalized as persistent runtime capabilities that become part of the operational substrate, and governance mechanisms like HarnessMutation can be added to modern agent runtimes without losing adaptability.

What would settle it

An attempt to implement HarnessMutation in a current agent runtime that either violates one of the required constraints or eliminates the system's capacity for further adaptation would falsify the claim that the framework can be operationalized while preserving governance and adaptability.

Figures

Figures reproduced from arXiv: 2605.27328 by Mariano Garralda-Barrio.

**Figure 1.** Figure 1: From agent-initiated artifacts to executable operational cognition. Local artifacts become persistent [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Conceptual knowledge-grounded runtime architecture. Governance-aware layers coordinate spe [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Governed runtime evolution loop. Agent-generated artifacts move through evaluation, governance [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Prototype architecture over modern agent runtimes. The governed runtime kernel operates [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Recent advances in agentic systems increasingly treat code as an executable operational substrate rather than as a disposable output artifact. Prior work such as \emph{Code as Agent Harness} frames validated agent-generated artifacts as runtime entities that can be created, executed, revised, persisted, and reused within long-running cognitive loops. However, the governance, lifecycle management, and operational evolution of such artifacts remain under-specified. This paper proposes a framework for governed runtime evolution in multi-agent systems through executable operational cognition. We formalize agent-generated artifacts as persistent runtime capabilities that progressively become part of the operational substrate rather than transient intermediate outputs. Building on this perspective, we introduce \emph{HarnessMutation} as a governed mechanism for lifecycle-aware runtime adaptation operating under explicit validation, traceability, evaluation, and rollback constraints. Rather than treating runtime adaptation as unrestricted self-modification, the proposed framework models evolution as a bounded and observable process over persistent operational memory. It further shows how these ideas can be operationalized over modern agent runtimes and governance-oriented orchestration systems, providing a conceptual foundation for adaptive infrastructures whose evolution remains explicit, auditable, and constrained.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual extension of prior agent-harness work that names a governance gap but stays at the level of definitions without evidence or operational detail.

read the letter

The main takeaway is that the paper takes the Code as Agent Harness idea and adds explicit governance, validation, and rollback to control how agent-generated code evolves into persistent runtime capabilities. It frames this through HarnessMutation as a bounded process over operational memory rather than unrestricted self-modification.

What it does reasonably is identify that lifecycle management has been left vague in these systems and sketch a way to make adaptation auditable and constrained. The emphasis on traceability and rollback constraints aligns with practical needs in long-running multi-agent setups.

The limitations are straightforward. The entire contribution is definitional; there are no worked examples, no integration sketches with actual runtimes, no discussion of failure modes, and no indication of how the proposed mechanisms would be implemented or tested. The claim that these ideas can be operationalized over modern systems is stated but not supported by any argument or reference to concrete platforms.

Because the paper introduces new terms like HarnessMutation without showing they solve the problems they target or avoid new ones, it is hard to judge whether the framework holds together beyond the abstract. The absence of any empirical or formal grounding means the central claims rest on the coherence of the modeling rather than demonstrated results.

This is for readers already working in the agent runtime subfield who want a high-level vocabulary for governance discussions. Someone looking for actionable design patterns or validated mechanisms will not find them here.

I would send it to peer review in a venue that accepts conceptual papers on agent systems, because the topic is relevant and the direction is sensible, but the authors would need to add substantial substance before it influences actual infrastructure work.

Referee Report

1 major / 0 minor

Summary. The paper proposes a conceptual framework for governed runtime evolution in multi-agent systems. It formalizes agent-generated artifacts as persistent runtime capabilities that become part of the operational substrate, introduces HarnessMutation as a lifecycle-aware adaptation mechanism operating under explicit constraints (validation, traceability, evaluation, rollback), and models evolution as a bounded, observable process over persistent operational memory to support auditable and constrained adaptive infrastructures.

Significance. As a perspective-shifting conceptual contribution in cs.SE, the framework could help reframe runtime adaptation in agentic systems from disposable outputs to governed, persistent capabilities if the operationalization claims hold. It explicitly positions itself as providing a foundation rather than empirical results or proofs, which aligns with its scope but limits immediate applicability.

major comments (1)

[Abstract] Abstract: The central claim that the ideas 'can be operationalized over modern agent runtimes and governance-oriented orchestration systems' is presented without any concrete examples, pseudocode, architecture diagrams, or case studies, making the operationalization assertion load-bearing but unsupported in the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the single major comment below and commit to revisions that strengthen the manuscript while preserving its conceptual scope.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the ideas 'can be operationalized over modern agent runtimes and governance-oriented orchestration systems' is presented without any concrete examples, pseudocode, architecture diagrams, or case studies, making the operationalization assertion load-bearing but unsupported in the manuscript.

Authors: We acknowledge the validity of this observation. The manuscript is positioned as a conceptual framework rather than an empirical or implementation paper, and the abstract's phrasing that the ideas 'can be operationalized' is not backed by explicit artifacts. In revision we will add a new section containing (1) a high-level architecture diagram showing integration points with governance-oriented orchestration systems, (2) pseudocode for the core HarnessMutation lifecycle (validation-traceability-evaluation-rollback), and (3) a brief illustrative walkthrough using a representative modern agent runtime. These additions will be framed as demonstrations of compatibility rather than full implementations, thereby supporting the claim without altering the paper's foundational character. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper is a purely conceptual proposal that formalizes agent artifacts as persistent runtime capabilities and introduces HarnessMutation as a governed adaptation mechanism. It contains no equations, derivations, fitted parameters, or load-bearing self-citations that reduce any claim to its own inputs by construction. The central framing of evolution as a bounded process over operational memory is presented as the paper's definitional contribution rather than a deduction from prior results, rendering the argument self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on domain assumptions about treating generated artifacts as persistent entities and the feasibility of bounded governance; no free parameters or invented entities with independent evidence are introduced beyond the named mechanism.

axioms (2)

domain assumption Agent-generated artifacts can be treated as persistent runtime capabilities that progressively become part of the operational substrate rather than transient outputs.
Foundational modeling choice stated in the abstract for the entire framework.
domain assumption Runtime adaptation can be modeled as a bounded and observable process under explicit validation, traceability, evaluation, and rollback constraints.
Core premise enabling the governance claims without loss of agentic benefits.

invented entities (1)

HarnessMutation no independent evidence
purpose: Governed mechanism for lifecycle-aware runtime adaptation operating under explicit constraints
New named construct introduced to operationalize the framework; no independent evidence provided.

pith-pipeline@v0.9.1-grok · 5727 in / 1372 out tokens · 49963 ms · 2026-06-29T15:47:33.306500+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes
cs.SE 2026-06 unverdicted novelty 4.0

Proposes GODR, a framework-neutral runtime pattern treating goals and their lifecycle as first-class objects for complex, interruptible multi-domain dialogues.

Reference graph

Works this paper leans on

26 extracted references · 18 canonical work pages · cited by 1 Pith paper · 10 internal anchors

[1]

Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks, 2022. URL https://arxiv.org/ abs/2211.12588

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WE_vluYUL-X

2023
[3]

Pal: Program-aided language models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. InInternational Conference on Machine Learning, pages 10764–10799. PMLR, 2023. URL https://proceedings.mlr.press/v202/gao23f.html

2023
[4]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. Swe-bench: Can language models resolve real-world github issues? InInternational Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=VTF8yNQM66

2024
[5]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. Openhands: An open platform for ai soft...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models, 2023. URL https://arxiv.org/abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A. Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J. Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. Gepa: Reflective prompt evolution can outperform reinforcement learning, 2025. URL https://a...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. Agentic context engineering: Evolving contexts for self-improving language models, 2025. URL https: //arxiv.org/abs/2510.04618

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, and Kevin P. Murphy. Autoharness: Improving llm agents by automatically synthesizing a code harness,
[10]

URL https://arxiv.org/abs/2603.03329

work page arXiv
[11]

Meta-Harness: End-to-End Optimization of Model Harnesses

Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta- harness: End-to-end optimization of model harnesses, 2026. URL https://arxiv.org/abs/2603.28052

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, and Chong Luo. Skillopt: Executive strategy for self-evolving agent skills, 2026. URL https://arxiv.org/abs/2605.23904. 13 Governed Evolution of Agent Runtimes through Executable Operational Cognition

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

Code as Agent Harness

Xuying Ning, Katherine Tieu, Dongqi Fu, Tianxin Wei, Zihao Li, Yuanchen Bei, Jiaru Zou, Mengting Ai, Zhining Liu, Ting-Wei Li, Lingjie Chen, Yanjun Zhao, Ke Yang, Bingxuan Li, Cheng Qian, Gaotang Li, Xiao Lin, Zhichen Zeng, Ruizhong Qiu, Sirui Chen, Yifan Sun, Xiyuan Yang, Ruida Wang, Rui Pan, Chenyuan Yang, Dylan Zhang, Liri Fang, Zikun Cui, Yang Cao, Pa...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[14]

Chain of code: Reasoning with a language model-augmented code emulator, 2023

Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. Chain of code: Reasoning with a language model-augmented code emulator, 2023. URL https://arxiv.org/abs/2312.04474

work page arXiv 2023
[15]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

URLhttps://doi.org/ 10.1109/ICRA48891.2023.10161447

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Codeaspolicies: Languagemodelprogramsforembodiedcontrol. InIEEE International Conference on Robotics and Automation, pages 9493–9500. IEEE, 2023. doi:10.1109/ICRA48891.2023.10160591. URL https://arxiv.org/abs/2209.07753

work page doi:10.1109/icra48891.2023.10160591 2023
[17]

Langgraph documentation

LangChain. Langgraph documentation. https://docs.langchain.com/oss/python/langgraph/overview,
[18]

Accessed: 2026-05-25

2026
[19]

Deep agents overview

LangChain. Deep agents overview. https://docs.langchain.com/oss/python/deepagents/overview, 2026. Accessed: 2026-05-25

2026
[20]

Kephart and David M

Jeffrey O. Kephart and David M. Chess. The vision of autonomic computing.Computer, 36(1):41–50,
[21]

doi:10.1109/MC.2003.1160055

work page doi:10.1109/mc.2003.1160055 2003
[22]

Betty H. C. Cheng, Rogério de Lemos, Holger Giese, Paola Inverardi, Jeff Magee, Jesper Andersson, Basil Becker, Nelly Bencomo, Yuriy Brun, Bojan Cukic, Ron Desmarais, Schahram Dustdar, Anthony Finkelstein, Alessandra Gorla, Vincenzo Grassi, Sam Malek, Raffaela Mirandola, Hausi Muller, Sooyong Park, Mary Shaw, Matthias Tichy, Massimo Tivoli, Danny Weyns, a...

work page doi:10.1007/978-3-642-02161-9_1 2009
[23]

Codetree: Agent-guided tree search for code generation with large language models

Jierui Li, Hung Le, Yingbo Zhou, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Codetree: Agent-guided tree search for code generation with large language models. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3711–3726, 2025. doi:10.18653/v...

work page doi:10.18653/v1/2025.naacl-long.191 2025
[24]

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Dong Huang, Jie M. Zhang, Michael Luck, Qingwen Bu, Yuhao Qing, and Heming Cui. Agentcoder: Multi-agent-based code generation with iterative testing and optimisation, 2023. URL https://arxiv.org/ abs/2312.13010

work page internal anchor Pith review Pith/arXiv arXiv 2023
[25]

Mapcoder: Multi-agent code generation for competitive problem solving

Md Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parvez. Mapcoder: Multi-agent code generation for competitive problem solving. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 4912–4944, 2024. doi:10.18653/v1/2024.acl-long.269. URL https: //aclanthology.org/2024.acl-long.269/

work page doi:10.18653/v1/2024.acl-long.269 2024
[26]

Ui-voyager: A self-evolving gui agent learning via failed experience, 2026

Zichuan Lin, Feiyu Liu, Yijun Yang, Jiafei Lyu, Yiming Gao, Yicheng Liu, Zhicong Lu, Yangbin Yu, Mingyu Yang, Junyou Li, Deheng Ye, and Jie Jiang. Ui-voyager: A self-evolving gui agent learning via failed experience, 2026. URL https://arxiv.org/abs/2603.24533. 14

work page arXiv 2026

[1] [1]

Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks, 2022. URL https://arxiv.org/ abs/2211.12588

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WE_vluYUL-X

2023

[3] [3]

Pal: Program-aided language models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. InInternational Conference on Machine Learning, pages 10764–10799. PMLR, 2023. URL https://proceedings.mlr.press/v202/gao23f.html

2023

[4] [4]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. Swe-bench: Can language models resolve real-world github issues? InInternational Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=VTF8yNQM66

2024

[5] [5]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. Openhands: An open platform for ai soft...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models, 2023. URL https://arxiv.org/abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A. Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J. Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. Gepa: Reflective prompt evolution can outperform reinforcement learning, 2025. URL https://a...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. Agentic context engineering: Evolving contexts for self-improving language models, 2025. URL https: //arxiv.org/abs/2510.04618

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, and Kevin P. Murphy. Autoharness: Improving llm agents by automatically synthesizing a code harness,

[10] [10]

URL https://arxiv.org/abs/2603.03329

work page arXiv

[11] [11]

Meta-Harness: End-to-End Optimization of Model Harnesses

Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta- harness: End-to-end optimization of model harnesses, 2026. URL https://arxiv.org/abs/2603.28052

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, and Chong Luo. Skillopt: Executive strategy for self-evolving agent skills, 2026. URL https://arxiv.org/abs/2605.23904. 13 Governed Evolution of Agent Runtimes through Executable Operational Cognition

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [13]

Code as Agent Harness

Xuying Ning, Katherine Tieu, Dongqi Fu, Tianxin Wei, Zihao Li, Yuanchen Bei, Jiaru Zou, Mengting Ai, Zhining Liu, Ting-Wei Li, Lingjie Chen, Yanjun Zhao, Ke Yang, Bingxuan Li, Cheng Qian, Gaotang Li, Xiao Lin, Zhichen Zeng, Ruizhong Qiu, Sirui Chen, Yifan Sun, Xiyuan Yang, Ruida Wang, Rui Pan, Chenyuan Yang, Dylan Zhang, Liri Fang, Zikun Cui, Yang Cao, Pa...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[14] [14]

Chain of code: Reasoning with a language model-augmented code emulator, 2023

Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. Chain of code: Reasoning with a language model-augmented code emulator, 2023. URL https://arxiv.org/abs/2312.04474

work page arXiv 2023

[15] [15]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

URLhttps://doi.org/ 10.1109/ICRA48891.2023.10161447

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Codeaspolicies: Languagemodelprogramsforembodiedcontrol. InIEEE International Conference on Robotics and Automation, pages 9493–9500. IEEE, 2023. doi:10.1109/ICRA48891.2023.10160591. URL https://arxiv.org/abs/2209.07753

work page doi:10.1109/icra48891.2023.10160591 2023

[17] [17]

Langgraph documentation

LangChain. Langgraph documentation. https://docs.langchain.com/oss/python/langgraph/overview,

[18] [18]

Accessed: 2026-05-25

2026

[19] [19]

Deep agents overview

LangChain. Deep agents overview. https://docs.langchain.com/oss/python/deepagents/overview, 2026. Accessed: 2026-05-25

2026

[20] [20]

Kephart and David M

Jeffrey O. Kephart and David M. Chess. The vision of autonomic computing.Computer, 36(1):41–50,

[21] [21]

doi:10.1109/MC.2003.1160055

work page doi:10.1109/mc.2003.1160055 2003

[22] [22]

Betty H. C. Cheng, Rogério de Lemos, Holger Giese, Paola Inverardi, Jeff Magee, Jesper Andersson, Basil Becker, Nelly Bencomo, Yuriy Brun, Bojan Cukic, Ron Desmarais, Schahram Dustdar, Anthony Finkelstein, Alessandra Gorla, Vincenzo Grassi, Sam Malek, Raffaela Mirandola, Hausi Muller, Sooyong Park, Mary Shaw, Matthias Tichy, Massimo Tivoli, Danny Weyns, a...

work page doi:10.1007/978-3-642-02161-9_1 2009

[23] [23]

Codetree: Agent-guided tree search for code generation with large language models

Jierui Li, Hung Le, Yingbo Zhou, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Codetree: Agent-guided tree search for code generation with large language models. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3711–3726, 2025. doi:10.18653/v...

work page doi:10.18653/v1/2025.naacl-long.191 2025

[24] [24]

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Dong Huang, Jie M. Zhang, Michael Luck, Qingwen Bu, Yuhao Qing, and Heming Cui. Agentcoder: Multi-agent-based code generation with iterative testing and optimisation, 2023. URL https://arxiv.org/ abs/2312.13010

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [25]

Mapcoder: Multi-agent code generation for competitive problem solving

Md Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parvez. Mapcoder: Multi-agent code generation for competitive problem solving. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 4912–4944, 2024. doi:10.18653/v1/2024.acl-long.269. URL https: //aclanthology.org/2024.acl-long.269/

work page doi:10.18653/v1/2024.acl-long.269 2024

[26] [26]

Ui-voyager: A self-evolving gui agent learning via failed experience, 2026

Zichuan Lin, Feiyu Liu, Yijun Yang, Jiafei Lyu, Yiming Gao, Yicheng Liu, Zhicong Lu, Yangbin Yu, Mingyu Yang, Junyou Li, Deheng Ye, and Jie Jiang. Ui-voyager: A self-evolving gui agent learning via failed experience, 2026. URL https://arxiv.org/abs/2603.24533. 14

work page arXiv 2026