Autogenesis: A Self-Evolving Agent Protocol

Bo An; Cankun Guo; Haibin Wen; Mengdi Wang; Ming Yin; Wentao Zhang; Yingcheng Wu; Zhe Zhao

arxiv: 2604.15034 · v4 · pith:KGPPXP7Nnew · submitted 2026-04-16 · 💻 cs.AI

Autogenesis: A Self-Evolving Agent Protocol

Wentao Zhang , Zhe Zhao , Haibin Wen , Yingcheng Wu , Cankun Guo , Ming Yin , Bo An , Mengdi Wang This is my paper

Pith reviewed 2026-05-21 00:09 UTC · model grok-4.3

classification 💻 cs.AI

keywords self-evolving agentsagent protocolsLLM-based agentsresource managementmulti-agent systemsclosed-loop evolution

0 comments

The pith

A self-evolution protocol lets agents treat their own prompts, tools, and memory as versioned resources that can be proposed, assessed, and updated in a closed loop.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a protocol designed to fix shortcomings in current LLM agent systems, where cross-entity management and safe updates are not well specified, leading to hard-to-maintain monolithic setups. By separating the resources being evolved from the process of evolving them, the protocol provides structured interfaces for registration, versioning, and improvement. An implementation of this approach in a multi-agent system is tested on benchmarks involving extended planning and diverse tool use, where it outperforms established methods. This suggests that explicit resource management combined with a feedback-driven evolution loop can make agent systems more adaptable and reliable over time.

Core claim

Existing agent protocols under-specify lifecycle management, version tracking, and safe evolution interfaces, which leads to brittle compositions. The Autogenesis Protocol addresses this by decoupling what evolves from how evolution occurs. Its Resource Substrate Protocol Layer registers prompts, agents, tools, environments, and memory as resources with explicit state, lifecycle, and versioned interfaces. Its Self Evolution Protocol Layer defines a closed-loop operator for proposing, assessing, and committing improvements with auditable lineage and rollback. The Autogenesis System built on this dynamically manages and refines these resources during task execution.

What carries the argument

The Autogenesis Protocol, which uses a Resource Substrate Protocol Layer to model all agent components as versioned resources and a Self Evolution Protocol Layer to enable closed-loop proposing, assessing, and committing of changes.

Load-bearing premise

That the closed-loop interface for proposing, assessing, and committing improvements can operate without introducing instability or requiring extensive human oversight.

What would settle it

A set of long-horizon planning benchmarks where the self-evolving system performs worse than fixed baselines or requires frequent manual rollbacks to maintain stability.

Figures

Figures reproduced from arXiv: 2604.15034 by Bo An, Cankun Guo, Haibin Wen, Mengdi Wang, Ming Yin, Wentao Zhang, Yingcheng Wu, Zhe Zhao.

**Figure 1.** Figure 1: The Autogenesis architecture. sioned resources with standardized interfaces, the same toolcalling agent policy can be paired with different prompts and tool sets, and deployed unchanged across tasks and environments. To support resource registration, unified management, and instantiation, RSPL stores a serializable registration record for each resource instance. Definition 3.2 (Resource Registration Recor… view at source ↗

**Figure 2.** Figure 2: Performance comparison of evolving and vanilla agents within-inference. reveal compounding improvement dynamics. Beyond endpoint metrics, [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

read the original abstract

Recent advances in LLM based agent systems have shown promise in tackling complex, long horizon tasks. However, existing agent protocols (e.g., A2A and MCP) under specify cross entity lifecycle and context management, version tracking, and evolution safe update interfaces, which encourages monolithic compositions and brittle glue code. We introduce Autogenesis Protocol (AGP), a self evolution protocol that decouples what evolves from how evolution occurs. Its Resource Substrate Protocol Layer (RSPL) models prompts, agents, tools, environments, and memory as protocol registered resources with explicit state, lifecycle, and versioned interfaces. Its Self Evolution Protocol Layer (SEPL) specifies a closed loop operator interface for proposing, assessing, and committing improvements with auditable lineage and rollback. Building on AGP, we present Autogenesis System (AGS), a self-evolving multi-agent system that dynamically instantiates, retrieves, and refines protocol-registered resources during execution. We evaluate AGS on multiple challenging benchmarks that require long horizon planning and tool use across heterogeneous resources. The results demonstrate consistent improvements over strong baselines, supporting the effectiveness of agent resource management and closed loop self evolution. The code is available at https://github.com/DVampire/Autogenesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Autogenesis Protocol adds explicit layers for resource management and closed-loop evolution in agents, but the evaluation details are too thin to confirm the gains.

read the letter

The main thing to know about this paper is that it introduces the Autogenesis Protocol to handle self-evolution in LLM agents through a clear separation of resource handling and evolution logic. The Resource Substrate Protocol Layer registers things like prompts, agents, and tools with state and version info. The Self Evolution Protocol Layer then manages a closed loop for suggesting, evaluating, and applying changes, including rollbacks for safety. They implement this in the Autogenesis System and report better results on long-horizon planning benchmarks compared to baselines. This decoupling and the explicit interfaces for lifecycle and auditing are the novel parts not fully covered in earlier protocols. It does a decent job laying out how to avoid brittle code by making resources more modular and trackable. Where it falls short is in the evidence. The abstract talks about consistent improvements but gives no concrete metrics, p-values, or details on the baselines used. Without that, it's difficult to see how much the self-evolution contributes versus other design choices. The potential for the evolution loop to become unstable or require manual fixes isn't tackled in the description either. This kind of work is relevant for people developing adaptive multi-agent setups that need to run over extended periods. A reader interested in practical agent architectures could pick up ideas from the protocol design. I think it should go to peer review so the community can check the implementation details and push for stronger validation of the claims.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Autogenesis Protocol (AGP) comprising a Resource Substrate Protocol Layer (RSPL) that registers prompts, agents, tools, environments, and memory as versioned resources with explicit lifecycle interfaces, and a Self Evolution Protocol Layer (SEPL) that defines a closed-loop operator for proposing, assessing, and committing improvements with auditable lineage and rollback. It describes the Autogenesis System (AGS) built on AGP and claims that evaluations on benchmarks requiring long-horizon planning and tool use across heterogeneous resources show consistent improvements over strong baselines, thereby supporting the value of protocol-registered resource management and closed-loop self-evolution. Code is released at the provided GitHub link.

Significance. If the empirical results hold under detailed scrutiny, the decoupling of resource modeling from evolution mechanics could reduce monolithic agent compositions and improve maintainability in LLM-based systems. The explicit support for version tracking and rollback is a constructive contribution. The public code release is a clear strength that enables direct verification of the protocol implementation and reported gains.

major comments (2)

[Abstract] Abstract: the central claim that 'the results demonstrate consistent improvements over strong baselines' is presented without any quantitative metrics, benchmark identifiers, baseline implementation details, statistical tests, or controls for post-hoc selection. This directly bears on the effectiveness argument for agent resource management and closed-loop self-evolution.
[Self Evolution Protocol Layer (SEPL)] Self Evolution Protocol Layer (SEPL) description: the closed-loop operator interface for proposing, assessing, and committing improvements is specified at a high level but provides no concrete mechanisms (e.g., proposal bounds, automatic rollback triggers, or divergence detection) to ensure stability without external intervention. This is load-bearing for the claim that observed gains arise from autonomous protocol operation rather than implicit human curation.

minor comments (2)

The abstract references existing protocols (A2A and MCP) but does not cite specific prior work on agent lifecycle management; adding targeted references would clarify the novelty positioning.
Notation for resource states and versioned interfaces in RSPL could be formalized with a small table or diagram to improve readability of the protocol specification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, indicating revisions where the manuscript will be updated to strengthen clarity and support for our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the results demonstrate consistent improvements over strong baselines' is presented without any quantitative metrics, benchmark identifiers, baseline implementation details, statistical tests, or controls for post-hoc selection. This directly bears on the effectiveness argument for agent resource management and closed-loop self-evolution.

Authors: We agree that the abstract would be strengthened by including specific quantitative metrics, benchmark names, baseline details, and references to statistical controls. The main text already reports these elements from our evaluations on long-horizon planning and tool-use benchmarks, including performance gains and multi-run statistics. In the revised manuscript we will update the abstract to incorporate key quantitative results and benchmark identifiers while retaining the high-level summary. revision: yes
Referee: [Self Evolution Protocol Layer (SEPL)] Self Evolution Protocol Layer (SEPL) description: the closed-loop operator interface for proposing, assessing, and committing improvements is specified at a high level but provides no concrete mechanisms (e.g., proposal bounds, automatic rollback triggers, or divergence detection) to ensure stability without external intervention. This is load-bearing for the claim that observed gains arise from autonomous protocol operation rather than implicit human curation.

Authors: The SEPL is presented as a protocol interface to support generality across implementations, with the AGS providing a concrete realization that includes auditable lineage and rollback. We acknowledge that explicit mechanisms for stability would better demonstrate autonomous operation. In the revision we will add concrete details drawn from the AGS implementation, such as proposal bounds, performance-based rollback triggers, and divergence detection, to clarify how stability is maintained without external intervention. revision: yes

Circularity Check

0 steps flagged

No significant circularity in protocol definition or evaluation

full rationale

The paper defines the Autogenesis Protocol (AGP) with its Resource Substrate Protocol Layer (RSPL) and Self Evolution Protocol Layer (SEPL) as an explicit specification for resource lifecycle, versioning, and closed-loop propose-assess-commit operations. It then describes the Autogenesis System (AGS) built on this protocol and reports benchmark results showing improvements over baselines. No equations, parameter fittings presented as predictions, self-citations that are load-bearing, or self-definitional reductions appear in the provided text. The central claims rest on the independent protocol design and external benchmark comparisons rather than reducing to tautological inputs by construction. This is a standard descriptive and empirical contribution in agent systems research.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that structured resource registration and closed-loop evolution will produce measurable gains; no explicit free parameters or invented physical entities are described.

axioms (1)

domain assumption Existing agent protocols under-specify lifecycle and version tracking, leading to brittle systems.
Stated in the motivation for introducing AGP.

invented entities (1)

Autogenesis Protocol (AGP) with RSPL and SEPL layers no independent evidence
purpose: To decouple what evolves from how evolution occurs in agent systems
Newly defined protocol layers introduced to solve the stated limitations.

pith-pipeline@v0.9.0 · 5761 in / 1170 out tokens · 47003 ms · 2026-05-21T00:09:04.376456+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Its Self Evolution Protocol Layer (SEPL) specifies a closed loop operator interface for proposing, assessing, and committing improvements with auditable lineage and rollback.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce AUTOGENESIS PROTOCOL (AGP), a self evolution protocol that decouples what evolves from how evolution occurs.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution
cs.CL 2026-05 unverdicted novelty 5.0

SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper · 9 internal anchors

[1]

Introduction to agent skills

Anthropic. Introduction to agent skills. https:// anthropic.skilljar.com/introduction- to-agent-skills, October 2025b. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901,

work page 1901
[2]

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., Edwards, H., Burda, Y ., Joseph, N., Brockman, G., et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Chen, Z., Deng, Y ., Yuan, H., Ji, K., and Gu, Q. Self-play fine-tuning converts weak language models to strong lan- guage models.arXiv preprint arXiv:2401.01335,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Gao, H.-a., Geng, J., Hua, W., Hu, M., Juan, X., Liu, H., Liu, S., Qiu, J., Qi, X., Wu, Y ., et al. A survey of self- evolving agents: What, when, how, and where to evolve on the path to artificial super intelligence.arXiv preprint arXiv:2507.21046,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Accessed: 2026-04-20

Google Developers Blog. Accessed: 2026-04-20. H2O.ai. Enterprise h2oGPTe: Agentic AI for Genera- tive and Predictive Intelligence. https://h2o.ai/ platform/enterprise-h2ogpte/,

work page 2026
[6]

Halo: Hierarchical au- tonomous logic-oriented orchestration for multi-agent llm systems.arXiv preprint arXiv:2505.13516,

Hou, Z., Tang, J., and Wang, Y . Halo: Hierarchical au- tonomous logic-oriented orchestration for multi-agent llm systems.arXiv preprint arXiv:2505.13516,

work page arXiv
[7]

REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization

Hu, J. Reinforce++: A simple and efficient approach for aligning large language models.arXiv preprint arXiv:2501.03262, 2025a. Hu, J. Reinforce++: A simple and efficient approach for aligning large language models.arXiv preprint arXiv:2501.03262, 2025b. Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., and Narasimhan, K. Swe-bench: Can la...

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Joyagent-jdgenie: Tech- nical report on the gaia.arXiv preprint arXiv:2510.00510,

Liu, J., Xu, S., Liu, S., Li, Y ., Liu, W., Liu, M., Zhou, X., Wang, H., Jia, S., Tian, S., et al. Joyagent-jdgenie: Tech- nical report on the gaia.arXiv preprint arXiv:2510.00510,

work page arXiv
[9]

Automatic prompt optimization with ”gradient descent” and beam search

Pryzant, R., Iter, D., Li, J., Lee, Y ., Zhu, C., and Zeng, M. Automatic prompt optimization with ”gradient descent” and beam search. InProceedings of the 2023 conference on empirical methods in natural language processing, pp. 7957–7968,

work page 2023
[10]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

13 Autogenesis: A Self-Evolving Agent Protocol Qin, Y ., Liang, S., Ye, Y ., Zhu, K., Yan, L., Lu, Y ., Lin, Y ., Cong, X., Tang, X., Qian, B., et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286,

Qiu, J., Qi, X., Zhang, T., Juan, X., Guo, J., Lu, Y ., Wang, Y ., Yao, Z., Ren, Q., Jiang, X., et al. Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286,

work page arXiv
[12]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y ., Wu, Y ., et al. Deepseekmath: Push- ing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Toolorchestra: Elevating intelligence via efficient model and tool orchestration

Su, H., Diao, S., Lu, X., Liu, M., Xu, J., Dong, X., Fu, Y ., Belcak, P., Ye, H., Yin, H., et al. Toolorchestra: Elevating intelligence via efficient model and tool orchestration. arXiv preprint arXiv:2511.21689,

work page arXiv
[15]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi`ere, B., Goyal, N., Hambro, E., Azhar, F., et al. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv:2302.13971,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Score- flow: Mastering llm agent workflows via score-based pref- erence optimization.arXiv preprint arXiv:2502.04306,

Wang, Y ., Yang, L., Li, G., Wang, M., and Aragam, B. Score- flow: Mastering llm agent workflows via score-based pref- erence optimization.arXiv preprint arXiv:2502.04306,

work page arXiv
[17]

Aworld: Orches- trating the training recipe for agentic ai.arXiv preprint arXiv:2508.20404,

Yu, C., Lu, S., Zhuang, C., Wang, D., Wu, Q., Li, Z., Gan, R., Wang, C., Hou, S., Huang, G., et al. Aworld: Orches- trating the training recipe for agentic ai.arXiv preprint arXiv:2508.20404,

work page arXiv
[18]

Fine-Tuning Language Models from Human Preferences

Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., and Irving, G. Fine-tuning language models from human preferences.arXiv preprint arXiv:1909.08593,

work page internal anchor Pith review Pith/arXiv arXiv 1909
[19]

Evolvable Variable Set

The goal of this comparison is to position Autogenesis relative to widely used protocol abstractions in agent tooling, and to clarify which protocol-level primitives are required to make self-evolution composable, auditable, and safe in practice. Accordingly, the comparison is organized into four high-level dimensions (grey rows):Basic Information,Agent a...

work page 2025

[1] [1]

Introduction to agent skills

Anthropic. Introduction to agent skills. https:// anthropic.skilljar.com/introduction- to-agent-skills, October 2025b. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901,

work page 1901

[2] [2]

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., Edwards, H., Burda, Y ., Joseph, N., Brockman, G., et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Chen, Z., Deng, Y ., Yuan, H., Ji, K., and Gu, Q. Self-play fine-tuning converts weak language models to strong lan- guage models.arXiv preprint arXiv:2401.01335,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Gao, H.-a., Geng, J., Hua, W., Hu, M., Juan, X., Liu, H., Liu, S., Qiu, J., Qi, X., Wu, Y ., et al. A survey of self- evolving agents: What, when, how, and where to evolve on the path to artificial super intelligence.arXiv preprint arXiv:2507.21046,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Accessed: 2026-04-20

Google Developers Blog. Accessed: 2026-04-20. H2O.ai. Enterprise h2oGPTe: Agentic AI for Genera- tive and Predictive Intelligence. https://h2o.ai/ platform/enterprise-h2ogpte/,

work page 2026

[6] [6]

Halo: Hierarchical au- tonomous logic-oriented orchestration for multi-agent llm systems.arXiv preprint arXiv:2505.13516,

Hou, Z., Tang, J., and Wang, Y . Halo: Hierarchical au- tonomous logic-oriented orchestration for multi-agent llm systems.arXiv preprint arXiv:2505.13516,

work page arXiv

[7] [7]

REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization

Hu, J. Reinforce++: A simple and efficient approach for aligning large language models.arXiv preprint arXiv:2501.03262, 2025a. Hu, J. Reinforce++: A simple and efficient approach for aligning large language models.arXiv preprint arXiv:2501.03262, 2025b. Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., and Narasimhan, K. Swe-bench: Can la...

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Joyagent-jdgenie: Tech- nical report on the gaia.arXiv preprint arXiv:2510.00510,

Liu, J., Xu, S., Liu, S., Li, Y ., Liu, W., Liu, M., Zhou, X., Wang, H., Jia, S., Tian, S., et al. Joyagent-jdgenie: Tech- nical report on the gaia.arXiv preprint arXiv:2510.00510,

work page arXiv

[9] [9]

Automatic prompt optimization with ”gradient descent” and beam search

Pryzant, R., Iter, D., Li, J., Lee, Y ., Zhu, C., and Zeng, M. Automatic prompt optimization with ”gradient descent” and beam search. InProceedings of the 2023 conference on empirical methods in natural language processing, pp. 7957–7968,

work page 2023

[10] [10]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

13 Autogenesis: A Self-Evolving Agent Protocol Qin, Y ., Liang, S., Ye, Y ., Zhu, K., Yan, L., Lu, Y ., Lin, Y ., Cong, X., Tang, X., Qian, B., et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286,

Qiu, J., Qi, X., Zhang, T., Juan, X., Guo, J., Lu, Y ., Wang, Y ., Yao, Z., Ren, Q., Jiang, X., et al. Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286,

work page arXiv

[12] [12]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y ., Wu, Y ., et al. Deepseekmath: Push- ing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Toolorchestra: Elevating intelligence via efficient model and tool orchestration

Su, H., Diao, S., Lu, X., Liu, M., Xu, J., Dong, X., Fu, Y ., Belcak, P., Ye, H., Yin, H., et al. Toolorchestra: Elevating intelligence via efficient model and tool orchestration. arXiv preprint arXiv:2511.21689,

work page arXiv

[15] [15]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi`ere, B., Goyal, N., Hambro, E., Azhar, F., et al. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv:2302.13971,

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Score- flow: Mastering llm agent workflows via score-based pref- erence optimization.arXiv preprint arXiv:2502.04306,

Wang, Y ., Yang, L., Li, G., Wang, M., and Aragam, B. Score- flow: Mastering llm agent workflows via score-based pref- erence optimization.arXiv preprint arXiv:2502.04306,

work page arXiv

[17] [17]

Aworld: Orches- trating the training recipe for agentic ai.arXiv preprint arXiv:2508.20404,

Yu, C., Lu, S., Zhuang, C., Wang, D., Wu, Q., Li, Z., Gan, R., Wang, C., Hou, S., Huang, G., et al. Aworld: Orches- trating the training recipe for agentic ai.arXiv preprint arXiv:2508.20404,

work page arXiv

[18] [18]

Fine-Tuning Language Models from Human Preferences

Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., and Irving, G. Fine-tuning language models from human preferences.arXiv preprint arXiv:1909.08593,

work page internal anchor Pith review Pith/arXiv arXiv 1909

[19] [19]

Evolvable Variable Set

The goal of this comparison is to position Autogenesis relative to widely used protocol abstractions in agent tooling, and to clarify which protocol-level primitives are required to make self-evolution composable, auditable, and safe in practice. Accordingly, the comparison is organized into four high-level dimensions (grey rows):Basic Information,Agent a...

work page 2025