CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

Antoine Cully; Pierluigi Vito Amadori; Richard Bornemann

REVIEW 2 major objections 1 minor 1 cited by

CODE-SHARP lets foundation models generate hierarchical Python reward programs so agents can discover and master skills from scratch without any human-designed rewards.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-22 10:31 UTC pith:RUA3FFJK

load-bearing objection CODE-SHARP uses FMs to generate hierarchical Python reward programs for autonomous RL, delivering large gains on Craftax tasks, but the autonomy of the discovery process lacks key validation details. the 2 major comments →

arxiv 2602.10085 v3 pith:RUA3FFJK submitted 2026-02-10 cs.AI

CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

Richard Bornemann , Pierluigi Vito Amadori , Antoine Cully This is my paper

classification cs.AI

keywords skill discoveryhierarchical rewardsfoundation modelsreinforcement learningautonomous learningopen-ended evolutionlong-horizon tasks

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CODE-SHARP as a way to let foundation models continuously create and refine an archive of Python programs called SHARPs. Each SHARP defines a local success condition for a skill and lists prerequisites that point to earlier SHARPs, so that at training time the system routes the agent through a chain of rewards and the agent only has to learn the new piece of behavior. This produces fully autonomous reinforcement learning in environments such as Craftax and XLand, where the resulting agents reach much higher performance than earlier methods and are the first to complete advanced actions like mining diamonds. The approach matters because it removes the need for hand-crafted reward functions or task curricula when building agents that keep expanding their own capabilities.

Core claim

CODE-SHARP leverages FMs to open-endedly grow and evolve an archive of Python programs encoding skills to train a generalist agent policy entirely from scratch via reinforcement learning, directly from source code. These programs, termed Skills as Hierarchical Reward Programs (SHARPs), each encode a local success condition and a set of prerequisites delegated to previously discovered SHARPs. At runtime, SHARPs dynamically route the agent through their prerequisite chain based on the current state, rewarding each completion along the way, requiring the agent to learn only the marginal behaviour each new SHARP introduces, enabling efficient learning of long-horizon skills without any pre-

What carries the argument

SHARPs, Python programs that pair a local success condition with a prerequisite list pointing to earlier programs, which at runtime dynamically route the agent and deliver incremental rewards so only the newest skill segment must be learned.

Load-bearing premise

The foundation model can keep producing valid, non-redundant SHARP programs whose prerequisite chains create useful incremental learning signals without any human curation or filtering.

What would settle it

An experiment in which the foundation model is replaced by one that produces mostly invalid or duplicate SHARPs and the resulting agent shows no improvement over standard RL baselines on diamond-mining success rate in Craftax.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

On Craftax-Classic the trained agents reach six times the median performance of prior methods.
On XLand the same agents reach 2.6 times the median performance of prior methods.
The agents become the only ones able to craft iron tools and mine diamonds in the tested environments.
Scaling to Craftax-Extended produces a generalist policy over more than 90 discovered SHARPs that solves long-horizon tasks zero-shot at the level of agents given ground-truth rewards.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same generated program archive could be reused across new environments to avoid re-engineering rewards from scratch.
If the number of discovered SHARPs grows with task difficulty, the method could support continual expansion of agent capabilities without external task lists.
Physical robots might receive the same hierarchical reward chains to acquire sequences of manipulation skills with minimal human reward design.
Periodic pruning of low-utility SHARPs could keep the archive manageable as the number of programs increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

CODE-SHARP uses FMs to generate hierarchical Python reward programs for autonomous RL, delivering large gains on Craftax tasks, but the autonomy of the discovery process lacks key validation details.

read the letter

The main point to take away is that CODE-SHARP uses foundation models to continuously generate and evolve hierarchical Python reward programs, called SHARPs, to train a generalist RL policy from scratch in environments like Craftax. This leads to agents that beat previous methods by significant margins and achieve capabilities like crafting iron tools and mining diamonds autonomously. The approach has some real strengths. Encoding skills as executable programs with prerequisite chains allows the system to route the agent dynamically and reward incremental progress. This seems to enable efficient learning of complex behaviors without any pre-defined rewards. The results on Craftax-Classic, XLand, and the extended version with over 90 SHARPs for zero-shot long-horizon solving are the standout empirical contributions. If the numbers hold up, this points to a scalable path for open-ended discovery. That said, the autonomy claim needs closer inspection. The generation of valid, non-redundant SHARPs by the FM is central, yet details on acceptance rates, how invalid or redundant programs are handled, and any post-generation checks are not prominent in the summary. If there's filtering involved, it could affect how 'fully autonomous' the process truly is. The performance gains are large, but without info on the number of runs or statistical tests, it's difficult to assess their robustness right away. This paper is relevant for researchers working on open-ended reinforcement learning and methods that combine foundation models with RL for skill discovery. A reader focused on automatic curriculum or reward design would get value from the specific mechanism and the reported outcomes. It deserves a serious referee because the core idea is novel and the results are ambitious enough to merit detailed feedback on the experiments and implementation. I recommend sending it out for peer review.

Referee Report

2 major / 1 minor

Summary. The paper introduces CODE-SHARP, a framework that uses foundation models to continuously and autonomously discover and evolve an archive of Skills as Hierarchical Reward Programs (SHARPs). Each SHARP is a Python program encoding a local success condition together with a chain of prerequisite SHARPs; at runtime the agent is routed through the chain and receives incremental rewards only for the marginal behavior introduced by the new program. The method is evaluated on Craftax-Classic, XLand, and Craftax-Extended, where it reports 6× and 2.6× median performance gains over prior work, the first successful iron-tool and diamond-mining agents, and zero-shot long-horizon solving with >90 discovered SHARPs that matches ground-truth-reward performance.

Significance. If the autonomy and validity claims are substantiated, the work would constitute a meaningful step toward open-ended, reward-free skill acquisition in reinforcement learning. The hierarchical program representation and dynamic routing mechanism offer a concrete route to scaling generalist agents on long-horizon tasks without hand-crafted reward functions.

major comments (2)

[Abstract] Abstract: the headline performance claims (6× median on Craftax-Classic, 2.6× on XLand, unique iron-tool and diamond-mining capability) are presented without any report of the number of independent runs, statistical significance tests, variance across seeds, or the precise baseline implementations and hyper-parameters used for comparison. These details are required to assess whether the reported gains are robust.
[Abstract and paragraph on SHARP generation and runtime routing] Abstract and paragraph on SHARP generation and runtime routing: the central claim that training occurs 'fully autonomously' and 'without any pre-defined rewards' rests on the assumption that the foundation model produces executable, non-redundant SHARPs whose prerequisite chains yield useful incremental signals. The manuscript provides no acceptance rate, cycle-detection procedure, redundancy-pruning method, or verification that invalid programs are never inserted into the archive; without these quantities it is impossible to rule out that performance derives from an implicitly curated subset rather than raw open-ended discovery.

minor comments (1)

[Abstract] The acronym SHARP is used in the abstract before its expansion; a parenthetical definition on first use would improve immediate readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We provide point-by-point responses to the major comments and indicate the revisions we plan to incorporate in the updated manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the headline performance claims (6× median on Craftax-Classic, 2.6× on XLand, unique iron-tool and diamond-mining capability) are presented without any report of the number of independent runs, statistical significance tests, variance across seeds, or the precise baseline implementations and hyper-parameters used for comparison. These details are required to assess whether the reported gains are robust.

Authors: We agree with this observation. The current abstract highlights key results but omits important statistical details. In the revised version, we will update the abstract to include the number of independent runs (e.g., 5 seeds), report median and interquartile ranges, and mention that statistical significance was assessed using appropriate tests. We will also add a table or section detailing baseline implementations and hyperparameters to ensure reproducibility and robustness assessment. revision: yes
Referee: [Abstract and paragraph on SHARP generation and runtime routing] Abstract and paragraph on SHARP generation and runtime routing: the central claim that training occurs 'fully autonomously' and 'without any pre-defined rewards' rests on the assumption that the foundation model produces executable, non-redundant SHARPs whose prerequisite chains yield useful incremental signals. The manuscript provides no acceptance rate, cycle-detection procedure, redundancy-pruning method, or verification that invalid programs are never inserted into the archive; without these quantities it is impossible to rule out that performance derives from an implicitly curated subset rather than raw open-ended discovery.

Authors: We appreciate the referee highlighting the need for more details on the autonomy mechanisms. While the framework operates without human intervention after initialization, we recognize that explicit descriptions of filtering processes are necessary. In the revision, we will add a new subsection under Methods describing the SHARP validation pipeline, including acceptance rates observed during experiments, cycle detection via topological sorting on the prerequisite graph, redundancy pruning based on program equivalence checks, and runtime verification that only valid, executable programs are added to the archive. This will substantiate that the performance gains stem from the open-ended discovery process. revision: yes

Circularity Check

0 steps flagged

No significant circularity: results validated on external benchmarks

full rationale

The paper's central claims rest on empirical performance metrics obtained from independent game environments (Craftax-Classic, XLand, Craftax-Extended) and direct comparisons to previously published baselines. No equations or derivations reduce the reported performance gains (6x median, diamond-mining capability, zero-shot long-horizon solving) to fitted parameters or self-referential definitions. SHARP generation and routing are described as autonomous processes whose outputs are evaluated externally rather than being tautologically equivalent to the inputs by construction. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the untested assumption that foundation models can produce a growing set of useful, non-circular SHARP programs whose hierarchical structure yields effective incremental rewards; no free parameters or invented physical entities are stated in the abstract.

axioms (1)

domain assumption Foundation models can generate executable Python programs that correctly encode local success conditions and prerequisite chains for novel skills
Invoked throughout the description of autonomous SHARP discovery and runtime routing.

invented entities (1)

SHARP (Skill as Hierarchical Reward Program) no independent evidence
purpose: To represent each discovered skill as a Python program that supplies a local reward and delegates prerequisites to earlier programs
Core new representation introduced to enable incremental learning without predefined global rewards.

pith-pipeline@v0.9.0 · 5815 in / 1359 out tokens · 44492 ms · 2026-05-22T10:31:49.411745+00:00 · methodology

0 comments

read the original abstract

A core quality of general intelligence is the ability to open-endedly expand and evolve its set of mastered skills autonomously. While recent Foundation Model (FM) driven approaches have shown promising results towards this goal, they typically rely on significant human-in-the-loop engineering, limiting their transferability to novel environments. To address this, we introduce Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs (CODE-SHARP), a framework that leverages FMs to open-endedly grow and evolve an archive of Python programs encoding skills to train a generalist agent policy entirely from scratch via reinforcement learning, directly from source code. These programs, termed Skills as Hierarchical Reward Programs (SHARPs), each encode a local success condition and a set of prerequisites delegated to previously discovered SHARPs. At runtime, SHARPs dynamically route the agent through their prerequisite chain based on the current state, rewarding each completion along the way, requiring the agent to learn only the marginal behaviour each new SHARP introduces, enabling efficient learning of long-horizon skills without any pre-defined rewards. On Craftax-Classic and XLand, agents trained fully autonomously by CODE-SHARP outperform previous works by 6x and 2.6x in median performance and are the only agents capable of crafting iron tools and mining diamonds. Scaled to Craftax-Extended, CODE-SHARP trains a generalist agent on over 90 discovered SHARPs, enabling the agent to solve challenging long-horizon tasks zero-shot, matching agents trained on ground-truth rewards.

Figures

Figures reproduced from arXiv: 2602.10085 by Antoine Cully, Pierluigi Vito Amadori, Richard Bornemann.

**Figure 1.** Figure 1: CODE-SHARP consists of two FM-driven iterative processes to discover novel SHARP skills and refine SHARP skills already present in the skill archive. CODE-SHARP utilises a pipeline of FM-based skill proposal generator, implementor, and judge to first generate and filter novel SHARP skills before environment evaluation. Skill refinement is based on the FM-based skill mutation generator and implementor. Skil… view at source ↗

**Figure 2.** Figure 2: Pseudo-Code version of the SHARP skill defining a skill to craft a stone pickaxe. Skill Proposal Generator The proposal generator produces a set of n skill candidates formatted as pseudo-code. Each proposal specifies a high-level description, a binary success condition ϕ, and a dictionary mapping environment conditions to prerequisite SHARP skills in the existing archive. The skill proposal generato… view at source ↗

**Figure 3.** Figure 3: Interconnected archive of discovered SHARP skills. CODE-SHARP continuously builds on existing SHARP skills in the archive to define novel, meaningful skills in line with the natural curriculum of Craftax. Initial skill discovery focuses on the Overworld before progressing to the Dungeon then the Mines and finally the Sewers. for a total of 2e9 environment steps. The agent architecture is a JAX (Bradbury et… view at source ↗

**Figure 4.** Figure 4: ((a) Average score achieved on each benchmark task. CODE-SHARP outperforms the zero-shot ReAct LLM agent, the agent pretrained on environment rewards, and the task experts. (b) Evolution of agent capabilities over the course of open-ended skill discovery. The policy planner utilises increasingly complex SHARP skills to define policies-in-code throughout training, resulting in large performance gains relati… view at source ↗

**Figure 5.** Figure 5: shows the evolution of the absolute score achieved by the goal-conditioned agent guided by the policies-in-code as the skill archive evolves. We observe large increases in performance for the Dungeon and Crafting benchmarks which are focused on the first two levels of Craftax. Performance on the Navigation and Mines benchmarks, which are focused on the later levels of Craftax, continue to increase steadily… view at source ↗

**Figure 6.** Figure 6: further illustrates the impact of these components across individual benchmark tasks. We observe that opportunistic sampling is critical for mastering complex, long-horizon tasks. While all ablations contribute to the agent’s success, the data suggest that opportunistic sampling, by dynamically shifting the training distribution toward the frontier of the agent’s capabilities, provides the largest singular… view at source ↗

**Figure 7.** Figure 7: Evolution of average SHARP skill complexity present in the skill archive 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dynamic Agent Skills: A Lifecycle Survey and Taxonomy of Evolving Skill Libraries
cs.AI 2026-07 conditional novelty 6.0

Dynamic agent skill libraries are lifecycle-managed evolving stores whose admission, verification, maintenance, and retrieval choices determine whether reuse helps or hurts.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Code as Policies: Language Model Programs for Embodied Control

URL https://openreview.net/forum? id=jRjvcqtdtA. Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., and Zeng, A. Code as policies: Language model programs for embodied control.arXiv preprint arXiv:2209.07753, 2022. Liang, W., Wang, S., Wang, H.-J., Bastani, O., Jayaraman, D., and Ma, Y . J. Eurekaverse: Environment curricu- lum ...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

Ensure the proposed skill is novel

Propose a Single Skill: Your proposal must be for exactly one novel and diverse skill that fits within the provided category. Ensure the proposed skill is novel. Simple repetitions of existing skills, e.g. MineThreeWood when MineWood is present, are not acceptable. Novelty implies a functional difference, not just a parametric one. \\

work page
[3]

Consider the logical order of skill acquisition to maximize the agent’s potential success

Build Upon Existing Skills: The proposed skill must expand the agent’s current repertoire by building on existing skills. Consider the logical order of skill acquisition to maximize the agent’s potential success. The prerequisite skills used to define the proposed skill must be present in the archive. \\

work page
[4]

Continuously build on already learned skills to form a curriculum of increasingly complex skills

Form a Curriculum: Ensure that you start out with simple skills. Continuously build on already learned skills to form a curriculum of increasingly complex skills. The curriculum should be as easy as possible for the agent to follow. \\ 25 CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

work page
[5]

Skill Exploration: The ultimate goal is for the agent to possess as many diverse and meaningful skills as possible. For this it is crucial to explore as many levels as possible in the environment but also thoroughly explore each level for any meaningful skills proposals not yet present in the agent’s skill repertoire. \\

work page
[6]

For crafting tools the success condition must be given as absolute values, i.e

Define Success Condition: Clearly define a condition that indicates the skill has been successfully completed. For crafting tools the success condition must be given as absolute values, i.e. for wood pickaxe it must be cur.inventory.pickaxe $ >= 1$. For navigation skills, ensure the success condition includes the level the agent should be on to fulfill th...

work page
[7]

No implementation for a condition needed

State Starting Conditions: Carefully review the provided environment code to identify any necessary starting conditions. State these conditions using the provided template. If no specific conditions are needed, you must write "No implementation for a condition needed". \\

work page
[8]

Carefully analyse the environment and the logic of the skill to decide the order in which you place the conditions

Condition Order: The agent will complete each condition in the order that is specified in the skill definition. Carefully analyse the environment and the logic of the skill to decide the order in which you place the conditions

work page
[9]

If a necessary prerequisite skill does not exist, you must disregard your initial idea and instead propose the missing prerequisite skill

Link Prerequisites: If a starting condition is required, you must name a prerequisite skill from the agent’s repertoire that can fulfill it. If a necessary prerequisite skill does not exist, you must disregard your initial idea and instead propose the missing prerequisite skill. \\

work page
[10]

Assign Reward: Assign a reward of one to the skill. \\

work page
[11]

Skill Name

Proposal History: Carefully go over previously failed skill proposals, analyse why they might not have achieved a high enough performance to be accepted and use them as potential inspiration for new proposals. Never directly repropose one of the failed skills. You can propose a skill with the same objective but it must use a different structure so as to a...

work page
[12]

If it’s fundamentally flawed and cannot be fixed, reject it and explain why

Validate the Proposal ------------------------- First, check the proposal for correctness. If it’s fundamentally flawed and cannot be fixed, reject it and explain why. * Success Condition: - Ensure it’s logical and unambiguous. - Correct minor logical errors. - For crafting tools the success condition must be given as absolute values, i.e. for wood pickax...

work page
[13]

No implementation for a condition needed

Implement the JAX Class -------------------------- If the proposal is valid (or you’ve corrected it), implement the class. * Indexing: - Use the proposed index. If it’s already taken, increment the index by one until it is unique. - Refer to prerequisite skills by their correct index. * Code Requirements: - Your entire implementation MUST be JAX and JIT-c...

work page
[14]

If no skill passes your judgement you are allowed to reject them all

**Selected Skill: ** At most two skills can be selected. If no skill passes your judgement you are allowed to reject them all

work page
[15]

Filter out all skills which are would not be classified as novel

**Filtering:** Start by comparing each skill proposal against the example skills. Filter out all skills which are would not be classified as novel. Then, make your final decision based on the set of novel skills

work page
[16]

**Justification:** Provide a concise explanation for your choice, explicitly referencing how the selected skill excels in **Curriculum Coherence **, **Strategic Value**, and/or **Skill Diversity ** compared to the other candidates

work page
[17]

**Output:** Provide the class of the skill which you choose as the optimal next skill to add to agents repertoire in the format provided to you

work page
[18]

Collect Wood

**Criteria:** The most important criteria your selected skill must possess is feasibility. Always pick a skill that presents a logical incremental improvement over extremely difficult skills for which the agent does not possess all prerequisite skills. ENVIRONMENT CODE ---------------- $environment_description$ SKILL FUNCTION TEMPLATE --------------------...

work page
[19]

Ensure your mutations follow the exact heuristic specification given to you and is sensible

work page
[20]

If you add a new new precondition function, ensure that a relevant skill to satisfy it is present in the agents skill archive

work page
[21]

Directly follow the output template given to you to define your mutation proposal

work page
[22]

All precondition functions should be clearly marked in the mutation preconditions

work page
[23]

You should not directly reimplement one of the previous failed mutations

Carefully analyse the previous failed mutation attempts, if available, to intelligently propose a next mutation. You should not directly reimplement one of the previous failed mutations

work page
[24]

Skill Name

Under no circumstances should you mutate the success condition of the parent skill. ENVIRONMENT DESCRIPTION $environment_description SKILL ARCHIVE 34 CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs $skill_repertoire PARENT SKILL $sampled_parent_skill $sampled_parent_skill_code PREVIOUSLY FAILED PROPOSED ...

work page
[25]

Skill Structure ------------------ * ** Class Name: ** ‘BenchmarkSolver‘ * ** Index:** $next_skill_index * ** Template:** Use the provided JAX class structure

work page
[26]

Mine Stone

Logic & Strategy Requirements -------------------------------- You must define the ‘cond_fns‘ (preconditions) and ‘prereq_fns‘ (actions) lists. Construct them using the following logic: ### A. Strategic Planning Before strictly following the provided milestones, you must analyze the task requirements to ensure agent survival and efficiency. * ** Preparati...

work page
[27]

Collect Wood

Code Requirements -------------------- * ** JAX Compatibility: ** The implementation must be pure JAX and JIT-compatible. * ** Operators:** ALWAYS use ‘jnp.logical_and‘, ‘jnp.logical_or‘, etc., instead of Python native operators. * ** Imports:** Do not add import statements; assume the environment is pre-loaded. --------------------------------- CRAFTAX E...

work page

[1] [1]

Code as Policies: Language Model Programs for Embodied Control

URL https://openreview.net/forum? id=jRjvcqtdtA. Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., and Zeng, A. Code as policies: Language model programs for embodied control.arXiv preprint arXiv:2209.07753, 2022. Liang, W., Wang, S., Wang, H.-J., Bastani, O., Jayaraman, D., and Ma, Y . J. Eurekaverse: Environment curricu- lum ...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

Ensure the proposed skill is novel

Propose a Single Skill: Your proposal must be for exactly one novel and diverse skill that fits within the provided category. Ensure the proposed skill is novel. Simple repetitions of existing skills, e.g. MineThreeWood when MineWood is present, are not acceptable. Novelty implies a functional difference, not just a parametric one. \\

work page

[3] [3]

Consider the logical order of skill acquisition to maximize the agent’s potential success

Build Upon Existing Skills: The proposed skill must expand the agent’s current repertoire by building on existing skills. Consider the logical order of skill acquisition to maximize the agent’s potential success. The prerequisite skills used to define the proposed skill must be present in the archive. \\

work page

[4] [4]

Continuously build on already learned skills to form a curriculum of increasingly complex skills

Form a Curriculum: Ensure that you start out with simple skills. Continuously build on already learned skills to form a curriculum of increasingly complex skills. The curriculum should be as easy as possible for the agent to follow. \\ 25 CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

work page

[5] [5]

Skill Exploration: The ultimate goal is for the agent to possess as many diverse and meaningful skills as possible. For this it is crucial to explore as many levels as possible in the environment but also thoroughly explore each level for any meaningful skills proposals not yet present in the agent’s skill repertoire. \\

work page

[6] [6]

For crafting tools the success condition must be given as absolute values, i.e

Define Success Condition: Clearly define a condition that indicates the skill has been successfully completed. For crafting tools the success condition must be given as absolute values, i.e. for wood pickaxe it must be cur.inventory.pickaxe $ >= 1$. For navigation skills, ensure the success condition includes the level the agent should be on to fulfill th...

work page

[7] [7]

No implementation for a condition needed

State Starting Conditions: Carefully review the provided environment code to identify any necessary starting conditions. State these conditions using the provided template. If no specific conditions are needed, you must write "No implementation for a condition needed". \\

work page

[8] [8]

Carefully analyse the environment and the logic of the skill to decide the order in which you place the conditions

Condition Order: The agent will complete each condition in the order that is specified in the skill definition. Carefully analyse the environment and the logic of the skill to decide the order in which you place the conditions

work page

[9] [9]

If a necessary prerequisite skill does not exist, you must disregard your initial idea and instead propose the missing prerequisite skill

Link Prerequisites: If a starting condition is required, you must name a prerequisite skill from the agent’s repertoire that can fulfill it. If a necessary prerequisite skill does not exist, you must disregard your initial idea and instead propose the missing prerequisite skill. \\

work page

[10] [10]

Assign Reward: Assign a reward of one to the skill. \\

work page

[11] [11]

Skill Name

Proposal History: Carefully go over previously failed skill proposals, analyse why they might not have achieved a high enough performance to be accepted and use them as potential inspiration for new proposals. Never directly repropose one of the failed skills. You can propose a skill with the same objective but it must use a different structure so as to a...

work page

[12] [12]

If it’s fundamentally flawed and cannot be fixed, reject it and explain why

Validate the Proposal ------------------------- First, check the proposal for correctness. If it’s fundamentally flawed and cannot be fixed, reject it and explain why. * Success Condition: - Ensure it’s logical and unambiguous. - Correct minor logical errors. - For crafting tools the success condition must be given as absolute values, i.e. for wood pickax...

work page

[13] [13]

No implementation for a condition needed

Implement the JAX Class -------------------------- If the proposal is valid (or you’ve corrected it), implement the class. * Indexing: - Use the proposed index. If it’s already taken, increment the index by one until it is unique. - Refer to prerequisite skills by their correct index. * Code Requirements: - Your entire implementation MUST be JAX and JIT-c...

work page

[14] [14]

If no skill passes your judgement you are allowed to reject them all

**Selected Skill: ** At most two skills can be selected. If no skill passes your judgement you are allowed to reject them all

work page

[15] [15]

Filter out all skills which are would not be classified as novel

**Filtering:** Start by comparing each skill proposal against the example skills. Filter out all skills which are would not be classified as novel. Then, make your final decision based on the set of novel skills

work page

[16] [16]

**Justification:** Provide a concise explanation for your choice, explicitly referencing how the selected skill excels in **Curriculum Coherence **, **Strategic Value**, and/or **Skill Diversity ** compared to the other candidates

work page

[17] [17]

**Output:** Provide the class of the skill which you choose as the optimal next skill to add to agents repertoire in the format provided to you

work page

[18] [18]

Collect Wood

**Criteria:** The most important criteria your selected skill must possess is feasibility. Always pick a skill that presents a logical incremental improvement over extremely difficult skills for which the agent does not possess all prerequisite skills. ENVIRONMENT CODE ---------------- $environment_description$ SKILL FUNCTION TEMPLATE --------------------...

work page

[19] [19]

Ensure your mutations follow the exact heuristic specification given to you and is sensible

work page

[20] [20]

If you add a new new precondition function, ensure that a relevant skill to satisfy it is present in the agents skill archive

work page

[21] [21]

Directly follow the output template given to you to define your mutation proposal

work page

[22] [22]

All precondition functions should be clearly marked in the mutation preconditions

work page

[23] [23]

You should not directly reimplement one of the previous failed mutations

Carefully analyse the previous failed mutation attempts, if available, to intelligently propose a next mutation. You should not directly reimplement one of the previous failed mutations

work page

[24] [24]

Skill Name

Under no circumstances should you mutate the success condition of the parent skill. ENVIRONMENT DESCRIPTION $environment_description SKILL ARCHIVE 34 CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs $skill_repertoire PARENT SKILL $sampled_parent_skill $sampled_parent_skill_code PREVIOUSLY FAILED PROPOSED ...

work page

[25] [25]

Skill Structure ------------------ * ** Class Name: ** ‘BenchmarkSolver‘ * ** Index:** $next_skill_index * ** Template:** Use the provided JAX class structure

work page

[26] [26]

Mine Stone

Logic & Strategy Requirements -------------------------------- You must define the ‘cond_fns‘ (preconditions) and ‘prereq_fns‘ (actions) lists. Construct them using the following logic: ### A. Strategic Planning Before strictly following the provided milestones, you must analyze the task requirements to ensure agent survival and efficiency. * ** Preparati...

work page

[27] [27]

Collect Wood

Code Requirements -------------------- * ** JAX Compatibility: ** The implementation must be pure JAX and JIT-compatible. * ** Operators:** ALWAYS use ‘jnp.logical_and‘, ‘jnp.logical_or‘, etc., instead of Python native operators. * ** Imports:** Do not add import statements; assume the environment is pre-loaded. --------------------------------- CRAFTAX E...

work page