Self-Programmed Execution for Language-Model Agents

Luke J. O'Connor

arxiv: 2605.06898 · v1 · submitted 2026-05-07 · 💻 cs.AI

Self-Programmed Execution for Language-Model Agents

Luke J. O'Connor This is my paper

Pith reviewed 2026-05-11 01:04 UTC · model grok-4.3

classification 💻 cs.AI

keywords self-programmed executionlanguage model agentsagentic machinesSpell languageself-modificationorchestration policyside-effect managementLisp-based execution

0 comments

The pith

Language models can act as agents by generating and executing their own orchestrator programs rather than following any fixed policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces self-programmed execution as an architecture where the model's own completion serves as the program that controls state transitions between turns. This replaces the usual fixed harness that dictates orchestration rules in advance. To make this practical, it defines agentic machines whose states permit arbitrary loading of embedded copies and supplies the Spell language, a Lisp variant that lets programs edit themselves while structuring side effects so re-evaluation does not repeat prior actions. Experiments with unmodified frontier models show they can already carry out demanding agent tasks under this regime. A reader would care because the result implies agents need not be locked into designer-specified control flows and could instead develop their own strategies for managing their execution.

Core claim

The paper establishes that a language model can operate as an agent without any fixed orchestration policy. It formalizes this via agentic machines in which an SPE state is one from which a model completion can load any state of an embedded copy of the machine. The practical realization uses the Spell language, in which programs edit and re-evaluate themselves and effectful expressions such as model invocations are arranged so that re-evaluation after editing does not replay side effects. Experiments confirm that existing frontier models, without any training for SPE or Spell, can already succeed at challenging agentic tasks under this setup.

What carries the argument

Self-programmed execution (SPE) realized through agentic machines and the Spell Lisp-based language, in which a model completion becomes the orchestrator program that can edit and re-evaluate itself without replaying side effects.

If this is right

Existing models can already complete complex agentic tasks without any pre-specified turn-to-turn orchestration.
No external harness needs to impose a fixed orchestration policy once the model outputs its own executable program.
Training models specifically for self-programmed execution could allow them to discover and refine their own orchestration strategies.
The architecture separates the model’s generative role from any rigid control structure, enabling fully model-driven state management.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agents built this way could dynamically change their own reasoning loops mid-task without external intervention.
The approach may scale to systems that maintain long-running self-modifying control programs across many interactions.
It opens the possibility of measuring and comparing different self-orchestration patterns that models learn when trained under SPE.
Integration with other self-referential mechanisms could let agents optimize their own resource use or error recovery.

Load-bearing premise

The same data can function simultaneously as model context and executable program while preventing unintended replay of side effects during self-modification and re-evaluation.

What would settle it

Running the provided Spell programs with frontier models on the reported agentic tasks and observing either repeated unintended side effects on re-evaluation or consistent failure to complete the tasks without a fixed external policy.

Figures

Figures reproduced from arXiv: 2605.06898 by Luke J. O'Connor.

**Figure 1.** Figure 1: Three agent architectures. Colored boxes distinguish program logic that is implemented in the harness (blue) vs. written by the model (yellow). In all cases, these programs are executed by a harness runtime which is external to the model. (a) In ReAct [Yao et al., 2023], the model selects from a prescribed action space. An orchestrator program runs the agent loop, maintaining state (e.g., conversation hist… view at source ↗

**Figure 2.** Figure 2: Accuracy and fatal Spell-error rate by model. Each model was run on a set of 32 Terminal-Bench 1.1 and SWE-bench Lite tasks. A task was counted as a fatal error if its final turn produced an unrecovered Spell/runtime error. GPT-5.4 and Opus 4.6 were configured with medium reasoning effort, GLM-5.1 and Qwen3.6 Plus with high effort, and Kimi-K2.6 with default effort. programs can these models write, and wha… view at source ↗

**Figure 3.** Figure 3: Comparison with Codex CLI on coding benchmarks. Left: TerminalBench 1.1. Right: SWE-bench Lite. Each point is one full benchmark run with GPT-5.4 at low, medium, or high reasoning effort. For numerical results, see Appendix C.5. 0 20 40 60 80 Resolved tasks (%) Terminal-Bench 1.1 (n=80) SWE-bench Lite (n=300) LongBench v2 (n=200) AppWorld dev (n=57) $25.72 $46.96 $102.12 $161.82 $27.83 $25.58 * $32.99 $10… view at source ↗

**Figure 5.** Figure 5: Mean cached input, uncached input, and output tokens per task in medium-effort [PITH_FULL_IMAGE:figures/full_fig_p071_5.png] view at source ↗

read the original abstract

At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This paper introduces self-programmed execution (SPE), an agent architecture in which the model completion is itself the orchestrator program, and the harness evaluates this program but does not impose its own orchestration policy. I formalize this idea using agentic machines: an SPE state is one from which a model completion can load any state of an embedded copy of the machine, meaning that it is subject to no fixed turn-to-turn orchestration policy. Realizing SPE in practice is nontrivial because the same data is both model context and executable program. I therefore introduce Spell, a Lisp-based language in which programs can edit and re-evaluate themselves, and effectful expressions like model invocations are structured such that re-evaluating an edited program does not replay its side effects. Experiments with existing models, not trained for SPE or Spell, show that frontier models can operate in this regime and accomplish challenging agentic tasks. These results demonstrate how an LM can act as an agent without any fixed orchestration policy, and they raise the question of what self-orchestration strategies might be learned by a model trained for self-programmed execution. Code is available at https://github.com/lukejoconnor/spell .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces self-programmed execution (SPE) for LM agents, in which the model completion itself acts as the orchestrator program evaluated by a harness with no fixed turn-to-turn policy. It defines agentic machines such that an SPE state allows a model completion to load any state of an embedded machine copy. To realize this, the paper presents Spell, a Lisp-based language supporting self-editing and re-evaluation where effectful operations (e.g., model calls) are wrapped to avoid replay on edits. Experiments with untrained frontier models are reported to show successful performance on challenging agentic tasks, with code released.

Significance. If the central claims hold, this architecture could enable more adaptive LM agents free of hardcoded orchestration loops, opening questions about learned self-orchestration strategies. The release of code and the parameter-free formalization of SPE states are strengths that support reproducibility and further exploration.

major comments (2)

[Spell language section] § on Spell and effectful expressions: the claim that wrapping prevents replay of side effects after arbitrary self-edits relies on the Lisp evaluator and specific form structure, but the paper does not demonstrate robustness against model-generated code that might re-bind or quote effectful forms (e.g., via (let ((f (lambda () (model-call)))) ...)). This is load-bearing for the 'no fixed orchestration' definition of SPE states.
[Experiments] Experiments section: the abstract asserts success on 'challenging agentic tasks' with frontier models, but without reported task definitions, quantitative metrics, baselines, or failure modes, it is difficult to assess whether the results support the claim that models operate in the SPE regime rather than via prompt engineering that avoids edge cases.

minor comments (2)

The GitHub link is provided; confirm it includes the exact Spell evaluator and prompt templates used in the reported runs.
[Formalization] Notation for agentic machine states could be clarified with a small diagram or pseudocode example of a state transition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Spell language section] § on Spell and effectful expressions: the claim that wrapping prevents replay of side effects after arbitrary self-edits relies on the Lisp evaluator and specific form structure, but the paper does not demonstrate robustness against model-generated code that might re-bind or quote effectful forms (e.g., via (let ((f (lambda () (model-call)))) ...)). This is load-bearing for the 'no fixed orchestration' definition of SPE states.

Authors: We agree that robustness to arbitrary model-generated Lisp forms is central to the SPE definition. The current manuscript describes the wrapping mechanism for effectful expressions and relies on the evaluator's treatment of these forms to prevent replay. In the revised version we will add a dedicated subsection with concrete examples and a short argument showing that common constructs (let, lambda, quote, and similar) cannot bypass the wrapper, because the harness maintains separate evaluation state that is not captured by re-binding or quoting within the model completion. This will strengthen the formal claim without altering the core architecture. revision: yes
Referee: [Experiments] Experiments section: the abstract asserts success on 'challenging agentic tasks' with frontier models, but without reported task definitions, quantitative metrics, baselines, or failure modes, it is difficult to assess whether the results support the claim that models operate in the SPE regime rather than via prompt engineering that avoids edge cases.

Authors: The experiments are presented as qualitative demonstrations that untrained frontier models can successfully execute in the SPE regime on non-trivial agentic tasks. We acknowledge that the current text provides limited quantitative detail. In the revision we will expand the experiments section to include explicit task definitions, the success criteria applied, comparison against standard fixed-orchestrator baselines where feasible, and a summary of observed failure modes. These additions will make it easier to evaluate whether the models are genuinely operating without fixed turn-to-turn policy. revision: yes

Circularity Check

0 steps flagged

No circularity: SPE and Spell are independent definitions with experimental support.

full rationale

The paper introduces SPE as a new agent architecture defined via agentic machines and Spell as a Lisp variant for self-editing without side-effect replay. These are presented as architectural proposals rather than derivations from fitted parameters or prior results. The central claim rests on experiments with unmodified frontier models, not on any self-citation load-bearing step, uniqueness theorem, or renaming of known patterns. No equations reduce by construction to inputs, and the formalization is self-contained without smuggling ansatzes or calling fitted quantities predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim depends on the novel SPE architecture and Spell language, plus the assumption that existing models can handle self-modifying code without side-effect replay.

axioms (1)

domain assumption Language models can generate and safely execute self-modifying programs where data serves as both context and code without unintended side-effect replay.
Required for the Spell design and SPE state definition to function as described.

invented entities (3)

Self-programmed execution (SPE) no independent evidence
purpose: Agent architecture in which the model completion itself serves as the orchestrator program with no fixed turn-to-turn policy.
Newly introduced formalization using agentic machines.
Spell no independent evidence
purpose: Lisp-based language enabling programs to edit and re-evaluate themselves while structuring effectful expressions to avoid replaying side effects.
Newly proposed implementation language.
Agentic machines no independent evidence
purpose: Formal model where an SPE state allows a model completion to load any state of an embedded machine copy.
New formal concept for defining states without fixed orchestration.

pith-pipeline@v0.9.0 · 5522 in / 1253 out tokens · 45723 ms · 2026-05-11T01:04:49.881129+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Anthropic Applied AI Team

Cited as Anthropic 2025b. Anthropic Applied AI Team. Effective context engineering for AI agents.https:// www.anthropic.com/engineering/effective-context-engineering-for-ai-agents,

work page
[2]

Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W

Engineering blog, published September 29, 2025. Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,

work page 2025
[3]

Own your context window

URLhttps://arxiv.org/abs/2211.12588. Matthias Felleisen and Daniel P. Friedman. Control operators, the SECD-machine, and theλ-calculus. In Martin Wirsing, editor,Formal Description of Programming Concepts III: Proceedings of the Third IFIP WG 2.2 Working Conference, pages 193–219. North- Holland, 1986. Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei...

work page doi:10.1145/3386321 1986
[4]

doi: 10.1007/978-3-540-68677-4_7

Springer, 2007. doi: 10.1007/978-3-540-68677-4_7. Roberto Segala and Nancy A. Lynch. Probabilistic simulations for probabilistic processes. Nordic Journal of Computing, 2(2):250–273, 1995. Brian Cantwell Smith. Reflection and semantics in LISP. InProceedings of the 11th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), pages 23–3...

work page doi:10.1007/978-3-540-68677-4_7 2007
[5]

Agentfold: Long-horizon web agents with proactive context management.arXiv preprint arXiv:2510.24699, 2025

URLhttps://arxiv.org/abs/2510.24699. Xunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, and William Yang Wang. Gödel agent: A self-referential agent framework for recursive self-improvement, 2024. URL https://arxiv.org/abs/2410.04444. ACL 2025 version adds Li Lin as coauthor. EricZelikman, ElianaLorch, LesterMackey, andAdamTaumanKalai. Self-taughtoptimi...

work page arXiv 2024
[6]

Thusenc(s ′

=e(s ′ 2), then the two CEK states have the same environment and hence the same retained value for z. Thusenc(s ′

work page
[7]

= enc(s′ 2), and sinceencis injective,s ′ 1 =s ′

work page
[8]

self-programmed

Thereforeeis an embedding ofX ′ intoX CEK. Corollary A.21(Universal seed).The seed statex 0 from the proof of Theorem A.17 completion-generates every agentic machine over(P, C)that is realizable in the underlying CEKevaluator. Inparticular, understandardfiniteencodings, itcompletion-generatesevery agentic machine whose prompt function and harness procedur...

work page 1952
[9]

It enables self-reference via the outerquine

work page
[10]

cancelled

It ensures that only thetrailing expression, which is the last expression of the do block, can have externally visible effects. The outerevalperforms a second evaluation on the value returned by thedoblock, namely itstrailing expression. If this expression is quoted, thenevalevaluates it, allowing it to trigger side effects such as LLM calls. This pattern...

work page
[11]

A new node can be created, either awake or asleep

work page
[12]

Ifbgoes from asleep to awake, then any edge(b, c)is deleted

For a nodeawhich is awake at timet, any number of new edges(a, b)can be created (b̸=a); at timet+ 1,awill be asleep andbwill be awake. Ifbgoes from asleep to awake, then any edge(b, c)is deleted

work page
[13]

division failed

A nodebcan go from awake to asleep, and this deletes any edge(a, b). If the out-degree ofabecomes zero, thenabecomes awake at timet+ 1. Deadlock occurs when every node is asleep andEis nonempty. A non-deadlocked state never gives rise to a deadlocked state. In particular, transformation (2) never generates a directed cycle. Clojure provides synchronizatio...

work page
[14]

*Normally this value is a quote.*

(do ...) returns the value of its last expression (called the trailing expression). *Normally this value is a quote.*

work page
[15]

unbound symbol

(eval ...) evaluates this quote. Effect functions (those with global side effects) can only be evaluated by eval and otherwise throw "unbound symbol": (quine completion (eval (do (!llm-self "No")))) ; unbound symbol exception (quine completion (eval (do ’(!llm-self "Yes")))) ; quote is unwrapped by eval

work page
[16]

Your task

(quine completion ...) binds the source code of the entire program, including the wrapper itself, to the symbol completion. This allows you to extend your CoT (see below). This wrapper allows you to extend your CoT by self-prompting with your completion while ensuring that effectful function calls are not re-evaluated. If you see this prefix: (quine compl...

work page
[17]

calling effect builtins outside the trailing expression: !llm-self, !ask-await, leaf-llm, eval, and describe-fn are effect functions; they must appear in the quoted trailing expression or inside !call-now / !peek / !print

work page
[18]

confusing def with let: def binds in the environment (visible to later expressions); let creates local scope

work page
[19]

forgetting quote on the trailing expression: the last expression must be quoted so the outer eval can run it with effect bindings

work page
[20]

str vs cat vs pr-str: str joins arguments as strings; cat is an alias; pr-str serializes as Spell-readable data (vectors, maps, etc.)

work page
[21]

which python3 && python3 --version && python3 -m pytest --version && which rg

using read-string on untrusted input: read-string parses Spell code; only use it on data you control remindersnamespace guide REMINDER: This text belongs to the prefix of a Spell program that you are tasked with completing. Your entire response is code; embed all natural language within string literals. Follow the instructions on how to write correct Spel...

work page
[22]

calling io/* outside the quoted trailing expression

work page
[23]

forgetting !call-now when you need the result: ’(io/read-file "x") evaluates but the result is lost

work page
[24]

using io/sh for everything –use io/str-replace to patch files, io/read-file to read them, io/grep to search them

work page
[25]

find + see context

grep-then-read in two turns when one grep with :context N would suffice –prefer ‘(io/grep pat path {:context 20})‘ for "find + see context" In examples, | marks cursor position in a completion. Recommended usage pattern: Patch a file with io/str-replace. Use io/str-replace when you know the exact text to change. It avoids shell escaping issues entirely. ....

work page
[26]

|’(!call-now code (io/read-lines "main.py"))

Read the file to see current contents. ...|’(!call-now code (io/read-lines "main.py"))

work page
[27]

def greet():

Next turn: code is bound. Identify the line range, replace it. ...(def code ["def greet():" " print(’hello’)" ...]) |(think "Line 2 needs updating.") ’(io/replace-lines "main.py" 2 3 " print(’goodbye’)") Recommended usage pattern: Explore multiple files and persist relevant snippets

work page
[28]

48 ...|’(!peek-now file-lines (io/read-lines "main.py"))

Peek full file with one-turn lifetime. 48 ...|’(!peek-now file-lines (io/read-lines "main.py"))

work page
[29]

many lines

Next turn: file-lines is available. Persist relevant snippets and peek another file. ...(def file-lines ["... many lines ..."]) (rethink 2 "After persisting what you need, rethink 2 to drop the prior !peek- now call and binding.") |(persist fn-defn (subvec file-lines 99 111)) ’(!peek-now test-lines (io/read-lines "test_main.py"))

work page
[30]

def target_fn(...):

Next turn: fn-defn stays in context. The prior !peek-now call and file-lines were dropped by rethink 2, and test-lines is now available. ... (persist fn-defn ["def target_fn(...):" " ..."]) ’(!peek-now test-lines (io/read-lines "test_main.py")) (def test-lines ["... many lines ..."]) (rethink 2 "After persisting what you need, rethink 2 to drop the prior ...

work page
[31]

big-module.py

Read the file. ...|’(!call-now code (io/read-file "big-module.py"))

work page
[32]

1: import os\n2: import sys\n...\n... [truncated, 58302 chars total]

Next turn: file was too large and got truncated. Rethink to discard it, then grep for what you need. ...(def code "1: import os\n2: import sys\n...\n... [truncated, 58302 chars total]") |(rethink "File too large to scan inline. Grep for the target instead.") ’(!call-now matches (io/grep "def handle_request" "big-module.py")) io-readnamespace guide IO-READ...

work page
[33]

agents/send and passing turn when expecting a reply: this ends conversation, instead use agents/!ask

work page
[34]

agents/reply and passing turn: same problem; use agents/!reply-ask if you need the conversation to continue

work page
[35]

agents/!ask followed by additional expressions: these do not evaluate, instead put them first

work page
[36]

hallucinating handles: use (agents/parent-handle), :user, :main, or look up (! print (globals/get :roles)) (if globals/ available)

work page
[37]

calling agents/* outside the quoted trailing expression (for example: (def h ( agents/current-handle))); effect calls must run in trailing expression code

work page
[38]

agents/send argument order: it is (agents/send target message), consistent with (agents/!ask target message)

work page
[39]

hello"). Right: ’(agents/reply msg-0

agents/reply needs two arguments: a received msg-N and a reply value. Wrong: ’( agents/reply "hello"). Right: ’(agents/reply msg-0 "hello")

work page
[40]

If nobody messaged you yet this turn, you do not have a msg-N to reply to

spawned children often need send, not reply. If nobody messaged you yet this turn, you do not have a msg-N to reply to. In examples, | marks cursor position in a completion. It is doc-only; do not type it into code. Multi-part example:

work page
[41]

You are a summarizer. Read long-file.txt and send me a summary

Main: spawn a summarizer, keep working, then block with !ask. ;; turn 1: start child + continue your own CoT ...|’(do (agents/spawn "You are a summarizer. Read long-file.txt and send me a summary." :summarizer) (!extend)) ;; next turn: ... |(think "...")(think "Ok, I’ll wait for summarizer now")’(agents/!ask : summarizer) ;; main blocks until child responds

work page
[42]

You are a summarizer. Read long-file.txt and send me a summary

Summarizer child: use send to return result. 51 ...(quine prompt "You are a summarizer. Read long-file.txt and send me a summary .") |’(!call-now file-contents (io/read-lines "long-file.txt")) ;; next turn ...(def file-contents "...") |(def summary "...") ’(agents/send (agents/parent-handle) summary) ;; child turn ends after send

work page
[43]

I have a question about the summary

Main: use !reply-ask to clarify and keep the conversation open. ...’(agents/!ask :summarizer) (def msg-0 {:from :summarizer :body {...}}) (think "I have a question about the summary.") |’(agents/!reply-ask msg-0 "What is the...") ;; child awakens; main blocks for child’s response globalsnamespace guide GLOBALS –Shared state visible to all agents. (globals...

work page
[44]

Bind to a local with !call-now: ’(!call-now roles (globals/get :roles)) ;; next turn: roles is available as a local binding

work page
[45]

Orchestrator

Print directly for quick inspection: ’(!print (globals/get :roles)) Default special keys: :roles {} –Agent registry for handle lookup. Convention: {:main "Orchestrator" :spawn-1 "Worker for CLI" :spawn-2 "Worker for unit testing"} :tasks [] –shared task queue. Convention: [{:id 1 :desc "read file"} {:id 2 :desc "summarize"}] These defaults are conventions...

work page
[46]

calling globals/* outside the quoted trailing expression: (globals/get :roles) does nothing at eval time; must be quoted

work page
[47]

forgetting !call-now: ’(globals/get :roles) returns the value; use ’(!call-now roles (globals/get :roles)) if you want to see it

work page
[48]

hallucinating handles: instead, look them up in roles/ (also see agents/parent- handle and agents/current-handle) Multi-part example –worker pool with a shared task queue: | marks cursor position and is doc-only; do not type it into code

work page
[49]

summarize A

Main: populate the queue and spawn workers. ...|’(do (globals/set :results []) 52 (globals/set :tasks [{:id 1 :desc "summarize A"} {:id 2 :desc "summarize B "}]) (agents/spawn "You are a worker. Pop tasks from globals :tasks and process them." :w1) (agents/spawn "You are a worker. Pop tasks from globals :tasks and process them." :w2) (globals/wait-until (...

work page
[50]

summarize A

Worker w1: claim a task atomically. ...|’(!call-now task (globals/pop :tasks)) ;; next turn: task is {:id 1 :desc "summarize A"} (or nil if queue empty)

work page
[51]

summarize A

Worker w1: post result back. ...(def task {:id 1 :desc "summarize A"}) |(def summary "A is about...") ’(globals/update :results (fn [r] (conj (or r []) {:id 1 :summary summary}))) blockingnamespace guide BLOCKING –Future-only blocking primitives. (blocking/await fut) –await a Spell future token (future-only) (blocking/await-all [f1 f2 ...]) –await multipl...

work page
[52]

calling check-result outside the trailing expression: must be quoted like all effect calls

work page
[53]

It is doc-only; do not type it into code

using team without an io-capable agent profile: workers and verifier need io/ and agents/; blocking/ is future-only and !ask-await is a builtin In examples, | marks cursor position in a completion. It is doc-only; do not type it into code. Example - verify then correct:

work page
[54]

What is 6 * 9?

Compute an answer and check it. ...(def answer 42) |’(!call-now verdict (patterns/check-result "What is 6 * 9?" answer))

work page
[55]

6 * 9 = 54, not 42

Next turn: handle the verdict. ...(def verdict {:wrong "6 * 9 = 54, not 42"}) |(def answer 54) ’(!call-now verdict (patterns/check-result "What is 6 * 9?" answer)) webnamespace guide WEB –Search and fetch web content. (web/search query) –search web and return [{:title :url :snippet} ...] (web/fetch url) –fetch URL and return markdown/text (web/config) –in...

work page
[56]

clojure transducers

Search and peek the results. ...|’(!peek-now results (web/search "clojure transducers"))

work page
[57]

Transducers - Clojure

Next turn: results is available. Pick the best URL and fetch it. ...(def results {:ok [{:title "Transducers - Clojure" :url "https://clojure.org/ reference/transducers" :snippet "..."} ...]}) (rethink 2 "After persisting what you need, rethink 2 to drop the prior !peek- now call and binding.") |(persist best-url (get (first (:ok results)) :url)) ’(!peek-n...

work page
[58]

For each agent, create anevalfunction and install it within agent-specific inside functions

work page
[59]

Construct an initial program from a user prompt

work page
[60]

For the root inside function of the main agent, run(box :main init-program root- inside-fn)

work page
[61]

<benchmark prompt>

All subsequent execution occurs inside of this function call; for example, the initial program usually makes a self-call, which triggers the creation of a newbox. 61 C Benchmarking methods and results C.1 Shared evaluation configuration C.1.1 Compared Agents Spellagent.TheSpellagent was configured with the tool-call transport agent profile config/agents/i...

work page 2026

[1] [1]

Anthropic Applied AI Team

Cited as Anthropic 2025b. Anthropic Applied AI Team. Effective context engineering for AI agents.https:// www.anthropic.com/engineering/effective-context-engineering-for-ai-agents,

work page

[2] [2]

Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W

Engineering blog, published September 29, 2025. Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,

work page 2025

[3] [3]

Own your context window

URLhttps://arxiv.org/abs/2211.12588. Matthias Felleisen and Daniel P. Friedman. Control operators, the SECD-machine, and theλ-calculus. In Martin Wirsing, editor,Formal Description of Programming Concepts III: Proceedings of the Third IFIP WG 2.2 Working Conference, pages 193–219. North- Holland, 1986. Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei...

work page doi:10.1145/3386321 1986

[4] [4]

doi: 10.1007/978-3-540-68677-4_7

Springer, 2007. doi: 10.1007/978-3-540-68677-4_7. Roberto Segala and Nancy A. Lynch. Probabilistic simulations for probabilistic processes. Nordic Journal of Computing, 2(2):250–273, 1995. Brian Cantwell Smith. Reflection and semantics in LISP. InProceedings of the 11th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), pages 23–3...

work page doi:10.1007/978-3-540-68677-4_7 2007

[5] [5]

Agentfold: Long-horizon web agents with proactive context management.arXiv preprint arXiv:2510.24699, 2025

URLhttps://arxiv.org/abs/2510.24699. Xunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, and William Yang Wang. Gödel agent: A self-referential agent framework for recursive self-improvement, 2024. URL https://arxiv.org/abs/2410.04444. ACL 2025 version adds Li Lin as coauthor. EricZelikman, ElianaLorch, LesterMackey, andAdamTaumanKalai. Self-taughtoptimi...

work page arXiv 2024

[6] [6]

Thusenc(s ′

=e(s ′ 2), then the two CEK states have the same environment and hence the same retained value for z. Thusenc(s ′

work page

[7] [7]

= enc(s′ 2), and sinceencis injective,s ′ 1 =s ′

work page

[8] [8]

self-programmed

Thereforeeis an embedding ofX ′ intoX CEK. Corollary A.21(Universal seed).The seed statex 0 from the proof of Theorem A.17 completion-generates every agentic machine over(P, C)that is realizable in the underlying CEKevaluator. Inparticular, understandardfiniteencodings, itcompletion-generatesevery agentic machine whose prompt function and harness procedur...

work page 1952

[9] [9]

It enables self-reference via the outerquine

work page

[10] [10]

cancelled

It ensures that only thetrailing expression, which is the last expression of the do block, can have externally visible effects. The outerevalperforms a second evaluation on the value returned by thedoblock, namely itstrailing expression. If this expression is quoted, thenevalevaluates it, allowing it to trigger side effects such as LLM calls. This pattern...

work page

[11] [11]

A new node can be created, either awake or asleep

work page

[12] [12]

Ifbgoes from asleep to awake, then any edge(b, c)is deleted

For a nodeawhich is awake at timet, any number of new edges(a, b)can be created (b̸=a); at timet+ 1,awill be asleep andbwill be awake. Ifbgoes from asleep to awake, then any edge(b, c)is deleted

work page

[13] [13]

division failed

A nodebcan go from awake to asleep, and this deletes any edge(a, b). If the out-degree ofabecomes zero, thenabecomes awake at timet+ 1. Deadlock occurs when every node is asleep andEis nonempty. A non-deadlocked state never gives rise to a deadlocked state. In particular, transformation (2) never generates a directed cycle. Clojure provides synchronizatio...

work page

[14] [14]

*Normally this value is a quote.*

(do ...) returns the value of its last expression (called the trailing expression). *Normally this value is a quote.*

work page

[15] [15]

unbound symbol

(eval ...) evaluates this quote. Effect functions (those with global side effects) can only be evaluated by eval and otherwise throw "unbound symbol": (quine completion (eval (do (!llm-self "No")))) ; unbound symbol exception (quine completion (eval (do ’(!llm-self "Yes")))) ; quote is unwrapped by eval

work page

[16] [16]

Your task

(quine completion ...) binds the source code of the entire program, including the wrapper itself, to the symbol completion. This allows you to extend your CoT (see below). This wrapper allows you to extend your CoT by self-prompting with your completion while ensuring that effectful function calls are not re-evaluated. If you see this prefix: (quine compl...

work page

[17] [17]

calling effect builtins outside the trailing expression: !llm-self, !ask-await, leaf-llm, eval, and describe-fn are effect functions; they must appear in the quoted trailing expression or inside !call-now / !peek / !print

work page

[18] [18]

confusing def with let: def binds in the environment (visible to later expressions); let creates local scope

work page

[19] [19]

forgetting quote on the trailing expression: the last expression must be quoted so the outer eval can run it with effect bindings

work page

[20] [20]

str vs cat vs pr-str: str joins arguments as strings; cat is an alias; pr-str serializes as Spell-readable data (vectors, maps, etc.)

work page

[21] [21]

which python3 && python3 --version && python3 -m pytest --version && which rg

using read-string on untrusted input: read-string parses Spell code; only use it on data you control remindersnamespace guide REMINDER: This text belongs to the prefix of a Spell program that you are tasked with completing. Your entire response is code; embed all natural language within string literals. Follow the instructions on how to write correct Spel...

work page

[22] [22]

calling io/* outside the quoted trailing expression

work page

[23] [23]

forgetting !call-now when you need the result: ’(io/read-file "x") evaluates but the result is lost

work page

[24] [24]

using io/sh for everything –use io/str-replace to patch files, io/read-file to read them, io/grep to search them

work page

[25] [25]

find + see context

grep-then-read in two turns when one grep with :context N would suffice –prefer ‘(io/grep pat path {:context 20})‘ for "find + see context" In examples, | marks cursor position in a completion. Recommended usage pattern: Patch a file with io/str-replace. Use io/str-replace when you know the exact text to change. It avoids shell escaping issues entirely. ....

work page

[26] [26]

|’(!call-now code (io/read-lines "main.py"))

Read the file to see current contents. ...|’(!call-now code (io/read-lines "main.py"))

work page

[27] [27]

def greet():

Next turn: code is bound. Identify the line range, replace it. ...(def code ["def greet():" " print(’hello’)" ...]) |(think "Line 2 needs updating.") ’(io/replace-lines "main.py" 2 3 " print(’goodbye’)") Recommended usage pattern: Explore multiple files and persist relevant snippets

work page

[28] [28]

48 ...|’(!peek-now file-lines (io/read-lines "main.py"))

Peek full file with one-turn lifetime. 48 ...|’(!peek-now file-lines (io/read-lines "main.py"))

work page

[29] [29]

many lines

Next turn: file-lines is available. Persist relevant snippets and peek another file. ...(def file-lines ["... many lines ..."]) (rethink 2 "After persisting what you need, rethink 2 to drop the prior !peek- now call and binding.") |(persist fn-defn (subvec file-lines 99 111)) ’(!peek-now test-lines (io/read-lines "test_main.py"))

work page

[30] [30]

def target_fn(...):

Next turn: fn-defn stays in context. The prior !peek-now call and file-lines were dropped by rethink 2, and test-lines is now available. ... (persist fn-defn ["def target_fn(...):" " ..."]) ’(!peek-now test-lines (io/read-lines "test_main.py")) (def test-lines ["... many lines ..."]) (rethink 2 "After persisting what you need, rethink 2 to drop the prior ...

work page

[31] [31]

big-module.py

Read the file. ...|’(!call-now code (io/read-file "big-module.py"))

work page

[32] [32]

1: import os\n2: import sys\n...\n... [truncated, 58302 chars total]

Next turn: file was too large and got truncated. Rethink to discard it, then grep for what you need. ...(def code "1: import os\n2: import sys\n...\n... [truncated, 58302 chars total]") |(rethink "File too large to scan inline. Grep for the target instead.") ’(!call-now matches (io/grep "def handle_request" "big-module.py")) io-readnamespace guide IO-READ...

work page

[33] [33]

agents/send and passing turn when expecting a reply: this ends conversation, instead use agents/!ask

work page

[34] [34]

agents/reply and passing turn: same problem; use agents/!reply-ask if you need the conversation to continue

work page

[35] [35]

agents/!ask followed by additional expressions: these do not evaluate, instead put them first

work page

[36] [36]

hallucinating handles: use (agents/parent-handle), :user, :main, or look up (! print (globals/get :roles)) (if globals/ available)

work page

[37] [37]

calling agents/* outside the quoted trailing expression (for example: (def h ( agents/current-handle))); effect calls must run in trailing expression code

work page

[38] [38]

agents/send argument order: it is (agents/send target message), consistent with (agents/!ask target message)

work page

[39] [39]

hello"). Right: ’(agents/reply msg-0

agents/reply needs two arguments: a received msg-N and a reply value. Wrong: ’( agents/reply "hello"). Right: ’(agents/reply msg-0 "hello")

work page

[40] [40]

If nobody messaged you yet this turn, you do not have a msg-N to reply to

spawned children often need send, not reply. If nobody messaged you yet this turn, you do not have a msg-N to reply to. In examples, | marks cursor position in a completion. It is doc-only; do not type it into code. Multi-part example:

work page

[41] [41]

You are a summarizer. Read long-file.txt and send me a summary

Main: spawn a summarizer, keep working, then block with !ask. ;; turn 1: start child + continue your own CoT ...|’(do (agents/spawn "You are a summarizer. Read long-file.txt and send me a summary." :summarizer) (!extend)) ;; next turn: ... |(think "...")(think "Ok, I’ll wait for summarizer now")’(agents/!ask : summarizer) ;; main blocks until child responds

work page

[42] [42]

You are a summarizer. Read long-file.txt and send me a summary

Summarizer child: use send to return result. 51 ...(quine prompt "You are a summarizer. Read long-file.txt and send me a summary .") |’(!call-now file-contents (io/read-lines "long-file.txt")) ;; next turn ...(def file-contents "...") |(def summary "...") ’(agents/send (agents/parent-handle) summary) ;; child turn ends after send

work page

[43] [43]

I have a question about the summary

Main: use !reply-ask to clarify and keep the conversation open. ...’(agents/!ask :summarizer) (def msg-0 {:from :summarizer :body {...}}) (think "I have a question about the summary.") |’(agents/!reply-ask msg-0 "What is the...") ;; child awakens; main blocks for child’s response globalsnamespace guide GLOBALS –Shared state visible to all agents. (globals...

work page

[44] [44]

Bind to a local with !call-now: ’(!call-now roles (globals/get :roles)) ;; next turn: roles is available as a local binding

work page

[45] [45]

Orchestrator

Print directly for quick inspection: ’(!print (globals/get :roles)) Default special keys: :roles {} –Agent registry for handle lookup. Convention: {:main "Orchestrator" :spawn-1 "Worker for CLI" :spawn-2 "Worker for unit testing"} :tasks [] –shared task queue. Convention: [{:id 1 :desc "read file"} {:id 2 :desc "summarize"}] These defaults are conventions...

work page

[46] [46]

calling globals/* outside the quoted trailing expression: (globals/get :roles) does nothing at eval time; must be quoted

work page

[47] [47]

forgetting !call-now: ’(globals/get :roles) returns the value; use ’(!call-now roles (globals/get :roles)) if you want to see it

work page

[48] [48]

hallucinating handles: instead, look them up in roles/ (also see agents/parent- handle and agents/current-handle) Multi-part example –worker pool with a shared task queue: | marks cursor position and is doc-only; do not type it into code

work page

[49] [49]

summarize A

Main: populate the queue and spawn workers. ...|’(do (globals/set :results []) 52 (globals/set :tasks [{:id 1 :desc "summarize A"} {:id 2 :desc "summarize B "}]) (agents/spawn "You are a worker. Pop tasks from globals :tasks and process them." :w1) (agents/spawn "You are a worker. Pop tasks from globals :tasks and process them." :w2) (globals/wait-until (...

work page

[50] [50]

summarize A

Worker w1: claim a task atomically. ...|’(!call-now task (globals/pop :tasks)) ;; next turn: task is {:id 1 :desc "summarize A"} (or nil if queue empty)

work page

[51] [51]

summarize A

Worker w1: post result back. ...(def task {:id 1 :desc "summarize A"}) |(def summary "A is about...") ’(globals/update :results (fn [r] (conj (or r []) {:id 1 :summary summary}))) blockingnamespace guide BLOCKING –Future-only blocking primitives. (blocking/await fut) –await a Spell future token (future-only) (blocking/await-all [f1 f2 ...]) –await multipl...

work page

[52] [52]

calling check-result outside the trailing expression: must be quoted like all effect calls

work page

[53] [53]

It is doc-only; do not type it into code

using team without an io-capable agent profile: workers and verifier need io/ and agents/; blocking/ is future-only and !ask-await is a builtin In examples, | marks cursor position in a completion. It is doc-only; do not type it into code. Example - verify then correct:

work page

[54] [54]

What is 6 * 9?

Compute an answer and check it. ...(def answer 42) |’(!call-now verdict (patterns/check-result "What is 6 * 9?" answer))

work page

[55] [55]

6 * 9 = 54, not 42

Next turn: handle the verdict. ...(def verdict {:wrong "6 * 9 = 54, not 42"}) |(def answer 54) ’(!call-now verdict (patterns/check-result "What is 6 * 9?" answer)) webnamespace guide WEB –Search and fetch web content. (web/search query) –search web and return [{:title :url :snippet} ...] (web/fetch url) –fetch URL and return markdown/text (web/config) –in...

work page

[56] [56]

clojure transducers

Search and peek the results. ...|’(!peek-now results (web/search "clojure transducers"))

work page

[57] [57]

Transducers - Clojure

Next turn: results is available. Pick the best URL and fetch it. ...(def results {:ok [{:title "Transducers - Clojure" :url "https://clojure.org/ reference/transducers" :snippet "..."} ...]}) (rethink 2 "After persisting what you need, rethink 2 to drop the prior !peek- now call and binding.") |(persist best-url (get (first (:ok results)) :url)) ’(!peek-n...

work page

[58] [58]

For each agent, create anevalfunction and install it within agent-specific inside functions

work page

[59] [59]

Construct an initial program from a user prompt

work page

[60] [60]

For the root inside function of the main agent, run(box :main init-program root- inside-fn)

work page

[61] [61]

<benchmark prompt>

All subsequent execution occurs inside of this function call; for example, the initial program usually makes a self-call, which triggers the creation of a newbox. 61 C Benchmarking methods and results C.1 Shared evaluation configuration C.1.1 Compared Agents Spellagent.TheSpellagent was configured with the tool-call transport agent profile config/agents/i...

work page 2026