Self-Programmed Execution for Language-Model Agents
Pith reviewed 2026-05-11 01:04 UTC · model grok-4.3
The pith
Language models can act as agents by generating and executing their own orchestrator programs rather than following any fixed policy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that a language model can operate as an agent without any fixed orchestration policy. It formalizes this via agentic machines in which an SPE state is one from which a model completion can load any state of an embedded copy of the machine. The practical realization uses the Spell language, in which programs edit and re-evaluate themselves and effectful expressions such as model invocations are arranged so that re-evaluation after editing does not replay side effects. Experiments confirm that existing frontier models, without any training for SPE or Spell, can already succeed at challenging agentic tasks under this setup.
What carries the argument
Self-programmed execution (SPE) realized through agentic machines and the Spell Lisp-based language, in which a model completion becomes the orchestrator program that can edit and re-evaluate itself without replaying side effects.
If this is right
- Existing models can already complete complex agentic tasks without any pre-specified turn-to-turn orchestration.
- No external harness needs to impose a fixed orchestration policy once the model outputs its own executable program.
- Training models specifically for self-programmed execution could allow them to discover and refine their own orchestration strategies.
- The architecture separates the model’s generative role from any rigid control structure, enabling fully model-driven state management.
Where Pith is reading between the lines
- Agents built this way could dynamically change their own reasoning loops mid-task without external intervention.
- The approach may scale to systems that maintain long-running self-modifying control programs across many interactions.
- It opens the possibility of measuring and comparing different self-orchestration patterns that models learn when trained under SPE.
- Integration with other self-referential mechanisms could let agents optimize their own resource use or error recovery.
Load-bearing premise
The same data can function simultaneously as model context and executable program while preventing unintended replay of side effects during self-modification and re-evaluation.
What would settle it
Running the provided Spell programs with frontier models on the reported agentic tasks and observing either repeated unintended side effects on re-evaluation or consistent failure to complete the tasks without a fixed external policy.
Figures
read the original abstract
At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This paper introduces self-programmed execution (SPE), an agent architecture in which the model completion is itself the orchestrator program, and the harness evaluates this program but does not impose its own orchestration policy. I formalize this idea using agentic machines: an SPE state is one from which a model completion can load any state of an embedded copy of the machine, meaning that it is subject to no fixed turn-to-turn orchestration policy. Realizing SPE in practice is nontrivial because the same data is both model context and executable program. I therefore introduce Spell, a Lisp-based language in which programs can edit and re-evaluate themselves, and effectful expressions like model invocations are structured such that re-evaluating an edited program does not replay its side effects. Experiments with existing models, not trained for SPE or Spell, show that frontier models can operate in this regime and accomplish challenging agentic tasks. These results demonstrate how an LM can act as an agent without any fixed orchestration policy, and they raise the question of what self-orchestration strategies might be learned by a model trained for self-programmed execution. Code is available at https://github.com/lukejoconnor/spell .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces self-programmed execution (SPE) for LM agents, in which the model completion itself acts as the orchestrator program evaluated by a harness with no fixed turn-to-turn policy. It defines agentic machines such that an SPE state allows a model completion to load any state of an embedded machine copy. To realize this, the paper presents Spell, a Lisp-based language supporting self-editing and re-evaluation where effectful operations (e.g., model calls) are wrapped to avoid replay on edits. Experiments with untrained frontier models are reported to show successful performance on challenging agentic tasks, with code released.
Significance. If the central claims hold, this architecture could enable more adaptive LM agents free of hardcoded orchestration loops, opening questions about learned self-orchestration strategies. The release of code and the parameter-free formalization of SPE states are strengths that support reproducibility and further exploration.
major comments (2)
- [Spell language section] § on Spell and effectful expressions: the claim that wrapping prevents replay of side effects after arbitrary self-edits relies on the Lisp evaluator and specific form structure, but the paper does not demonstrate robustness against model-generated code that might re-bind or quote effectful forms (e.g., via (let ((f (lambda () (model-call)))) ...)). This is load-bearing for the 'no fixed orchestration' definition of SPE states.
- [Experiments] Experiments section: the abstract asserts success on 'challenging agentic tasks' with frontier models, but without reported task definitions, quantitative metrics, baselines, or failure modes, it is difficult to assess whether the results support the claim that models operate in the SPE regime rather than via prompt engineering that avoids edge cases.
minor comments (2)
- The GitHub link is provided; confirm it includes the exact Spell evaluator and prompt templates used in the reported runs.
- [Formalization] Notation for agentic machine states could be clarified with a small diagram or pseudocode example of a state transition.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Spell language section] § on Spell and effectful expressions: the claim that wrapping prevents replay of side effects after arbitrary self-edits relies on the Lisp evaluator and specific form structure, but the paper does not demonstrate robustness against model-generated code that might re-bind or quote effectful forms (e.g., via (let ((f (lambda () (model-call)))) ...)). This is load-bearing for the 'no fixed orchestration' definition of SPE states.
Authors: We agree that robustness to arbitrary model-generated Lisp forms is central to the SPE definition. The current manuscript describes the wrapping mechanism for effectful expressions and relies on the evaluator's treatment of these forms to prevent replay. In the revised version we will add a dedicated subsection with concrete examples and a short argument showing that common constructs (let, lambda, quote, and similar) cannot bypass the wrapper, because the harness maintains separate evaluation state that is not captured by re-binding or quoting within the model completion. This will strengthen the formal claim without altering the core architecture. revision: yes
-
Referee: [Experiments] Experiments section: the abstract asserts success on 'challenging agentic tasks' with frontier models, but without reported task definitions, quantitative metrics, baselines, or failure modes, it is difficult to assess whether the results support the claim that models operate in the SPE regime rather than via prompt engineering that avoids edge cases.
Authors: The experiments are presented as qualitative demonstrations that untrained frontier models can successfully execute in the SPE regime on non-trivial agentic tasks. We acknowledge that the current text provides limited quantitative detail. In the revision we will expand the experiments section to include explicit task definitions, the success criteria applied, comparison against standard fixed-orchestrator baselines where feasible, and a summary of observed failure modes. These additions will make it easier to evaluate whether the models are genuinely operating without fixed turn-to-turn policy. revision: yes
Circularity Check
No circularity: SPE and Spell are independent definitions with experimental support.
full rationale
The paper introduces SPE as a new agent architecture defined via agentic machines and Spell as a Lisp variant for self-editing without side-effect replay. These are presented as architectural proposals rather than derivations from fitted parameters or prior results. The central claim rests on experiments with unmodified frontier models, not on any self-citation load-bearing step, uniqueness theorem, or renaming of known patterns. No equations reduce by construction to inputs, and the formalization is self-contained without smuggling ansatzes or calling fitted quantities predictions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Language models can generate and safely execute self-modifying programs where data serves as both context and code without unintended side-effect replay.
invented entities (3)
-
Self-programmed execution (SPE)
no independent evidence
-
Spell
no independent evidence
-
Agentic machines
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Cited as Anthropic 2025b. Anthropic Applied AI Team. Effective context engineering for AI agents.https:// www.anthropic.com/engineering/effective-context-engineering-for-ai-agents,
-
[2]
Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W
Engineering blog, published September 29, 2025. Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,
work page 2025
-
[3]
URLhttps://arxiv.org/abs/2211.12588. Matthias Felleisen and Daniel P. Friedman. Control operators, the SECD-machine, and theλ-calculus. In Martin Wirsing, editor,Formal Description of Programming Concepts III: Proceedings of the Third IFIP WG 2.2 Working Conference, pages 193–219. North- Holland, 1986. Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei...
-
[4]
doi: 10.1007/978-3-540-68677-4_7
Springer, 2007. doi: 10.1007/978-3-540-68677-4_7. Roberto Segala and Nancy A. Lynch. Probabilistic simulations for probabilistic processes. Nordic Journal of Computing, 2(2):250–273, 1995. Brian Cantwell Smith. Reflection and semantics in LISP. InProceedings of the 11th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), pages 23–3...
-
[5]
URLhttps://arxiv.org/abs/2510.24699. Xunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, and William Yang Wang. Gödel agent: A self-referential agent framework for recursive self-improvement, 2024. URL https://arxiv.org/abs/2410.04444. ACL 2025 version adds Li Lin as coauthor. EricZelikman, ElianaLorch, LesterMackey, andAdamTaumanKalai. Self-taughtoptimi...
-
[6]
=e(s ′ 2), then the two CEK states have the same environment and hence the same retained value for z. Thusenc(s ′
-
[7]
= enc(s′ 2), and sinceencis injective,s ′ 1 =s ′
-
[8]
Thereforeeis an embedding ofX ′ intoX CEK. Corollary A.21(Universal seed).The seed statex 0 from the proof of Theorem A.17 completion-generates every agentic machine over(P, C)that is realizable in the underlying CEKevaluator. Inparticular, understandardfiniteencodings, itcompletion-generatesevery agentic machine whose prompt function and harness procedur...
work page 1952
-
[9]
It enables self-reference via the outerquine
-
[10]
It ensures that only thetrailing expression, which is the last expression of the do block, can have externally visible effects. The outerevalperforms a second evaluation on the value returned by thedoblock, namely itstrailing expression. If this expression is quoted, thenevalevaluates it, allowing it to trigger side effects such as LLM calls. This pattern...
-
[11]
A new node can be created, either awake or asleep
-
[12]
Ifbgoes from asleep to awake, then any edge(b, c)is deleted
For a nodeawhich is awake at timet, any number of new edges(a, b)can be created (b̸=a); at timet+ 1,awill be asleep andbwill be awake. Ifbgoes from asleep to awake, then any edge(b, c)is deleted
-
[13]
A nodebcan go from awake to asleep, and this deletes any edge(a, b). If the out-degree ofabecomes zero, thenabecomes awake at timet+ 1. Deadlock occurs when every node is asleep andEis nonempty. A non-deadlocked state never gives rise to a deadlocked state. In particular, transformation (2) never generates a directed cycle. Clojure provides synchronizatio...
-
[14]
*Normally this value is a quote.*
(do ...) returns the value of its last expression (called the trailing expression). *Normally this value is a quote.*
-
[15]
(eval ...) evaluates this quote. Effect functions (those with global side effects) can only be evaluated by eval and otherwise throw "unbound symbol": (quine completion (eval (do (!llm-self "No")))) ; unbound symbol exception (quine completion (eval (do ’(!llm-self "Yes")))) ; quote is unwrapped by eval
-
[16]
(quine completion ...) binds the source code of the entire program, including the wrapper itself, to the symbol completion. This allows you to extend your CoT (see below). This wrapper allows you to extend your CoT by self-prompting with your completion while ensuring that effectful function calls are not re-evaluated. If you see this prefix: (quine compl...
-
[17]
calling effect builtins outside the trailing expression: !llm-self, !ask-await, leaf-llm, eval, and describe-fn are effect functions; they must appear in the quoted trailing expression or inside !call-now / !peek / !print
-
[18]
confusing def with let: def binds in the environment (visible to later expressions); let creates local scope
-
[19]
forgetting quote on the trailing expression: the last expression must be quoted so the outer eval can run it with effect bindings
-
[20]
str vs cat vs pr-str: str joins arguments as strings; cat is an alias; pr-str serializes as Spell-readable data (vectors, maps, etc.)
-
[21]
which python3 && python3 --version && python3 -m pytest --version && which rg
using read-string on untrusted input: read-string parses Spell code; only use it on data you control remindersnamespace guide REMINDER: This text belongs to the prefix of a Spell program that you are tasked with completing. Your entire response is code; embed all natural language within string literals. Follow the instructions on how to write correct Spel...
-
[22]
calling io/* outside the quoted trailing expression
-
[23]
forgetting !call-now when you need the result: ’(io/read-file "x") evaluates but the result is lost
-
[24]
using io/sh for everything –use io/str-replace to patch files, io/read-file to read them, io/grep to search them
-
[25]
grep-then-read in two turns when one grep with :context N would suffice –prefer ‘(io/grep pat path {:context 20})‘ for "find + see context" In examples, | marks cursor position in a completion. Recommended usage pattern: Patch a file with io/str-replace. Use io/str-replace when you know the exact text to change. It avoids shell escaping issues entirely. ....
-
[26]
|’(!call-now code (io/read-lines "main.py"))
Read the file to see current contents. ...|’(!call-now code (io/read-lines "main.py"))
-
[27]
Next turn: code is bound. Identify the line range, replace it. ...(def code ["def greet():" " print(’hello’)" ...]) |(think "Line 2 needs updating.") ’(io/replace-lines "main.py" 2 3 " print(’goodbye’)") Recommended usage pattern: Explore multiple files and persist relevant snippets
-
[28]
48 ...|’(!peek-now file-lines (io/read-lines "main.py"))
Peek full file with one-turn lifetime. 48 ...|’(!peek-now file-lines (io/read-lines "main.py"))
-
[29]
Next turn: file-lines is available. Persist relevant snippets and peek another file. ...(def file-lines ["... many lines ..."]) (rethink 2 "After persisting what you need, rethink 2 to drop the prior !peek- now call and binding.") |(persist fn-defn (subvec file-lines 99 111)) ’(!peek-now test-lines (io/read-lines "test_main.py"))
-
[30]
Next turn: fn-defn stays in context. The prior !peek-now call and file-lines were dropped by rethink 2, and test-lines is now available. ... (persist fn-defn ["def target_fn(...):" " ..."]) ’(!peek-now test-lines (io/read-lines "test_main.py")) (def test-lines ["... many lines ..."]) (rethink 2 "After persisting what you need, rethink 2 to drop the prior ...
- [31]
-
[32]
1: import os\n2: import sys\n...\n... [truncated, 58302 chars total]
Next turn: file was too large and got truncated. Rethink to discard it, then grep for what you need. ...(def code "1: import os\n2: import sys\n...\n... [truncated, 58302 chars total]") |(rethink "File too large to scan inline. Grep for the target instead.") ’(!call-now matches (io/grep "def handle_request" "big-module.py")) io-readnamespace guide IO-READ...
-
[33]
agents/send and passing turn when expecting a reply: this ends conversation, instead use agents/!ask
-
[34]
agents/reply and passing turn: same problem; use agents/!reply-ask if you need the conversation to continue
-
[35]
agents/!ask followed by additional expressions: these do not evaluate, instead put them first
-
[36]
hallucinating handles: use (agents/parent-handle), :user, :main, or look up (! print (globals/get :roles)) (if globals/ available)
-
[37]
calling agents/* outside the quoted trailing expression (for example: (def h ( agents/current-handle))); effect calls must run in trailing expression code
-
[38]
agents/send argument order: it is (agents/send target message), consistent with (agents/!ask target message)
-
[39]
hello"). Right: ’(agents/reply msg-0
agents/reply needs two arguments: a received msg-N and a reply value. Wrong: ’( agents/reply "hello"). Right: ’(agents/reply msg-0 "hello")
-
[40]
If nobody messaged you yet this turn, you do not have a msg-N to reply to
spawned children often need send, not reply. If nobody messaged you yet this turn, you do not have a msg-N to reply to. In examples, | marks cursor position in a completion. It is doc-only; do not type it into code. Multi-part example:
-
[41]
You are a summarizer. Read long-file.txt and send me a summary
Main: spawn a summarizer, keep working, then block with !ask. ;; turn 1: start child + continue your own CoT ...|’(do (agents/spawn "You are a summarizer. Read long-file.txt and send me a summary." :summarizer) (!extend)) ;; next turn: ... |(think "...")(think "Ok, I’ll wait for summarizer now")’(agents/!ask : summarizer) ;; main blocks until child responds
-
[42]
You are a summarizer. Read long-file.txt and send me a summary
Summarizer child: use send to return result. 51 ...(quine prompt "You are a summarizer. Read long-file.txt and send me a summary .") |’(!call-now file-contents (io/read-lines "long-file.txt")) ;; next turn ...(def file-contents "...") |(def summary "...") ’(agents/send (agents/parent-handle) summary) ;; child turn ends after send
-
[43]
I have a question about the summary
Main: use !reply-ask to clarify and keep the conversation open. ...’(agents/!ask :summarizer) (def msg-0 {:from :summarizer :body {...}}) (think "I have a question about the summary.") |’(agents/!reply-ask msg-0 "What is the...") ;; child awakens; main blocks for child’s response globalsnamespace guide GLOBALS –Shared state visible to all agents. (globals...
-
[44]
Bind to a local with !call-now: ’(!call-now roles (globals/get :roles)) ;; next turn: roles is available as a local binding
-
[45]
Print directly for quick inspection: ’(!print (globals/get :roles)) Default special keys: :roles {} –Agent registry for handle lookup. Convention: {:main "Orchestrator" :spawn-1 "Worker for CLI" :spawn-2 "Worker for unit testing"} :tasks [] –shared task queue. Convention: [{:id 1 :desc "read file"} {:id 2 :desc "summarize"}] These defaults are conventions...
-
[46]
calling globals/* outside the quoted trailing expression: (globals/get :roles) does nothing at eval time; must be quoted
-
[47]
forgetting !call-now: ’(globals/get :roles) returns the value; use ’(!call-now roles (globals/get :roles)) if you want to see it
-
[48]
hallucinating handles: instead, look them up in roles/ (also see agents/parent- handle and agents/current-handle) Multi-part example –worker pool with a shared task queue: | marks cursor position and is doc-only; do not type it into code
-
[49]
Main: populate the queue and spawn workers. ...|’(do (globals/set :results []) 52 (globals/set :tasks [{:id 1 :desc "summarize A"} {:id 2 :desc "summarize B "}]) (agents/spawn "You are a worker. Pop tasks from globals :tasks and process them." :w1) (agents/spawn "You are a worker. Pop tasks from globals :tasks and process them." :w2) (globals/wait-until (...
-
[50]
Worker w1: claim a task atomically. ...|’(!call-now task (globals/pop :tasks)) ;; next turn: task is {:id 1 :desc "summarize A"} (or nil if queue empty)
-
[51]
Worker w1: post result back. ...(def task {:id 1 :desc "summarize A"}) |(def summary "A is about...") ’(globals/update :results (fn [r] (conj (or r []) {:id 1 :summary summary}))) blockingnamespace guide BLOCKING –Future-only blocking primitives. (blocking/await fut) –await a Spell future token (future-only) (blocking/await-all [f1 f2 ...]) –await multipl...
-
[52]
calling check-result outside the trailing expression: must be quoted like all effect calls
-
[53]
It is doc-only; do not type it into code
using team without an io-capable agent profile: workers and verifier need io/ and agents/; blocking/ is future-only and !ask-await is a builtin In examples, | marks cursor position in a completion. It is doc-only; do not type it into code. Example - verify then correct:
-
[54]
Compute an answer and check it. ...(def answer 42) |’(!call-now verdict (patterns/check-result "What is 6 * 9?" answer))
-
[55]
Next turn: handle the verdict. ...(def verdict {:wrong "6 * 9 = 54, not 42"}) |(def answer 54) ’(!call-now verdict (patterns/check-result "What is 6 * 9?" answer)) webnamespace guide WEB –Search and fetch web content. (web/search query) –search web and return [{:title :url :snippet} ...] (web/fetch url) –fetch URL and return markdown/text (web/config) –in...
-
[56]
Search and peek the results. ...|’(!peek-now results (web/search "clojure transducers"))
-
[57]
Next turn: results is available. Pick the best URL and fetch it. ...(def results {:ok [{:title "Transducers - Clojure" :url "https://clojure.org/ reference/transducers" :snippet "..."} ...]}) (rethink 2 "After persisting what you need, rethink 2 to drop the prior !peek- now call and binding.") |(persist best-url (get (first (:ok results)) :url)) ’(!peek-n...
-
[58]
For each agent, create anevalfunction and install it within agent-specific inside functions
-
[59]
Construct an initial program from a user prompt
-
[60]
For the root inside function of the main agent, run(box :main init-program root- inside-fn)
-
[61]
All subsequent execution occurs inside of this function call; for example, the initial program usually makes a self-call, which triggers the creation of a newbox. 61 C Benchmarking methods and results C.1 Shared evaluation configuration C.1.1 Compared Agents Spellagent.TheSpellagent was configured with the tool-call transport agent profile config/agents/i...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.