The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents
Pith reviewed 2026-06-27 16:55 UTC · model grok-4.3
The pith
Separating token-sampling variability from extrinsic sources clarifies why agentic AI systems produce different outputs even under deterministic execution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agentic AI systems embed a foundation model inside an orchestration loop that plans, calls tools, observes results, and updates state. Token generation supplies one explicit intrinsic source of variability because the model produces next-token scores, converts them to probabilities, and a decoder may sample using a pseudo-random number generator. Small sampled differences can propagate upward into different tool calls, code paths, search queries, or agent state. Other sources remain extrinsic to token sampling and include changing environments, live data, serving infrastructure, batch effects, and numerical details. Partitioning these layers clarifies the meaning of stochasticity, the condit
What carries the argument
The partition of variability sources into intrinsic token sampling inside the foundation-model orchestration loop and extrinsic factors such as environment and infrastructure.
If this is right
- A small difference in a sampled token can change the entire subsequent plan or tool call sequence.
- Deterministic execution of the model does not guarantee identical agent behavior once extrinsic layers are present.
- Variability is reproducible only when all layers, intrinsic and extrinsic, are matched across runs.
- Conflating the layers leads to incorrect diagnoses of whether observed differences are model-intrinsic or deployment-driven.
Where Pith is reading between the lines
- Designers could instrument each layer separately to isolate which source is responsible for observed output changes.
- The same partition might apply to non-agentic systems that still use sampling inside larger loops.
- Developers seeking reproducibility could focus first on locking extrinsic factors before attempting to control sampling.
Load-bearing premise
The sources of variability can be cleanly partitioned into intrinsic token generation and extrinsic categories with limited interaction that would undermine the separation.
What would settle it
A controlled run in which every extrinsic factor is held fixed yet agent outputs still diverge solely because of token sampling, or conversely a demonstration that extrinsic factors always interact with token choices in ways that prevent clean separation.
Figures
read the original abstract
Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. At the core of many current agents is a foundation model, a large pretrained model adaptable to many downstream tasks, embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source of variability in such systems is token generation: the model computes scores over possible next tokens, the scores are converted into probabilities, and a decoder may sample tokens using a pseudo-random number generator. A small sampled token difference can then cascade downstream into a different tool call, code path, search query, or agent state. Other sources of variability are extrinsic to token sampling, including changing environments, live data, serving infrastructure, batch effects, and numerical details. By separating these layers, this tutorial clarifies what it means to call agentic AI systems stochastic, when such variability can be reproduced under matched conditions, and why deterministic execution need not imply identical behavior in deployed settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a tutorial that introduces a layered taxonomy of variability sources in agentic AI systems built around foundation models. It distinguishes intrinsic variability arising from token sampling (scores to probabilities to decoder sampling via PRNG) from extrinsic sources including environments, live data, serving infrastructure, batch effects, and numerical details. The central claim is that this separation clarifies the meaning of stochasticity, the conditions for reproducing variability under matched conditions, and why deterministic execution need not produce identical deployed behavior.
Significance. If the taxonomy is adopted, the work supplies a definitional framework that could reduce conflation of distinct variability sources when discussing reproducibility in AI agents. Its value lies in conceptual clarification rather than new empirical results, formal derivations, or falsifiable predictions; the contribution is therefore primarily taxonomic and pedagogical.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and their recommendation to accept. The referee's summary correctly identifies the paper as a tutorial providing a layered taxonomy that separates intrinsic token-sampling stochasticity from extrinsic sources of variability in agentic AI systems.
Circularity Check
No significant circularity identified
full rationale
The manuscript is a purely descriptive tutorial that introduces a conceptual taxonomy separating intrinsic token-sampling variability from extrinsic sources in agentic AI systems. It contains no equations, derivations, quantitative predictions, fitted parameters, or load-bearing self-citations. The central claim is definitional (clarifying terminology via layered separation) rather than a result that reduces to its own inputs by construction. No steps match any of the enumerated circularity patterns.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.