The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents

Muhammad Zia Hydari; Raja Iqbal

arxiv: 2606.08998 · v3 · pith:LNCIIG4Anew · submitted 2026-06-08 · 💻 cs.AI · cs.CY· econ.GN· q-fin.EC

The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents

Muhammad Zia Hydari , Raja Iqbal This is my paper

Pith reviewed 2026-06-27 16:55 UTC · model grok-4.3

classification 💻 cs.AI cs.CYecon.GNq-fin.EC

keywords agentic AIstochasticitytoken samplingvariabilityreproducibilityfoundation modelsorchestration loop

0 comments

The pith

Separating token-sampling variability from extrinsic sources clarifies why agentic AI systems produce different outputs even under deterministic execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that variability in agentic AI systems stems from distinct layers that are frequently conflated. An intrinsic source is token generation, where scores become probabilities and a decoder samples via a pseudo-random generator, allowing small differences to propagate into altered plans, tool calls, or state. Extrinsic sources include changing environments, live data, serving infrastructure, and numerical details. By partitioning these layers the tutorial shows what stochasticity means in practice and when variability can be reproduced under matched conditions. A sympathetic reader would care because this partition explains why identical prompts can yield divergent behavior without any change to the model itself.

Core claim

Agentic AI systems embed a foundation model inside an orchestration loop that plans, calls tools, observes results, and updates state. Token generation supplies one explicit intrinsic source of variability because the model produces next-token scores, converts them to probabilities, and a decoder may sample using a pseudo-random number generator. Small sampled differences can propagate upward into different tool calls, code paths, search queries, or agent state. Other sources remain extrinsic to token sampling and include changing environments, live data, serving infrastructure, batch effects, and numerical details. Partitioning these layers clarifies the meaning of stochasticity, the condit

What carries the argument

The partition of variability sources into intrinsic token sampling inside the foundation-model orchestration loop and extrinsic factors such as environment and infrastructure.

If this is right

A small difference in a sampled token can change the entire subsequent plan or tool call sequence.
Deterministic execution of the model does not guarantee identical agent behavior once extrinsic layers are present.
Variability is reproducible only when all layers, intrinsic and extrinsic, are matched across runs.
Conflating the layers leads to incorrect diagnoses of whether observed differences are model-intrinsic or deployment-driven.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers could instrument each layer separately to isolate which source is responsible for observed output changes.
The same partition might apply to non-agentic systems that still use sampling inside larger loops.
Developers seeking reproducibility could focus first on locking extrinsic factors before attempting to control sampling.

Load-bearing premise

The sources of variability can be cleanly partitioned into intrinsic token generation and extrinsic categories with limited interaction that would undermine the separation.

What would settle it

A controlled run in which every extrinsic factor is held fixed yet agent outputs still diverge solely because of token sampling, or conversely a demonstration that extrinsic factors always interact with token choices in ways that prevent clean separation.

Figures

Figures reproduced from arXiv: 2606.08998 by Muhammad Zia Hydari, Raja Iqbal.

**Figure 2.** Figure 2: Agentic stochasticity. The agent repeatedly turns model outputs into actions and observations. A [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Sampling as a lookup on the unit interval. The probabilities [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Temperature reshapes the same logits before sampling. All three panels use the identical raw scores; [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: The token-generation pipeline. The model’s forward pass computes logits and probabilities. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: How a single token flips a trajectory. Two runs share an identical token prefix, then sampling [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. At the core of many current agents is a foundation model, a large pretrained model adaptable to many downstream tasks, embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source of variability in such systems is token generation: the model computes scores over possible next tokens, the scores are converted into probabilities, and a decoder may sample tokens using a pseudo-random number generator. A small sampled token difference can then cascade downstream into a different tool call, code path, search query, or agent state. Other sources of variability are extrinsic to token sampling, including changing environments, live data, serving infrastructure, batch effects, and numerical details. By separating these layers, this tutorial clarifies what it means to call agentic AI systems stochastic, when such variability can be reproduced under matched conditions, and why deterministic execution need not imply identical behavior in deployed settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A tutorial that cleanly separates token sampling from extrinsic variability in agents but introduces no new results or tests.

read the letter

This paper lays out a simple distinction between variability that comes from sampling the next token in a foundation model and variability that comes from the environment, infrastructure, or live data. The walkthrough of how probabilities turn into sampled tokens and then into different tool calls or plans is clear and matches standard autoregressive generation.

It does a reasonable job pointing out that even a deterministic run can produce different outputs across deployments because of those external layers. That framing is consistent and might reduce some loose talk about stochastic agents.

Nothing in the work is new. No derivation, experiment, or edge-case analysis is added beyond what is already in the literature on language model decoding and agent loops. The separation is presented as definitional, so there is no way to check whether the layers stay distinct once a sampled token changes which external call gets made.

The main soft spot is that the utility of the taxonomy is left unexamined. Readers who already think about reproducibility in deployed agents will not learn much they did not already separate in practice.

This is the sort of note that could help a team writing internal design docs or debugging agent runs. It does not have the substance or novelty to justify sending out for peer review.

Referee Report

0 major / 0 minor

Summary. The manuscript is a tutorial that introduces a layered taxonomy of variability sources in agentic AI systems built around foundation models. It distinguishes intrinsic variability arising from token sampling (scores to probabilities to decoder sampling via PRNG) from extrinsic sources including environments, live data, serving infrastructure, batch effects, and numerical details. The central claim is that this separation clarifies the meaning of stochasticity, the conditions for reproducing variability under matched conditions, and why deterministic execution need not produce identical deployed behavior.

Significance. If the taxonomy is adopted, the work supplies a definitional framework that could reduce conflation of distinct variability sources when discussing reproducibility in AI agents. Its value lies in conceptual clarification rather than new empirical results, formal derivations, or falsifiable predictions; the contribution is therefore primarily taxonomic and pedagogical.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and their recommendation to accept. The referee's summary correctly identifies the paper as a tutorial providing a layered taxonomy that separates intrinsic token-sampling stochasticity from extrinsic sources of variability in agentic AI systems.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript is a purely descriptive tutorial that introduces a conceptual taxonomy separating intrinsic token-sampling variability from extrinsic sources in agentic AI systems. It contains no equations, derivations, quantitative predictions, fitted parameters, or load-bearing self-citations. The central claim is definitional (clarifying terminology via layered separation) rather than a result that reduces to its own inputs by construction. No steps match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The submission is a conceptual tutorial; no free parameters, axioms, or invented entities are introduced or required by the central claim.

pith-pipeline@v0.9.1-grok · 5736 in / 1000 out tokens · 33808 ms · 2026-06-27T16:55:15.354335+00:00 · methodology

The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)