pith. sign in

arxiv: 2606.08998 · v3 · pith:LNCIIG4Anew · submitted 2026-06-08 · 💻 cs.AI · cs.CY· econ.GN· q-fin.EC

The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents

Pith reviewed 2026-07-03 23:57 UTC · model grok-4.3

classification 💻 cs.AI cs.CYecon.GNq-fin.EC
keywords agentic AIstochasticitytoken samplingreproducibilityfoundation modelsorchestration loopsextrinsic variability
0
0 comments X

The pith

Separating token sampling from extrinsic sources clarifies why agentic AI systems vary across runs even under deterministic execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that variability in agentic AI systems arises from multiple distinct layers, with token generation sampling as one explicit intrinsic source that can cascade into different plans or actions. It distinguishes this from extrinsic sources such as changing environments, live data, and serving infrastructure. A sympathetic reader would care because the separation explains when variability is reproducible under matched conditions and why fixing execution parameters does not guarantee identical deployed behavior.

Core claim

Agentic AI systems can behave differently across runs because the same request may produce a different plan, tool call, code edit, or final answer. Such variability arises from several layers that are often conflated. At the core is a foundation model embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source is token generation, where scores over next tokens are converted to probabilities and a decoder samples using a pseudo-random number generator, allowing small differences to cascade downstream. Other sources are extrinsic, including changing environments, live data, serving infrastructure, batch effects, and numerical de

What carries the argument

The partition of variability sources into intrinsic token-sampling effects (from the model's probability distribution and pseudo-random decoder) versus extrinsic effects (environment, data, infrastructure) within the foundation-model orchestration loop.

If this is right

  • Variability from token sampling can be reproduced by matching the pseudo-random number generator state.
  • Deterministic execution of the orchestration loop can still yield non-identical agent behavior when extrinsic factors differ.
  • Controlling only the sampling layer leaves other sources of non-reproducibility unaddressed.
  • Understanding the layers allows targeted interventions to reduce or isolate specific forms of variability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Testing frameworks could record and replay both the random seed and the extrinsic context to isolate which layer drives a given difference.
  • Safety evaluations of agents may need separate protocols for sampling-induced versus environment-induced divergences.
  • Deployment pipelines might log the full state of the orchestration loop to diagnose whether observed changes trace to sampling or to external inputs.

Load-bearing premise

The sources of variability can be cleanly partitioned into intrinsic token-sampling effects and extrinsic effects with limited interaction that would blur the distinction in real deployments.

What would settle it

Running an agentic system multiple times with identical inputs, fixed random seeds, and controlled environments yet still observing divergent tool calls or state updates would indicate that the clean partition does not hold.

Figures

Figures reproduced from arXiv: 2606.08998 by Muhammad Zia Hydari, Raja Iqbal.

Figure 1
Figure 1. Figure 1: Anatomy of an agentic AI system. A foundation model and decoder sit inside an orchestration [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Agentic stochasticity. The agent repeatedly turns model outputs into actions and observations. A [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Agentic stochasticity. The agent repeatedly turns model outputs into actions and observations. A [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sampling as a lookup on the unit interval. The probabilities [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: From context tokens to a selected next token. The tokens already in the model context are used to [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Temperature reshapes the same logits before sampling. All three panels use the identical raw scores; [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: Sampling as a lookup on the unit interval. The probabilities [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The token-generation pipeline. The model’s forward pass computes logits and probabilities. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 5
Figure 5. Figure 5: Temperature reshapes the toy next-token distribution. Lower temperature concentrates probability [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: How a single token flips a trajectory. Two runs share an identical token prefix, then sampling [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 6
Figure 6. Figure 6: The token-generation pipeline. The model’s forward pass computes logits and probabilities. [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: How a single token flips a trajectory. Two runs share an identical token prefix, then sampling [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
read the original abstract

Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. At the core of many current agents is a foundation model, a large pretrained model adaptable to many downstream tasks, embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source of variability in such systems is token generation: the model computes scores over possible next tokens, the scores are converted into probabilities, and a decoder may sample tokens using a pseudo-random number generator. A small sampled token difference can then cascade downstream into a different tool call, code path, search query, or agent state. Other sources of variability are extrinsic to token sampling, including changing environments, live data, serving infrastructure, batch effects, and numerical details. By separating these layers, this tutorial clarifies what it means to call agentic AI systems stochastic, when such variability can be reproduced under matched conditions, and why deterministic execution need not imply identical behavior in deployed settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript is a tutorial that distinguishes intrinsic variability in agentic AI systems—arising from token sampling in foundation models during autoregressive decoding—from extrinsic sources such as changing environments, serving infrastructure, batch effects, and numerical details. It claims that cleanly separating these layers clarifies what it means to describe such systems as stochastic, identifies when variability can be reproduced under matched conditions, and explains why deterministic execution need not produce identical behavior in deployed settings.

Significance. If the separation can be maintained, the tutorial offers a clear conceptual framework for reasoning about reproducibility and stochasticity in agent loops, which could aid experiment design and deployment analysis. The work is purely descriptive with no free parameters, axioms, or invented entities, consistent with a tutorial format, but this also means it provides no empirical tests or formal models to anchor its distinctions.

major comments (1)
  1. [Abstract] Abstract: The central claim that intrinsic sampling effects and extrinsic factors can be partitioned with only limited interaction is load-bearing for the reproducibility distinction, yet the text supplies no independence condition, formal model of the agent loop, or worked example showing that a sampled token's downstream effect on state, tool calls, or queries does not materially alter the extrinsic inputs observed next.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed report and the opportunity to clarify the scope of our tutorial. The central concern is addressed point-by-point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that intrinsic sampling effects and extrinsic factors can be partitioned with only limited interaction is load-bearing for the reproducibility distinction, yet the text supplies no independence condition, formal model of the agent loop, or worked example showing that a sampled token's downstream effect on state, tool calls, or queries does not materially alter the extrinsic inputs observed next.

    Authors: The manuscript is explicitly a conceptual tutorial and does not assert a formal statistical independence condition or supply a mathematical model of the agent loop; the separation is presented as a descriptive distinction between sources of variability rather than a claim of zero interaction. We acknowledge that the current text contains no worked example demonstrating traceability of downstream effects. To address this, we will revise the abstract to emphasize the conceptual nature of the distinction and add a short illustrative scenario in the main text showing how a single sampled token can produce a different tool call and subsequent observation while other extrinsic factors remain matched. This addition will not introduce formal axioms or empirical tests, consistent with the tutorial format. revision: yes

Circularity Check

0 steps flagged

No circularity; purely descriptive conceptual separation

full rationale

The manuscript is a tutorial that partitions variability sources into intrinsic token-sampling effects and extrinsic factors, then explains implications for reproducibility. No equations, fitted parameters, self-citations, or derivation steps appear in the provided text. The central claim is a clarification of terminology and conditions rather than a result obtained by reducing prior inputs to themselves. The separation is asserted as an organizing lens, not derived from any self-referential construction or ansatz. This is the expected non-finding for a non-mathematical descriptive paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the work is a conceptual clarification without mathematical derivations or new postulates.

pith-pipeline@v0.9.1-grok · 5736 in / 916 out tokens · 13224 ms · 2026-07-03T23:57:35.483911+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 6 canonical work pages · 6 internal anchors

  1. [1]

    Communications of the ACM , year =

    Hydari, Muhammad Zia and Iqbal, Raja and Ramasubbu, Narayan , title =. Communications of the ACM , year =

  2. [2]

    and Adeli, Ehsan and Altman, Russ and Arora, Simran and von Arx, Sydney and Bernstein, Michael S

    Bommasani, Rishi and Hudson, Drew A. and Adeli, Ehsan and Altman, Russ and Arora, Simran and von Arx, Sydney and Bernstein, Michael S. and Bohg, Jeannette and Bosselut, Antoine and Brunskill, Emma and others , title =. 2021 , url =

  3. [3]

    2025 , howpublished =

  4. [4]

    2023 , howpublished =

  5. [5]

    2024 , howpublished =

  6. [6]

    2026 , howpublished =

  7. [7]

    2019 , howpublished =

  8. [8]

    2025 , url =

    He, Horace and. 2025 , url =

  9. [9]

    and Kaiser, Lukasz and Polosukhin, Illia , title =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , title =. Advances in Neural Information Processing Systems , year =

  10. [10]

    , title =

    Train, Kenneth E. , title =

  11. [11]

    International Conference on Learning Representations (ICLR) , year =

    Holtzman, Ari and Buys, Jan and Du, Li and Forbes, Maxwell and Choi, Yejin , title =. International Conference on Learning Representations (ICLR) , year =

  12. [12]

    International Conference on Learning Representations (ICLR) , year =

    Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , title =. International Conference on Learning Representations (ICLR) , year =

  13. [13]

    Advances in Neural Information Processing Systems , year =

    Schick, Timo and Dwivedi-Yu, Jane and Dessi, Roberto and Raileanu, Roberta and Lomeli, Maria and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas , title =. Advances in Neural Information Processing Systems , year =

  14. [14]

    and Burger, Doug and Wang, Chi , title =

    Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Li, Beibin and Zhu, Erkang and Jiang, Li and Zhang, Xiaoyun and Zhang, Shaokun and Liu, Jiale and Awadallah, Ahmed Hassan and White, Ryen W. and Burger, Doug and Wang, Chi , title =. Conference on Language Modeling (COLM) , year =

  15. [15]

    Tool use with Claude

    Anthropic . Tool use with Claude . Claude API documentation, 2024. URL https://perma.cc/H73D-K98H

  16. [16]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the Opportunities and Risks of Foundation Models . Stanford CRFM, 2021. URL https://arxiv.org/abs/2108.07258

  17. [17]

    About GitHub Copilot cloud agent

    GitHub . About GitHub Copilot cloud agent . GitHub Docs, 2025. URL https://perma.cc/P7C4-U93C

  18. [18]

    Defeating Nondeterminism in LLM Inference

    Horace He and Thinking Machines Lab . Defeating Nondeterminism in LLM Inference . Thinking Machines Lab: Connectionism, 2025. URL https://perma.cc/EQ8Z-BR32

  19. [19]

    The Curious Case of Neural Text Degeneration

    Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration . In International Conference on Learning Representations (ICLR), 2020. URL https://arxiv.org/abs/1904.09751

  20. [20]

    openai-community/gpt2

    Hugging Face . openai-community/gpt2 . Hugging Face model repository, 2019. URL https://perma.cc/K4BD-FL76

  21. [21]

    Generation strategies

    Hugging Face . Generation strategies . Transformers documentation, 2026. URL https://perma.cc/26E6-N4GT

  22. [22]

    Governing technical debt in agentic AI systems

    Muhammad Zia Hydari, Raja Iqbal, and Narayan Ramasubbu. Governing technical debt in agentic AI systems . Communications of the ACM, in press

  23. [23]

    LangChain . Agents . LangChain documentation, 2026. URL https://perma.cc/L9AB-YLZE

  24. [24]

    How to make your completions outputs consistent with the new seed parameter

    OpenAI . How to make your completions outputs consistent with the new seed parameter . OpenAI Cookbook, 2023. URL https://perma.cc/S8BA-K5LH

  25. [25]

    OpenAI . Agents . OpenAI Agents SDK documentation, 2025 a . URL https://perma.cc/B968-432N

  26. [26]

    Agents SDK

    OpenAI . Agents SDK . OpenAI API documentation, 2025 b . URL https://perma.cc/5X7Z-Z3R6

  27. [27]

    OpenAI . Codex . OpenAI Developers documentation, 2025 c . URL https://perma.cc/U45T-X6SW

  28. [28]

    OpenAI . Tools . OpenAI Agents SDK documentation, 2025 d . URL https://perma.cc/6WNJ-JYNY

  29. [29]

    PyTorch . Dropout . PyTorch documentation, 2026 a . URL https://perma.cc/P5UD-HP55

  30. [30]

    torch.multinomial

    PyTorch . torch.multinomial . PyTorch documentation, 2026 b . URL https://perma.cc/KG62-JC5K

  31. [31]

    Reproducibility

    PyTorch . Reproducibility . PyTorch documentation, 2026 c . URL https://perma.cc/SX9P-7NQ8

  32. [32]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools . In Advances in Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2302.04761

  33. [33]

    Kenneth E. Train. Discrete Choice Methods with Simulation . Cambridge University Press, 2nd edition, 2009

  34. [34]

    Attention Is All You Need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need . In Advances in Neural Information Processing Systems, 2017. URL https://arxiv.org/abs/1706.03762

  35. [35]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation framework . In Conference on Language Modeling (COLM), 2024. URL https://arxiv.org/abs/2308.08155

  36. [36]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models . In International Conference on Learning Representations (ICLR), 2023. URL https://arxiv.org/abs/2210.03629