The Illusion of Stochasticity in LLMs

Larisa Markeeva; Michalis Titsias; Petar Veli\v{c}kovi\'c; Razvan Pascanu; Soham De; Xiangming Gu

arxiv: 2604.06543 · v1 · submitted 2026-04-08 · 💻 cs.CL · cs.LG

The Illusion of Stochasticity in LLMs

Xiangming Gu , Soham De , Michalis Titsias , Larisa Markeeva , Petar Veli\v{c}kovi\'c , Razvan Pascanu This is my paper

Pith reviewed 2026-05-10 18:48 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords large language modelsstochastic samplingagentic systemsprobability estimationdistribution samplingLLM limitations

0 comments

The pith

LLMs cannot map their internal probability estimates to accurate stochastic outputs when sampling from distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models are increasingly used as agents that must sample from distributions inferred from data, yet they fail to align their generated tokens with their own estimated probabilities. Experiments across model families, sizes, and prompting styles show that models can convert externally provided random seeds into target distributions, but direct sampling remains unreliable. This gap creates an illusion of stochastic capability because agentic systems depend on consistent randomness for exploration and decision-making, unlike traditional RL agents that use separate sampling mechanisms. If the claim holds, LLMs cannot serve as standalone stochastic agents without external support for distribution sampling.

Core claim

The paper establishes that LLMs suffer from a fundamental mismatch: while they can use provided random seeds to produce outputs matching target distributions, their direct sampling from specific distributions does not reflect their internal probability estimates, exposing a core limitation in emulating stochastic processes required for agentic behavior.

What carries the argument

The mismatch between LLMs' internal probability estimates over tokens and the empirical distributions of their generated stochastic outputs during direct sampling tasks.

If this is right

Agentic LLM systems will need external sampling modules to achieve reliable stochastic behavior.
Even frontier models exhibit the sampling failure across multiple distributions and prompting methods.
Tasks requiring sampling from inferred data distributions will show systematic deviations from intended behavior.
LLMs cannot replace standard RL agents in environments that depend on independent random sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training regimes that explicitly include stochastic sampling objectives might reduce the mismatch without changing the base architecture.
This limitation could extend to related capabilities like uncertainty estimation or simulation-based reasoning in LLMs.
Hybrid systems combining LLMs with dedicated random number generators may become standard for agent deployments.

Load-bearing premise

The mismatch between internal probability estimates and generated outputs is a fundamental architectural or training limitation rather than a fixable artifact of prompting or insufficient exposure to stochastic tasks.

What would settle it

A test where samples generated by an LLM from a known distribution, without any external random seeds, produce an empirical frequency distribution that closely matches the model's own reported token probabilities for that task.

Figures

Figures reproduced from arXiv: 2604.06543 by Larisa Markeeva, Michalis Titsias, Petar Veli\v{c}kovi\'c, Razvan Pascanu, Soham De, Xiangming Gu.

**Figure 1.** Figure 1: LLMs are heavily biased towards “C” rather than sampling uniformly when being prompted to generate a multi-choice question. In this paper, we propose that one potential reason such a knowing-doing gap may arise is from a fundamental shortcoming: even if the model knows the correct policy, acting stochastically according to this policy is non-trivial for LLMs. This is because the model has to implicitly sa… view at source ↗

**Figure 2.** Figure 2: Empirical distribution estimates, for various target distributions, using 1024 in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Empirical distribution estimates, using ( [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: Empirical distribution estimates, for different orders of random set in the prompt, (Left) {left, right, up, down, fire}(Right) {up, down, left, right, fire}. Positional bias. While the observed biases can sometimes be intuitive, e.g., the preference to numbers such as 42 or 7, which probably correlates with high presence of these numbers in the training data, this is not always the case. For example, w… view at source ↗

**Figure 5.** Figure 5: From left to right, we showcase the auto-correlation of the sample sequence, against different time lags, for four sampling approaches when sampling from a uniform distribution between 0 and 9 with Gemini-2.5-Pro: independent sampling, sequential sampling with-all-history, sequential sampling with-last-history, batched sampling. 3.3 Effects of chain-of-thoughts Previous works (e.g., Chen et al., 2024; Sui … view at source ↗

**Figure 6.** Figure 6: Empirical transition counts between pairs of states when sampling from a uniform distribution between 0 and 9 with Gemini-2.5-Pro. (Left) sequential sampling with-all-history (Right) sequential sampling with-last-history. Temporal bias. With the introduction of dependence on the history, we can also investigate the temporal biases when sampling from LLMs. Specifically, we estimate auto-correlation from … view at source ↗

**Figure 7.** Figure 7: Empirical distribution estimates, for a uniform discrete distribution between 0 and [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Truncated responses when prompting LLMs (Qwen3-32B) to use a random number [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: LLMs can reliably convert a uniform distribution in [0, 1] to various distributions. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Example responses of (Left) Gemini-2.5-Pro and (Right) Gemini-3.0-Pro when prompting to generate a multi-choice question. B.2 Sampling from distributions with OLMO-3 family In addition to Gemini (Comanici et al., 2025; Google, 2025) and Qwen3 (Yang et al., 2025) families shown in the main paper, we also evaluate the model behaviors of OLMO-3 (Olmo et al., 2025), including OLMO-3-7B-Think 1 and OLMO-3-32B-… view at source ↗

**Figure 11.** Figure 11: Empirical distribution estimates, for various distributions, using 1024 indepen [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: Example of thinking trace of Gemini-2.5-Flash when sampling from a uniform [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: Example of thinking trace of Gemini-2.5-Flash when sampling from a uniform [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

**Figure 14.** Figure 14: Example of thinking trace of Gemini-2.5-Pro when sampling from a uniform [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗

**Figure 15.** Figure 15: Example of thinking trace of Gemini-2.5-Pro when sampling from a Gaussian [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗

**Figure 16.** Figure 16: Example of response of Qwen3-8B when sampling from a uniform discrete [PITH_FULL_IMAGE:figures/full_fig_p017_16.png] view at source ↗

**Figure 17.** Figure 17: Example of response of Qwen3-14B when sampling from a uniform discrete [PITH_FULL_IMAGE:figures/full_fig_p018_17.png] view at source ↗

**Figure 18.** Figure 18: Example of response of Qwen3-32B when sampling from a uniform continuous [PITH_FULL_IMAGE:figures/full_fig_p018_18.png] view at source ↗

**Figure 19.** Figure 19: Example of response of Qwen3-30B-A3B when sampling from a Gaussian [PITH_FULL_IMAGE:figures/full_fig_p019_19.png] view at source ↗

**Figure 20.** Figure 20: Examples of response of (Left) Gemini-2.5-Pro and (Right) Qwen3-8B without chain-of-though reasoning. 0 10 20 30 40 50 60 70 80 90 0 50 100 150 200 250 300 350 400 387 409 138 Uniform discrete 1 to 100 0 10 20 30 40 50 60 70 80 90 0 200 400 600 800 1000 1014 Uniform discrete 1 to 100 0 1 2 3 4 5 6 7 8 9 0 200 400 600 800 72 928 Uniform discrete 0 to 9 0 1 2 3 4 5 6 7 8 9 0 200 400 600 800 142 870 Uniform … view at source ↗

**Figure 21.** Figure 21: Empirical distribution estimates, using ( [PITH_FULL_IMAGE:figures/full_fig_p020_21.png] view at source ↗

**Figure 22.** Figure 22: From left to right, we compare the empirical distribution estimates, for four sampling approaches when sampling from a uniform discrete distribution between 0 and 9 with Gemini-2.5-Pro: independent sampling, sequential sampling with all history, sequential sampling with last history, batched sampling. 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 0 1 2 3 4 5 6 7 Independent sampling theory 1.5 1.0 0.5 0.0 … view at source ↗

**Figure 23.** Figure 23: From left to right, we compare four sampling approaches by adopting Gemini-2.5-Pro to sample from a Gaussian distribution with mean 0 and standard deviation 1: independent sampling, sequential sampling with all history, sequential sampling with last history, batched sampling. While for Qwen3 family, we typically find that they even struggle with a uniform discrete distribution between 0 and 9, as shown in… view at source ↗

**Figure 24.** Figure 24: (Left) Empirical distribution estimates and transition counts of Qwen3-32B when sequential sampling from uniform discrete distribution between 0 and 9 (Left) with all history (Right) with last history. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_24.png] view at source ↗

**Figure 25.** Figure 25: Example of thinking trace of Gemini-2.5-Pro when using sequential sampling [PITH_FULL_IMAGE:figures/full_fig_p022_25.png] view at source ↗

**Figure 26.** Figure 26: Example of thinking trace of Gemini-2.5-Pro when using sequential sampling [PITH_FULL_IMAGE:figures/full_fig_p023_26.png] view at source ↗

**Figure 27.** Figure 27: Example of thinking trace of Gemini-2.5-Pro when using sequential sampling [PITH_FULL_IMAGE:figures/full_fig_p024_27.png] view at source ↗

**Figure 28.** Figure 28: Example of thinking trace of Gemini-2.5-Pro when using sequential sampling [PITH_FULL_IMAGE:figures/full_fig_p025_28.png] view at source ↗

**Figure 29.** Figure 29: Empirical distribution estimates for the number of generated numbers when [PITH_FULL_IMAGE:figures/full_fig_p026_29.png] view at source ↗

**Figure 30.** Figure 30: Empirical distribution estimates of first and second sample across independent [PITH_FULL_IMAGE:figures/full_fig_p026_30.png] view at source ↗

**Figure 31.** Figure 31: Example of thinking trace of Gemini-2.5-Pro when batched sampling from a [PITH_FULL_IMAGE:figures/full_fig_p027_31.png] view at source ↗

**Figure 32.** Figure 32: Example of thinking trace of Gemini-2.5-Pro when batched sampling from a [PITH_FULL_IMAGE:figures/full_fig_p028_32.png] view at source ↗

**Figure 33.** Figure 33: Example of thinking trace of Qwen3-32B when batched sampling from a uniform [PITH_FULL_IMAGE:figures/full_fig_p029_33.png] view at source ↗

**Figure 34.** Figure 34: Example code generation and execution results of ( [PITH_FULL_IMAGE:figures/full_fig_p030_34.png] view at source ↗

**Figure 35.** Figure 35: Example code generation and execution results of ( [PITH_FULL_IMAGE:figures/full_fig_p031_35.png] view at source ↗

**Figure 36.** Figure 36: Prompt design for using LLMs to sample from a uniform discrete distribution [PITH_FULL_IMAGE:figures/full_fig_p031_36.png] view at source ↗

**Figure 37.** Figure 37: Prompt design for using LLMs to sample from a uniform continuous distribution [PITH_FULL_IMAGE:figures/full_fig_p032_37.png] view at source ↗

**Figure 38.** Figure 38: Prompt design for using LLMs to sample from a Gaussian distribution with a [PITH_FULL_IMAGE:figures/full_fig_p033_38.png] view at source ↗

**Figure 39.** Figure 39: From left to right, we visualize the failures of LLMs simulating PRNG algorithms for Gaussian distributions: Gemini-2.5-Flash, Gemini-2.5-Pro, Qwen3-8B, Qwen3-32B. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_39.png] view at source ↗

**Figure 40.** Figure 40: Truncated responses when prompting (Left) Gemini-2.5-Pro and (Right) Qwen3- 8B to simulate a PRNG algorithm for uniform discrete distribution between 0 and 9. The thinking traces and some special tokens are also truncated as they are too verbose. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_40.png] view at source ↗

**Figure 41.** Figure 41: Truncated responses when prompting (Left) Gemini-2.5-Pro and (Right) Qwen3- 14B to simulate a PRNG algorithm for uniform continuous distribution [0, 1]. The thinking traces and some special tokens are also truncated as they are too verbose. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_41.png] view at source ↗

**Figure 42.** Figure 42: Truncated responses when prompting (Left) Gemini-2.5-Pro and (Right) Qwen3- 32B to simulate a PRNG algorithm for Gaussian distribution N (0, 1). The thinking traces and some special tokens are also truncated as they are too verbose. We use bold red to mark the first mistake LLM made. Typically LLMs struggle with large number multiplications. 37 [PITH_FULL_IMAGE:figures/full_fig_p037_42.png] view at source ↗

**Figure 43.** Figure 43: Responses when prompting (Left) Gemini-2.5-Pro (Right) Qwen3-30B-A3B to use a random number in [0, 1] to sample from a uniform discrete discrete distribution between 0 and 9. 38 [PITH_FULL_IMAGE:figures/full_fig_p038_43.png] view at source ↗

**Figure 44.** Figure 44: Responses when prompting (Left) Gemini-2.5-Pro (Right) Qwen3-14B to use a random number in [0, 1] to sample from a Gaussian distribution with mean 0 and standard deviation 1. 39 [PITH_FULL_IMAGE:figures/full_fig_p039_44.png] view at source ↗

**Figure 45.** Figure 45: Responses when prompting (Left) Gemini-3.0-Pro to sample from a Gaussian Mixture Model (GMM); (Right) Qwen3-32B to sample from a non-uniform discrete distribution on directions usi a random number in [0, 1] as input. 40 [PITH_FULL_IMAGE:figures/full_fig_p040_45.png] view at source ↗

read the original abstract

In this work, we demonstrate that reliable stochastic sampling is a fundamental yet unfulfilled requirement for Large Language Models (LLMs) operating as agents. Agentic systems are frequently required to sample from distributions, often inferred from observed data, a process which needs to be emulated by the LLM. This leads to a distinct failure point: while standard RL agents rely on external sampling mechanisms, LLMs fail to map their internal probability estimates to their stochastic outputs. Through rigorous empirical analysis across multiple model families, model sizes, prompting styles, and distributions, we demonstrate the extent of this failure. Crucially, we show that while powerful frontier models can convert provided random seeds to target distributions, their ability to sample directly from specific distributions is fundamentally flawed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLMs can map seeds to distribution samples but fail at direct sampling, which flags a practical limit for agents.

read the letter

The paper's central finding is that large language models cannot reliably sample from specified probability distributions on their own. They do better when handed a random seed and asked to map it to the distribution. This observation comes from tests on several model families and sizes. The authors vary prompts and distributions to show the direct sampling fails while seed-based succeeds. It points to a disconnect between the model's internal probability estimates and its generated outputs. For agent systems that need to simulate uncertainty or explore options, this could be a real constraint since they often can't rely on external random number generators. The strength here is the targeted comparison. It is not just saying LLMs are bad at randomness; it shows they can handle it under one condition but not another. That makes the claim more precise. The empirical scope across models adds weight, assuming the controls are in place. One area that needs checking is how they define and measure the failure. If the direct sampling prompts are not optimized, the gap might shrink with better techniques. The paper treats it as fundamental, but evidence for that would require showing persistence after reasonable attempts to fix it. Also, the abstract mentions rigorous analysis, so the full paper should have the stats and sample details to back the extent of the failure. Readers focused on building reliable LLM agents or studying model capabilities in stochastic settings would get value from this. It is not a theoretical breakthrough but a practical flag. I think it should go to peer review. The experiments are doable to replicate or extend, and the issue is worth documenting if the results hold.

Referee Report

2 major / 0 minor

Summary. The paper claims that LLMs cannot reliably perform direct stochastic sampling from target probability distributions (despite internal probability estimates), while they succeed at converting externally provided random seeds into samples from those distributions. This distinction is presented as a fundamental limitation for agentic use of LLMs, supported by empirical tests across model families, sizes, prompting styles, and distributions.

Significance. If substantiated, the result would highlight a practical barrier to using LLMs for tasks requiring unbiased sampling from inferred distributions, reinforcing the need for external randomness sources in agent architectures rather than relying on the model's token-generation process.

major comments (2)

[Abstract] Abstract: the assertion of 'rigorous empirical analysis across multiple model families, model sizes, prompting styles, and distributions' supplies no sample sizes, specific distributions tested, evaluation metrics for mismatch between internal probabilities and outputs, or statistical controls, making it impossible to assess whether the data support the claim that direct sampling is 'fundamentally flawed'.
[Abstract] The central distinction between seed-assisted and direct sampling is load-bearing, yet the manuscript does not describe controls that isolate architectural limits from prompting artifacts (e.g., whether chain-of-thought or distribution-specific few-shot examples were systematically varied).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight opportunities to improve the clarity and transparency of our empirical claims. We address each major comment point by point below, indicating revisions where we agree the manuscript can be strengthened.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'rigorous empirical analysis across multiple model families, model sizes, prompting styles, and distributions' supplies no sample sizes, specific distributions tested, evaluation metrics for mismatch between internal probabilities and outputs, or statistical controls, making it impossible to assess whether the data support the claim that direct sampling is 'fundamentally flawed'.

Authors: We agree that the abstract, being concise by nature, does not enumerate these details. The full manuscript (Sections 3 and 4) specifies the experimental scale, the concrete distributions evaluated, the quantitative mismatch metrics (e.g., total variation and KL divergence between target and realized output distributions), and the statistical procedures employed (repeated trials with variance reporting). We have revised the abstract to include a brief summary of sample sizes, distributions, metrics, and controls so that the strength of the evidence can be assessed from the abstract alone. revision: yes
Referee: [Abstract] The central distinction between seed-assisted and direct sampling is load-bearing, yet the manuscript does not describe controls that isolate architectural limits from prompting artifacts (e.g., whether chain-of-thought or distribution-specific few-shot examples were systematically varied).

Authors: The manuscript states that results hold across prompting styles and includes those variations in the experimental design. To make the isolation of architectural versus prompting effects fully explicit, we have added a dedicated paragraph in the methods section that systematically enumerates the prompting conditions tested (direct, chain-of-thought, and distribution-specific few-shot) and reports that the performance gap between direct and seed-assisted sampling remains consistent across all variants. This addition clarifies that the observed limitation is not reducible to prompting artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper advances an empirical claim based on experiments across model families, sizes, prompting styles, and distributions: LLMs can map provided random seeds to target distributions but fail at direct sampling from those distributions. No equations, derivations, fitted parameters, or self-referential constructions appear in the abstract or description. The analysis relies on external benchmarks (observed outputs vs. internal probabilities) rather than reducing any result to its own inputs by construction, making the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim is supported by empirical observations rather than theoretical derivations. No free parameters, axioms, or invented entities are invoked or mentioned in the abstract.

pith-pipeline@v0.9.0 · 5440 in / 1061 out tokens · 37513 ms · 2026-05-10T18:48:32.799979+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Crucially, we show that while powerful frontier models can convert provided random seeds to target distributions, their ability to sample directly from specific distributions is fundamentally flawed.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLMs fail to map their internal probability estimates to their stochastic outputs.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

A standard fair coin (one Head, one Tail)

work page
[2]

A two-headed coin (Heads on both sides)

work page
[3]

A coin is selected at random from the bag and flipped

A two-tailed coin (Tails on both sides). A coin is selected at random from the bag and flipped. The coin lands showing **Heads**. What is the probability that the other side of the coin is also Heads? A) 1 3 B) 1 2 C) 2 3 D) 5 6 *** **Correct Answer:** C *** **Explanation:** ... Figure 10: Example responses of (Left) Gemini-2.5-Pro and (Right) Gemini-3.0-...

work page 2025
[4]

There’s no bias; the choice is random enough for this operation

It’s perfectly valid given the constraints. There’s no bias; the choice is random enough for this operation. Finally, I’ll format the output within the LaTeX command ‘\\boxed{}‘. Model response:7 Figure 12: Example of thinking trace of Gemini-2.5-Flash when sampling from a uniform discrete distribution between 0 and 9. B.4 Effects of chain-of-thoughts Cur...

work page
[5]

ran- dom.randint()

Square it: 1234 2 = 1522756. 3. Extract the middle four digits: **2275**. 4. Take this modulo 26: 2275 mod 26=13. 5. Map 13 to the corresponding letter: **m**— Final Answerm Figure 17: Example of response of Qwen3-14B when sampling from a uniform discrete distribution on letters. Qwen3-32B Sample a number from a uniform distribution within the continuous ...

work page 2024
[6]

First, we instantiate the class with the given seed

Initialization. First, we instantiate the class with the given seed

work page
[7]

Step 1: Get the raw large number by call- ing ‘next raw()‘

Calling ‘randint(0, 9)‘ ... Step 1: Get the raw large number by call- ing ‘next raw()‘ ... ‘(a * state + c) % m‘ ... ‘1103515245 * 1000 = 1103515245000‘ ... ‘1103515245000 + 12345 = 1103515257345‘ ... ‘1103515257345 / 2147483648‘ gives a quotient of ‘513‘ and a remainder ... ‘self.state = 1856145921‘ ... Step 2: Determine the size of the range ... Step 3:...

work page
[8]

**Initialize the PRNG with seed 1035**

work page
[9]

state= (a·state+c) modm

**Compute the raw number via ‘next raw()‘** ... state= (a·state+c) modm ... First compute: a·state= 1103515245 · 1035 = 1142138278575 ... Then add c: 1142138278575 + 12345 = 1142138290920 ... Now compute modulo m: 1142138290920 mod 2147483648 = 1824473832

work page
[10]

Computes: scaled= 1824473832 mod 10 = 2, Returns:result= 0 + 2 = 2

**Generate a number in the range [0, 9]** ... Computes: scaled= 1824473832 mod 10 = 2, Returns:result= 0 + 2 = 2 ... **Final Result** ... 2 Figure 40: Truncated responses when prompting (Left) Gemini-2.5-Pro and (Right) Qwen3- 8B to simulate a PRNG algorithm for uniform discrete distribution between 0 and 9. The thinking traces and some special tokens are...

work page
[11]

Initialize the state and constants

work page
[12]

‘(a * state + c) % m‘

Calculate the next state: ... ‘(a * state + c) % m‘ ... Step 2a: ‘a * state‘ ... ‘1103515245 * 1846 = 2037089142270‘ ... Step 2b: ‘a * state + c‘ ... ‘2037089142270 + 12345 = 2037089154615‘ ... Step 2c: ‘(a * state + c) % m‘ ... The remain- der is ‘2037089154615 - 2035814498304 = 1274656311‘

work page
[13]

‘1274656311 / 2147483648 ≈ 0.5935603845‘

Normalize the state to get the final number: ... ‘1274656311 / 2147483648 ≈ 0.5935603845‘ ... approximately 0.59356 ... 0.5935603845 . Qwen3-14B ... Sample a number from a uniform dis- tribution within the continuous range [0, 1] ... **random seed 1817** ... [...thinking...] To generate a random number from a uni- form distribution in the continuous range...

work page
[14]

‘R‘ = ‘sqrt(3.149)‘ ≈ **1.7746**

**Calculate R:** ... ‘R‘ = ‘sqrt(3.149)‘ ≈ **1.7746** ... 2. **Calculate theta:** ... ‘theta‘≈ **2.5626** (in radians) ... 3. **Cal- culate the final value ‘z0‘: ... ‘z0‘ ≈ **- 1.4832 ... ‘mu + Z * sigma‘, which is ‘0 + z0 * 1‘, the final result is ‘z0‘.−1.4832 Qwen3-32B ... Sample a number from a Gaussian dis- tribution with mean 0 and standard devi- ati...

work page 2023
[15]

Convert to float: u2 = 1732512723 2147483648 ≈ 0.807

Resulting state: state2 =1732512723 ... Convert to float: u2 = 1732512723 2147483648 ≈ 0.807 ... Step 3: Apply the Box-Muller Transform ... 1. Compute: R= p −2·ln(u 1) =p −2·ln(0.547)≈ p −2·(−0.603) =√ 1.206≈ 1.098 ... 2. Compute: θ= 2π· u2 = 2π· 0.807 ≈ 5.073 radians ... 3. Com- pute: Z=R·cos(θ)≈ 1.098 ·cos( 5.073) ... 4. Compute cos(5.073)5.073, textrad...

work page
[16]

Given ‘u = 0.8921795677048454‘:

and take the floor of the result. Given ‘u = 0.8921795677048454‘:

work page
[17]

Multiply by the number of out- comes (10): ‘0.8921795677048454 * 10 = 8.921795677048454‘

work page
[18]

The given number 0.8921795677048454 falls into the interval [0.8, 0.9), which corresponds to the integer 8

Take the floor of the result: ‘floor(8.921795677048454) = 8‘ This method works by dividing the in- terval [0, 1] into 10 equal sub-intervals: [0, 0.1), [0.1, 0.2), ..., [0.9, 1.0). The given number 0.8921795677048454 falls into the interval [0.8, 0.9), which corresponds to the integer 8. The sampled value is 8. 8 Qwen3-30B-A3B Here is a random number samp...

work page

[1] [1]

A standard fair coin (one Head, one Tail)

work page

[2] [2]

A two-headed coin (Heads on both sides)

work page

[3] [3]

A coin is selected at random from the bag and flipped

A two-tailed coin (Tails on both sides). A coin is selected at random from the bag and flipped. The coin lands showing **Heads**. What is the probability that the other side of the coin is also Heads? A) 1 3 B) 1 2 C) 2 3 D) 5 6 *** **Correct Answer:** C *** **Explanation:** ... Figure 10: Example responses of (Left) Gemini-2.5-Pro and (Right) Gemini-3.0-...

work page 2025

[4] [4]

There’s no bias; the choice is random enough for this operation

It’s perfectly valid given the constraints. There’s no bias; the choice is random enough for this operation. Finally, I’ll format the output within the LaTeX command ‘\\boxed{}‘. Model response:7 Figure 12: Example of thinking trace of Gemini-2.5-Flash when sampling from a uniform discrete distribution between 0 and 9. B.4 Effects of chain-of-thoughts Cur...

work page

[5] [5]

ran- dom.randint()

Square it: 1234 2 = 1522756. 3. Extract the middle four digits: **2275**. 4. Take this modulo 26: 2275 mod 26=13. 5. Map 13 to the corresponding letter: **m**— Final Answerm Figure 17: Example of response of Qwen3-14B when sampling from a uniform discrete distribution on letters. Qwen3-32B Sample a number from a uniform distribution within the continuous ...

work page 2024

[6] [6]

First, we instantiate the class with the given seed

Initialization. First, we instantiate the class with the given seed

work page

[7] [7]

Step 1: Get the raw large number by call- ing ‘next raw()‘

Calling ‘randint(0, 9)‘ ... Step 1: Get the raw large number by call- ing ‘next raw()‘ ... ‘(a * state + c) % m‘ ... ‘1103515245 * 1000 = 1103515245000‘ ... ‘1103515245000 + 12345 = 1103515257345‘ ... ‘1103515257345 / 2147483648‘ gives a quotient of ‘513‘ and a remainder ... ‘self.state = 1856145921‘ ... Step 2: Determine the size of the range ... Step 3:...

work page

[8] [8]

**Initialize the PRNG with seed 1035**

work page

[9] [9]

state= (a·state+c) modm

**Compute the raw number via ‘next raw()‘** ... state= (a·state+c) modm ... First compute: a·state= 1103515245 · 1035 = 1142138278575 ... Then add c: 1142138278575 + 12345 = 1142138290920 ... Now compute modulo m: 1142138290920 mod 2147483648 = 1824473832

work page

[10] [10]

Computes: scaled= 1824473832 mod 10 = 2, Returns:result= 0 + 2 = 2

**Generate a number in the range [0, 9]** ... Computes: scaled= 1824473832 mod 10 = 2, Returns:result= 0 + 2 = 2 ... **Final Result** ... 2 Figure 40: Truncated responses when prompting (Left) Gemini-2.5-Pro and (Right) Qwen3- 8B to simulate a PRNG algorithm for uniform discrete distribution between 0 and 9. The thinking traces and some special tokens are...

work page

[11] [11]

Initialize the state and constants

work page

[12] [12]

‘(a * state + c) % m‘

Calculate the next state: ... ‘(a * state + c) % m‘ ... Step 2a: ‘a * state‘ ... ‘1103515245 * 1846 = 2037089142270‘ ... Step 2b: ‘a * state + c‘ ... ‘2037089142270 + 12345 = 2037089154615‘ ... Step 2c: ‘(a * state + c) % m‘ ... The remain- der is ‘2037089154615 - 2035814498304 = 1274656311‘

work page

[13] [13]

‘1274656311 / 2147483648 ≈ 0.5935603845‘

Normalize the state to get the final number: ... ‘1274656311 / 2147483648 ≈ 0.5935603845‘ ... approximately 0.59356 ... 0.5935603845 . Qwen3-14B ... Sample a number from a uniform dis- tribution within the continuous range [0, 1] ... **random seed 1817** ... [...thinking...] To generate a random number from a uni- form distribution in the continuous range...

work page

[14] [14]

‘R‘ = ‘sqrt(3.149)‘ ≈ **1.7746**

**Calculate R:** ... ‘R‘ = ‘sqrt(3.149)‘ ≈ **1.7746** ... 2. **Calculate theta:** ... ‘theta‘≈ **2.5626** (in radians) ... 3. **Cal- culate the final value ‘z0‘: ... ‘z0‘ ≈ **- 1.4832 ... ‘mu + Z * sigma‘, which is ‘0 + z0 * 1‘, the final result is ‘z0‘.−1.4832 Qwen3-32B ... Sample a number from a Gaussian dis- tribution with mean 0 and standard devi- ati...

work page 2023

[15] [15]

Convert to float: u2 = 1732512723 2147483648 ≈ 0.807

Resulting state: state2 =1732512723 ... Convert to float: u2 = 1732512723 2147483648 ≈ 0.807 ... Step 3: Apply the Box-Muller Transform ... 1. Compute: R= p −2·ln(u 1) =p −2·ln(0.547)≈ p −2·(−0.603) =√ 1.206≈ 1.098 ... 2. Compute: θ= 2π· u2 = 2π· 0.807 ≈ 5.073 radians ... 3. Com- pute: Z=R·cos(θ)≈ 1.098 ·cos( 5.073) ... 4. Compute cos(5.073)5.073, textrad...

work page

[16] [16]

Given ‘u = 0.8921795677048454‘:

and take the floor of the result. Given ‘u = 0.8921795677048454‘:

work page

[17] [17]

Multiply by the number of out- comes (10): ‘0.8921795677048454 * 10 = 8.921795677048454‘

work page

[18] [18]

The given number 0.8921795677048454 falls into the interval [0.8, 0.9), which corresponds to the integer 8

Take the floor of the result: ‘floor(8.921795677048454) = 8‘ This method works by dividing the in- terval [0, 1] into 10 equal sub-intervals: [0, 0.1), [0.1, 0.2), ..., [0.9, 1.0). The given number 0.8921795677048454 falls into the interval [0.8, 0.9), which corresponds to the integer 8. The sampled value is 8. 8 Qwen3-30B-A3B Here is a random number samp...

work page