ASSESSING THE STOCHASTIC PROPERTIES OF MODERN PSEUDO-RANDOM GENERATORS FOR PARALLEL COMPUTING
Pith reviewed 2026-05-20 00:27 UTC · model grok-4.3
The pith
Modern PRNGs pass at most 72 percent of BigCrush tests when thousands of parallel streams are evaluated.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When more than 10^3 independent streams are generated and subjected to the full BigCrush test suite from TestU01, generators drawn from the Xoshiro, Philox, PCG, and MRG32k3a families all exhibit multiple statistical defects, with the highest observed success rate across all tests reaching only 72 percent.
What carries the argument
BigCrush statistical test battery applied independently to more than 1000 parallel streams per generator under uniform initialization.
If this is right
- Claims of statistical quality made by generator authors on the basis of single-stream tests do not extend to parallel usage.
- Specific failed tests for each generator family are now documented and can guide selection or mitigation in production code.
- Large-scale simulations and training runs that rely on these generators may encounter reproducibility or bias problems not caught by prior single-stream validation.
- Reproducible results in parallel environments require either different generators or additional safeguards beyond published single-stream performance.
- The four-and-a-half-year computational effort shows that thorough multi-stream testing is feasible but expensive.
Where Pith is reading between the lines
- Monte Carlo methods and stochastic gradient sampling in AI could inherit subtle biases when these generators are used across many parallel workers.
- Future generator design and validation suites should treat multi-stream testing as a first-class requirement rather than an optional check.
- Library maintainers might add runtime warnings or default to safer alternatives when users request thousands of independent streams.
- Automated regression testing frameworks could incorporate periodic multi-stream BigCrush runs to catch regressions after code changes.
Load-bearing premise
BigCrush tests run on over one thousand streams with standard initialization are enough to reveal every statistical defect that would matter in actual parallel HPC or AI workloads.
What would settle it
A generator that passes every BigCrush test without any failure when more than 1000 streams are initialized and run under the same protocol used here.
Figures
read the original abstract
Pseudo-random number generators (PRNGs) are widely used in modern computing and are expected to exhibit excellent statistical performance and repeatability. This study evaluates and compares modern PRNGs used in high performance computing and artificial intelligence. Our selections comes from different families, including Xoshiro, Philox, PCG, and MRG32k3a. We systematically assess the quality of these generators; instead of testing a single stream for each generator, we test more than 10 3 streams with the BigCrush battery form the TestU01 library. The results, involving more than 4.5 years of cumulative computing time, are analyzed against the claims made by the generators' creators. The highest success rate is 72%, and all tests have been failed by almost every generator, the failed tests are documented. To ensure fairness, all tests are conducted under consistent conditions and are designed to closely simulate real-world usage. The results of each test are available, usable and reproducible with a git repository.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to evaluate the statistical quality of modern PRNGs (Xoshiro, Philox, PCG, MRG32k3a) for parallel computing by testing more than 10^3 streams per generator with the BigCrush battery from TestU01. It reports a maximum success rate of 72%, with nearly all generators failing multiple tests, based on experiments simulating real-world usage and involving over 4.5 years of compute time. Results are made reproducible via a git repository.
Significance. This empirical study highlights potential issues with PRNGs in parallel environments, which is relevant for HPC and AI applications. The extensive testing across many streams is a strength compared to standard single-stream evaluations. However, the impact is limited by insufficient details on stream initialization, which is essential for interpreting whether the failures indicate real defects in the generators or artifacts of the testing methodology.
major comments (2)
- The experimental design for generating the >10^3 parallel streams is not described. The abstract asserts that tests 'closely simulate real-world usage' yet provides no information on the seeding strategy, jump-ahead functions, or splitting methods used for each PRNG family. This omission is load-bearing for the central claim, as improper stream initialization (e.g., simple increment from a global seed) could induce correlations that cause test failures unrelated to the generator's quality.
- Results section: The reported 'highest success rate is 72%' and statement that 'all tests have been failed by almost every generator' require a precise definition of 'success rate' (e.g., average percentage of BigCrush tests passed across streams or per-generator aggregate). Without this, it is difficult to assess the quantitative strength of the conclusion that these generators are inadequate for parallel use.
minor comments (2)
- Grammatical error in abstract: 'Our selections comes' should be 'Our selections come'.
- Typo in abstract: 'form the TestU01 library' should be 'from the TestU01 library'.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review of our manuscript. We address each of the major comments below and have revised the manuscript to improve clarity on the experimental design and quantitative definitions.
read point-by-point responses
-
Referee: The experimental design for generating the >10^3 parallel streams is not described. The abstract asserts that tests 'closely simulate real-world usage' yet provides no information on the seeding strategy, jump-ahead functions, or splitting methods used for each PRNG family. This omission is load-bearing for the central claim, as improper stream initialization (e.g., simple increment from a global seed) could induce correlations that cause test failures unrelated to the generator's quality.
Authors: We agree that a detailed description of stream initialization is essential for interpreting the results. The revised manuscript includes a new subsection in the Methods that specifies the seeding and splitting approach for each generator family. Xoshiro streams were created using the jump-ahead function to advance the internal state by a fixed large increment per stream. Philox and PCG used their respective counter-based and linear-congruential splitting mechanisms with distinct seeds derived from a master seed. MRG32k3a streams followed the standard combination of two MRGs with unique initial states. These choices follow the generators' documented recommendations for parallel use and are accompanied by pseudocode and repository links. We believe this addition removes the possibility that failures arise from naive initialization. revision: yes
-
Referee: Results section: The reported 'highest success rate is 72%' and statement that 'all tests have been failed by almost every generator' require a precise definition of 'success rate' (e.g., average percentage of BigCrush tests passed across streams or per-generator aggregate). Without this, it is difficult to assess the quantitative strength of the conclusion that these generators are inadequate for parallel use.
Authors: We accept that the original wording was imprecise. The revised Results section now defines success rate explicitly as the percentage of the 160 BigCrush tests passed by an individual stream. The figure of 72% is the highest such value recorded for any single stream across all generators and all 1000+ streams tested. We have also rephrased the failure statement to indicate that every generator produced streams that failed at least one test and that the large majority of streams failed multiple tests. A supplementary table now reports the mean, median, and range of success rates per generator to give a fuller quantitative picture. revision: yes
Circularity Check
No circularity: pure empirical measurement study with external benchmarks
full rationale
The paper conducts direct statistical testing of PRNGs using the BigCrush battery from TestU01 on >10^3 streams per generator. No derivations, predictions, or first-principles results are claimed that could reduce to fitted inputs, self-citations, or ansatzes. All results are reproducible measurements against an independent external test suite, satisfying the self-contained criterion. The skeptic concern about unspecified stream initialization is a methodological gap but does not constitute circularity in any derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption BigCrush from TestU01 is a sufficient and unbiased test battery for detecting defects relevant to parallel PRNG use.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We systematically assess the quality of these generators; instead of testing a single stream for each generator, we test more than 10^3 streams with the BigCrush battery
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Antunes, B., Mazel, C., & Hill, D. R.C. (2023). Identifying quality mersenne twister streams for parallel stochastic simulations. ACM/IEEE, In 2023 Winter Simulation Conference (WSC), 2801-2812. Antunes, B., & Hill, D.R. C. (2024). Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study...
-
[2]
Distribution of random streams for simulation practitioners
Addison - Wesley Professional. Hill D.R.C, Mazel C., Passerat-Palmbach J, Traore M. (2013), “Distribution of random streams for simulation practitioners”. Concurrency and Computation: Practice and Experience, 25(10), 1427-1442. Hill D.R.C., Antunes B, Bertrand A., Nguifo E.M., Yon L., Nautré-Domanski J., Antoine V., “Machine Learning, Simulation and Repro...
work page 2013
-
[3]
In Proceedings of 2011 Supercomputing Conference (SC11): international conference for high performance computing, networking, storage and analysis, 1-12. Vigna, S. (2016). An experimental exploration of Marsaglia's xorshift generators, scrambled. ACM Transactions on Mathematical Software (TOMS), 42(4), 1-23. Web Reference Salmon, J. K., & Moraes, M. A.,
work page 2011
-
[4]
Random123: a Library of Counter-Based Random Number Generators
“Random123: a Library of Counter-Based Random Number Generators”. Retrieved February 12th, 2025, from: https://www.thesalmons.org/john/random123/releases/latest/docs/index.html Biographies Theau Wartel is a student at the ISIMA - Clermont Auvergne INP engineering school. He specializes in embedded and virtual interactive systems. He is part of the “resear...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.