Fail Faster: Staging and Fast Randomness for High-Performance PBT
Pith reviewed 2026-05-22 22:08 UTC · model grok-4.3
The pith
Multi-stage programming and optimized randomness make property-based testing find bugs up to 13 times faster.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying multi-stage transformations to eliminate combinator overhead and by swapping in a faster randomness source, the authors show that the same generators can produce test cases much more quickly while still generating exactly the same distributions of inputs.
What carries the argument
Allegro, a multi-stage programming technique that rewrites high-level generator combinators into efficient low-level code.
If this is right
- PBT libraries can execute far more tests in the same wall-clock time.
- Users obtain the same test distributions without rewriting their generators.
- Randomness source quality becomes a first-class performance concern in PBT.
- The technique applies to existing combinator libraries in OCaml and Scala 3.
Where Pith is reading between the lines
- Similar staging could be applied to generator libraries in other languages that support multi-stage compilation.
- Faster generators might make property-based testing practical for larger state spaces that are currently too slow to explore.
- The controlled experiment isolates randomness as an independent variable that future PBT work can optimize separately.
Load-bearing premise
The multi-stage transformations and randomness replacement exactly preserve the semantics of the original generators.
What would settle it
Execute the original and staged generators on the same sequence of random seeds and observe whether the produced test cases or the bugs discovered differ.
Figures
read the original abstract
Property-based testing (PBT) relies on generators for random test cases, often constructed using embedded domain specific languages, which provide expressive combinators for building and composing generators. The effectiveness of PBT depends critically on the speed of these generators. However, careful measurements show that the generator performance of widely used PBT libraries falls well short of what is possible, due principally to (1) the abstraction overhead of their combinator-heavy style and (2) suboptimal sources of randomness. We characterize, quantify, and address these bottlenecks. To eliminate abstraction overheads, we propose a technique based on multi-stage programming, dubbed Allegro. We apply this technique to leading generator libraries in OCaml and Scala 3, significantly improving performance. To quantify the performance impact of the randomness source, we carry out a controlled experiment, replacing the randomness in the OCaml PBT library with an optimized version. Both interventions exactly preserve the semantics of generators, enabling precise, pointwise comparisons. Together, these improvements find bugs up to $13\times$ faster.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Allegro, a multi-stage programming technique applied to PBT generator libraries in OCaml and Scala 3 to eliminate combinator abstraction overhead, and reports a controlled experiment replacing the randomness source in the OCaml library with an optimized version. Both changes are stated to exactly preserve generator semantics, enabling pointwise comparisons, and together they are claimed to find bugs up to 13× faster.
Significance. If the semantics-preservation claims and experimental controls hold, the work quantifies and mitigates two concrete bottlenecks (abstraction overhead and randomness quality) that limit PBT effectiveness. The staging approach offers a reusable method for specializing generator DSLs, and the randomness replacement provides a measurable baseline; both could inform generator design in other languages and libraries.
major comments (2)
- [Abstract] Abstract and §1: the central performance and bug-finding claims rest on the assertion that Allegro transformations and the randomness replacement 'exactly preserve the semantics of generators' (distribution and per-seed observable behavior). No machine-checked equivalence, bisimulation argument, or even informal proof sketch is referenced; if staging alters thunk order, laziness, or bit consumption, or if the new RNG changes the effective distribution, the 13× speedup could reflect different test cases rather than equivalent faster generation. This precondition must be substantiated with concrete evidence before the pointwise comparisons are credible.
- [Abstract] The controlled experiment replacing the randomness source is described only at abstract level; the manuscript must supply the measurement methodology, data-exclusion rules, statistical analysis, and seed-handling protocol (as flagged by the low soundness score) to allow reproduction and to confirm that timing differences are not confounded by distribution changes.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight important points regarding the substantiation of our semantics-preservation claims and the level of detail provided for the controlled experiment. We respond to each major comment below and will revise the manuscript to address them.
read point-by-point responses
-
Referee: [Abstract] Abstract and §1: the central performance and bug-finding claims rest on the assertion that Allegro transformations and the randomness replacement 'exactly preserve the semantics of generators' (distribution and per-seed observable behavior). No machine-checked equivalence, bisimulation argument, or even informal proof sketch is referenced; if staging alters thunk order, laziness, or bit consumption, or if the new RNG changes the effective distribution, the 13× speedup could reflect different test cases rather than equivalent faster generation. This precondition must be substantiated with concrete evidence before the pointwise comparisons are credible.
Authors: We agree that an explicit justification strengthens the claims. Allegro is constructed so that the staged generators consume exactly the same sequence of random bits and produce identical observable outputs for any given seed, because staging only removes indirection in the combinator implementations without reordering effects or altering laziness. The RNG replacement is a drop-in substitute that yields identical bit streams for the same seed. In the revision we will add a dedicated subsection with an informal argument (based on the structure of the staged code and the RNG interface) together with a small set of seed-by-seed equality checks on generated values. This will make the precondition for the pointwise comparisons explicit. revision: yes
-
Referee: [Abstract] The controlled experiment replacing the randomness source is described only at abstract level; the manuscript must supply the measurement methodology, data-exclusion rules, statistical analysis, and seed-handling protocol (as flagged by the low soundness score) to allow reproduction and to confirm that timing differences are not confounded by distribution changes.
Authors: We accept that the current description is insufficient for reproduction. The revised manuscript will contain a new subsection that details the timing harness, the precise rules used to exclude outlier runs, the statistical tests applied to the speed-up figures, and the seed-initialization and state-management protocol. These additions will confirm that the measured differences are attributable to generator speed rather than any change in the induced distribution. revision: yes
Circularity Check
No circularity: empirical performance claims rest on external measurements
full rationale
The paper's core claims concern measured speedups from multi-stage transformations (Allegro) and randomness replacement in PBT generators. These are presented as experimental interventions that preserve semantics, with results quantified via timing and bug-finding comparisons. No equations, fitted parameters, or derivations appear in the abstract or description; the 13× claim is tied directly to controlled experiments rather than any self-referential definition or self-citation chain. The semantics-preservation precondition is stated as an assumption enabling the comparisons but is not derived from the paper's own inputs. This is a standard empirical paper structure with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multi-stage programming transformations preserve the observable behavior of the original generator combinators.
Reference graph
Works this paper leans on
-
[1]
[n. d.]. Macros. https://dotty.epfl.ch/docs/reference/metaprogramming/macros.html. Accessed: 2025-03-17
work page 2025
-
[2]
[n. d.]. ScalaCheck. https://scalacheck.org/. Accessed: 2025-03-24
work page 2025
-
[3]
[n. d.]. A small noncryptographic pseudorandom number generator. https://burtleburtle.net/bob/rand/smallprng.html. Accessed: 2025-03-24
work page 2025
-
[4]
Supun Abeysinghe and Tiark Rompf. 2023. Rhyme: A Data-Centric Expressive Query Language for Nested Data Structures. In Practical Aspects of Declarative Languages , Martin Gebser and Ilya Sergey (Eds.). Springer Nature Switzerland, Cham, 64–81
work page 2023
-
[5]
Stefan Ackermann, Vojin Jovanovic, Tiark Rompf, and Martin Odersky. 2012. Jet: An Embedded DSL for High Performance Big Data Processing. https://infoscience.epfl.ch/handle/20.500.14299/85985
work page 2012
-
[6]
Alan Bawden. 1999. Quasiquotation in Lisp. In Partial Evaluation and Semantic-Based Program Manipulation . 4–12. citeseer.ist.psu.edu/bawden99quasiquotation.html
work page 1999
- [7]
-
[8]
Anders Bondorf. 1992. Improving binding times without explicit CPS-conversion. In Proceedings of the 1992 ACM Conference on LISP and Functional Programming (San Francisco, California, USA) (LFP ’92). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/141471.141483
-
[9]
Rudy Braquehais, Michael Walker, José Manuel Calderón Trilla, and Colin Runciman. 2017. A simple incremental development of a property-based testing tool (Functional Pearl). (2017)
work page 2017
-
[10]
Bytheway, Cameron. 2025. Bolero. https://camshaft.github.io/bolero/. Accessed: 2025-03-24
work page 2025
-
[11]
Jacques Carette and Oleg Kiselyov. 2005. Multi-stage Programming with Functors and Monads: Eliminating Abstraction Overhead from Generic Code. In Generative Programming and Component Engineering , Robert Glück and Michael Lowry (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 256–274
work page 2005
-
[12]
Koen Claessen, Jonas Duregård, and Michal H. Palka. 2015. Generating Constrained Random Data with Uniform Distribution. J. Funct. Program. 25 (2015). https://doi.org/10.1017/S0956796815000143
-
[14]
Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00), Montreal, Canada, September 18-21, 2000 , Martin Odersky and Philip Wadler (Eds.). ACM, Montreal, Canada, 268–279. https: //doi.org/10.1145/351240.351266
-
[15]
Rowan Davies. 2017. A Temporal Logic Approach to Binding-Time Analysis. J. ACM 64, 1, Article 1 (March 2017), 45 pages. https://doi.org/10.1145/3011069
-
[16]
Rowan Davies and Frank Pfenning. 2001. A modal analysis of staged computation. J. ACM 48, 3 (May 2001), 555–604. https://doi.org/10.1145/382780.382785
-
[17]
Eisenberg, Stephen Dolan, and Leo White
Richard A. Eisenberg, Stephen Dolan, and Leo White. 2022. Unboxed types for OCaml. https://www.youtube.com/ watch?v=Vevld4cXSYk
work page 2022
- [18]
-
[19]
FsCheck Team. 2025. FsCheck: Random Testing for .NET. https://fscheck.github.io/FsCheck/. Accessed: 2025-03-24
work page 2025
-
[20]
Andrew Gill, John Launchbury, and Simon L. Peyton Jones. 1993. A short cut to deforestation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture (Copenhagen, Denmark) (FPCA ’93). Association for Computing Machinery, New York, NY, USA, 223–232. https://doi.org/10.1145/165180.165214
-
[21]
Cutler, Daniel Dickstein, Benjamin C
Harrison Goldstein, Joseph W. Cutler, Daniel Dickstein, Benjamin C. Pierce, and Andrew Head. 2024. Property- Based Testing in Practice. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 187, 13 pages. https://doi.org/10.1145/35...
-
[22]
Harrison Goldstein and Benjamin C. Pierce. 2022. Parsing Randomness. Proceedings of the ACM on Programming Languages 6, OOPSLA2 (Oct. 2022), 128:89–128:113. https://doi.org/10.1145/3563291
-
[23]
John H. Holland. 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge, MA, USA
work page 1992
-
[24]
Hypothesis Team. 2025. Hypothesis. https://hypothesis.works/. Accessed: 2025-03-24
work page 2025
-
[25]
Intel Corporation. 2025. Intel ® Processor Trace. https://edc.intel.com/content/www/us/en/design/products/platforms/ processor-and-core-i3-n-series-datasheet-volume-1-of-2/001/intel-processor-trace/. Accessed: 2025-03-24
work page 2025
-
[26]
Neil D Jones, Carsten K Gomard, and Peter Sestoft. 1993. Partial evaluation and automatic program generation . Peter Sestoft. , Vol. 1, No. 1, Article . Publication date: March 2025. 24 Richey and Cutler et al
work page 1993
-
[27]
Manohar Jonnalagedda, Thierry Coppey, Sandro Stucki, Tiark Rompf, and Martin Odersky. 2014. Staged parser combinators for efficient data processing. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (Portland, Oregon, USA)(OOPSLA ’14). Association for Computing Machinery, New York, ...
-
[28]
Oleg Kiselyov. 2014. The Design and Implementation of BER MetaOCaml. In Functional and Logic Programming , Michael Codish and Eijiro Sumii (Eds.). Springer International Publishing, Cham, 86–102
work page 2014
- [29]
-
[30]
Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream fusion, to completeness. SIGPLAN Not. 52, 1 (Jan. 2017), 285–299. https://doi.org/10.1145/3093333.3009880
- [31]
-
[32]
András Kovács. 2024. Closure-Free Functional Programming in a Two-Level Type Theory. Proc. ACM Program. Lang. 8, ICFP, Article 259 (Aug. 2024), 34 pages. https://doi.org/10.1145/3674648
-
[33]
Shriram Krishnamurthi and Matthias Felleisen. 2001. Linguistic reuse. Ph. D. Dissertation. AAI3021152
work page 2001
-
[34]
Krishnaswami and Jeremy Yallop
Neelakantan R. Krishnaswami and Jeremy Yallop. 2019. A typed, algebraic approach to parsing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 379–393. https://doi.org/10.1145/3314221.3314625
-
[35]
Robert Krook, Nicholas Smallbone, Bo Joel Svensson, and Koen Claessen. 2024. QuickerCheck: Implementing and Evaluating a Parallel Run-Time for QuickCheck. InProceedings of the 35th Symposium on Implementation and Application of Functional Languages (IFL ’23). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/ 10.1145/3652561.3652570
-
[36]
Leonidas Lampropoulos, Diane Gallois-Wong, Catalin Hritcu, John Hughes, Benjamin C. Pierce, and Li-yao Xia. 2017. Beginner’s Luck: A Language for Property-Based Generators. Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017 (2017), 114–129
work page 2017
-
[37]
Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage Guided, Property Based Testing. PACMPL 3, OOPSLA (2019), 181:1–181:29. https://doi.org/10.1145/3360607
-
[38]
Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. 2017. Generating Good Generators for Inductive Relations. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1–30
work page 2017
-
[39]
Daniel Lemire. 2023. testingRNG. https://github.com/lemire/testingRNG
work page 2023
-
[40]
Andreas Löscher and Konstantinos Sagonas. 2017. Targeted Property-Based Testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017). Association for Computing Machinery, New York, NY, USA, 46–56. https://doi.org/10.1145/3092703.3092711
-
[41]
William M McKeeman. 1998. Differential testing for software. Digital Technical Journal 10, 1 (1998), 100–107
work page 1998
-
[42]
Microsoft. 2025. Code Quotations — F# | Microsoft Learn. https://learn.microsoft.com/en-us/dotnet/fsharp/language- reference/code-quotations. Accessed: 2025-03-24
work page 2025
-
[43]
Yaron Minsky and Anil Madhavapeddy. 2022. Real World OCaml: Functional Programming for the Masses . " O’Reilly Media, Inc. "
work page 2022
-
[44]
Eugenio Moggi. 1991. Notions of computation and monads. Information and computation 93, 1 (1991), 55–92
work page 1991
-
[45]
Anders Møller and Oskar Haarklou Veileborg. 2020. Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization. Proc. ACM Program. Lang. 4, OOPSLA, Article 168 (Nov. 2020), 29 pages. https://doi.org/10.1145/3428236
-
[46]
Aleksandar Nanevski, Frank Pfenning, and Brigitte Pientka. 2008. Contextual modal type theory. ACM Trans. Comput. Logic 9, 3, Article 23 (June 2008), 49 pages. https://doi.org/10.1145/1352582.1352591
-
[47]
Flemming Nielson and Hanne Riis Nielson. 1992. Two-level functional languages. Cambridge university press
work page 1992
-
[48]
openjdk. 2023. Java Microbenchmarking Harness. https://github.com/openjdk/jmh
work page 2023
-
[49]
Lionel Parreaux, Amir Shaikhha, and Christoph E. Koch. 2017. Quoted staged rewriting: a practical approach to library- defined optimizations. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (Vancouver, BC, Canada) (GPCE 2017). Association for Computing Machinery, New York, NY, USA, 131–14...
-
[50]
Lionel Emile Vincent Parreaux. 2020. Type-Safe Metaprogramming and Compilation Techniques For Designing Efficient Systems in High-Level Languages . Ph. D. Dissertation. EPFL, Lausanne. https://doi.org/10.5075/epfl-thesis-10285
-
[51]
W. H. Payne, J. R. Rabung, and T. P. Bogyo. 1969. Coding the Lehmer pseudo-random number generator. Commun. ACM 12, 2 (Feb. 1969), 85–86. https://doi.org/10.1145/362848.362860
-
[52]
Proptest Contributors. 2025. The Proptest Book. https://altsysrq.github.io/proptest-book/. Accessed: 2025-03-24
work page 2025
-
[53]
Tiark Rompf and Martin Odersky. 2012. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM 55, 6 (June 2012), 121–130. https://doi.org/10.1145/2184319.2184345 , Vol. 1, No. 1, Article . Publication date: March 2025. Fail Faster 25
-
[54]
Colin Runciman, Matthew Naylor, and Fredrik Lindblad. 2008. Smallcheck and Lazy Smallcheck: Automatic Exhaustive Testing for Small Values. ACM SIGPLAN Notices 44, 2 (Sept. 2008), 37–48. https://doi.org/10.1145/1543134.1411292
-
[55]
Eric L. Seidel, Niki Vazou, and Ranjit Jhala. 2015. Type Targeted Testing. In Programming Languages and Systems (Lecture Notes in Computer Science) , Jan Vitek (Ed.). Springer, Berlin, Heidelberg, 812–836. https://doi.org/10.1007/978- 3-662-46669-8_33
-
[56]
Tim Sheard, Zine el-abidine Benaissa, and Emir Pasalic. 1999. DSL Implementation Using Staging and Monads. In 2nd Conference on Domain-Specific Languages (DSL 99) . USENIX Association, Austin, TX. https://www.usenix.org/ conference/dsl-99/dsl-implementation-using-staging-and-monads
work page 1999
-
[57]
Tim Sheard and Simon Peyton Jones. 2002. Template meta-programming for Haskell. In Proceedings of the 2002 ACM SIGPLAN Workshop on Haskell (Pittsburgh, Pennsylvania) (Haskell ’02). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/581690.581691
-
[58]
Pierce, and Leonidas Lampropoulos
Jessica Shi, Alperen Keles, Harrison Goldstein, Benjamin C. Pierce, and Leonidas Lampropoulos. 2023. Etna: An Evaluation Platform for Property-Based Testing (Experience Report). Proc. ACM Program. Lang. 7, ICFP, Article 218 (Aug. 2023), 17 pages. https://doi.org/10.1145/3607860
-
[59]
Steele, Doug Lea, and Christine H
Guy L. Steele, Doug Lea, and Christine H. Flood. 2014. Fast splittable pseudorandom number generators. InProceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (Portland, Oregon, USA) (OOPSLA ’14). Association for Computing Machinery, New York, NY, USA, 453–472. https://doi.org/10. 1145/2660193.2660195
-
[60]
Dominic Steinhöfel and Andreas Zeller. 2022. Input Invariants. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022) . Association for Computing Machinery, New York, NY, USA, 583–594. https://doi.org/10.1145/3540250.3549139
-
[61]
Jane Street. 2021. Base Quickcheck. https://github.com/janestreet/base_quickcheck
work page 2021
-
[62]
Jane Street. 2023. Core bench. https://github.com/janestreet/core_bench
work page 2023
-
[63]
Walid Taha and Tim Sheard. 2000. MetaML and multi-stage programming with explicit annotations. Theoretical computer science 248, 1-2 (2000), 211–242
work page 2000
-
[64]
Sam Tobin-Hochstadt, Vincent St-Amour, Ryan Culpepper, Matthew Flatt, and Matthias Felleisen. 2011. Languages as libraries. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (San Jose, California, USA) (PLDI ’11). Association for Computing Machinery, New York, NY, USA, 132–141. https: //doi.org/10.1145/199...
-
[65]
Laurence Tratt. 2008. Domain specific language implementation via compile-time meta-programming. ACM Trans. Program. Lang. Syst. 30, 6, Article 31 (Oct. 2008), 40 pages. https://doi.org/10.1145/1391956.1391958
-
[66]
Janis Voigtländer. 2008. Asymptotic Improvement of Computations over Free Monads. In Mathematics of Program Construction, Philippe Audebaud and Christine Paulin-Mohring (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 388–403
work page 2008
-
[67]
John Von Neumann. 1951. Various Techniques Used in Connection with Random Digits. Appl. Math Ser 12, 36-38 (1951), 3
work page 1951
-
[68]
Edwin Westbrook, Mathias Ricken, Jun Inoue, Yilong Yao, Tamer Abdelatif, and Walid Taha. 2010. Mint: Java multi-stage programming using weak separability. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, Canada) (PLDI ’10). Association for Computing Machinery, New York, NY, USA, 400–411...
-
[69]
Jamie Willis, Nicolas Wu, and Matthew Pickering. 2020. Staged selective parser combinators. Proc. ACM Program. Lang. 4, ICFP, Article 120 (Aug. 2020), 30 pages. https://doi.org/10.1145/3409002
-
[70]
Ningning Xie, Leo White, Olivier Nicole, and Jeremy Yallop. 2023. MacoCaml: Staging Composable and Compilable Macros. Proc. ACM Program. Lang. 7, ICFP, Article 209 (Aug. 2023), 45 pages. https://doi.org/10.1145/3607851
-
[71]
Jeremy Yallop, Ningning Xie, and Neel Krishnaswami. 2023. flap: A Deterministic Parser with Fused Lexing. Proc. ACM Program. Lang. 7, PLDI, Article 155 (June 2023), 24 pages. https://doi.org/10.1145/3591269
-
[72]
Wang Yi. 2021. wyhash. https://github.com/wangyi-fudan/wyhash. Received –; revised –; accepted – , Vol. 1, No. 1, Article . Publication date: March 2025
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.