pith. sign in

arxiv: 2604.04345 · v1 · submitted 2026-04-06 · 💻 cs.PL

Trace-Guided Synthesis of Effectful Test Generators

Pith reviewed 2026-05-10 20:17 UTC · model grok-4.3

classification 💻 cs.PL
keywords test generationunderapproximate typeseffectful programsproperty-based testingsymbolic tracesprogram synthesisblack-box testing
0
0 comments X

The pith

Underapproximate type specifications guide the synthesis of test generators that probe effectful black-box systems while preserving latent dependencies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that symbolic traces derived from underapproximate type specifications can direct the automatic creation of test generators for programs with effects. These traces make explicit the data and control dependencies that test sequences must respect to reach interesting behaviors. If this holds, property-based testing frameworks could rely less on random or handwritten generators and instead produce sequences that systematically explore effectful code. A sympathetic reader would care because many real systems involve state and side effects that are hard to test effectively without such guidance.

Core claim

Underapproximate type specifications can be equipped with symbolic traces that capture the underapproximate behaviors of effectful operations. These traces expose latent data and control dependencies that must be preserved when generating test sequences. The resulting generators, when plugged into frameworks like QCheck or model checkers like P, produce tests that are more effective than default strategies and competitive with handwritten solutions across diverse applications.

What carries the argument

Symbolic traces that encode underapproximate behaviors of effectful operations and enforce the latent data and control dependencies required for valid test sequences.

If this is right

  • Generators synthesized this way integrate directly into property-based testing tools and improve their ability to reach dependent behaviors.
  • The same generators perform competitively against manually crafted solutions on a range of applications.
  • Underapproximate specifications become a practical source of guidance for automated test generation rather than solely for verification.
  • Test sequences produced respect constraints that random or unguided generators routinely violate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique could be adapted to generate inputs for other testing or analysis tools that must respect ordering or state dependencies.
  • Developers might start from partial type annotations and iteratively refine them to improve the quality of synthesized tests without writing generators by hand.
  • This style of trace-guided synthesis offers a bridge between type-based reasoning and dynamic testing of stateful or effectful components.

Load-bearing premise

The symbolic traces extracted from underapproximate type specifications accurately identify and enforce the dependencies needed to generate effective test sequences for arbitrary effectful systems.

What would settle it

Running Clouseau-generated generators on an effectful black-box system where they achieve lower bug detection or coverage rates than either default random testing or existing handwritten generators.

Figures

Figures reproduced from arXiv: 2604.04345 by Ankush Desai, Benjamin Delaware, Suresh Jagannathan, Zhe Zhou.

Figure 1
Figure 1. Figure 1: Clouseau workflow [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Syntax of 𝝀 𝐶 expressions We formalize our approach using a core language, 𝝀 𝐶 , for expressing trace generators– nondeter￾ministic programs that probe the behaviors of a black-box system under test via a set of (effectful) operators that the system provides. This calculus is equipped with a type system that characterizes the set of traces that a program must be able to produce, if the system under test pr… view at source ↗
Figure 3
Figure 3. Figure 3: Selected reduction rules 𝝀 𝐶 is equipped with a small-step operational semantics similar to other calculi for trace￾based type systems [45]; selected reduction rules are shown in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: 𝝀 𝐶 types constrains every value a pure expression may evaluate to; the type qualifier {𝜈:𝑏 | 𝜙} 𝑢 constrains a subset of the values it must evaluate to. For example, the constant value 1 can be assigned the types {𝜈:int } 𝑜 and {𝜈:int } 𝑜 – the value 1 satisfies the constraint in both overapproximate qualifiers – but, 1 cannot be assigned the type, {𝜈:int | 𝜈 = 1 ∨ 𝜈 = 2} 𝑢 , since it cannot evaluate to b… view at source ↗
Figure 5
Figure 5. Figure 5: Selected typing rules. new ghost event pushI with an incremented index (i.e., ⟨pushI 𝑛+1 𝑥⟩). The signature for pop also records the number of pop operations in the same way (i.e., ⟨popI 𝑚+1 𝑥⟩); additionally, it requires the last popped value (𝑥) to be equal to the (𝑛−𝑚) 𝑡ℎ pushed value, reflecting the “last-in-first-out” property of the stack. 3.2 Typing Rules 𝝀 𝐶 is equipped with a pair of typing judgme… view at source ↗
Figure 6
Figure 6. Figure 6: Clouseau’s high-level test generator synthesis algorithm [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: STLC terms in a serialized form. We explicitly define the syntax of the serialized STLC terms in [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: uHat specifications of STLC operations. Handler Semantics 𝛼 ⊨ op(𝑐) ⇓ 𝑐 op(𝑐) ⇓ 𝑐 Operational Semantics 𝛼 ⊨ 𝑒 𝛼 ↩−→𝑒 𝛼 ⊨ op(𝑐 ) ⇓ 𝑐 ′ StEfOp 𝛼 ⊨ op 𝑐 [op(𝑐 𝑐′ ) ] ↩−−−−−−→𝑐 ′ op 𝑐 ⇓ 𝑐 ′ StOp 𝛼 ⊨ op 𝑐 [ ] ↩−→𝑐 ′ 𝜙 ⇓ ⊤ StAssume 𝛼 ⊨ assume 𝜙 [ ] ↩−→ ( ) 𝜙 ⇓ ⊤ StAssert 𝛼 ⊨ assert 𝜙 [ ] ↩−→ ( ) 𝜙 ⇓ ⊥ StAssert 𝛼 ⊨ assert 𝜙 [ ] ↩−→raise Error 𝛼 ⊨ 𝑒1 𝛼 ′ ↩−→𝑒 ′ 1 StLetE1 𝛼 ⊨ let 𝑦 = 𝑒1 in 𝑒2 𝛼 ′ ↩−→let 𝑦 = 𝑒 ′ 1 i… view at source ↗
Figure 9
Figure 9. Figure 9: Full Operational Semantics open abstraction at some depth in the prefix trace. The history regex of the var operation precisely characterizes this constraint: the abstraction at depth 𝑑 must be open (• ∗ ⟨abs 𝑡⟩⟨depth 𝑑⟩ (• \ ⟨endAbs⟩)∗ ), and the current depth must be 𝑑 + 𝑖. C Operational Semantics The small-step operational semantics of our core language are shown in [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗
Figure 10
Figure 10. Figure 10: Basic Typing Rules and Qualifier Typing Rules [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Full set of well-formedness typing rules. [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Full set of subtyping rules. Auxiliary Typing Γ ⊢WF 𝜏 Γ ⊢ 𝐴 ⊆ 𝐴 Γ ⊢ 𝜏 <: 𝜏 Typing Γ ⊢ op : 𝑡 Γ ⊢ 𝑣 : 𝑡 Γ ⊢ 𝑒 : 𝜏 Γ ⊢ 𝑣 : 𝑡 TRet Γ ⊢ 𝑣 : [𝐻]𝑡 [𝜖] Δ(op) = 𝑡 TOpCtx Γ ⊢ op : 𝑡 Γ ⊢ op : 𝑥𝑖 :𝑡𝑖  [𝐻]𝑡 [⟨op 𝑥𝑖⟩·𝐹 ] ∀𝑖. Γ ⊢ 𝑣𝑖 : 𝑡𝑖 TEffOp Γ ⊢ op 𝑣𝑖 : [𝐻]𝑡 [⟨op 𝑣𝑖⟩·𝐹 ] Γ ⊢ 𝑒1 : 𝜏1 Γ ⊢ 𝑒2 : 𝜏2 TChoice Γ ⊢ 𝑒1 ⊕ 𝑒2 : 𝜏1 ⊓𝜏2 Γ ⊢ 𝑒1 : [𝐻] 𝑥:𝑡𝑥 [𝐹1] Γ, 𝑥:𝑡𝑥 ⊢ 𝑒2 : [𝐻 ·𝐹1]𝑡 [𝐹2] TLet Γ ⊢ let 𝑥 = 𝑒1 in 𝑒2 : [𝐻]𝑡 [𝐹1 ·𝐹2]… view at source ↗
Figure 13
Figure 13. Figure 13: Full set of typing rules. F Type Denotation , Vol. 1, No. 1, Article . Publication date: April 2026 [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Type denotations in 𝝀 𝐶 G Auxiliary Functions for Synthesis This section describes three auxiliary functions used for generator synthesis. The first of these, Norm, converts a SRE formula into a set of abstract traces. The second, TermDerive, generates test generators from a set of abstract traces. The third, termTrace, generates a straightline program from a single refined abstract trace. Normalization. … view at source ↗
read the original abstract

Several recently proposed program logics have incorporated notions of underapproximation into their design, enabling them to reason about reachability rather than safety. In this paper, we explore how similar ideas can be integrated into an expressive type and effect system. We use the resulting underapproximate type specifications to guide the synthesis of test generators that probe the behavior of effectful black-box systems. A key novelty of our type language is its ability to capture underapproximate behaviors of effectful operations using symbolic traces that expose latent data and control dependencies, constraints that must be preserved by the test sequences the generator outputs. We implement this approach in a tool called Clouseau, and evaluate it on a diverse range of applications by integrating Clouseau's synthesized generators into property-based testing frameworks like QCheck and model-checking tools like P. In both settings, the generators synthesized by Clouseau are significantly more effective than the default testing strategy, and are competitive with state-of-the-art, handwritten solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes integrating underapproximation into an expressive type and effect system, using the resulting specifications and their symbolic traces to synthesize test generators for effectful black-box systems. These traces are intended to expose latent data and control dependencies that must be preserved in generated test sequences. The approach is implemented in the Clouseau tool and evaluated by integrating the synthesized generators into QCheck for property-based testing and the P model checker; the abstract claims the generators are significantly more effective than default strategies and competitive with state-of-the-art handwritten solutions across diverse applications.

Significance. If the empirical claims hold after detailed reporting, the work offers a novel bridge between underapproximate type systems and automated test generation for effectful programs, potentially reducing manual effort in writing generators while preserving key behavioral constraints. The explicit use of symbolic traces to enforce dependencies is a technical strength that could influence future work on type-guided synthesis and testing.

major comments (2)
  1. [Abstract and §5] Abstract and §5 (Evaluation): the claim that Clouseau generators are 'significantly more effective' and 'competitive with state-of-the-art handwritten solutions' is asserted without any reported metrics, baselines, statistical tests, number of trials, or methodology details in the abstract; the evaluation section must supply these to substantiate the central effectiveness claim.
  2. [§3 and §4] §3 (Underapproximate Type System) and §4 (Synthesis): the soundness of the approach rests on the assumption that symbolic traces extracted from underapproximate specifications accurately capture all relevant data and control dependencies for arbitrary black-box effectful systems; the manuscript provides no experiment or argument addressing how trace completeness affects generator quality or what happens when type specifications are deliberately coarsened.
minor comments (2)
  1. [§2] Notation for symbolic traces and effect operations should be introduced with a small running example in §2 to improve readability before the formal definitions.
  2. [§5] The integration points with QCheck and P (how generators are invoked and oracles are checked) are described at a high level; a short code snippet or pseudocode would clarify the usage model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our submission. We have carefully considered each major comment and provide point-by-point responses below. We plan to make revisions to strengthen the manuscript as outlined.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Evaluation): the claim that Clouseau generators are 'significantly more effective' and 'competitive with state-of-the-art handwritten solutions' is asserted without any reported metrics, baselines, statistical tests, number of trials, or methodology details in the abstract; the evaluation section must supply these to substantiate the central effectiveness claim.

    Authors: We agree with the referee that the abstract would benefit from including concrete metrics to support the effectiveness claims. Section 5 of the manuscript describes the evaluation methodology, including the use of QCheck and P, comparisons against default generators and handwritten solutions, and the number of trials performed. To address this comment, we will revise the abstract to incorporate key quantitative results, such as coverage or bug-finding improvements, along with mentions of statistical tests where applicable. This will make the central claims more substantiated while keeping the abstract concise. revision: yes

  2. Referee: [§3 and §4] §3 (Underapproximate Type System) and §4 (Synthesis): the soundness of the approach rests on the assumption that symbolic traces extracted from underapproximate specifications accurately capture all relevant data and control dependencies for arbitrary black-box effectful systems; the manuscript provides no experiment or argument addressing how trace completeness affects generator quality or what happens when type specifications are deliberately coarsened.

    Authors: The soundness of our approach is formally established by the type system in Section 3, which ensures that the symbolic traces underapproximate the possible behaviors, and the synthesis in Section 4 produces generators that respect these traces. However, we acknowledge the lack of an explicit discussion or experiment on the effects of trace completeness or coarsened specifications. We will add an argument in Section 3 explaining that coarsening the underapproximate types leads to sound but potentially less precise generators (i.e., they may miss some dependencies but never violate the specified ones). Additionally, we will include a brief sensitivity analysis in the evaluation section using one benchmark with intentionally coarsened types to illustrate the impact on generator effectiveness. This revision will directly address the referee's concern. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical synthesis technique evaluated on applications

full rationale

The paper describes a synthesis technique (Clouseau) that uses underapproximate type specifications to extract symbolic traces for guiding effectful test generators. The contribution is implemented and evaluated empirically by integrating the generators into QCheck and P, with comparisons to default strategies and handwritten solutions. No mathematical derivation chain, parameter fitting, self-definitional equations, or load-bearing self-citations are present in the abstract or described approach. The central claims rest on experimental effectiveness rather than any reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied systems paper on synthesis and testing; the abstract identifies no free parameters, mathematical axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 5483 in / 1025 out tokens · 42903 ms · 2026-05-10T20:17:28.500913+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    MonkeyDB: Effectively Testing Correctness Under Weak Isolation Levels.Proc

    Ranadeep Biswas, Diptanshu Kakwani, Jyothi Vedurada, Constantin Enea, and Akash Lal. MonkeyDB: Effectively Testing Correctness Under Weak Isolation Levels.Proc. ACM Program. Lang., 5(OOPSLA), October 2021. doi: 10.1145/ 3485546. URL https://doi.org/10.1145/3485546

  2. [2]

    Programming with Angelic Nondeterminism

    Rastislav Bodik, Satish Chandra, Joel Galenson, Doug Kimelman, Nicholas Tung, Shaon Barman, and Casey Rodarmor. Programming with Angelic Nondeterminism. InProceedings of the 37th Annual ACM Symposium on Principles of Programming Languages, POPL ’10, page 339–352, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605584799

  3. [3]

    Infer: An Automatic Program Verifier for Memory Safety of C Programs

    Cristiano Calcagno and Dino Distefano. Infer: An Automatic Program Verifier for Memory Safety of C Programs. In Proceedings of the Third International Conference on NASA Formal Methods, NFM’11, page 459–465, Berlin, Heidelberg,

  4. [4]

    ISBN 9783642203978

    Springer-Verlag. ISBN 9783642203978

  5. [5]

    Systematic Testing for Detecting Concurrency Errors in Erlang Programs

    Maria Christakis, Alkis Gotovos, and Konstantinos Sagonas. Systematic Testing for Detecting Concurrency Errors in Erlang Programs. In2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, pages 154–163, 2013. doi: 10.1109/ICST.2013.50

  6. [6]

    URLhttps://doi

    Koen Claessen and John Hughes. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, ICFP ’00, page 268–279, New York, NY, USA, 2000. Association for Computing Machinery. ISBN 1581132026. doi: 10.1145/351240.351266. URL https://doi.org/10.1145/3512...

  7. [7]

    Finding Race Conditions in Erlang with QuickCheck and PULSE

    Koen Claessen, Michal Palka, Nicholas Smallbone, John Hughes, Hans Svensson, Thomas Arts, and Ulf Wiger. Finding Race Conditions in Erlang with QuickCheck and PULSE. InProceedings of the 14th ACM SIGPLAN International Conference on Functional Programming, page 149–160. Association for Computing Machinery, 2009

  8. [8]

    A Tool for Checking ANSI-C Programs

    Edmund Clarke, Daniel Kroening, and Flavio Lerda. A Tool for Checking ANSI-C Programs . In Kurt Jensen and Andreas Podelski, editors,Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2004), volume 2988 ofLecture Notes in Computer Science, pages 168–176. Springer, 2004. ISBN 3-540-21299-X

  9. [9]

    Minimization of Symbolic Automata.SIGPLAN Not., 49(1):541–553, Jan 2014

    Loris D’Antoni and Margus Veanes. Minimization of Symbolic Automata.SIGPLAN Not., 49(1):541–553, Jan 2014. ISSN 0362-1340. doi: 10.1145/2578855.2535849. URL https://doi.org/10.1145/2578855.2535849

  10. [10]

    N.G de Bruijn. Lambda Calculus Notation with Nameless Dummies, A Tool For Automatic Formula Manipulation, With Application to the Church-Rosser Theorem.Indagationes Mathematicae (Proceedings), 75(5):381–392, 1972. ISSN 1385-7258. doi: https://doi.org/10.1016/1385-7258(72)90034-0. URL https://www.sciencedirect.com/science/article/pii/ 1385725872900340

  11. [11]

    Giuseppe De Giacomo and Moshe Y. Vardi. Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, page 854–860. AAAI Press, 2013. ISBN 9781577356332

  12. [12]

    Z3: An Efficient SMT Solver

    Leonardo de Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg. ISBN 978-3-540-78800-3. doi: 10.1007/978-3-540-78800-3_24

  13. [13]

    Subquadra tic-Time Algorithms for Abelian Stringology Problems

    Ankush Desai and Shaz Qadeer. P: Modular and Safe Asynchronous Programming. In Shuvendu K. Lahiri and Giles Reger, editors,Runtime Verification - 17th International Conference, RV 2017, Seattle, W A, USA, September 13-16, 2017, Proceedings, volume 10548 ofLecture Notes in Computer Science, pages 3–7. Springer, 2017. doi: 10.1007/978-3-319- 67531-2\_1. URL...

  14. [14]

    Jackson, Shaz Qadeer, Sriram K

    Ankush Desai, Vivek Gupta, Ethan K. Jackson, Shaz Qadeer, Sriram K. Rajamani, and Damien Zufferey. P: Safe Asynchronous Event-Driven Programming. In Hans-Juergen Boehm and Cormac Flanagan, editors,ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, W A, USA, June 16-19, 2013, pages 321–332. ACM, 2013. doi: 10.1145/...

  15. [15]

    Ankush Desai, Amar Phanishayee, Shaz Qadeer, and Sanjit A. Seshia. Compositional Programming and Testing of Dynamic Distributed Systems.Proc. ACM Program. Lang., 2(OOPSLA), October 2018. doi: 10.1145/3276529. URL https://doi.org/10.1145/3276529

  16. [16]

    OLTP- Bench: An extensible testbed for benchmarking relational databases

    Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudre-Mauroux. OLTP-Bench: an Extensible Testbed for Benchmarking Relational Databases.Proc. VLDB Endow., 7(4):277–288, December 2013. ISSN 2150-8097. doi: 10.14778/2732240.2732246. URL https://doi.org/10.14778/2732240.2732246

  17. [17]

    Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W. O’Hearn. Scaling Static Analyses at Facebook. Commun. ACM, 62(8):62–70, July 2019. ISSN 0001-0782

  18. [18]

    The Power of Symbolic Automata and Transducers

    Loris D’Antoni and Margus Veanes. The Power of Symbolic Automata and Transducers. InComputer Aided Verification, pages 47–67. Springer, 2017

  19. [19]

    Generating Well-Typed Terms That Are Not "Useless"

    Justin Frank, Benjamin Quiring, and Leonidas Lampropoulos. Generating Well-Typed Terms That Are Not "Useless". Proc. ACM Program. Lang., 8(POPL):2318–2339, 2024. doi: 10.1145/3632919. URL https://doi.org/10.1145/3632919. , Vol. 1, No. 1, Article . Publication date: April 2026. 24 Zhe Zhou, Ankush Desai, Benjamin Delaware, and Suresh Jagannathan

  20. [20]

    Cutler, Daniel Dickstein, Benjamin C

    Harrison Goldstein, Joseph W. Cutler, Daniel Dickstein, Benjamin C. Pierce, and Andrew Head. Property-Based Testing in Practice. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024, pages 187:1–187:13. ACM, 2024. doi: 10.1145/3597503.3639581. URL https: //doi.org/10.1145/35975...

  21. [21]

    Pierce, Antal Spector-Zabusky, Dimitrios Vytiniotis, Arthur Azevedo de Amorim, and Leonidas Lampropoulos

    Catalin Hritcu, John Hughes, Benjamin C. Pierce, Antal Spector-Zabusky, Dimitrios Vytiniotis, Arthur Azevedo de Amorim, and Leonidas Lampropoulos. Testing Noninterference, Quickly. InProceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ICFP ’13, page 455–468, New York, NY, USA, 2013. Association for Computing Machinery. ...

  22. [22]

    A Falsification View of Success Typing

    Robert Jakob and Peter Thiemann. A Falsification View of Success Typing. In Klaus Havelund, Gerard Holzmann, and Rajeev Joshi, editors,NASA Formal Methods, pages 234–247, Cham, 2015. Springer International Publishing. ISBN 978-3-319-17524-9

  23. [23]

    Refinement Types: A Tutorial.Found

    Ranjit Jhala and Niki Vazou. Refinement Types: A Tutorial.Found. Trends Program. Lang., 6(3-4):159–317, 2021. doi: 10.1561/2500000032. URL https://doi.org/10.1561/2500000032

  24. [24]

    Local Temporal Reasoning

    Eric Koskinen and Tachio Terauchi. Local Temporal Reasoning. InProceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), CSL-LICS ’14, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450328869. doi: ...

  25. [25]

    We’ve Got You Covered: Type-Guided Repair of Incomplete Input Generators.Proc

    Patrick LaFontaine, Zhe Zhou, Ashish Mishra, Suresh Jagannathan, and Benjamin Delaware. We’ve Got You Covered: Type-Guided Repair of Incomplete Input Generators.Proc. ACM Program. Lang., 9(OOPSLA2), October 2025

  26. [26]

    Pierce, and Li-yao Xia

    Leonidas Lampropoulos, Diane Gallois-Wong, Cătălin Hriţcu, John Hughes, Benjamin C. Pierce, and Li-yao Xia. Beginner’s luck: A Language for Property-Based Generators. InProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL ’17, page 114–129, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 97814503466...

  27. [27]

    Quang Loc Le, Azalea Raad, Jules Villard, Josh Berdine, Derek Dreyer, and Peter W. O’Hearn. Finding Real Bugs in Big Programs with Incorrectness Logic.Proc. ACM Program. Lang., 6(OOPSLA):1–27, 2022. doi: 10.1145/3527325. URL https://doi.org/10.1145/3527325

  28. [28]

    A Complementary Approach to Incorrectness Typing

    Celia Mengyue Li, Sophie Pull, and Steven Ramsay. A Complementary Approach to Incorrectness Typing. InACM Symposium on Principles of Programming Languages (POPL), 2026

  29. [29]

    Property-Based Testing of OCaml 5’s Runtime System: Fun and Segfaults with Interpreters and State Transition Functions

    Jan Midtgaard. Property-Based Testing of OCaml 5’s Runtime System: Fun and Segfaults with Interpreters and State Transition Functions. InOlivierFest’25, page 180–193, New York, NY, USA, 2025. Association for Computing Machinery. ISBN 9798400721502. doi: 10.1145/3759427.3760378. URL https://doi.org/10.1145/3759427.3760378

  30. [30]

    In: PLDI

    Anders Miltner, Saswat Padhi, Todd D. Millstein, and David Walker. Data-driven Inference of Representation Invariants. InProceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI, pages 1–15. ACM, 2020. doi: 10.1145/3385412.3385967. URL https://doi.org/10.1145/3385412.3385967

  31. [31]

    Federico Mora, Ankush Desai, Elizabeth Polgreen, and Sanjit A. Seshia. Message Chains for Distributed System Verification.Proc. ACM Program. Lang., 7(OOPSLA2), October 2023. doi: 10.1145/3622876. URL https://doi.org/10.1145/ 3622876

  32. [32]

    Moss and Tamara von Glehn

    Yoji Nanjo, Hiroshi Unno, Eric Koskinen, and Tachio Terauchi. A Fixpoint Logic and Dependent Effects for Temporal Property Verification. InProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’18, page 759–768, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450355834. doi: 10.1145/3209108.3209204. UR...

  33. [33]

    Guaranteed bounds for posterior inference in universal probabilistic programming , booktitle =

    Liam O’Connor and Oskar Wickström. Quickstrom: property-based acceptance testing with ltl specifications. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2022, page 1025–1038, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392655. doi: 10.1145/3519939.352372...

  34. [34]

    Peter W. O’Hearn. Incorrectness Logic.Proc. ACM Program. Lang., 4(POPL), 2019. doi: 10.1145/3371078. URL https://doi.org/10.1145/3371078

  35. [35]

    Pałka, Koen Claessen, Alejandro Russo, and John Hughes

    Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes. Testing an Optimising Compiler by Generating Random Lambda Terms. InProceedings of the 6th International Workshop on Automation of Software Test, AST ’11, page 91–97, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450305921. doi: 10.1145/1982595.1982615. URL https://...

  36. [36]

    QuickCheck Inspired Property-Based Testing for Ocaml

    QCheck. QuickCheck Inspired Property-Based Testing for Ocaml. https://c-cube.github.io/qcheck/, 2024. URL https://c-cube.github.io/qcheck/

  37. [37]

    O’Hearn, and Jules Villard

    Azalea Raad, Josh Berdine, Hoang-Hai Dang, Derek Dreyer, Peter O’Hearn, and Jules Villard. Local Reasoning About the Presence of Bugs: Incorrectness Separation Logic. InComputer Aided Verification: 32nd International Conference, CA V 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part II, page 225–252, Berlin, Heidelberg, 2020. Springer-Verlag...

  38. [38]

    A General Approach to Under-Approximate Reasoning About Concurrent Programs

    Azalea Raad, Julien Vanegue, Josh Berdine, and Peter O’Hearn. A General Approach to Under-Approximate Reasoning About Concurrent Programs. In Guillermo A. Pérez and Jean-François Raskin, editors,34th International Conference on Concurrency Theory (CONCUR 2023), volume 279 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 25:1–25:17, Dagst...

  39. [39]

    Ill-Typed Programs Don’t Evaluate.Proc

    Steven Ramsay and Charlie Walpole. Ill-Typed Programs Don’t Evaluate.Proc. ACM Program. Lang., 8(POPL), January 2024

  40. [40]

    F. P. Ramsey.On a Problem of Formal Logic, pages 1–24. Birkhäuser Boston, Boston, MA, 1987. ISBN 978-0-8176-4842-8. doi: 10.1007/978-0-8176-4842-8_1. URL https://doi.org/10.1007/978-0-8176-4842-8_1

  41. [41]

    Temporal Verification with Answer-Effect Modification: Dependent Temporal Type- and-Effect System with Delimited Continuations.Proc

    Taro Sekiyama and Hiroshi Unno. Temporal Verification with Answer-Effect Modification: Dependent Temporal Type- and-Effect System with Delimited Continuations.Proc. ACM Program. Lang., 7(POPL), jan 2023. doi: 10.1145/3571264. URL https://doi.org/10.1145/3571264

  42. [42]

    History Effects and Verification

    Christian Skalka and Scott Smith. History Effects and Verification. In Wei-Ngan Chin, editor,Programming Languages and Systems, pages 107–128, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg. ISBN 978-3-540-30477-7

  43. [43]

    Applications of Symbolic Finite Automata

    Margus Veanes. Applications of Symbolic Finite Automata. In Stavros Konstantinidis, editor,Implementation and Application of Automata, pages 16–23, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-39274-0

  44. [44]

    Refinement Type Refutations.Proc

    Robin Webbers, Klaus von Gleissenthall, and Ranjit Jhala. Refinement Type Refutations.Proc. ACM Program. Lang., 8 (OOPSLA2), October 2024

  45. [45]

    Covering All the Bases: Type-Based Verification of Test Input Generators.Proc

    Zhe Zhou, Ashish Mishra, Benjamin Delaware, and Suresh Jagannathan. Covering All the Bases: Type-Based Verification of Test Input Generators.Proc. ACM Program. Lang., 7(PLDI), June 2023. doi: 10.1145/3591271. URL https://doi.org/10. 1145/3591271

  46. [46]

    A HAT Trick: Automatically Verifying Representation Invariants using Symbolic Finite Automata.Proc

    Zhe Zhou, Qianchuan Ye, Benjamin Delaware, and Suresh Jagannathan. A HAT Trick: Automatically Verifying Representation Invariants using Symbolic Finite Automata.Proc. ACM Program. Lang., 8(PLDI):1387–1411, 2024. doi: 10.1145/3656433. URL https://doi.org/10.1145/3656433

  47. [47]

    Absofty∗term

    Noam Zilberstein, Derek Dreyer, and Alexandra Silva. Outcome Logic: A Unifying Foundation for Correctness and Incorrectness Reasoning.Proc. ACM Program. Lang., 7(OOPSLA1), April 2023. , Vol. 1, No. 1, Article . Publication date: April 2026. 26 Zhe Zhou, Ankush Desai, Benjamin Delaware, and Suresh Jagannathan A Outlines of Supplemental Materials The supple...