arxiv: 2604.17290 · v1 · submitted 2026-04-19 · 💻 cs.CL · cs.AI· cs.PL

Recognition: unknown

Probabilistic Programs of Thought

Poorva Garg , Renato Lui Geh , Daniel Israel , Todd Millstein , Kyle Richardson , Guy Van den Broeck

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:43 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.PL

keywords probabilistic programs of thoughtLLM code generationtest-time samplingmathematical reasoningprogram of thoughtefficient inferencenext-token probabilitiesstructured output

0 comments

The pith

A single LLM generation can be converted into a probabilistic program representing exponentially many deterministic programs for cheap additional sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces probabilistic programs of thought as a test-time technique that extracts more program samples from language models without running additional GPU-intensive generations. It takes the next-token probability distribution from one model output and assembles it into a compact probabilistic program that encodes many possible valid programs. Probabilistic reasoning over this structure replaces repeated calls to the LLM and produces performance gains on code generation, code understanding, and mathematical reasoning benchmarks. The approach matters because the cost of drawing large numbers of samples has been a practical barrier for structured reasoning tasks.

Core claim

Given a program generated by an LLM together with its next-token probabilities, the authors construct a probabilistic program that compactly represents exponentially many deterministic programs. Because inference in this probabilistic program is far cheaper than additional LLM generations, new program samples can be obtained with little CPU overhead and no further GPU cost. When instantiated on standard benchmarks, the method yields higher performance than standard sampling while using fewer model calls.

What carries the argument

The probabilistic program assembled from one LLM generation and its next-token probabilities, which encodes a distribution over valid programs to enable low-cost sampling and inference.

If this is right

Users can draw additional program samples at CPU cost only after one LLM call.
Performance on code generation and mathematical reasoning tasks improves with fewer total generations from the model.
The framework applies equally to code understanding tasks that require reasoning over programs.
Sampling strategies that previously scaled with the number of generations become cheaper at large sample counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same construction could be applied to other structured generation tasks such as planning or symbolic reasoning where LLMs output sequences with token-level probabilities.
If the probabilistic program can be compiled into even faster inference routines, the CPU overhead could be reduced further for real-time applications.
Combining this method with existing prompting techniques like chain-of-thought might multiply the effective diversity of reasoning paths at fixed compute.

Load-bearing premise

The next-token probabilities produced by a single LLM generation can be assembled into a probabilistic program whose distribution over valid programs matches what would be obtained by repeated independent generations.

What would settle it

Run many independent LLM generations on the same prompt to obtain an empirical distribution of valid programs, then compare it to the distribution obtained by sampling from the constructed probabilistic program; large divergence in the sets of valid programs or downstream task accuracy would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.17290 by Daniel Israel, Guy Van den Broeck, Kyle Richardson, Poorva Garg, Renato Lui Geh, Todd Millstein.

**Figure 1.** Figure 1: A probabilistic program of thought can sample a correct program from an incorrect program of thought. An LLM generated program is equipped with probability distributions parameterized from the logits of the LLM; select tokens are treated as random variables Xi , turning the code into a probabilistic program. In this paper, we propose addressing these limitations by exposing the LLM’s underlying probability… view at source ↗

**Figure 2.** Figure 2: A probabilistic program (left) induces a distribution over exponentially many deterministic programs (right). This probabilistic program defines random variables X , Y and Z , conditions on Y .= true and returns Z as the logical or of X and Y; each deterministic program (and their execution) thus has a probability according to this distribution. 2025) across three different benchmarks—GSM8k (Cobbe et al., … view at source ↗

**Figure 3.** Figure 3: PPoT achieves similar or higher performance with fewer LLM samples. Each curve shows the pass@(k + k · m) accuracy of k LLM samples on GSM8k, each of which having been boosted by m samples from PPoT, where m ∈ {0, 1, 5, 10, 15, 20}. The gray dashed line highlights how many fewer (from the 20) LLM samples we can get away with if we use m many PPoT samples. and D). With only m=5 PPoT samples, accuracy improv… view at source ↗

**Figure 4.** Figure 4: PPoT enables cheap samples that correct the LLM generated program. From left to right, we show the ground-truth plot, followed by a diff between the LLM and PPoT samples, the LLM sample rendering and then the resulting PPoT sample. The LLM generated code contains a runtime error and so produced no graph. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: PPoT compilation and sampling time is computationally cheap. Each curve shows the runtime of sampling k LLM samples, compiling and then sampling m programs from PPoT on GSM8k, where m ∈ {0, 1, 5, 10, 15, 20}. When m = 0, no compilation or sampling is done. 4.2 ... for Efficiency PPoT compilation and sampling require no GPU forward passes. In practice, the wallclock runtime is dominated by LLM inference [… view at source ↗

**Figure 6.** Figure 6: Effective LLM samples from PPoT for GSM8k. For each configuration of k LLM samples with m ∈ {0, 1, 5, 10, 15, 20} PPoT samples each, we compute how many additional LLM-only samples would be needed to match the same accuracy using the LLM-only scaling law. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: PPoT achieves similar or higher performance with fewer LLM samples. Each curve shows the pass@(k + k · m) accuracy of k LLM samples on CRUXEval, each of which having been boosted by m samples from PPoT, where m ∈ {0, 1, 5, 10, 15, 20}. The gray dashed line highlights how many fewer (from the 20) LLM samples we can get away with if we use m many PPoT samples. 0 5 10 15 20 2 4 6 Time (in s) 0.5B 0 5 10 15 20… view at source ↗

**Figure 8.** Figure 8: PPoT compilation and sampling time is computationally cheap. Each curve shows the runtime of sampling k LLM samples, compiling and then sampling m programs from PPoT on CRUXEval, where m ∈ {0, 1, 5, 10, 15, 20}. When m = 0, no compilation or sampling is done. F Qualitative examples of PPoT In this section we show some examples where PPoT performs better compared to the LLM samples. We start with some examp… view at source ↗

**Figure 9.** Figure 9: PPoT achieves similar or higher performance with fewer LLM samples. Each curve shows the pass@(k + k · m) accuracy of k LLM samples on Plot2Code, each of which having been boosted by m samples from PPoT, where m ∈ {0, 1, 5, 10, 15, 20}. The gray dashed line highlights how many fewer (from the 20) LLM samples we can get away with if we use m many PPoT samples. 0 5 10 15 20 10 20 30 Number of LLM samples (k)… view at source ↗

**Figure 10.** Figure 10: PPoT compilation and sampling time is computationally cheap. Each curve shows the runtime of sampling k LLM samples, compiling and then sampling m programs from PPoT on Plot2Code, where m ∈ {0, 1, 5, 10, 15, 20}. When m = 0, no compilation or sampling is done. --- LLM sample +++ PPoT sample - data1 = np.random.randn(5, 50) + data1 = np.random.randn(6, 50) - data3 = np.random.gamma(1, 1, (60, 50)) + data3 … view at source ↗

**Figure 11.** Figure 11: shows accuracy as a function of k for varying m, with scaling-law fits extrapolated beyond the data range. The corresponding log-log view in [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

**Figure 12.** Figure 12: Scaling-law fits in log-log space. The error (1 − accuracy) follows a power law a k−b . The steeper slopes for higher m show that PPoT samples for GSM8k improve the scaling exponent, not just the intercept. The quantity n ∗ − k then measures how many additional LLM samples would be needed to achieve the same accuracy without PPoT, providing a measure of the value of the k · m PPoT samples used. We plot th… view at source ↗

read the original abstract

LLMs are widely used for code generation and mathematical reasoning tasks where they are required to generate structured output. They either need to reason about code, generate code for a given specification, or reason using programs of thought. The typical approach to code generation is to prompt the model and generate samples until an appropriate program is obtained. Within this process, sampling $n$ programs from the language model requires $n$ GPU compute-intensive generations which becomes prohibitively expensive for larger values of $n$. In this work, we address this limitation by exposing the LLM's distribution within the generated programs themselves. We propose a novel test-time framework we dub probabilistic programs of thought to obtain more samples from the model with fewer LLM generations. Given a program generated by a model and the associated next-token probabilities, we build a probabilistic program that compactly represents exponentially many deterministic programs. Since performing probabilistic reasoning in this probabilistic program is much cheaper, our approach allows sampling new programs without any additional GPU compute and little CPU overhead. We instantiate our approach on benchmarks for code generation, code understanding and mathematical reasoning and report improvements in performance with fewer generations from the LLM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The idea of building a probabilistic program from one LLM generation's token probabilities to sample many more programs cheaply is a distinct test-time move, but it is unclear whether the resulting samples match the model's actual distribution or just local variants around the observed trace.

read the letter

The main thing here is that the authors take a single LLM generation for code or math, keep its next-token probabilities, and turn that into a compact probabilistic program. From there they can draw many additional deterministic programs on CPU with almost no extra cost. They test the approach on code generation, code understanding, and mathematical reasoning benchmarks and state that it improves results while using fewer full model generations than standard sampling.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes a test-time framework called 'probabilistic programs of thought' for LLMs in code generation and mathematical reasoning. From a single model-generated program and its associated next-token probabilities, the authors construct a probabilistic program that compactly encodes exponentially many deterministic programs. Probabilistic reasoning over this structure is performed at low CPU cost to sample additional programs, avoiding the need for multiple GPU-intensive LLM generations. The paper claims this yields performance improvements on benchmarks for code generation, code understanding, and mathematical reasoning.

Significance. If the probabilistic construction faithfully approximates the LLM distribution and the claimed gains are reproducible, the method could meaningfully reduce the inference cost of multi-sample techniques for structured output tasks, enabling larger effective sample sizes without proportional compute scaling.

major comments (3)

Abstract: The claim that the method 'report improvements in performance with fewer generations from the LLM' is unsupported by any quantitative results, benchmark scores, baselines, error bars, or statistical analysis anywhere in the manuscript. This leaves the central empirical assertion without visible evidence.
Framework description: The assembly of next-token probabilities from one generation into a probabilistic program is presented at a high level without a formal definition or algorithm. It is unclear how the resulting distribution marginalizes over programs reachable only via different prefixes, raising the risk that samples are drawn from a narrower conditional rather than the model's full distribution over valid programs.
Method and experiments: No details are supplied on the specific probabilistic inference procedure (e.g., exact sampling or marginalization algorithm), the overhead measurements, the benchmarks, the number of samples drawn, or any ablation showing that the gains exceed those from standard repeated sampling. This prevents verification of the 'exponentially many' and 'little CPU overhead' claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and outline the revisions planned for the manuscript.

read point-by-point responses

Referee: Abstract: The claim that the method 'report improvements in performance with fewer generations from the LLM' is unsupported by any quantitative results, benchmark scores, baselines, error bars, or statistical analysis anywhere in the manuscript. This leaves the central empirical assertion without visible evidence.

Authors: We agree that the abstract claim requires explicit supporting evidence. The manuscript states that the approach was instantiated on benchmarks and reports improvements, but we acknowledge these results were not presented with the necessary quantitative detail, baselines, error bars, or statistical analysis. In the revision we will add a dedicated experimental results section containing benchmark scores, comparisons to repeated-sampling baselines, error bars from multiple runs, and significance tests to substantiate the performance claims. revision: yes
Referee: Framework description: The assembly of next-token probabilities from one generation into a probabilistic program is presented at a high level without a formal definition or algorithm. It is unclear how the resulting distribution marginalizes over programs reachable only via different prefixes, raising the risk that samples are drawn from a narrower conditional rather than the model's full distribution over valid programs.

Authors: We thank the referee for identifying the lack of formality. The current presentation is intentionally conceptual; we will revise the framework section to include a precise mathematical definition of the probabilistic program and a pseudocode algorithm that assembles the next-token probabilities. Regarding marginalization: the construction encodes a tree whose branches are weighted by the model's token probabilities at every position, enabling sampling of programs that diverge at any prefix. We will add a derivation showing that the marginal distribution over complete programs equals the LLM's conditional distribution given the prompt, confirming that samples are drawn from the full distribution rather than a narrower conditional. revision: yes
Referee: Method and experiments: No details are supplied on the specific probabilistic inference procedure (e.g., exact sampling or marginalization algorithm), the overhead measurements, the benchmarks, the number of samples drawn, or any ablation showing that the gains exceed those from standard repeated sampling. This prevents verification of the 'exponentially many' and 'little CPU overhead' claims.

Authors: We agree that these implementation and experimental details are required for verification. We will expand the method and experiments sections to specify the inference procedure (exact enumeration over the compact structure for the reported scales, with Monte Carlo for larger cases), report CPU overhead measurements (typically orders of magnitude below a single LLM generation), list the exact benchmarks and number of samples drawn per instance, and include ablations against standard repeated LLM sampling. These additions will directly support the claims of exponentially many programs and negligible CPU cost. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method extends LLM sampling without reducing to fitted inputs or self-referential definitions

full rationale

The paper's core proposal constructs a probabilistic program from one LLM generation's next-token probabilities to represent exponentially many deterministic programs, enabling cheaper sampling. This step is presented as a novel assembly technique rather than a redefinition of the input distribution or a fitted parameter renamed as a prediction. No load-bearing self-citations, uniqueness theorems, or ansatzes from prior author work are invoked; the derivation chain remains self-contained against external benchmarks for code generation and reasoning tasks. The central claim depends on the empirical validity of the assembly approximating the LLM distribution, which is an assumption open to falsification rather than a circular reduction by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the domain assumption that token-level probabilities from a single generation can be aggregated into a faithful probabilistic program over structured outputs.

axioms (1)

domain assumption LLM next-token probabilities can be used to construct a compact probabilistic program representing the distribution over valid programs
Invoked when building the probabilistic program from the generated output and probabilities.

invented entities (1)

probabilistic program of thought no independent evidence
purpose: Compact representation of exponentially many deterministic programs derived from one LLM generation
New construct introduced to enable cheap sampling without additional LLM calls.

pith-pipeline@v0.9.0 · 5503 in / 1212 out tokens · 39525 ms · 2026-05-10T05:43:35.374135+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 13 canonical work pages · 2 internal anchors

[1]

2023 , eprint=

From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought , author=. 2023 , eprint=

2023
[2]

Correct System Design: Symposium in Honor of Ernst-R

Understanding probabilistic programs , author=. Correct System Design: Symposium in Honor of Ernst-R. 2015 , organization=

2015
[3]

Future of software engineering proceedings , pages=

Probabilistic programming , author=. Future of software engineering proceedings , pages=
[4]

A survey on large language models for code generation,

Jiang, Juyong and Wang, Fan and Shen, Jiasi and Kim, Sungju and Kim, Sunghun , year=. A Survey on Large Language Models for Code Generation , volume=. ACM Transactions on Software Engineering and Methodology , publisher=. doi:10.1145/3747588 , number=

work page doi:10.1145/3747588
[5]

2024 , eprint=

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution , author=. 2024 , eprint=

2024
[6]

2023 , eprint=

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=

2023
[7]

2025 , eprint=

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models , author=. 2025 , eprint=

2025
[8]

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis , year=

Cheng, Junyan and Richardson, Kyle and Chin, Peter , booktitle=. Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis , year=
[9]

Differentiation

Dspy assertions: Computational constraints for self-refining language model pipelines , author=. arXiv preprint arXiv:2312.13382 , year=

work page arXiv
[10]

Machine Learning , volume=

Probabilistic (logic) programming concepts , author=. Machine Learning , volume=. 2015 , publisher=

2015
[11]

Advances in neural information processing systems , volume=

Deepproblog: Neural probabilistic logic programming , author=. Advances in neural information processing systems , volume=
[12]

Forty-second International Conference on Machine Learning , year=

Understanding the Logic of Direct Preference Alignment through Logic , author=. Forty-second International Conference on Machine Learning , year=
[13]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Dspy: Compiling declarative language model calls into self-improving pipelines , author=. arXiv preprint arXiv:2310.03714 , year=

work page internal anchor Pith review arXiv
[14]

Proceedings of the ACM on Programming Languages , volume=

Prompting is programming: A query language for large language models , author=. Proceedings of the ACM on Programming Languages , volume=. 2023 , publisher=

2023
[15]

2022 , eprint=

RL with KL penalties is better viewed as Bayesian inference , author=. 2022 , eprint=

2022
[16]

2021 , eprint=

Training Verifiers to Solve Math Word Problems , author=. 2021 , eprint=

2021
[17]

2025 , eprint=

Evaluation of Best-of-N Sampling Strategies for Language Model Alignment , author=. 2025 , eprint=

2025
[18]

2023 , eprint=

Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs , author=. 2023 , eprint=

2023
[19]

2025 , eprint=

Large Language Bayes , author=. 2025 , eprint=

2025
[20]

2026 , eprint=

RefineStat: Efficient Exploration for Probabilistic Program Synthesis , author=. 2026 , eprint=

2026
[21]

2023 , eprint=

Scaling Integer Arithmetic in Probabilistic Programs , author=. 2023 , eprint=

2023
[22]

2025 , eprint=

NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning , author=. 2025 , eprint=

2025
[23]

2025 , eprint=

Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo , author=. 2025 , eprint=

2025
[24]

International conference on machine learning , pages=

A semantic loss function for deep learning with symbolic knowledge , author=. International conference on machine learning , pages=. 2018 , organization=

2018
[25]

2022 , eprint=

Language Model Cascades , author=. 2022 , eprint=

2022
[26]

2025 , eprint=

Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs , author=. 2025 , eprint=

2025
[27]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Visual programming: Compositional visual reasoning without training , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[28]

2025 , eprint=

NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language , author=. 2025 , eprint=

2025
[29]

2025 , eprint=

BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models , author=. 2025 , eprint=

2025
[30]

2025 , eprint=

Probabilistic Soundness Guarantees in LLM Reasoning Chains , author=. 2025 , eprint=

2025
[31]

2024 , eprint=

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters , author=. 2024 , eprint=

2024
[32]

2020 , eprint=

The Curious Case of Neural Text Degeneration , author=. 2020 , eprint=

2020
[33]

Proceedings of the ACM on Programming Languages , volume=

Scaling exact inference for discrete probabilistic programs , author=. Proceedings of the ACM on Programming Languages , volume=. 2020 , publisher=

2020
[34]

2024 , eprint=

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots , author=. 2024 , eprint=

2024
[35]

2024 , eprint=

Qwen2.5-Coder Technical Report , author=. 2024 , eprint=

2024
[36]

2025 , eprint=

Qwen2.5-VL Technical Report , author=. 2025 , eprint=

2025
[37]

Foundations of Probabilistic Programming , year=

Semantics of Probabilistic Programming: A Gentle Introduction , author=. Foundations of Probabilistic Programming , year=
[38]

20th Annual Symposium on Foundations of Computer Science (sfcs 1979) , pages=

Semantics of probabilistic programs , author=. 20th Annual Symposium on Foundations of Computer Science (sfcs 1979) , pages=. 1979 , organization=

1979
[39]

2016 , isbn =

Pfeffer, Avi , title =. 2016 , isbn =

2016
[40]

A neural probabilistic language model , year =

Bengio, Yoshua and Ducharme, R\'. A neural probabilistic language model , year =. J. Mach. Learn. Res. , month = mar, pages =
[41]

Benjamin C. Pierce and Arthur Azevedo de Amorim and Chris Casinghino and Marco Gaboardi and Michael Greenberg and Cătălin Hriţcu and Vilhelm Sjöberg and Andrew Tolmach and Brent Yorgey , editor =. Programming Language Foundations. 2026

2026
[42]

Separation Logic Foundations

Arthur Charguéraud , editor =. Separation Logic Foundations. 2026

2026
[43]

Where is the signal in tokenization space?

Geh, Renato and Zhang, Honghua and Ahmed, Kareem and Wang, Benjie and Van Den Broeck, Guy. Where is the signal in tokenization space?. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.230

work page doi:10.18653/v1/2024.emnlp-main.230 2024
[44]

2026 , eprint=

Likelihood hacking in probabilistic program synthesis , author=. 2026 , eprint=

2026
[45]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021
[46]

and Mansinghka, Vikash K

Goodman, Noah D. and Mansinghka, Vikash K. and Roy, Daniel and Bonawitz, Keith and Tenenbaum, Joshua B. , title =. Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence , pages =. 2008 , isbn =

2008
[47]

Journal of Statistical Software , author=

Stan: A Probabilistic Programming Language , volume=. Journal of Statistical Software , author=. 2017 , pages=. doi:10.18637/jss.v076.i01 , abstract=

work page doi:10.18637/jss.v076.i01 2017
[48]

Theory and Practice of Logic Programming , author=

Inference and learning in probabilistic logic programs using weighted Boolean formulas , volume=. Theory and Practice of Logic Programming , author=. 2015 , pages=. doi:10.1017/S1471068414000076 , number=

work page doi:10.1017/s1471068414000076 2015
[49]

, author Andreani, V

Oriol Abril-Pla and Virgile Andreani and Colin Carroll and Larry Dong and Christopher J. Fonnesbeck and Maxim Kochurov and Ravin Kumar and Junpeng Lao and Christian C. Luhmann and Osvaldo A. Martin and Michael Osthege and Ricardo Vieira and Thomas Wiecki and Robert Zinkov , journal =. doi:10.7717/peerj-cs.1516 , year =

work page doi:10.7717/peerj-cs.1516
[50]

Hill and James Mann , title =

Theodore P. Hill and James Mann , title =. Stochastic Analysis and Applications , volume =. 2000 , publisher =. doi:10.1080/07362990008809656 , URL =

work page doi:10.1080/07362990008809656 2000
[51]

and Doucet, A

Crisan, D. and Doucet, A. , journal=. A survey of convergence results on particle filtering methods for practitioners , year=
[52]

Scaling Laws for Neural Language Models

Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001
[53]

2025 , month = feb, number =

How do large language monkeys get their power (laws)? , author=. arXiv preprint arXiv:2502.17578 , year=

work page arXiv
[54]

doi:10.1007/978-3-540-34514-5 , publisher=

Measure Theory , author=. doi:10.1007/978-3-540-34514-5 , publisher=

work page doi:10.1007/978-3-540-34514-5
[55]

Proceedings of the 35th International Conference on Machine Learning , pages =

Sound Abstraction and Decomposition of Probabilistic Programs , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

2018
[56]

2024 , eprint=

A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints , author=. 2024 , eprint=

2024
[57]

Pierce, and Guy Van den Broeck

Tjoa, Ryan and Garg, Poorva and Goldstein, Harrison and Millstein, Todd and Pierce, Benjamin C. and Van den Broeck, Guy , year=. Tuning Random Generators: Property-Based Testing as Probabilistic Programming , volume=. Proceedings of the ACM on Programming Languages , publisher=. doi:10.1145/3763082 , number=

work page doi:10.1145/3763082
[58]

Bit Blasting Probabilistic Programs , volume=

Garg, Poorva and Holtzen, Steven and Van den Broeck, Guy and Millstein, Todd , year=. Bit Blasting Probabilistic Programs , volume=. Proceedings of the ACM on Programming Languages , publisher=. doi:10.1145/3656412 , number=

work page doi:10.1145/3656412
[59]

and Friedman, N

Koller, D. and Friedman, N. , biburl =. Probabilistic Graphical Models: Principles and Techniques , url =
[60]

http://starai.cs.ucla.edu/papers/GargTPM25.pdf

Garg, Poorva and Wang, Benjie and Broadrick, Oliver and Van den Broeck, Guy and Millstein, Todd , title =. Proceedings of the UAI Workshop on Tractable Probabilistic Modeling (TPM) , url = "http://starai.cs.ucla.edu/papers/GargTPM25.pdf", month = 7, year =