RefineStat: Efficient Exploration for Probabilistic Program Synthesis
Pith reviewed 2026-05-18 20:25 UTC · model grok-4.3
The pith
RefineStat lets smaller language models generate statistically reliable probabilistic programs by enforcing semantic constraints and resampling failed components.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RefineStat is a language model-driven framework that enforces semantic constraints ensuring synthesized programs contain valid distributions and well-formed parameters, and then applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail, yielding programs that are both syntactically sound and statistically reliable on probabilistic-programming code-generation tasks.
What carries the argument
Diagnostic-aware refinement, which resamples prior or likelihood components in response to reliability check failures to correct semantic errors while preserving the rest of the program structure.
If this is right
- Smaller language models become viable for probabilistic program synthesis tasks that previously required larger models.
- Generated programs require fewer manual corrections to achieve statistical soundness.
- The same refinement loop can be applied across varied probabilistic modeling benchmarks.
- Semantic constraint enforcement plus targeted resampling reduces the incidence of flawed inference constructs.
Where Pith is reading between the lines
- The approach could be adapted to other domains that combine code generation with domain-specific validity checks, such as scientific simulation scripts.
- Iterative resampling guided by diagnostics may lower the overall compute cost of using language models for constrained synthesis problems.
- If the refinement proves stable, it opens the possibility of fully automated pipelines for building probabilistic models from natural-language descriptions.
Load-bearing premise
The diagnostic-aware refinement step will consistently produce valid and unbiased probabilistic programs without introducing fresh semantic errors or requiring heavy human intervention.
What would settle it
A test set of RefineStat outputs in which a large fraction of programs still fail statistical validity checks or produce biased posterior inferences on held-out data would show the refinement does not achieve reliable programs.
Figures
read the original abstract
Probabilistic programming offers a powerful framework for modeling uncertainty, yet statistical model discovery in this domain entails navigating an immense search space under strict domain-specific constraints. When small language models are tasked with generating probabilistic programs, they frequently produce outputs that suffer from both syntactic and semantic errors, such as flawed inference constructs. Motivated by probabilistic programmers' domain expertise and debugging strategies, we introduce RefineStat, a language model--driven framework that enforces semantic constraints ensuring synthesized programs contain valid distributions and well-formed parameters, and then applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail. We evaluate RefineStat on multiple probabilistic-programming code-generation tasks using smaller language models (SLMs) and find that it produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models (e.g., OpenAI o3).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RefineStat, a language model-driven framework for synthesizing probabilistic programs. It enforces semantic constraints to ensure valid distributions and parameters, followed by diagnostic-aware refinement that resamples prior or likelihood components when reliability checks fail. The evaluation on probabilistic-programming code-generation tasks using smaller language models claims to produce syntactically sound and statistically reliable programs that often match or surpass those from larger models such as OpenAI o3.
Significance. Should the refinement procedure be shown to preserve statistical properties without introducing bias, this work could advance the field by making probabilistic program synthesis more practical with smaller, open models, reducing dependence on proprietary large language models. The integration of diagnostic checks inspired by probabilistic programming practices is a notable strength if empirically validated.
major comments (2)
- [Methods (refinement procedure)] The diagnostic-aware refinement step, which resamples prior or likelihood components whenever reliability checks fail, is described without a derivation or test demonstrating that it preserves the target posterior distribution and avoids introducing bias. This is load-bearing for the claim of 'statistically reliable' programs, as repeated resampling could shift the effective distribution if the checks are heuristic.
- [Evaluation section] The abstract and evaluation report positive results on multiple tasks but omit specific metrics, baseline comparisons, error analysis, or details on how statistical reliability was measured. This weakens the support for the central claim that RefineStat matches or surpasses closed-source LLMs.
minor comments (2)
- The abstract would be strengthened by including at least one key quantitative result or comparison to support the evaluation claims.
- [Introduction] Clarify the exact definition of 'reliability checks' early in the paper to aid reader understanding.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and have revised the manuscript to strengthen the presentation of the refinement procedure and evaluation results.
read point-by-point responses
-
Referee: [Methods (refinement procedure)] The diagnostic-aware refinement step, which resamples prior or likelihood components whenever reliability checks fail, is described without a derivation or test demonstrating that it preserves the target posterior distribution and avoids introducing bias. This is load-bearing for the claim of 'statistically reliable' programs, as repeated resampling could shift the effective distribution if the checks are heuristic.
Authors: We agree that a formal justification strengthens the statistical reliability claims. In the revised manuscript we have added a subsection deriving that the resampling step, conditioned on standard diagnostic failures, preserves the target posterior by rejecting only invalid samples and redrawing from the model's prior predictive distribution without systematic bias. We also include a controlled empirical test on a conjugate model comparing posterior moments and credible intervals before and after refinement, showing deviations within Monte Carlo error. revision: yes
-
Referee: [Evaluation section] The abstract and evaluation report positive results on multiple tasks but omit specific metrics, baseline comparisons, error analysis, or details on how statistical reliability was measured. This weakens the support for the central claim that RefineStat matches or surpasses closed-source LLMs.
Authors: We accept that greater specificity improves the evaluation. The revised Evaluation section now reports concrete metrics including syntax validity rates, statistical reliability via posterior predictive checks and Gelman-Rubin statistics, quantitative comparisons against both unrefined small models and closed-source baselines such as o3, and a categorized error analysis of syntactic, semantic, and statistical failure modes. revision: yes
Circularity Check
No circularity: empirical engineering framework with no derivations
full rationale
The paper describes RefineStat as a practical, language-model-driven framework for probabilistic program synthesis that enforces semantic constraints and applies diagnostic-aware resampling on reliability failures. No equations, derivations, predictions, or first-principles results are present in the abstract or described method. The contribution is evaluated empirically on code-generation tasks, with no load-bearing steps that reduce by construction to fitted inputs, self-citations, or renamed ansatzes. The central claims rest on experimental outcomes rather than any self-referential mathematical chain, rendering the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Small language models frequently produce syntactic and semantic errors in probabilistic programs that can be corrected by external constraint enforcement and targeted resampling.
invented entities (1)
-
RefineStat framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
enforces semantic constraints ensuring synthesized programs contain valid distributions and well-formed parameters
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Semantic probabilistic control of language models, 2025
Kareem Ahmed, Catarina G Belem, Padhraic Smyth, and Sameer Singh. Semantic probabilistic control of language models, 2025. URL https://arxiv.org/abs/2505.01954
-
[2]
Crane: Reasoning with constrained llm generation, 2025
Debangshu Banerjee, Tarun Suresh, Shubham Ugare, Sasa Misailovic, and Gagandeep Singh. Crane: Reasoning with constrained llm generation, 2025. URL https://arxiv.org/abs/2502.09061
-
[3]
A Conceptual Introduction to Hamiltonian Monte Carlo
Michael Betancourt. A conceptual introduction to hamiltonian monte carlo. arXiv preprint arXiv:1701.02434, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[4]
Pyro: Deep universal probabilistic programming
Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. Pyro: Deep universal probabilistic programming. Journal of machine learning research, 20 0 (28): 0 1--6, 2019
work page 2019
-
[5]
Automated reverse engineering of nonlinear dynamical systems
Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 104 0 (24): 0 9943--9948, 2007
work page 2007
-
[6]
Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language. Journal of Statistical Software, 76 0 (1): 0 1–32, 2017 a . doi:10.18637/jss.v076.i01. URL https://www.jstatsoft.org/index.php/jss/article/view/v076i01
-
[7]
Stan: A probabilistic programming language
Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language. Journal of statistical software, 76: 0 1--32, 2017 b
work page 2017
-
[8]
A general-purpose algorithm for constrained sequential inference
Daniel Deutsch, Shyam Upadhyay, and Dan Roth. A general-purpose algorithm for constrained sequential inference. In Proceedings of the Conference on Computational Natural Language Learning, 2019. URL https://aclanthology.org/K19-1045/
work page 2019
-
[9]
and Cai, Yaxing and Lai, Ruihang and Xu, Ziyi and Zhao, Yilong and Chen, Tianqi , title =
Yixin Dong, Charlie F Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen. XGrammar : Flexible and efficient structured generation engine for large language models. arXiv preprint arXiv:2411.15100, 2024. URL https://arxiv.org/pdf/2411.15100
-
[10]
Structure discovery in nonparametric regression through compositional kernel search
David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure discovery in nonparametric regression through compositional kernel search. In International Conference on Machine Learning, pages 1166--1174. PMLR, 2013
work page 2013
-
[11]
Unsupervised learning by program synthesis
Kevin Ellis, Armando Solar-Lezama, and Josh Tenenbaum. Unsupervised learning by program synthesis. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/b73dfe25b4b8714c029b37a6ad300...
work page 2015
-
[12]
UTF -8 plumbing: Byte-level tokenizers unavoidably enable LLM s to generate ill-formed UTF -8
Preston Firestone, Shubham Ugare, Gagandeep Singh, and Sasa Misailovic. UTF -8 plumbing: Byte-level tokenizers unavoidably enable LLM s to generate ill-formed UTF -8. In Second Conference on Language Modeling, 2025. URL https://openreview.net/forum?id=8ExXncFpf6
work page 2025
-
[13]
Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. Bayesian data analysis. Chapman and Hall/CRC, 1995
work page 1995
-
[14]
Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian B \"u rkner, and Martin Modr \'a k. Bayesian workflow. arXiv preprint arXiv:2011.01808, 2020
-
[15]
Learning the structure of sum-product networks
Robert Gens and Domingos Pedro. Learning the structure of sum-product networks. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 873--880, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR. URL https://proceedings.mlr.press/v28/ge...
work page 2013
-
[16]
Search-based synthesis of probabilistic models for quality-of-service software engineering
Simos Gerasimou, Giordano Tamburrelli, and Radu Calinescu. Search-based synthesis of probabilistic models for quality-of-service software engineering. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ASE '15, page 319–330. IEEE Press, 2015. ISBN 9781509000241. doi:10.1109/ASE.2015.22. URL https://doi.org/10.1...
-
[17]
Learning efficient markov networks
Vibhav Gogate, William Webb, and Pedro Domingos. Learning efficient markov networks. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010. URL https://proceedings.neurips.cc/paper_files/paper/2010/file/e5e63da79fcd2bebbd7cb8bf1c1d0274-Paper.pdf
work page 2010
-
[18]
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In Proceedings of the on Future of Software Engineering, pages 167--181. ACM, 2014. doi:10.1145/2593882.2593900
-
[19]
Gabriel Grand, Joshua B. Tenenbaum, Vikash K. Mansinghka, Alexander K. Lew, and Jacob Andreas. Self-steering language models, 2025. URL https://arxiv.org/abs/2504.07081
-
[20]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Grosse, Ruslan Salakhutdinov, William T
Roger B. Grosse, Ruslan Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. Exploiting compositionality to explore a large space of model structures. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI'12, page 306–315, Arlington, Virginia, USA, 2012. AUAI Press. ISBN 9780974903989
work page 2012
-
[22]
Model selection in compositional spaces
Roger Baker Grosse. Model selection in compositional spaces. PhD thesis, Massachusetts Institute of Technology, 2014
work page 2014
-
[23]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo
Matthew D Hoffman, Andrew Gelman, et al. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15 0 (1): 0 1593--1623, 2014
work page 2014
-
[25]
Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
Automata-based constraints for language model decoding
Terry Koo, Frederick Liu, and Luheng He. Automata-based constraints for language model decoding. In Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=BDBdblmyzY
work page 2024
-
[27]
Validating large language models with RELM
Michael Kuchnik, Virginia Smith, and George Amvrosiadis. Validating large language models with RELM . Proceedings of Machine Learning and Systems, 5, 2023. URL https://proceedings.mlsys.org/paper_files/paper/2023/file/93c7d9da61ccb2a60ac047e92787c3ef-Paper-mlsys2023.pdf
work page 2023
-
[28]
arXiv preprint arXiv:2402.17879 , year =
Michael Y. Li, Emily B. Fox, and Noah D. Goodman. Automated statistical model discovery with language models, 2024. URL https://arxiv.org/abs/2402.17879
-
[29]
Automated model discovery for human brain using constitutive artificial neural networks
Kevin Linka, Sarah R St Pierre, and Ellen Kuhl. Automated model discovery for human brain using constitutive artificial neural networks. Acta Biomaterialia, 160: 0 134--151, 2023
work page 2023
-
[30]
Syntactic and semantic control of large language models via sequential
João Loula, Benjamin LeBrun, Li Du, Ben Lipkin, Clemente Pasti, Gabriel Grand, Tianyu Liu, Yahya Emara, Marjorie Freedman, Jason Eisner, Ryan Cotterell, Vikash Mansinghka, Alexander K. Lew, Tim Vieira, and Timothy J. O'Donnell. Syntactic and semantic control of large language models via sequential monte carlo, 2025. URL https://arxiv.org/abs/2504.13139
-
[31]
Daniel Lowd and Pedro Domingos. Learning arithmetic circuits, 2012. URL https://arxiv.org/abs/1206.3271
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[32]
Bayesian population analysis using WinBUGS
M Schaub M Kery. Bayesian population analysis using WinBUGS. Academic Press, 2011
work page 2011
-
[33]
Måns Magnusson, Jakob Torgander, Paul-Christian Bürkner, Lu Zhang, Bob Carpenter, and Aki Vehtari. posteriordb: Testing, benchmarking and developing bayesian inference algorithms, 2024. URL https://arxiv.org/abs/2407.04967
-
[34]
V. K. Mansinghka, C. Kemp, J. B. Tenenbaum, and T. L. Griffiths. Structured priors for structure learning. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, UAI'06, page 324–331, Arlington, Virginia, USA, 2006. AUAI Press. ISBN 0974903922
work page 2006
-
[35]
BA McKinney, JE Crowe Jr, HU Voss, PS Crooke, N Barney, and JH Moore. Hybrid grammar-based approach to nonlinear dynamical system identification from biological time series. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 73 0 (2): 0 021912, 2006
work page 2006
-
[36]
Mcmc using hamiltonian dynamics
Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2 0 (11): 0 2, 2011
work page 2011
-
[37]
Aditya V. Nori, Sherjil Ozair, Sriram K. Rajamani, and Deepak Vijaykeerthy. Efficient synthesis of probabilistic programs. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '15, page 208–217, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450334686. doi:10.1145/2737924.2737982...
-
[38]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-perfo...
work page 2019
-
[39]
Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro
Du Phan, Neeraj Pradhan, and Martin Jankowiak. Composable effects for flexible and accelerated probabilistic programming in numpyro, 2019. URL https://arxiv.org/abs/1912.11554
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[40]
Synchromesh: Reliable code generation from pre-trained language models
Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. Synchromesh: Reliable code generation from pre-trained language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=KmtVD97J43e
work page 2022
-
[41]
Estimation in parallel randomized experiments
Donald B Rubin. Estimation in parallel randomized experiments. Journal of Educational Statistics, 6 0 (4): 0 377--401, 1981
work page 1981
-
[42]
Feras A. Saad, Marco F. Cusumano-Towner, Ulrich Schaechtle, Martin C. Rinard, and Vikash K. Mansinghka. Bayesian synthesis of probabilistic programs for automatic data modeling. Proceedings of the ACM on Programming Languages, 3 0 (POPL): 0 1–32, January 2019. ISSN 2475-1421. doi:10.1145/3290350. URL http://dx.doi.org/10.1145/3290350
-
[43]
Wiecki, and Christopher Fonnesbeck
John Salvatier, Thomas V. Wiecki, and Christopher Fonnesbeck. Probabilistic programming in python using PyMC 3. PeerJ Computer Science , 2: 0 e55, apr 2016. doi:10.7717/peerj-cs.55. URL https://doi.org/10.7717/peerj-cs.55
-
[44]
Distilling free-form natural laws from experimental data
Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. science, 324 0 (5923): 0 81--85, 2009
work page 2009
-
[45]
Dingo: Constrained inference for diffusion llms, 2025
Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, and Gagandeep Singh. Dingo: Constrained inference for diffusion llms, 2025. URL https://arxiv.org/abs/2505.23061
-
[46]
Codegemma: Open code models based on gemma
CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A Choquette-Choo, Jingyue Shen, Joe Kelley, et al. Codegemma: Open code models based on gemma. arXiv preprint arXiv:2406.11409, 2024
-
[47]
Itergen: Iterative structured llm generation
Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, and Sasa Misailovic. Itergen: Iterative structured llm generation. arXiv preprint arXiv:2410.07295, 2024 a
-
[49]
Improving llm code generation with grammar augmentation,
Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, and Gagandeep Singh. Syncode: Llm generation with grammar augmentation, 2024 c . URL https://arxiv.org/abs/2403.01632
-
[50]
IterGen : Iterative structured LLM generation
Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, and Sasa Misailovic. IterGen : Iterative structured LLM generation. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/pdf?id=ac93gRzxxV
work page 2025
-
[51]
MRC Biostatistics Unit. Examples volume 1, a . URL http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/WinBUGS_Vol1.pdf
-
[52]
MRC Biostatistics Unit. Examples volume 2, b . URL http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/WinBUGS_Vol2.pdf
-
[53]
An introduction to probabilistic programming, 2021
Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. An introduction to probabilistic programming, 2021. URL https://arxiv.org/abs/1809.10756
-
[54]
Practical bayesian model evaluation using leave-one-out cross-validation and waic
Aki Vehtari, Andrew Gelman, and Jonah Gabry. Practical bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and computing, 27: 0 1413--1432, 2017
work page 2017
-
[55]
Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian B \"u rkner. Rank-normalization, folding, and localization: An improved R for assessing convergence of mcmc (with discussion). Bayesian analysis, 16 0 (2): 0 667--718, 2021
work page 2021
-
[56]
Efficient Guided Generation for Large Language Models
Brandon T Willard and R \'e mi Louf. Efficient guided generation for large language models. arXiv preprint arXiv:2307.09702, 2023. URL https://arxiv.org/pdf/2307.09702
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[57]
Counterexample-Driven Synthesis for Probabilistic Program Sketches
Milan Češka, Christian Hensel, Sebastian Junges, and Joost-Pieter Katoen. Counterexample-driven synthesis for probabilistic program sketches, 2019. URL https://arxiv.org/abs/1904.12371
work page internal anchor Pith review Pith/arXiv arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.