pith. sign in

arxiv: 1912.11554 · v1 · pith:52FPSCX5new · submitted 2019-12-24 · 📊 stat.ML · cs.AI· cs.LG· cs.PL

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

Pith reviewed 2026-05-15 15:58 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGcs.PL
keywords NumPyroPyroNUTSJAXeffect handlersJIT compilationprobabilistic programmingMCMC
0
0 comments X

The pith

NumPyro composes Pyro effect handlers with JAX to deliver a fully JIT-compiled iterative NUTS sampler.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NumPyro as a lightweight NumPy backend for the Pyro probabilistic programming language that reuses the same modeling interface, language primitives, and effect handling abstractions. It establishes that these effect handlers can be composed directly with JAX program transformations to enable hardware acceleration, automatic differentiation, and vectorization. The central demonstration is an iterative formulation of the No-U-Turn Sampler that supports end-to-end JIT compilation. This yields substantially faster inference than prior alternatives across both small and large dataset regimes. A reader would care because the work shows how to retain flexible probabilistic modeling while gaining the performance benefits of a functional numeric backend.

Core claim

NumPyro shows that Pyro's effect handlers compose with JAX's functional transformations to preserve the original modeling API while adding hardware acceleration and automatic differentiation. In particular it supplies an iterative formulation of the No-U-Turn Sampler that can be compiled end-to-end with JAX's JIT, producing faster runtimes than existing implementations in both the small-data and large-data regimes.

What carries the argument

Effect handlers that extend Pyro's modeling abstractions to JAX's functional transformations for acceleration and compilation.

If this is right

  • Probabilistic models written in the Pyro interface can run with full JIT compilation and hardware acceleration.
  • The same modeling code benefits from vectorization and automatic differentiation supplied by JAX.
  • Inference scales to both small and large datasets without separate code paths.
  • Effect-handler composition becomes a reusable pattern for adding new backends to probabilistic programming languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same handler-composition technique could be applied to accelerate other MCMC or variational methods inside JAX.
  • Models could be automatically ported between CPU, GPU, and TPU execution without rewriting inference logic.
  • New modeling primitives that exploit JAX's functional purity might become feasible once the handler layer is stable.

Load-bearing premise

Pyro effect handlers compose cleanly with JAX transformations without introducing correctness problems or reducing modeling expressiveness.

What would settle it

A direct runtime comparison on standard benchmark models showing that the NumPyro NUTS implementation is not faster than existing Pyro or Stan alternatives in either the small-dataset or large-dataset regime.

read the original abstract

NumPyro is a lightweight library that provides an alternate NumPy backend to the Pyro probabilistic programming language with the same modeling interface, language primitives and effect handling abstractions. Effect handlers allow Pyro's modeling API to be extended to NumPyro despite its being built atop a fundamentally different JAX-based functional backend. In this work, we demonstrate the power of composing Pyro's effect handlers with the program transformations that enable hardware acceleration, automatic differentiation, and vectorization in JAX. In particular, NumPyro provides an iterative formulation of the No-U-Turn Sampler (NUTS) that can be end-to-end JIT compiled, yielding an implementation that is much faster than existing alternatives in both the small and large dataset regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces NumPyro as a lightweight NumPy-based backend for the Pyro probabilistic programming language that preserves the same modeling interface and effect-handling abstractions. It demonstrates that Pyro effect handlers compose with JAX's functional transformations (JIT, autodiff, vectorization) to support an iterative formulation of the No-U-Turn Sampler (NUTS) that is end-to-end JIT-compilable, yielding substantially faster inference than existing alternatives across both small- and large-dataset regimes.

Significance. If the performance and correctness claims hold, the work is significant for showing how effect-handler composition can bridge imperative PPL APIs with functional autodiff frameworks, enabling scalable, hardware-accelerated sampling without loss of modeling expressiveness. The engineering result directly addresses practical bottlenecks in Bayesian inference for machine-learning models.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (results): the central claim that the JIT-compiled iterative NUTS is 'much faster than existing alternatives in both the small and large dataset regimes' is load-bearing yet supported only by high-level statements; specific wall-clock timings, hardware specifications, baseline implementations (e.g., Pyro, Stan, TensorFlow Probability), and dataset sizes must be reported with error bars or multiple runs to allow verification.
  2. [§3] §3 (NUTS formulation): the iterative NUTS algorithm is presented as end-to-end JIT-compatible, but the manuscript does not explicitly address potential non-differentiable control flow or side-effect leakage when the effect handlers are transformed; a short proof sketch or counter-example check would strengthen the correctness argument.
minor comments (2)
  1. [§4] Add a table or figure in §4 that directly tabulates speedup factors versus the closest competing samplers for the reported models.
  2. [Introduction] Clarify in the introduction whether the modeling API is byte-for-byte identical to Pyro or admits any syntactic differences.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and positive recommendation for minor revision. We address the major comments point-by-point below, agreeing to incorporate additional details and clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (results): the central claim that the JIT-compiled iterative NUTS is 'much faster than existing alternatives in both the small and large dataset regimes' is load-bearing yet supported only by high-level statements; specific wall-clock timings, hardware specifications, baseline implementations (e.g., Pyro, Stan, TensorFlow Probability), and dataset sizes must be reported with error bars or multiple runs to allow verification.

    Authors: We agree that the performance claims would benefit from more detailed empirical support. In the revised manuscript, we will expand §4 to include specific wall-clock timings, hardware specifications (such as the CPU and GPU models used), the exact baseline implementations (Pyro, Stan, TensorFlow Probability), dataset sizes, and results reported as means with standard deviations over multiple independent runs. This will provide the necessary quantitative evidence for the 'much faster' claim in both small and large dataset regimes. revision: yes

  2. Referee: [§3] §3 (NUTS formulation): the iterative NUTS algorithm is presented as end-to-end JIT-compatible, but the manuscript does not explicitly address potential non-differentiable control flow or side-effect leakage when the effect handlers are transformed; a short proof sketch or counter-example check would strengthen the correctness argument.

    Authors: We thank the referee for highlighting this point on correctness. The effect handlers in NumPyro are implemented to be fully compatible with JAX's functional transformations, ensuring no side-effect leakage and that control flow remains traceable. In the revision, we will add a short paragraph in §3 providing a sketch of why the iterative NUTS formulation avoids non-differentiable operations and side effects, referencing the pure functional nature of the handlers and JAX's tracing mechanism. If space permits, we can include a brief counter-example check or note on the absence of such issues in our implementation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in implementation description

full rationale

This is an implementation paper presenting NumPyro as a JAX-based backend for Pyro's modeling interface. The central claim concerns the engineering outcome of composing effect handlers with JAX transformations to enable an end-to-end JIT-compilable iterative NUTS sampler, with reported performance gains. No mathematical derivations, parameter fits, or self-referential equations appear in the provided text that reduce to their own inputs by construction. The work is self-contained as a software design and benchmarking description, with no load-bearing steps that match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that effect handlers from Pyro can be lifted to a JAX functional backend without semantic change.

axioms (1)
  • domain assumption Pyro effect handlers can be composed with JAX program transformations while preserving modeling semantics
    This is the central premise enabling the entire NumPyro design as stated in the abstract.

pith-pipeline@v0.9.0 · 5425 in / 1115 out tokens · 29639 ms · 2026-05-15T15:58:48.146930+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 49 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Archimedean Copula Inference via Taylor-Mode AD

    cs.LG 2026-05 unverdicted novelty 7.0

    acopula enables polynomial-time exact inference for arbitrary nested Archimedean copulas with censoring via Taylor-mode AD on user-defined generators.

  2. ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage

    cs.CR 2026-05 unverdicted novelty 7.0

    A new queryable binary dataset combining cross-build diversity, temporal history, and CVE labels with linked metadata for vulnerability research.

  3. Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

    stat.ML 2026-05 unverdicted novelty 7.0

    Derives Wasserstein bounds and explicit hyperparameter tuning rules for annealed Langevin dynamics in compositional score-based SBI, proving Linhart et al. (2026) allows larger steps and fewer total steps than Geffner...

  4. Mixed neural posterior estimation for simulators with discrete and continuous parameters

    cs.LG 2026-05 unverdicted novelty 7.0

    Extends NPE to mixed discrete-continuous parameter spaces via a factorized inference network combining an autoregressive classifier and generative model, trained jointly to yield accurate calibrated posteriors.

  5. Variational predictive resampling

    stat.ME 2026-05 conditional novelty 7.0

    Variational predictive resampling iteratively imputes data from a variational predictive to produce posterior samples that converge to the exact Bayesian posterior in Gaussian models where mean-field VI retains a gap.

  6. Variational predictive resampling

    stat.ME 2026-05 unverdicted novelty 7.0

    Variational predictive resampling uses sequential imputation from variational predictives to generate samples whose distribution converges to the exact Bayesian posterior in Gaussian models and improves dependence cap...

  7. Bayesian Doppler Imaging: Simultaneous Inference of Surface Maps and Geometric Parameters

    astro-ph.EP 2026-05 conditional novelty 7.0

    A fully Bayesian pixel-based Doppler imaging framework uses Gaussian Process priors and Hamiltonian Monte Carlo to simultaneously infer surface maps and geometric parameters from spectral data.

  8. ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations

    cs.DC 2026-05 conditional novelty 7.0

    ADELIA is the first AD-enabled INLA system that computes exact hyperparameter gradients via a structure-exploiting multi-GPU backward pass, delivering 4.2-7.9x per-gradient speedups and 5-8x better energy efficiency t...

  9. Archival Multiband Gravitational-Wave Signals from Massive Black Hole Binary Mergers

    astro-ph.HE 2026-04 unverdicted novelty 7.0

    Massive black hole binary mergers produce orphaned low-frequency signals in PTA pulsar terms that can be stacked for archival multiband gravitational-wave detection.

  10. High-dimensional inference for the $\gamma$-ray sky with differentiable programming

    astro-ph.HE 2026-04 unverdicted novelty 7.0

    A differentiable forward model and likelihood enable probabilistic inference over many spatial morphologies for the Galactic Center gamma-ray Excess using variational methods on GPUs.

  11. Dynamic sparse graphs with overlapping communities

    stat.ME 2025-12 unverdicted novelty 7.0

    Bayesian nonparametric model for dynamic sparse networks with overlapping communities via completely random measures and latent Markov processes.

  12. People readily follow personal advice from AI but it does not improve their well-being

    cs.HC 2025-11 conditional novelty 7.0

    Large longitudinal RCT finds high rates of following AI personal advice but no sustained well-being gains versus a hobbies control condition.

  13. AMIGO: a Data-Driven Calibration of the JWST Interferometer

    astro-ph.IM 2025-10 unverdicted novelty 7.0

    AMIGO is an end-to-end differentiable forward model of JWST AMI that corrects detector systematics to recover high-precision astrometry and detect close high-contrast companions.

  14. Using Symbolic Regression to Emulate the Radial Fourier Transform of the S\'ersic profile for Fast, Accurate and Differentiable Galaxy Profile Fitting

    astro-ph.IM 2025-08 conditional novelty 7.0

    Symbolic regression yields an emulator for the radial Fourier transform of the Sérsic profile that enables 2.5 times faster galaxy profile fitting with minimal accuracy loss.

  15. Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

    stat.ML 2025-08 unverdicted novelty 7.0

    The authors derive drift and diffusion constraints plus a parameterization that forces neural SDE solutions to remain inside compact polyhedral domains, yielding better forecasts on real EMA suicide-risk datasets than...

  16. Differentiable Fuzzy Cosmic-Web for Field Level Inference

    astro-ph.CO 2025-06 unverdicted novelty 7.0

    Introduces HICOBIAN, a differentiable fuzzy hierarchical cosmic-web bias model using sigmoid gradients for smooth region transitions, enabling accurate Bayesian field-level reconstruction of primordial density fields ...

  17. A Strongly Parametrized Mass Ratio Model for the Stable Mass Transfer Channel: a Case Study of the $10 \, \rm{M}_{\odot}$ Peak

    astro-ph.HE 2026-05 unverdicted novelty 6.0

    A parametrized analytical model for BBH mass ratios from the stable mass transfer channel is derived and applied to the 10 solar-mass peak in GWTC-4, favoring little mass-ratio reversal.

  18. AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers

    stat.CO 2026-05 unverdicted novelty 6.0

    AI4BayesCode generates validated modular stateful MCMC samplers from natural language Bayesian model descriptions via LLM translation, modular blocks, and recursive stateful composition.

  19. Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

    cs.CL 2026-05 unverdicted novelty 6.0

    LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via inte...

  20. A hierarchical Bayesian pipeline for soliton-plus-NFW inference on SPARC rotation curves: diagnostics and prior-boundary behaviour

    astro-ph.CO 2026-05 conditional novelty 6.0

    A hierarchical Bayesian pipeline applied to 106 SPARC galaxies yields posteriors that reach prior boundaries for soliton parameters, indicating no detectable interior population-level soliton within the Schive-normali...

  21. What You Don't Know Won't Hurt You: Self-Consistent Hierarchical Inference with Unknown Follow-up Selection Strategies

    astro-ph.IM 2026-05 unverdicted novelty 6.0

    Hierarchical Bayesian inference allows accurate recovery of intrinsic astrophysical source populations even when follow-up selection is unknown and correlated with parameters of interest.

  22. Bayesian Modeling and Prediction of Generalized Contact Matrices

    stat.ME 2026-05 unverdicted novelty 6.0

    A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/Germa...

  23. Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles

    cs.LG 2026-04 unverdicted novelty 6.0

    E-value sequential tests enable early stopping of MCMC sampling in Bayesian deep ensembles, often needing only a fraction of the full budget while improving over standard deep ensembles.

  24. A unified harmonic framework for dark siren cosmology

    astro-ph.CO 2026-03 unverdicted novelty 6.0

    The GW-galaxy cross-correlation method, unified with spectral sirens in a harmonic framework, can measure H0 to 1% and Omega_m to 5% precision with 2 years of data from next-generation detectors like Einstein Telescop...

  25. Stochastic gravitational-wave background search using data from five pulsar timing arrays

    astro-ph.CO 2025-12 conditional novelty 6.0

    Combined five-PTA dataset yields posterior on SGWB power-law amplitude and index consistent with nonzero signal but below 5-sigma significance, with reconstructed angular correlations matching the Hellings-Downs prediction.

  26. OzDES Reverberation Mapping of Active Galactic Nuclei: Final Data Release, Black-Hole Mass Results, & Scaling Relations

    astro-ph.GA 2025-12 unverdicted novelty 6.0

    OzDES final release delivers 62 new reverberation-mapped black hole masses and tighter lag-luminosity relations for Hβ, MgII, and CIV in high-redshift AGN after correcting for survey-length selection effects.

  27. Photon counting readout for detection and inference of gravitational waves from neutron star merger remnants

    gr-qc 2025-11 conditional novelty 6.0

    Photon counting readout detects weak postmerger gravitational wave signals at a rate of about 1 in 100 for SNR 0.2 and yields a twofold improvement in neutron star radius measurement after 20,000 events.

  28. Image reconstruction with the JWST Interferometer

    astro-ph.IM 2025-10 unverdicted novelty 6.0

    Dorito enables diffraction-limited image reconstruction from JWST AMI observations by deconvolving images or Fourier observables using maximum entropy and total variation regularization.

  29. Conversational AI increases political knowledge as effectively as self-directed internet search

    cs.HC 2025-09 conditional novelty 6.0

    Conversational AI matches self-directed internet search in increasing belief in true political information and decreasing belief in misinformation.

  30. RefineStat: Efficient Exploration for Probabilistic Program Synthesis

    cs.LG 2025-09 unverdicted novelty 6.0

    RefineStat improves small language model performance on probabilistic program synthesis by adding semantic constraint enforcement and diagnostic-aware refinement, producing syntactically and statistically reliable cod...

  31. Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes

    cs.LG 2025-06 unverdicted novelty 6.0

    BSA-TNP is a new neural process model with KRBlocks and biased scan attention that claims to match top accuracy while scaling inference to over 1M points in under a minute on a single GPU and supporting translation in...

  32. A search for periodic AGN variability in $\textit{Gaia}$ Data Release 3

    astro-ph.HE 2025-05 accept novelty 6.0

    Systematic search of 377k Gaia DR3 AGN light curves finds no reliable periodic SMBHB candidates after red-noise modeling and empirical false-alarm testing; all survivors lie in the few-cycle regime.

  33. A "Black Hole Star" Reveals the Remarkable Gas-Enshrouded Hearts of the Little Red Dots

    astro-ph.GA 2025-03 unverdicted novelty 6.0

    A source 660 million years after the Big Bang is interpreted as a black hole star with a dust-free dense gas atmosphere, implying Little Red Dots have black hole masses overestimated by orders of magnitude.

  34. Towards Understanding Sycophancy in Language Models

    cs.CL 2023-10 conditional novelty 6.0

    Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.

  35. Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

    stat.ML 2026-05 unverdicted novelty 5.0

    An importance sampling correction is added to integrated Laplace approximation so that the approximate posterior for latent Gaussian models converges to the true posterior as the number of samples grows.

  36. Gravitational-wave constraints on $H_0$ are robust to (putative) redshift evolution in the binary black hole mass spectrum at current sensitivity

    astro-ph.CO 2026-05 conditional novelty 5.0

    Spectral-siren H0 constraints from GWTC-4.0 binary black holes remain robust when the mass spectrum is permitted to evolve with redshift at current detector sensitivity.

  37. A Uniform Determination of the Bulk Metallicities and Alpha Enrichments of Confirmed Exoplanet Systems with TRES

    astro-ph.EP 2026-05 conditional novelty 5.0

    A uniform spectroscopic catalog of 625 exoplanet hosts shows subsolar-metallicity giant-planet hosts are alpha-enhanced relative to both iron-rich hosts and typical metal-poor field stars.

  38. Plato's view on supermassive black hole binaries: Exploring the faint limit of ESA's Plato space mission

    astro-ph.GA 2026-05 unverdicted novelty 5.0

    Simulations show Plato can recover relativistic photometric signatures of supermassive black hole binaries in bright quasars (G≤18) via Bayesian inference on mock light curves.

  39. Mitigating effects of telescope jitter through differentiable forward-modeling

    astro-ph.IM 2026-05 unverdicted novelty 5.0

    Differentiable optical simulation models telescope jitter blurring and shows that two-dimensional jitter models avoid systematic bias in binary separation measurements for the TOLIMAN exoplanet mission.

  40. Fast Bayesian equipment condition monitoring via simulation based inference: applications to heat exchanger health

    cs.LG 2026-04 unverdicted novelty 5.0

    Amortized neural posterior estimation via simulation-based inference delivers 82x faster inference than MCMC for heat exchanger fouling and leakage diagnosis while maintaining comparable accuracy on synthetic data.

  41. Neural posterior estimation for scalable and accurate inverse parameter inference in Li-ion batteries

    physics.data-an 2026-04 unverdicted novelty 5.0

    NPE delivers millisecond-scale parameter inference for Li-ion batteries that matches or exceeds Bayesian calibration accuracy while adding local sensitivity interpretability, though with higher voltage prediction errors.

  42. Discovery of a compact hierarchical triple main-sequence star system while searching for binary stars with compact objects

    astro-ph.SR 2026-01 accept novelty 5.0

    A new compact hierarchical triple main-sequence star system G1010 was discovered through combined low- and high-SNR spectroscopy, Gaia DR3 data, and TESS light curve analysis, showing an inner eclipsing binary rather ...

  43. The DESI DR1 Peculiar Velocity Survey: growth rate measurements from the maximum likelihood fields method

    astro-ph.CO 2025-12 accept novelty 5.0

    DESI DR1 peculiar velocity data yields fσ8(z_eff=0.07) = 0.450 ± 0.055, consistent with Planck ΛCDM and GR growth index γ = 0.58 ± 0.11.

  44. Symbolic Emulators for Cosmology: Accelerating Cosmological Analyses Without Sacrificing Precision

    astro-ph.CO 2025-10 unverdicted novelty 5.0

    Symbolic emulators approximate key Lambda CDM functions to 0.001-0.05% accuracy across relevant redshifts and Omega_m values, enabling faster 3x2pt inference with consistent results.

  45. GW250114: testing Hawking's area law and the Kerr nature of black holes

    gr-qc 2025-09 accept novelty 5.0

    GW250114 data confirm the remnant black hole ringdown frequencies lie within 30% of Kerr predictions and that the final horizon area is larger than the sum of the progenitors' areas to high credibility.

  46. LITMUS: Bayesian Lag Recovery in Reverberation Mapping with Fast Differentiable Models

    astro-ph.GA 2025-05 unverdicted novelty 5.0

    LITMUS introduces a differentiable Bayesian lag recovery framework that outperforms JAVELIN on OzDES-like mock data by reducing false positives from seasonal aliasing.

  47. Temporal Point Process Modeling of Aggressive Behavior Onset in Psychiatric Inpatient Youths with Autism

    stat.AP 2025-03 unverdicted novelty 5.0

    Applies self-exciting temporal point processes to model clustered aggression onsets in inpatient autistic youth and reports better fit than Poisson baselines.

  48. Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management

    stat.AP 2026-05 unverdicted novelty 4.0

    Bayesian random effects estimation followed by stratified DirectLiNGAM causal discovery on 112 pumps shows 400x larger causal effects from operational features in slower-deteriorating equipment compared to faster-dete...

  49. Uncertainty-aware classification and triage of structural heart disease using electrocardiography and echocardiography metrics

    q-bio.QM 2026-05 unverdicted novelty 3.0

    Bayesian neural networks match or exceed frequentist performance on SHD classification from the EchoNext dataset while providing more robust uncertainty estimates for clinical triage.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 48 Pith papers · 3 internal anchors

  1. [1]

    Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

    and Edward2 [6] based on TensorFlow, and PyMC3 [7] based on Theano. NumPyro is a package for probabilistic programming built atop JAX [8, 9], which is a high-level tracing library for program transformations (e.g. automatic differentiation, vectorization and JIT compilation) of Python and NumPy functions. Thus NumPyro enables users to write probabilistic ...

  2. [2]

    An introduction to probabilistic programming, 2021

    Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. An introduction to probabilistic programming. arXiv preprint arXiv:1809.10756, 2018

  3. [3]

    Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D

    Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D. Goodman. Pyro: Deep universal probabilistic programming. Journal of Machine Learning Research , 20(28):1–6, 2019. URL http://jmlr.org/ papers/v20/18-403.html

  4. [4]

    Learning disentangled representations with semi-supervised deep generative models

    Siddharth Narayanaswamy, T Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Noah Goodman, Pushmeet Kohli, Frank Wood, and Philip Torr. Learning disentangled representations with semi-supervised deep generative models. In Advances in Neural Information Processing Systems , pages 5925–5935, 2017

  5. [5]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

  6. [6]

    TensorFlow Distributions

    Joshua V Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, and Rif A Saurous. Tensorflow distributions. arXiv preprint arXiv:1711.10604, 2017

  7. [7]

    Simple, distributed, and accelerated probabilistic programming

    Dustin Tran, Matthew W Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, and Alexey Radul. Simple, distributed, and accelerated probabilistic programming. In Advances in Neural Information Processing Systems, pages 7598–7609, 2018

  8. [8]

    V., & Fonnesbeck, C

    John Salvatier, Thomas V . Wiecki, and Christopher Fonnesbeck. Probabilistic programming in python using PyMC3. PeerJ Computer Science , 2:e55, apr 2016. doi: 10.7717/peerj-cs.55. URL https: //doi.org/10.7717/peerj-cs.55

  9. [9]

    JAX: composable transformations of Python+NumPy programs, 2018

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, and Skye Wanderman-Milne. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax

  10. [10]

    Compiling machine learning programs via high-level tracing

    Roy Frostig, Matthew Johnson, and Chris Leary. Compiling machine learning programs via high-level tracing. 2018. URL http://www.sysml.cc/doc/2018/146.pdf. 5

  11. [11]

    https://www.tensorflow.org/xla/

    XLA: Optimizing Compiler for Machine Learning. https://www.tensorflow.org/xla/

  12. [12]

    Effect Handling for Composable Program Transformations in Edward2

    Dave Moore and Maria I. Gorinova. Effect handling for composable program transformations in edward2. CoRR, abs/1811.06150, 2018. URL http://arxiv.org/abs/1811.06150

  13. [13]

    Handlers of algebraic effects

    Gordon Plotkin and Matija Pretnar. Handlers of algebraic effects. In Giuseppe Castagna, editor, Program- ming Languages and Systems , pages 80–94, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. ISBN 978-3-642-00590-9

  14. [14]

    JAX PRNG Design

    The JAX Team. JAX PRNG Design. https://github.com/google/jax/blob/master/design_ notes/prng.md, 2019

  15. [15]

    Hoffman and Andrew Gelman

    Matthew D. Hoffman and Andrew Gelman. The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research , 15:1593–1623, 2014. URL http: //jmlr.org/papers/v15/hoffman14a.html

  16. [16]

    Hybrid monte carlo

    Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid monte carlo. Physics letters B, 195(2):216–222, 1987

  17. [17]

    MCMC Using Hamiltonian Dynamics, pp

    Radford Neal. MCMC Using Hamiltonian Dynamics . CRC Press, May 2011. doi: 10.1201/b10905-6. URL http://dx.doi.org/10.1201/b10905-6

  18. [18]

    Stochastic variational inference

    Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference. The Journal of Machine Learning Research, 14(1):1303–1347, 2013

  19. [19]

    Stan: A probabilistic programming language

    Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language. Journal of statistical software, 76(1), 2017

  20. [20]

    Arnold, Dougal J

    Allen Riddell, Ari Hartikainen, Daniel Lee, riddell stan, Marco Inacio, Daniel Chen, Kenneth C. Arnold, Dougal J. Sutherland, Aki Vehtari, Shinya SUZUKI, Takahiro Kubo, Todd Small, Tobias Erhardt, Stephen Hoover, Stephan Hoyer, Richard C Gerkin, Joerg Rings, Jackie, J. J. Ramsey, Aaron Darling, seantalts, Skipper Seabold, Max Shron, Liam Brannigan, Kyle F...

  21. [21]

    UCI machine learning repository, 2017

    Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive.ics.uci. edu/ml

  22. [22]

    The kernel interaction trick: Fast Bayesian discovery of pairwise interactions in high dimensions

    Raj Agrawal, Brian Trippe, Jonathan Huggins, and Tamara Broderick. The kernel interaction trick: Fast Bayesian discovery of pairwise interactions in high dimensions. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research, pages 14...

  23. [23]

    URL http://proceedings.mlr.press/v97/agrawal19a.html

    PMLR. URL http://proceedings.mlr.press/v97/agrawal19a.html

  24. [24]

    Stan Modeling Language User’s Guide and Reference Manual, V ersion 2.18.0

    Stan Development Team. Stan Modeling Language User’s Guide and Reference Manual, V ersion 2.18.0

  25. [25]

    6 index 1 2 3 4 0 depth 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 3: A graphical representation of how binary trees are constructed in ITERATIVE BUILD TREE

    URL http://mc-stan.org. 6 index 1 2 3 4 0 depth 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 3: A graphical representation of how binary trees are constructed in ITERATIVE BUILD TREE. The orange node is the leaf generated at the current step. Blue nodes are the leaves stored in memory for the purpose of checking the U-Turn condition. White nodes are past ...