The Polynomial Stein Discrepancy for Assessing Moment Convergence

Christopher Drovandi; Leah F South; Matthew Sutton; Narayan Srinivasan

arxiv: 2412.05135 · v2 · submitted 2024-12-06 · 📊 stat.ML · cs.LG· stat.CO

The Polynomial Stein Discrepancy for Assessing Moment Convergence

Narayan Srinivasan , Matthew Sutton , Christopher Drovandi , Leah F South This is my paper

Pith reviewed 2026-05-23 07:39 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.CO

keywords Stein discrepancygoodness-of-fit testmoment matchingBayesian samplingsample qualityscalable diagnosticspolynomial kernelhyperparameter tuning

0 comments

The pith

The polynomial Stein discrepancy detects mismatches in the first r moments of samples from a Gaussian target and supports faster hyperparameter tuning for biased Bayesian samplers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the polynomial Stein discrepancy as a linear-cost alternative to the kernel Stein discrepancy for checking whether samples approximate a target posterior. It proves that the associated test identifies differences in the first r moments when the target is Gaussian, though the test does not guarantee full distributional convergence. Experiments demonstrate higher power than competing methods at lower computational cost, and the measure improves the efficiency of selecting hyperparameters for algorithms such as stochastic gradient Langevin dynamics. A sympathetic reader would care because many modern sampling methods are asymptotically biased, making classical diagnostics like effective sample size unreliable and full kernel methods too expensive for routine use.

Core claim

The polynomial Stein discrepancy supplies a goodness-of-fit test that detects differences in the first r moments between a collection of samples and a Gaussian target distribution while operating at linear cost in the number of samples; the same measure also yields more efficient hyperparameter selection for asymptotically biased sampling algorithms than existing discrepancy-based competitors.

What carries the argument

The polynomial Stein discrepancy, constructed via polynomial kernels to measure agreement on moments up to a chosen order r.

If this is right

The test can be applied directly to high-dimensional posteriors where quadratic-cost methods become prohibitive.
Hyperparameter selection for stochastic gradient Langevin dynamics and related algorithms can be performed with fewer evaluations of the target density.
The method provides a diagnostic that scales to larger sample sizes while retaining power against moment-based alternatives.
Practitioners obtain a tool that avoids the kernel bandwidth tuning required by most Stein discrepancy variants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If many practical posteriors are well approximated once low-order moments match, the PSD could serve as a default diagnostic in place of more expensive kernel methods.
The linear scaling opens the possibility of embedding the test inside online or streaming sampling procedures.
Extensions that replace the Gaussian assumption with moment conditions derived from the actual target density would broaden applicability without changing the core computational structure.

Load-bearing premise

That agreement on the first r moments provides a practically useful indication of sample quality even when the target distribution is not Gaussian or when higher-order features matter for the downstream inference task.

What would settle it

A concrete counter-example in which samples match the first r moments of the target yet produce materially different posterior expectations or credible intervals, and the PSD test returns a non-significant result.

Figures

Figures reproduced from arXiv: 2412.05135 by Christopher Drovandi, Leah F South, Matthew Sutton, Narayan Srinivasan.

**Figure 2.** Figure 2: Approximate posterior for mixture example with SGLD for varying step sizes and when [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Step size selection results for SGLD using various methods. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Runtime for various testing methods where [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Type I error rate (a) and statistical power (b,c,d) for detecting discrepancies between the [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse of dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments for Gaussian targets. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PSD gives a cheaper way to catch first-r moment mismatches on Gaussians but its use for general sample quality rests on an assumption the paper flags as unproven.

read the letter

The paper introduces the polynomial Stein discrepancy and proves it detects differences in the first r moments when the target is Gaussian. It also reports experiments where the test shows higher power than kernel Stein methods at lower cost, and claims it helps pick hyperparameters for biased samplers like SGLD more efficiently. That is the core of what is new: a non-kernel construction with a specific moment guarantee that prior Stein discrepancy papers do not appear to contain. The motivation around scalability and the curse of dimensionality for KSD is clear and the empirical comparisons are presented as direct evidence of practical gains. The work stays inside the existing Stein framework rather than inventing new assumptions from scratch. The main limitation is stated up front: the test is not fully convergence-determining. The moment result holds only for Gaussians, yet the applications shown involve general posteriors and hyper-parameter tuning where other discrepancies (tails, higher moments, or non-moment features) could matter more. The paper treats moment detection as a useful proxy in those settings, but the evidence for that proxy is empirical and example-specific rather than general. Without the full derivations and experimental protocols it is difficult to judge how sensitive the power gains are to implementation choices or dimension. This is aimed at people who run or tune asymptotically biased samplers and need something faster than KSD for routine checks. Readers already working with Stein discrepancies will see the most direct value in the new construction and the Gaussian result. The paper shows clear thinking on the problem it sets out to solve and engages the relevant literature, so it deserves a serious referee even if the scope of the claims needs tightening in revision.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Polynomial Stein Discrepancy (PSD) as a scalable goodness-of-fit test for assessing samples from asymptotically biased Bayesian samplers such as SGLD. It proves that the PSD detects differences in the first r moments for Gaussian targets (while explicitly noting it is not fully convergence-determining), reports empirical results showing higher statistical power than KSD competitors at lower computational cost, and demonstrates utility for hyperparameter selection in sampling algorithms.

Significance. If the moment-detection result and empirical power advantages hold, the PSD would supply a practical, lower-cost alternative to KSD for moment-based sample-quality assessment in high-dimensional Bayesian settings. The transparent limitation to moment convergence is a positive feature that scopes the contribution appropriately; the work could be significant as a heuristic tool provided the first-r-moment proxy aligns with the dominant failure modes in the non-Gaussian regimes where it is applied.

major comments (2)

[Abstract, theoretical results section] Abstract and § on theoretical results: the central practical utility claims rest on the unverified assumption that first-r-moment discrepancies are the dominant failure mode for the non-Gaussian, high-dimensional, or biased-sampler regimes shown in the experiments. Because the test is explicitly not convergence-determining, the manuscript should either restrict its claims to Gaussian targets or add experiments that isolate tail/higher-moment discrepancies to test whether low PSD values can coexist with poor sample quality.
[Empirical evaluation section] Empirical evaluation section: the reported power gains and hyperparameter-selection improvements are presented without detailed experimental protocols, error bars, or controls for non-moment features; this makes it difficult to assess whether the advantages are robust or specific to the chosen examples where moment mismatch is the primary difference.

minor comments (2)

[Methods section] Notation for the polynomial basis and the precise definition of the PSD should be introduced with an explicit equation early in the methods section to improve readability.
[Introduction or methods] The manuscript would benefit from a short table summarizing the computational complexity of PSD versus KSD and its extensions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, agreeing where revisions are needed to strengthen the manuscript while defending the scope of our contributions on substance.

read point-by-point responses

Referee: [Abstract, theoretical results section] Abstract and § on theoretical results: the central practical utility claims rest on the unverified assumption that first-r-moment discrepancies are the dominant failure mode for the non-Gaussian, high-dimensional, or biased-sampler regimes shown in the experiments. Because the test is explicitly not convergence-determining, the manuscript should either restrict its claims to Gaussian targets or add experiments that isolate tail/higher-moment discrepancies to test whether low PSD values can coexist with poor sample quality.

Authors: The manuscript already explicitly states that the PSD is not convergence-determining and proves moment detection only for Gaussian targets. The empirical examples are chosen to illustrate utility for biased samplers (e.g., SGLD) where moment mismatch is a primary concern, consistent with the paper's focus. We agree that stronger scoping language would help. We will revise the abstract and theoretical results section to emphasize that the PSD serves as a moment-based diagnostic rather than a general convergence test, and add a brief discussion of this limitation for non-Gaussian regimes. Adding new isolating experiments on tails would be a substantial extension beyond the current scope; we will instead note this as future work. revision: partial
Referee: [Empirical evaluation section] Empirical evaluation section: the reported power gains and hyperparameter-selection improvements are presented without detailed experimental protocols, error bars, or controls for non-moment features; this makes it difficult to assess whether the advantages are robust or specific to the chosen examples where moment mismatch is the primary difference.

Authors: We acknowledge the need for greater transparency in the empirical section. Detailed protocols are included in the supplementary material, but we agree the main text should reference them more explicitly and include error bars. We will revise the empirical evaluation section to add error bars, summarize the protocols, and include a short discussion addressing potential non-moment features in the chosen examples to better demonstrate robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation grounded in existing Stein framework with independent moment proof

full rationale

The paper defines the PSD as a polynomial-based variant of the kernel Stein discrepancy and proves its moment-detection property specifically for Gaussian targets via direct analysis. No step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation chain, or definitional tautology. The non-convergence-determining nature is explicitly stated rather than hidden, and empirical claims rest on separate experiments rather than algebraic identity with inputs. The central construction therefore remains self-contained against external Stein discrepancy literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the Stein discrepancy framework from prior literature plus the new polynomial construction; no free parameters are mentioned. The PSD itself is the primary invented entity.

axioms (1)

domain assumption Stein discrepancy properties and associated operators from prior literature
PSD is defined as an extension of the kernel Stein discrepancy using polynomials.

invented entities (1)

Polynomial Stein Discrepancy (PSD) no independent evidence
purpose: Measure discrepancy between samples and target distribution via polynomial basis for moment convergence assessment
New discrepancy measure introduced to address scalability and dimensionality issues of KSD.

pith-pipeline@v0.9.0 · 5713 in / 1293 out tokens · 27698 ms · 2026-05-23T07:39:26.649121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

E., Ghaderinezhad, F., Gorham, J., Gretton, A., Ley, C., Liu, Q., Mackey, L., Oates, C

Anastasiou, A., Barp, A., Briol, F.-X., Ebner, B., Gaunt, R. E., Ghaderinezhad, F., Gorham, J., Gretton, A., Ley, C., Liu, Q., Mackey, L., Oates, C. J., Reinert, G., and Swan, Y. (2023). S tein’s method meets computational statistics: A review of some recent developments. Statistical Science , 38(1):120 -- 139

work page 2023
[2]

Arcones, M. A. and Gine, E. (1992). On the Bootstrap of U and V Statistics . The Annals of Statistics , 20(2):655--674

work page 1992
[3]

and Caffarel, M

Assaraf, R. and Caffarel, M. (1999). Zero-variance principle for M onte C arlo algorithms. Physical Review Letters , 83(23):4682--4685

work page 1999
[4]

Barbour, A. D. (1990). Stein's method for diffusion approximations. Probability theory and related fields , 84(3):297--322

work page 1990
[5]

and Thomas-Agnan, C

Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics . Springer US, Boston, MA

work page 2004
[6]

Bhattacharya, A., Linero, A., and Oates, C. J. (2024). Grand challenges in bayesian computation

work page 2024
[7]

Chwialkowski, K., Strathmann, H., and Gretton, A. (2016). A kernel test of goodness of fit. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 2606--2615, New York, New York, USA. PMLR

work page 2016
[8]

P., Ramdas, A., Sejdinovic, D., and Gretton, A

Chwialkowski, K. P., Ramdas, A., Sejdinovic, D., and Gretton, A. (2015). Fast two-sample testing with analytic representations of probability measures. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 28. Curran Associates, Inc

work page 2015
[9]

and Mackey, L

Gorham, J. and Mackey, L. (2015). Measuring sample quality with S tein's method. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 28. Curran Associates, Inc

work page 2015
[10]

and Mackey, L

Gorham, J. and Mackey, L. (2017). Measuring sample quality with kernels. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 1292--1301. PMLR

work page 2017
[11]

Huggins, J. (2018). rfsd package. https://bitbucket.org/jhhuggins/random-feature-stein-discrepancies/src/master/

work page 2018
[12]

and Mackey, L

Huggins, J. and Mackey, L. (2018). Random feature S tein discrepancies. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 31. Curran Associates, Inc

work page 2018
[13]

and Janssen, P

Hušková, M. and Janssen, P. (1993). Consistency of the generalized bootstrap for degenerate u-statistics. The Annals of Statistics , 21(4):1811--1823

work page 1993
[14]

Jitkrittum, W. (2019). kernel-gof package. https://github.com/wittawatj/kernel-gof

work page 2019
[15]

Jitkrittum, W., Xu, W., Szabo, Z., Fukumizu, K., and Gretton, A. (2017). A linear-time kernel goodness-of-fit test. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc

work page 2017
[16]

Kanagawa, H., Barp, A., Gretton, A., and Mackey, L. (2022). Controlling moments with kernel S tein discrepancies. arXiv preprint arXiv:2211.05408

work page arXiv 2022
[17]

and Neumann, M

Leucht, A. and Neumann, M. (2013). Dependent wild bootstrap for degenerate uu- and vv-statistics. Journal of Multivariate Analysis , 117:257–280

work page 2013
[18]

Liu, Q., Lee, J., and Jordan, M. (2016). A kernelized S tein discrepancy for goodness-of-fit tests. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 276--284, New York, New York, USA. PMLR

work page 2016
[19]

Mira, A., Solgi, R., and Imparato, D. (2013). Zero variance M arkov chain M onte C arlo for B ayesian estimators. Statistics and Computing , 23(5):653--662

work page 2013
[20]

Müller, A. (1997). Integral probability metrics and their generating classes of functions. Advances in Applied Probability , 29(2):429--443

work page 1997
[21]

and Fearnhead, P

Nemeth, C. and Fearnhead, P. (2021). Stochastic gradient M arkov chain M onte C arlo. Journal of the American Statistical Association , 116(533):433--450

work page 2021
[22]

and Recht, B

Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Platt, J., Koller, D., Singer, Y., and Roweis, S., editors, Advances in Neural Information Processing Systems , volume 20. Curran Associates, Inc

work page 2007
[23]

Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of L angevin distributions and their discrete approximations. Bernoulli , 2(4):341--363

work page 1996
[24]

Serfling, R. J. (2009). Approximation theorems of mathematical statistics . John Wiley & Sons

work page 2009
[25]

F., Karvonen, T., Nemeth, C., Girolami, M., and Oates, C

South, L. F., Karvonen, T., Nemeth, C., Girolami, M., and Oates, C. J. (2022). Semi-exact control functionals from Sard ’s method. Biometrika , 109(2):351--367

work page 2022
[26]

F., Oates, C

South, L. F., Oates, C. J., Mira, A., and Drovandi, C. (2023). Regularized zero-variance control variates. Bayesian Analysis , 18(3):865 -- 888

work page 2023
[27]

Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability , Volume 2: Probability Theory , volume 6.2, pages 583--603. University of California Press

work page 1972
[28]

and Teh, Y

Welling, M. and Teh, Y. (2011). Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning , pages 681--688

work page 2011

[1] [1]

E., Ghaderinezhad, F., Gorham, J., Gretton, A., Ley, C., Liu, Q., Mackey, L., Oates, C

Anastasiou, A., Barp, A., Briol, F.-X., Ebner, B., Gaunt, R. E., Ghaderinezhad, F., Gorham, J., Gretton, A., Ley, C., Liu, Q., Mackey, L., Oates, C. J., Reinert, G., and Swan, Y. (2023). S tein’s method meets computational statistics: A review of some recent developments. Statistical Science , 38(1):120 -- 139

work page 2023

[2] [2]

Arcones, M. A. and Gine, E. (1992). On the Bootstrap of U and V Statistics . The Annals of Statistics , 20(2):655--674

work page 1992

[3] [3]

and Caffarel, M

Assaraf, R. and Caffarel, M. (1999). Zero-variance principle for M onte C arlo algorithms. Physical Review Letters , 83(23):4682--4685

work page 1999

[4] [4]

Barbour, A. D. (1990). Stein's method for diffusion approximations. Probability theory and related fields , 84(3):297--322

work page 1990

[5] [5]

and Thomas-Agnan, C

Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics . Springer US, Boston, MA

work page 2004

[6] [6]

Bhattacharya, A., Linero, A., and Oates, C. J. (2024). Grand challenges in bayesian computation

work page 2024

[7] [7]

Chwialkowski, K., Strathmann, H., and Gretton, A. (2016). A kernel test of goodness of fit. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 2606--2615, New York, New York, USA. PMLR

work page 2016

[8] [8]

P., Ramdas, A., Sejdinovic, D., and Gretton, A

Chwialkowski, K. P., Ramdas, A., Sejdinovic, D., and Gretton, A. (2015). Fast two-sample testing with analytic representations of probability measures. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 28. Curran Associates, Inc

work page 2015

[9] [9]

and Mackey, L

Gorham, J. and Mackey, L. (2015). Measuring sample quality with S tein's method. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 28. Curran Associates, Inc

work page 2015

[10] [10]

and Mackey, L

Gorham, J. and Mackey, L. (2017). Measuring sample quality with kernels. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 1292--1301. PMLR

work page 2017

[11] [11]

Huggins, J. (2018). rfsd package. https://bitbucket.org/jhhuggins/random-feature-stein-discrepancies/src/master/

work page 2018

[12] [12]

and Mackey, L

Huggins, J. and Mackey, L. (2018). Random feature S tein discrepancies. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 31. Curran Associates, Inc

work page 2018

[13] [13]

and Janssen, P

Hušková, M. and Janssen, P. (1993). Consistency of the generalized bootstrap for degenerate u-statistics. The Annals of Statistics , 21(4):1811--1823

work page 1993

[14] [14]

Jitkrittum, W. (2019). kernel-gof package. https://github.com/wittawatj/kernel-gof

work page 2019

[15] [15]

Jitkrittum, W., Xu, W., Szabo, Z., Fukumizu, K., and Gretton, A. (2017). A linear-time kernel goodness-of-fit test. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc

work page 2017

[16] [16]

Kanagawa, H., Barp, A., Gretton, A., and Mackey, L. (2022). Controlling moments with kernel S tein discrepancies. arXiv preprint arXiv:2211.05408

work page arXiv 2022

[17] [17]

and Neumann, M

Leucht, A. and Neumann, M. (2013). Dependent wild bootstrap for degenerate uu- and vv-statistics. Journal of Multivariate Analysis , 117:257–280

work page 2013

[18] [18]

Liu, Q., Lee, J., and Jordan, M. (2016). A kernelized S tein discrepancy for goodness-of-fit tests. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 276--284, New York, New York, USA. PMLR

work page 2016

[19] [19]

Mira, A., Solgi, R., and Imparato, D. (2013). Zero variance M arkov chain M onte C arlo for B ayesian estimators. Statistics and Computing , 23(5):653--662

work page 2013

[20] [20]

Müller, A. (1997). Integral probability metrics and their generating classes of functions. Advances in Applied Probability , 29(2):429--443

work page 1997

[21] [21]

and Fearnhead, P

Nemeth, C. and Fearnhead, P. (2021). Stochastic gradient M arkov chain M onte C arlo. Journal of the American Statistical Association , 116(533):433--450

work page 2021

[22] [22]

and Recht, B

Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Platt, J., Koller, D., Singer, Y., and Roweis, S., editors, Advances in Neural Information Processing Systems , volume 20. Curran Associates, Inc

work page 2007

[23] [23]

Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of L angevin distributions and their discrete approximations. Bernoulli , 2(4):341--363

work page 1996

[24] [24]

Serfling, R. J. (2009). Approximation theorems of mathematical statistics . John Wiley & Sons

work page 2009

[25] [25]

F., Karvonen, T., Nemeth, C., Girolami, M., and Oates, C

South, L. F., Karvonen, T., Nemeth, C., Girolami, M., and Oates, C. J. (2022). Semi-exact control functionals from Sard ’s method. Biometrika , 109(2):351--367

work page 2022

[26] [26]

F., Oates, C

South, L. F., Oates, C. J., Mira, A., and Drovandi, C. (2023). Regularized zero-variance control variates. Bayesian Analysis , 18(3):865 -- 888

work page 2023

[27] [27]

Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability , Volume 2: Probability Theory , volume 6.2, pages 583--603. University of California Press

work page 1972

[28] [28]

and Teh, Y

Welling, M. and Teh, Y. (2011). Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning , pages 681--688

work page 2011