pith. sign in

arxiv: 2604.03398 · v1 · submitted 2026-04-03 · 📊 stat.ME · stat.AP· stat.CO

Robust Standard Errors for Bayesian Posterior Functionals via the Infinitesimal Jackknife

Pith reviewed 2026-05-13 18:16 UTC · model grok-4.3

classification 📊 stat.ME stat.APstat.CO
keywords infinitesimal jackkniferobust standard errorsBayesian posterior functionalsmodel misspecificationnonparametric bootstrapmediation analysismultilevel modelinginfluence functions
0
0 comments X

The pith

The infinitesimal jackknife supplies robust standard errors for any Bayesian posterior functional by approximating the nonparametric bootstrap from one MCMC run.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bayesian models in the social sciences routinely produce nonlinear functionals such as indirect effects, standardized coefficients, and intraclass correlations. The default posterior standard deviation assumes the model is correct and tends to understate frequentist uncertainty when it is not. The infinitesimal jackknife computes a robust standard error by estimating influence functions from a single MCMC sample, without requiring repeated refits or new analytic derivatives for each functional. Simulations across mediation, ANOVA, and multilevel models show that the jackknife tracks the bootstrap closely under misspecification while agreeing with the posterior standard deviation when the model is correct. This combination makes reliable uncertainty quantification feasible for the nonlinear summaries that behavioral researchers actually report.

Core claim

The infinitesimal jackknife standard error approximates the bootstrap variance of any posterior functional through observation-level or cluster-level influence functions obtained from a single MCMC run. In four simulation studies covering six common functionals, the jackknife closely matched bootstrap standard errors under misspecification, while the posterior standard deviation substantially underestimated them. Under correct specification all three methods agreed, confirming that the jackknife adds no distortion when the working model is adequate.

What carries the argument

The infinitesimal jackknife standard error (IJSE), which recovers bootstrap variance by summing squared influence functions computed once from the MCMC samples.

If this is right

  • Researchers obtain robust standard errors for any nonlinear posterior functional without deriving custom gradients or rerunning MCMC hundreds of times.
  • Under misspecification the method prevents the undercoverage that occurs with the plain posterior standard deviation.
  • Cluster-level versions directly handle the multilevel data structures common in behavioral research.
  • When the model is correctly specified the jackknife coincides with the posterior standard deviation, preserving statistical efficiency.
  • The approach applies unchanged to indirect effects, intraclass correlations, and other functionals already in routine use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same single-run influence-function machinery could be applied to other posterior summaries such as predictive checks or decision-theoretic quantities.
  • Because the computational cost stays near that of one MCMC fit, the method lowers the barrier to routine robust reporting in large-scale applied studies.
  • Extensions to dependent data would require only a modified influence-function definition that respects the dependence structure.

Load-bearing premise

The influence functions extracted from a single MCMC chain accurately recover the bootstrap variance of the chosen functional even when the fitted model differs from the true data-generating process.

What would settle it

A controlled simulation or real-data example in which the infinitesimal jackknife standard error deviates materially from the standard error obtained by repeating full MCMC fits on many bootstrap resamples under known misspecification.

read the original abstract

Quantitative research in the social and behavioral sciences relies heavily on nonlinear posterior functionals such as indirect effects, standardized coefficients, effect sizes, intraclass correlations, and multilevel variance-explained measures. The posterior standard deviation (PostSD) is the default uncertainty summary for these quantities, yet it presupposes a correctly specified model. When the working model is wrong, as is common with behavioral data that exhibit heavy tails and heteroskedasticity, PostSD can severely underestimate the frequentist standard error. The nonparametric bootstrap offers robustness but requires repeated MCMC refits, while the delta method demands a separate analytic gradient derivation for every new functional. The infinitesimal jackknife standard error (Giordano & Broderick, 2023) sidesteps both limitations: it approximates the bootstrap variance through influence functions computed from a single MCMC run, applies to any posterior functional without modification, and requires no analytic derivatives. We discuss the use the IJSE methodology at both the observation level and the cluster level and evaluate it through four simulation studies covering six functionals from mediation analysis, ANOVA, and multilevel modeling, which are commonly used in the social and behavioral sciences. Under misspecification, PostSD substantially underestimated the true standard error across all settings, whereas IJSE closely tracked the bootstrap at a fraction of the computational cost. Under correct specification all three methods agreed, confirming that IJSE introduces no distortion when the model is right. These results show IJSE as a practical, general-purpose tool for robust uncertainty quantification in Bayesian workflows throughout the social and behavioral sciences

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes applying the infinitesimal jackknife standard error (IJSE) from Giordano & Broderick (2023) to obtain frequentist-robust standard errors for arbitrary nonlinear Bayesian posterior functionals (e.g., indirect effects, standardized coefficients, intraclass correlations) using a single MCMC run. Through four simulation studies involving six functionals from mediation analysis, ANOVA, and multilevel models, it shows that under misspecification (heavy tails, heteroskedasticity) the posterior standard deviation (PostSD) substantially underestimates the true frequentist SE while IJSE closely tracks the nonparametric bootstrap; under correct specification all three methods agree.

Significance. If the simulation results hold, the work supplies a computationally efficient, general-purpose tool for robust uncertainty quantification in Bayesian workflows that are standard in the social and behavioral sciences. It avoids both repeated MCMC refits required by the bootstrap and the need for analytic derivatives required by the delta method, while preserving agreement with PostSD when the model is correctly specified. The explicit treatment of both observation- and cluster-level versions broadens applicability to multilevel data.

major comments (2)
  1. [§4] §4 (Simulation design): the manuscript reports that IJSE 'closely tracked the bootstrap' across misspecification settings, yet the abstract and summary tables provide no numerical summaries (e.g., mean ratios of IJSE to bootstrap SE, coverage probabilities, or relative MSE) with error bars or replication counts; without these quantities the strength of the central tracking claim cannot be assessed quantitatively.
  2. [§3.2] §3.2 (Cluster-level IJSE): the aggregation of observation-level influence functions into cluster-level quantities is described only at a high level; an explicit equation or algorithm showing how the per-cluster jackknife weights are formed from the MCMC draws is needed to confirm that the approximation remains valid under the cluster-level misspecifications tested in the multilevel simulations.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'four simulation studies covering six functionals' should be accompanied by a one-sentence summary of the key quantitative result (e.g., average relative error of IJSE versus bootstrap) to give readers an immediate sense of effect size.
  2. [§2] Notation: the symbol for the posterior functional (denoted variously as θ or g(θ)) should be standardized throughout §2 and §3 to avoid confusion when the same symbol appears in the influence-function derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comments. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [§4] §4 (Simulation design): the manuscript reports that IJSE 'closely tracked the bootstrap' across misspecification settings, yet the abstract and summary tables provide no numerical summaries (e.g., mean ratios of IJSE to bootstrap SE, coverage probabilities, or relative MSE) with error bars or replication counts; without these quantities the strength of the central tracking claim cannot be assessed quantitatively.

    Authors: We agree that quantitative summaries are needed to substantiate the tracking claim. In the revised manuscript we will add a new table in Section 4 that reports, for each functional and misspecification setting, the mean ratio of IJSE to bootstrap SE, the empirical coverage of nominal 95% intervals, and the relative MSE of each estimator, together with Monte Carlo standard errors computed across the simulation replications. revision: yes

  2. Referee: [§3.2] §3.2 (Cluster-level IJSE): the aggregation of observation-level influence functions into cluster-level quantities is described only at a high level; an explicit equation or algorithm showing how the per-cluster jackknife weights are formed from the MCMC draws is needed to confirm that the approximation remains valid under the cluster-level misspecifications tested in the multilevel simulations.

    Authors: We appreciate the request for explicit detail. In the revision we will insert into Section 3.2 the precise aggregation formula that sums the observation-level influence functions within each cluster to obtain the cluster-level influence, together with a short algorithm (in pseudocode) that shows how the per-cluster jackknife weights are obtained directly from the MCMC draws. This will make the cluster-level procedure fully reproducible and confirm its validity for the multilevel simulations. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper applies the infinitesimal jackknife standard error (IJSE) from the cited prior work of Giordano & Broderick (2023) to posterior functionals and evaluates it via independent simulation studies that directly compare IJSE performance against bootstrap and PostSD under misspecification and correct specification. No equations, assumptions, or performance claims reduce by construction to quantities fitted or defined within the present manuscript; the central results rest on external simulation evidence rather than self-referential definitions, fitted-input predictions, or load-bearing self-citations. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the established infinitesimal jackknife approximation and standard MCMC sampling assumptions; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)
  • domain assumption The infinitesimal jackknife influence-function approximation recovers the nonparametric bootstrap variance for posterior functionals
    This is the central modeling assumption that allows IJSE to serve as a cheap surrogate for the bootstrap under misspecification.

pith-pipeline@v0.9.0 · 5573 in / 1358 out tokens · 39594 ms · 2026-05-13T18:16:28.694150+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Baron, R. M. and Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.Journal of Personality and Social Psychology, 51(6):1173–1182

  2. [2]

    Bollen, K. A. and Stine, R. (1990). Direct and indirect effects: Classical and bootstrap estimates of variability.Sociological Methodology, 20:115–140

  3. [3]

    (2011).Handbook of Markov Chain Monte Carlo

    Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011).Handbook of Markov Chain Monte Carlo. CRC press

  4. [4]

    Davison, A. C. and Hinkley, D. V . (1997).Bootstrap Methods and Their Application. Cambridge University Press, Cambridge

  5. [5]

    Efron, B. (1979). Bootstrap methods: Another look at the jackknife.The Annals of Statistics, 7(1):1–26. PSYCHOMETRIKASUBMISSIONApril 7, 202634

  6. [6]

    (1982).The Jackknife, the Bootstrap, and Other Resampling Plans

    Efron, B. (1982).The Jackknife, the Bootstrap, and Other Resampling Plans. SIAM, Philadelphia

  7. [7]

    and Tibshirani, R

    Efron, B. and Tibshirani, R. J. (1994).An Introduction to the Bootstrap. Chapman & Hall/CRC, New York

  8. [8]

    Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations.Statistics in Medicine, 27(15):2865–2873

  9. [9]

    B., Stern, H

    Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013).Bayesian Data Analysis. Chapman & Hall/CRC, Boca Raton, FL, 3 edition

  10. [10]

    and Broderick, T

    Giordano, R. and Broderick, T. (2023). The bayesian infinitesimal jackknife for variance

  11. [11]

    I., and Broderick, T

    Giordano, R., Stephenson, W., Liu, R., Jordan, M. I., and Broderick, T. (2019). A swiss army infinitesimal jackknife.Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1139–1147

  12. [12]

    Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1:221–233

  13. [13]

    Jaeckel, L. A. (1972). The infinitesimal jackknife. Technical Report MM 72-1215-11, Bell Laboratories

  14. [14]

    Ji, F., Lee, J., and Rabe-Hesketh, S. (2024). Valid standard errors for bayesian quantile regression with clustered and independent data

  15. [15]

    Johnson, J. B. and Omland, K. S. (2004). Model selection in ecology and evolution.Trends in Ecology & Evolution, 19(2):101–108

  16. [16]

    Kleijn, B. J. K. and van der Vaart, A. W. (2012). The Bernstein–von Mises theorem under misspecification. Electronic Journal of Statistics, 6:354–381

  17. [17]

    and Finley, A

    Kroes, J. and Finley, A. (2025). Demystifying omega squared: practical guidance for using a less biased ANOV A effect size measure.Psychological Methods, 30(4):866–887

  18. [18]

    J., Preacher, K

    Lachowicz, M. J., Preacher, K. J., and Kelley, K. (2018). A novel measure of effect size for mediation analysis.Psychological Methods, 23(2):244–261. PSYCHOMETRIKASUBMISSIONApril 7, 202635

  19. [19]

    Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOV As.Frontiers in Psychology, 4:863

  20. [20]

    Levine, T. R. and Hullett, C. R. (2002). Eta squared, partial eta squared, and misreporting of effect size in communication research.Human Communication Research, 28(4):612–625

  21. [21]

    MacKinnon, D. P. (2008).Introduction to Statistical Mediation Analysis. Lawrence Erlbaum Associates, New York

  22. [22]

    P., Lockwood, C

    MacKinnon, D. P., Lockwood, C. M., and Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods.Multivariate Behavioral Research, 39(1):99–128

  23. [23]

    McGraw, K. O. and Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1):30–46

  24. [24]

    Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.Psychological Bulletin, 105(1):156–166

  25. [25]

    P., White, I

    Morris, T. P., White, I. R., and Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods.Statistics in Medicine, 38(11):2074–2102. Müller, U. K. (2013). Risk of bayesian inference in misspecified models, and the sandwich covariance matrix.Econometrica, 81(5):1805–1849

  26. [26]

    and Schielzeth, H

    Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtainingR 2 from generalized linear mixed-effects models.Methods in Ecology and Evolution, 4(2):133–142

  27. [27]

    Oehlert, G. W. (1992). A note on the delta method.The American Statistician, 46(1):27–29

  28. [28]

    Okada, K. (2013). Is omega squared less biased? a comparison of three major effect size indices in one-way ANOV A.Behaviormetrika, 40(2):129–144

  29. [29]

    and Algina, J

    Olejnik, S. and Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs.Psychological Methods, 8(4):434–447. PSYCHOMETRIKASUBMISSIONApril 7, 202636

  30. [30]

    Pesigan, I. J. A. and Cheung, S. F. (2020). Sem-based methods to form confidence intervals for indirect effect: Still applicable given nonnormality, under certain conditions.Frontiers in Psychology, 11:571928

  31. [31]

    Preacher, K. J. and Kelley, K. (2011). Effect size measures for mediation models: Quantitative strategies for communicating indirect effects.Psychological Methods, 16(2):93–115

  32. [32]

    Raudenbush, S. W. and Bryk, A. S. (2002).Hierarchical Linear Models: Applications and Data Analysis Methods. Sage, 2 edition

  33. [33]

    and Marcoulides, G

    Raykov, T. and Marcoulides, G. A. (2015). Intraclass correlation coefficients in hierarchical design models. Educational and Psychological Measurement, 75(6):1063–1070

  34. [34]

    Rosseel, Y . (2012). lavaan: An R package for structural equation modeling.Journal of Statistical Software, 48(2):1–36

  35. [35]

    Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4):1151–1172

  36. [36]

    Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients.Methods in Ecology and Evolution, 1(2):103–113

  37. [37]

    Shrout, P. E. and Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations.Psychological Methods, 7(4):422–445

  38. [38]

    Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2):420–428

  39. [39]

    Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13:290–312. ten Hove, D. and others (2025). How to estimate the intraclass correlation coefficient: a systematic review of suggested methods in health behavior research.Health Psychology Review. van der Vaart, A. W. (1998).Asy...

  40. [40]

    White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50(1):1–25

  41. [41]

    Wilcox, R. R. (2012).Introduction to Robust Estimation and Hypothesis Testing. Academic Press, 3 edition

  42. [42]

    and MacKinnon, D

    Yuan, Y . and MacKinnon, D. P. (2014). Robust mediation analysis based on median regression. Psychological Methods, 19(1):1–20