Robust Standard Errors for Bayesian Posterior Functionals via the Infinitesimal Jackknife
Pith reviewed 2026-05-13 18:16 UTC · model grok-4.3
The pith
The infinitesimal jackknife supplies robust standard errors for any Bayesian posterior functional by approximating the nonparametric bootstrap from one MCMC run.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The infinitesimal jackknife standard error approximates the bootstrap variance of any posterior functional through observation-level or cluster-level influence functions obtained from a single MCMC run. In four simulation studies covering six common functionals, the jackknife closely matched bootstrap standard errors under misspecification, while the posterior standard deviation substantially underestimated them. Under correct specification all three methods agreed, confirming that the jackknife adds no distortion when the working model is adequate.
What carries the argument
The infinitesimal jackknife standard error (IJSE), which recovers bootstrap variance by summing squared influence functions computed once from the MCMC samples.
If this is right
- Researchers obtain robust standard errors for any nonlinear posterior functional without deriving custom gradients or rerunning MCMC hundreds of times.
- Under misspecification the method prevents the undercoverage that occurs with the plain posterior standard deviation.
- Cluster-level versions directly handle the multilevel data structures common in behavioral research.
- When the model is correctly specified the jackknife coincides with the posterior standard deviation, preserving statistical efficiency.
- The approach applies unchanged to indirect effects, intraclass correlations, and other functionals already in routine use.
Where Pith is reading between the lines
- The same single-run influence-function machinery could be applied to other posterior summaries such as predictive checks or decision-theoretic quantities.
- Because the computational cost stays near that of one MCMC fit, the method lowers the barrier to routine robust reporting in large-scale applied studies.
- Extensions to dependent data would require only a modified influence-function definition that respects the dependence structure.
Load-bearing premise
The influence functions extracted from a single MCMC chain accurately recover the bootstrap variance of the chosen functional even when the fitted model differs from the true data-generating process.
What would settle it
A controlled simulation or real-data example in which the infinitesimal jackknife standard error deviates materially from the standard error obtained by repeating full MCMC fits on many bootstrap resamples under known misspecification.
read the original abstract
Quantitative research in the social and behavioral sciences relies heavily on nonlinear posterior functionals such as indirect effects, standardized coefficients, effect sizes, intraclass correlations, and multilevel variance-explained measures. The posterior standard deviation (PostSD) is the default uncertainty summary for these quantities, yet it presupposes a correctly specified model. When the working model is wrong, as is common with behavioral data that exhibit heavy tails and heteroskedasticity, PostSD can severely underestimate the frequentist standard error. The nonparametric bootstrap offers robustness but requires repeated MCMC refits, while the delta method demands a separate analytic gradient derivation for every new functional. The infinitesimal jackknife standard error (Giordano & Broderick, 2023) sidesteps both limitations: it approximates the bootstrap variance through influence functions computed from a single MCMC run, applies to any posterior functional without modification, and requires no analytic derivatives. We discuss the use the IJSE methodology at both the observation level and the cluster level and evaluate it through four simulation studies covering six functionals from mediation analysis, ANOVA, and multilevel modeling, which are commonly used in the social and behavioral sciences. Under misspecification, PostSD substantially underestimated the true standard error across all settings, whereas IJSE closely tracked the bootstrap at a fraction of the computational cost. Under correct specification all three methods agreed, confirming that IJSE introduces no distortion when the model is right. These results show IJSE as a practical, general-purpose tool for robust uncertainty quantification in Bayesian workflows throughout the social and behavioral sciences
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes applying the infinitesimal jackknife standard error (IJSE) from Giordano & Broderick (2023) to obtain frequentist-robust standard errors for arbitrary nonlinear Bayesian posterior functionals (e.g., indirect effects, standardized coefficients, intraclass correlations) using a single MCMC run. Through four simulation studies involving six functionals from mediation analysis, ANOVA, and multilevel models, it shows that under misspecification (heavy tails, heteroskedasticity) the posterior standard deviation (PostSD) substantially underestimates the true frequentist SE while IJSE closely tracks the nonparametric bootstrap; under correct specification all three methods agree.
Significance. If the simulation results hold, the work supplies a computationally efficient, general-purpose tool for robust uncertainty quantification in Bayesian workflows that are standard in the social and behavioral sciences. It avoids both repeated MCMC refits required by the bootstrap and the need for analytic derivatives required by the delta method, while preserving agreement with PostSD when the model is correctly specified. The explicit treatment of both observation- and cluster-level versions broadens applicability to multilevel data.
major comments (2)
- [§4] §4 (Simulation design): the manuscript reports that IJSE 'closely tracked the bootstrap' across misspecification settings, yet the abstract and summary tables provide no numerical summaries (e.g., mean ratios of IJSE to bootstrap SE, coverage probabilities, or relative MSE) with error bars or replication counts; without these quantities the strength of the central tracking claim cannot be assessed quantitatively.
- [§3.2] §3.2 (Cluster-level IJSE): the aggregation of observation-level influence functions into cluster-level quantities is described only at a high level; an explicit equation or algorithm showing how the per-cluster jackknife weights are formed from the MCMC draws is needed to confirm that the approximation remains valid under the cluster-level misspecifications tested in the multilevel simulations.
minor comments (2)
- [Abstract] Abstract: the phrase 'four simulation studies covering six functionals' should be accompanied by a one-sentence summary of the key quantitative result (e.g., average relative error of IJSE versus bootstrap) to give readers an immediate sense of effect size.
- [§2] Notation: the symbol for the posterior functional (denoted variously as θ or g(θ)) should be standardized throughout §2 and §3 to avoid confusion when the same symbol appears in the influence-function derivation.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and constructive comments. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [§4] §4 (Simulation design): the manuscript reports that IJSE 'closely tracked the bootstrap' across misspecification settings, yet the abstract and summary tables provide no numerical summaries (e.g., mean ratios of IJSE to bootstrap SE, coverage probabilities, or relative MSE) with error bars or replication counts; without these quantities the strength of the central tracking claim cannot be assessed quantitatively.
Authors: We agree that quantitative summaries are needed to substantiate the tracking claim. In the revised manuscript we will add a new table in Section 4 that reports, for each functional and misspecification setting, the mean ratio of IJSE to bootstrap SE, the empirical coverage of nominal 95% intervals, and the relative MSE of each estimator, together with Monte Carlo standard errors computed across the simulation replications. revision: yes
-
Referee: [§3.2] §3.2 (Cluster-level IJSE): the aggregation of observation-level influence functions into cluster-level quantities is described only at a high level; an explicit equation or algorithm showing how the per-cluster jackknife weights are formed from the MCMC draws is needed to confirm that the approximation remains valid under the cluster-level misspecifications tested in the multilevel simulations.
Authors: We appreciate the request for explicit detail. In the revision we will insert into Section 3.2 the precise aggregation formula that sums the observation-level influence functions within each cluster to obtain the cluster-level influence, together with a short algorithm (in pseudocode) that shows how the per-cluster jackknife weights are obtained directly from the MCMC draws. This will make the cluster-level procedure fully reproducible and confirm its validity for the multilevel simulations. revision: yes
Circularity Check
No significant circularity
full rationale
The paper applies the infinitesimal jackknife standard error (IJSE) from the cited prior work of Giordano & Broderick (2023) to posterior functionals and evaluates it via independent simulation studies that directly compare IJSE performance against bootstrap and PostSD under misspecification and correct specification. No equations, assumptions, or performance claims reduce by construction to quantities fitted or defined within the present manuscript; the central results rest on external simulation evidence rather than self-referential definitions, fitted-input predictions, or load-bearing self-citations. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The infinitesimal jackknife influence-function approximation recovers the nonparametric bootstrap variance for posterior functionals
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The infinitesimal jackknife standard error (IJSE) approximates the bootstrap variance through influence functions computed from a single MCMC run... Ii ≈ N · dCovt(L(t)i, g(θ(t)))
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under misspecification... PostSD substantially underestimated the true standard error... IJSE closely tracked the bootstrap
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Baron, R. M. and Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.Journal of Personality and Social Psychology, 51(6):1173–1182
work page 1986
-
[2]
Bollen, K. A. and Stine, R. (1990). Direct and indirect effects: Classical and bootstrap estimates of variability.Sociological Methodology, 20:115–140
work page 1990
-
[3]
(2011).Handbook of Markov Chain Monte Carlo
Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011).Handbook of Markov Chain Monte Carlo. CRC press
work page 2011
-
[4]
Davison, A. C. and Hinkley, D. V . (1997).Bootstrap Methods and Their Application. Cambridge University Press, Cambridge
work page 1997
-
[5]
Efron, B. (1979). Bootstrap methods: Another look at the jackknife.The Annals of Statistics, 7(1):1–26. PSYCHOMETRIKASUBMISSIONApril 7, 202634
work page 1979
-
[6]
(1982).The Jackknife, the Bootstrap, and Other Resampling Plans
Efron, B. (1982).The Jackknife, the Bootstrap, and Other Resampling Plans. SIAM, Philadelphia
work page 1982
-
[7]
Efron, B. and Tibshirani, R. J. (1994).An Introduction to the Bootstrap. Chapman & Hall/CRC, New York
work page 1994
-
[8]
Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations.Statistics in Medicine, 27(15):2865–2873
work page 2008
-
[9]
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013).Bayesian Data Analysis. Chapman & Hall/CRC, Boca Raton, FL, 3 edition
work page 2013
-
[10]
Giordano, R. and Broderick, T. (2023). The bayesian infinitesimal jackknife for variance
work page 2023
-
[11]
Giordano, R., Stephenson, W., Liu, R., Jordan, M. I., and Broderick, T. (2019). A swiss army infinitesimal jackknife.Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1139–1147
work page 2019
-
[12]
Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1:221–233
work page 1967
-
[13]
Jaeckel, L. A. (1972). The infinitesimal jackknife. Technical Report MM 72-1215-11, Bell Laboratories
work page 1972
-
[14]
Ji, F., Lee, J., and Rabe-Hesketh, S. (2024). Valid standard errors for bayesian quantile regression with clustered and independent data
work page 2024
-
[15]
Johnson, J. B. and Omland, K. S. (2004). Model selection in ecology and evolution.Trends in Ecology & Evolution, 19(2):101–108
work page 2004
-
[16]
Kleijn, B. J. K. and van der Vaart, A. W. (2012). The Bernstein–von Mises theorem under misspecification. Electronic Journal of Statistics, 6:354–381
work page 2012
-
[17]
Kroes, J. and Finley, A. (2025). Demystifying omega squared: practical guidance for using a less biased ANOV A effect size measure.Psychological Methods, 30(4):866–887
work page 2025
-
[18]
Lachowicz, M. J., Preacher, K. J., and Kelley, K. (2018). A novel measure of effect size for mediation analysis.Psychological Methods, 23(2):244–261. PSYCHOMETRIKASUBMISSIONApril 7, 202635
work page 2018
-
[19]
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOV As.Frontiers in Psychology, 4:863
work page 2013
-
[20]
Levine, T. R. and Hullett, C. R. (2002). Eta squared, partial eta squared, and misreporting of effect size in communication research.Human Communication Research, 28(4):612–625
work page 2002
-
[21]
MacKinnon, D. P. (2008).Introduction to Statistical Mediation Analysis. Lawrence Erlbaum Associates, New York
work page 2008
-
[22]
MacKinnon, D. P., Lockwood, C. M., and Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods.Multivariate Behavioral Research, 39(1):99–128
work page 2004
-
[23]
McGraw, K. O. and Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1):30–46
work page 1996
-
[24]
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.Psychological Bulletin, 105(1):156–166
work page 1989
-
[25]
Morris, T. P., White, I. R., and Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods.Statistics in Medicine, 38(11):2074–2102. Müller, U. K. (2013). Risk of bayesian inference in misspecified models, and the sandwich covariance matrix.Econometrica, 81(5):1805–1849
work page 2019
-
[26]
Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtainingR 2 from generalized linear mixed-effects models.Methods in Ecology and Evolution, 4(2):133–142
work page 2013
-
[27]
Oehlert, G. W. (1992). A note on the delta method.The American Statistician, 46(1):27–29
work page 1992
-
[28]
Okada, K. (2013). Is omega squared less biased? a comparison of three major effect size indices in one-way ANOV A.Behaviormetrika, 40(2):129–144
work page 2013
-
[29]
Olejnik, S. and Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs.Psychological Methods, 8(4):434–447. PSYCHOMETRIKASUBMISSIONApril 7, 202636
work page 2003
-
[30]
Pesigan, I. J. A. and Cheung, S. F. (2020). Sem-based methods to form confidence intervals for indirect effect: Still applicable given nonnormality, under certain conditions.Frontiers in Psychology, 11:571928
work page 2020
-
[31]
Preacher, K. J. and Kelley, K. (2011). Effect size measures for mediation models: Quantitative strategies for communicating indirect effects.Psychological Methods, 16(2):93–115
work page 2011
-
[32]
Raudenbush, S. W. and Bryk, A. S. (2002).Hierarchical Linear Models: Applications and Data Analysis Methods. Sage, 2 edition
work page 2002
-
[33]
Raykov, T. and Marcoulides, G. A. (2015). Intraclass correlation coefficients in hierarchical design models. Educational and Psychological Measurement, 75(6):1063–1070
work page 2015
-
[34]
Rosseel, Y . (2012). lavaan: An R package for structural equation modeling.Journal of Statistical Software, 48(2):1–36
work page 2012
-
[35]
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4):1151–1172
work page 1984
-
[36]
Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients.Methods in Ecology and Evolution, 1(2):103–113
work page 2010
-
[37]
Shrout, P. E. and Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations.Psychological Methods, 7(4):422–445
work page 2002
-
[38]
Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2):420–428
work page 1979
-
[39]
Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13:290–312. ten Hove, D. and others (2025). How to estimate the intraclass correlation coefficient: a systematic review of suggested methods in health behavior research.Health Psychology Review. van der Vaart, A. W. (1998).Asy...
work page 1982
-
[40]
White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50(1):1–25
work page 1982
-
[41]
Wilcox, R. R. (2012).Introduction to Robust Estimation and Hypothesis Testing. Academic Press, 3 edition
work page 2012
-
[42]
Yuan, Y . and MacKinnon, D. P. (2014). Robust mediation analysis based on median regression. Psychological Methods, 19(1):1–20
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.