Robust Standard Errors for Bayesian Posterior Functionals via the Infinitesimal Jackknife

Feng Ji; Nanyu Luo

arxiv: 2604.03398 · v1 · submitted 2026-04-03 · 📊 stat.ME · stat.AP· stat.CO

Robust Standard Errors for Bayesian Posterior Functionals via the Infinitesimal Jackknife

Nanyu Luo , Feng Ji This is my paper

Pith reviewed 2026-05-13 18:16 UTC · model grok-4.3

classification 📊 stat.ME stat.APstat.CO

keywords infinitesimal jackkniferobust standard errorsBayesian posterior functionalsmodel misspecificationnonparametric bootstrapmediation analysismultilevel modelinginfluence functions

0 comments

The pith

The infinitesimal jackknife supplies robust standard errors for any Bayesian posterior functional by approximating the nonparametric bootstrap from one MCMC run.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bayesian models in the social sciences routinely produce nonlinear functionals such as indirect effects, standardized coefficients, and intraclass correlations. The default posterior standard deviation assumes the model is correct and tends to understate frequentist uncertainty when it is not. The infinitesimal jackknife computes a robust standard error by estimating influence functions from a single MCMC sample, without requiring repeated refits or new analytic derivatives for each functional. Simulations across mediation, ANOVA, and multilevel models show that the jackknife tracks the bootstrap closely under misspecification while agreeing with the posterior standard deviation when the model is correct. This combination makes reliable uncertainty quantification feasible for the nonlinear summaries that behavioral researchers actually report.

Core claim

The infinitesimal jackknife standard error approximates the bootstrap variance of any posterior functional through observation-level or cluster-level influence functions obtained from a single MCMC run. In four simulation studies covering six common functionals, the jackknife closely matched bootstrap standard errors under misspecification, while the posterior standard deviation substantially underestimated them. Under correct specification all three methods agreed, confirming that the jackknife adds no distortion when the working model is adequate.

What carries the argument

The infinitesimal jackknife standard error (IJSE), which recovers bootstrap variance by summing squared influence functions computed once from the MCMC samples.

If this is right

Researchers obtain robust standard errors for any nonlinear posterior functional without deriving custom gradients or rerunning MCMC hundreds of times.
Under misspecification the method prevents the undercoverage that occurs with the plain posterior standard deviation.
Cluster-level versions directly handle the multilevel data structures common in behavioral research.
When the model is correctly specified the jackknife coincides with the posterior standard deviation, preserving statistical efficiency.
The approach applies unchanged to indirect effects, intraclass correlations, and other functionals already in routine use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same single-run influence-function machinery could be applied to other posterior summaries such as predictive checks or decision-theoretic quantities.
Because the computational cost stays near that of one MCMC fit, the method lowers the barrier to routine robust reporting in large-scale applied studies.
Extensions to dependent data would require only a modified influence-function definition that respects the dependence structure.

Load-bearing premise

The influence functions extracted from a single MCMC chain accurately recover the bootstrap variance of the chosen functional even when the fitted model differs from the true data-generating process.

What would settle it

A controlled simulation or real-data example in which the infinitesimal jackknife standard error deviates materially from the standard error obtained by repeating full MCMC fits on many bootstrap resamples under known misspecification.

read the original abstract

Quantitative research in the social and behavioral sciences relies heavily on nonlinear posterior functionals such as indirect effects, standardized coefficients, effect sizes, intraclass correlations, and multilevel variance-explained measures. The posterior standard deviation (PostSD) is the default uncertainty summary for these quantities, yet it presupposes a correctly specified model. When the working model is wrong, as is common with behavioral data that exhibit heavy tails and heteroskedasticity, PostSD can severely underestimate the frequentist standard error. The nonparametric bootstrap offers robustness but requires repeated MCMC refits, while the delta method demands a separate analytic gradient derivation for every new functional. The infinitesimal jackknife standard error (Giordano & Broderick, 2023) sidesteps both limitations: it approximates the bootstrap variance through influence functions computed from a single MCMC run, applies to any posterior functional without modification, and requires no analytic derivatives. We discuss the use the IJSE methodology at both the observation level and the cluster level and evaluate it through four simulation studies covering six functionals from mediation analysis, ANOVA, and multilevel modeling, which are commonly used in the social and behavioral sciences. Under misspecification, PostSD substantially underestimated the true standard error across all settings, whereas IJSE closely tracked the bootstrap at a fraction of the computational cost. Under correct specification all three methods agreed, confirming that IJSE introduces no distortion when the model is right. These results show IJSE as a practical, general-purpose tool for robust uncertainty quantification in Bayesian workflows throughout the social and behavioral sciences

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper applies the existing infinitesimal jackknife to nonlinear Bayesian functionals common in social sciences and shows via simulation that it tracks bootstrap SE under misspecification at low cost.

read the letter

This paper takes the infinitesimal jackknife from Giordano and Broderick and tests it on nonlinear posterior functionals like indirect effects, standardized coefficients, effect sizes, and intraclass correlations. The simulations indicate that IJSE matches bootstrap results under misspecification such as heavy tails and heteroskedasticity, while agreeing with the usual posterior SD when the model is correct, all from a single MCMC run and without custom derivatives for each functional. They cover both observation-level and cluster-level versions across mediation, ANOVA, and multilevel settings. That is the core contribution: a practical check that the method transfers to these applied cases without distortion or high compute cost. The work is solid on its own terms because the simulations directly compare the three approaches and the logic is internally consistent. The main soft spot is that everything rests on simulation evidence, so the strength of the claim depends on how close the tracking actually is in the numbers and whether the designs cover enough realistic sample sizes and replication variability. It is an extension rather than a new derivation, which is fine but limits how much it shifts the literature. This is for applied researchers in psychology and behavioral sciences who already run Bayesian models and need reliable SE for nonlinear quantities. A reader facing the PostSD underestimation problem would find the comparison useful. I would send it to peer review because the question is relevant, the approach is grounded in prior formal work, and the simulations address the right comparisons even if they could be reported with more detail.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes applying the infinitesimal jackknife standard error (IJSE) from Giordano & Broderick (2023) to obtain frequentist-robust standard errors for arbitrary nonlinear Bayesian posterior functionals (e.g., indirect effects, standardized coefficients, intraclass correlations) using a single MCMC run. Through four simulation studies involving six functionals from mediation analysis, ANOVA, and multilevel models, it shows that under misspecification (heavy tails, heteroskedasticity) the posterior standard deviation (PostSD) substantially underestimates the true frequentist SE while IJSE closely tracks the nonparametric bootstrap; under correct specification all three methods agree.

Significance. If the simulation results hold, the work supplies a computationally efficient, general-purpose tool for robust uncertainty quantification in Bayesian workflows that are standard in the social and behavioral sciences. It avoids both repeated MCMC refits required by the bootstrap and the need for analytic derivatives required by the delta method, while preserving agreement with PostSD when the model is correctly specified. The explicit treatment of both observation- and cluster-level versions broadens applicability to multilevel data.

major comments (2)

[§4] §4 (Simulation design): the manuscript reports that IJSE 'closely tracked the bootstrap' across misspecification settings, yet the abstract and summary tables provide no numerical summaries (e.g., mean ratios of IJSE to bootstrap SE, coverage probabilities, or relative MSE) with error bars or replication counts; without these quantities the strength of the central tracking claim cannot be assessed quantitatively.
[§3.2] §3.2 (Cluster-level IJSE): the aggregation of observation-level influence functions into cluster-level quantities is described only at a high level; an explicit equation or algorithm showing how the per-cluster jackknife weights are formed from the MCMC draws is needed to confirm that the approximation remains valid under the cluster-level misspecifications tested in the multilevel simulations.

minor comments (2)

[Abstract] Abstract: the phrase 'four simulation studies covering six functionals' should be accompanied by a one-sentence summary of the key quantitative result (e.g., average relative error of IJSE versus bootstrap) to give readers an immediate sense of effect size.
[§2] Notation: the symbol for the posterior functional (denoted variously as θ or g(θ)) should be standardized throughout §2 and §3 to avoid confusion when the same symbol appears in the influence-function derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comments. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [§4] §4 (Simulation design): the manuscript reports that IJSE 'closely tracked the bootstrap' across misspecification settings, yet the abstract and summary tables provide no numerical summaries (e.g., mean ratios of IJSE to bootstrap SE, coverage probabilities, or relative MSE) with error bars or replication counts; without these quantities the strength of the central tracking claim cannot be assessed quantitatively.

Authors: We agree that quantitative summaries are needed to substantiate the tracking claim. In the revised manuscript we will add a new table in Section 4 that reports, for each functional and misspecification setting, the mean ratio of IJSE to bootstrap SE, the empirical coverage of nominal 95% intervals, and the relative MSE of each estimator, together with Monte Carlo standard errors computed across the simulation replications. revision: yes
Referee: [§3.2] §3.2 (Cluster-level IJSE): the aggregation of observation-level influence functions into cluster-level quantities is described only at a high level; an explicit equation or algorithm showing how the per-cluster jackknife weights are formed from the MCMC draws is needed to confirm that the approximation remains valid under the cluster-level misspecifications tested in the multilevel simulations.

Authors: We appreciate the request for explicit detail. In the revision we will insert into Section 3.2 the precise aggregation formula that sums the observation-level influence functions within each cluster to obtain the cluster-level influence, together with a short algorithm (in pseudocode) that shows how the per-cluster jackknife weights are obtained directly from the MCMC draws. This will make the cluster-level procedure fully reproducible and confirm its validity for the multilevel simulations. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper applies the infinitesimal jackknife standard error (IJSE) from the cited prior work of Giordano & Broderick (2023) to posterior functionals and evaluates it via independent simulation studies that directly compare IJSE performance against bootstrap and PostSD under misspecification and correct specification. No equations, assumptions, or performance claims reduce by construction to quantities fitted or defined within the present manuscript; the central results rest on external simulation evidence rather than self-referential definitions, fitted-input predictions, or load-bearing self-citations. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the established infinitesimal jackknife approximation and standard MCMC sampling assumptions; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption The infinitesimal jackknife influence-function approximation recovers the nonparametric bootstrap variance for posterior functionals
This is the central modeling assumption that allows IJSE to serve as a cheap surrogate for the bootstrap under misspecification.

pith-pipeline@v0.9.0 · 5573 in / 1358 out tokens · 39594 ms · 2026-05-13T18:16:28.694150+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The infinitesimal jackknife standard error (IJSE) approximates the bootstrap variance through influence functions computed from a single MCMC run... Ii ≈ N · dCovt(L(t)i, g(θ(t)))
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under misspecification... PostSD substantially underestimated the true standard error... IJSE closely tracked the bootstrap

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Baron, R. M. and Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.Journal of Personality and Social Psychology, 51(6):1173–1182

work page 1986
[2]

Bollen, K. A. and Stine, R. (1990). Direct and indirect effects: Classical and bootstrap estimates of variability.Sociological Methodology, 20:115–140

work page 1990
[3]

(2011).Handbook of Markov Chain Monte Carlo

Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011).Handbook of Markov Chain Monte Carlo. CRC press

work page 2011
[4]

Davison, A. C. and Hinkley, D. V . (1997).Bootstrap Methods and Their Application. Cambridge University Press, Cambridge

work page 1997
[5]

Efron, B. (1979). Bootstrap methods: Another look at the jackknife.The Annals of Statistics, 7(1):1–26. PSYCHOMETRIKASUBMISSIONApril 7, 202634

work page 1979
[6]

(1982).The Jackknife, the Bootstrap, and Other Resampling Plans

Efron, B. (1982).The Jackknife, the Bootstrap, and Other Resampling Plans. SIAM, Philadelphia

work page 1982
[7]

and Tibshirani, R

Efron, B. and Tibshirani, R. J. (1994).An Introduction to the Bootstrap. Chapman & Hall/CRC, New York

work page 1994
[8]

Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations.Statistics in Medicine, 27(15):2865–2873

work page 2008
[9]

B., Stern, H

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013).Bayesian Data Analysis. Chapman & Hall/CRC, Boca Raton, FL, 3 edition

work page 2013
[10]

and Broderick, T

Giordano, R. and Broderick, T. (2023). The bayesian infinitesimal jackknife for variance

work page 2023
[11]

I., and Broderick, T

Giordano, R., Stephenson, W., Liu, R., Jordan, M. I., and Broderick, T. (2019). A swiss army infinitesimal jackknife.Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1139–1147

work page 2019
[12]

Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1:221–233

work page 1967
[13]

Jaeckel, L. A. (1972). The infinitesimal jackknife. Technical Report MM 72-1215-11, Bell Laboratories

work page 1972
[14]

Ji, F., Lee, J., and Rabe-Hesketh, S. (2024). Valid standard errors for bayesian quantile regression with clustered and independent data

work page 2024
[15]

Johnson, J. B. and Omland, K. S. (2004). Model selection in ecology and evolution.Trends in Ecology & Evolution, 19(2):101–108

work page 2004
[16]

Kleijn, B. J. K. and van der Vaart, A. W. (2012). The Bernstein–von Mises theorem under misspecification. Electronic Journal of Statistics, 6:354–381

work page 2012
[17]

and Finley, A

Kroes, J. and Finley, A. (2025). Demystifying omega squared: practical guidance for using a less biased ANOV A effect size measure.Psychological Methods, 30(4):866–887

work page 2025
[18]

J., Preacher, K

Lachowicz, M. J., Preacher, K. J., and Kelley, K. (2018). A novel measure of effect size for mediation analysis.Psychological Methods, 23(2):244–261. PSYCHOMETRIKASUBMISSIONApril 7, 202635

work page 2018
[19]

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOV As.Frontiers in Psychology, 4:863

work page 2013
[20]

Levine, T. R. and Hullett, C. R. (2002). Eta squared, partial eta squared, and misreporting of effect size in communication research.Human Communication Research, 28(4):612–625

work page 2002
[21]

MacKinnon, D. P. (2008).Introduction to Statistical Mediation Analysis. Lawrence Erlbaum Associates, New York

work page 2008
[22]

P., Lockwood, C

MacKinnon, D. P., Lockwood, C. M., and Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods.Multivariate Behavioral Research, 39(1):99–128

work page 2004
[23]

McGraw, K. O. and Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1):30–46

work page 1996
[24]

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.Psychological Bulletin, 105(1):156–166

work page 1989
[25]

P., White, I

Morris, T. P., White, I. R., and Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods.Statistics in Medicine, 38(11):2074–2102. Müller, U. K. (2013). Risk of bayesian inference in misspecified models, and the sandwich covariance matrix.Econometrica, 81(5):1805–1849

work page 2019
[26]

and Schielzeth, H

Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtainingR 2 from generalized linear mixed-effects models.Methods in Ecology and Evolution, 4(2):133–142

work page 2013
[27]

Oehlert, G. W. (1992). A note on the delta method.The American Statistician, 46(1):27–29

work page 1992
[28]

Okada, K. (2013). Is omega squared less biased? a comparison of three major effect size indices in one-way ANOV A.Behaviormetrika, 40(2):129–144

work page 2013
[29]

and Algina, J

Olejnik, S. and Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs.Psychological Methods, 8(4):434–447. PSYCHOMETRIKASUBMISSIONApril 7, 202636

work page 2003
[30]

Pesigan, I. J. A. and Cheung, S. F. (2020). Sem-based methods to form confidence intervals for indirect effect: Still applicable given nonnormality, under certain conditions.Frontiers in Psychology, 11:571928

work page 2020
[31]

Preacher, K. J. and Kelley, K. (2011). Effect size measures for mediation models: Quantitative strategies for communicating indirect effects.Psychological Methods, 16(2):93–115

work page 2011
[32]

Raudenbush, S. W. and Bryk, A. S. (2002).Hierarchical Linear Models: Applications and Data Analysis Methods. Sage, 2 edition

work page 2002
[33]

and Marcoulides, G

Raykov, T. and Marcoulides, G. A. (2015). Intraclass correlation coefficients in hierarchical design models. Educational and Psychological Measurement, 75(6):1063–1070

work page 2015
[34]

Rosseel, Y . (2012). lavaan: An R package for structural equation modeling.Journal of Statistical Software, 48(2):1–36

work page 2012
[35]

Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4):1151–1172

work page 1984
[36]

Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients.Methods in Ecology and Evolution, 1(2):103–113

work page 2010
[37]

Shrout, P. E. and Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations.Psychological Methods, 7(4):422–445

work page 2002
[38]

Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2):420–428

work page 1979
[39]

Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13:290–312. ten Hove, D. and others (2025). How to estimate the intraclass correlation coefficient: a systematic review of suggested methods in health behavior research.Health Psychology Review. van der Vaart, A. W. (1998).Asy...

work page 1982
[40]

White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50(1):1–25

work page 1982
[41]

Wilcox, R. R. (2012).Introduction to Robust Estimation and Hypothesis Testing. Academic Press, 3 edition

work page 2012
[42]

and MacKinnon, D

Yuan, Y . and MacKinnon, D. P. (2014). Robust mediation analysis based on median regression. Psychological Methods, 19(1):1–20

work page 2014

[1] [1]

Baron, R. M. and Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.Journal of Personality and Social Psychology, 51(6):1173–1182

work page 1986

[2] [2]

Bollen, K. A. and Stine, R. (1990). Direct and indirect effects: Classical and bootstrap estimates of variability.Sociological Methodology, 20:115–140

work page 1990

[3] [3]

(2011).Handbook of Markov Chain Monte Carlo

Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011).Handbook of Markov Chain Monte Carlo. CRC press

work page 2011

[4] [4]

Davison, A. C. and Hinkley, D. V . (1997).Bootstrap Methods and Their Application. Cambridge University Press, Cambridge

work page 1997

[5] [5]

Efron, B. (1979). Bootstrap methods: Another look at the jackknife.The Annals of Statistics, 7(1):1–26. PSYCHOMETRIKASUBMISSIONApril 7, 202634

work page 1979

[6] [6]

(1982).The Jackknife, the Bootstrap, and Other Resampling Plans

Efron, B. (1982).The Jackknife, the Bootstrap, and Other Resampling Plans. SIAM, Philadelphia

work page 1982

[7] [7]

and Tibshirani, R

Efron, B. and Tibshirani, R. J. (1994).An Introduction to the Bootstrap. Chapman & Hall/CRC, New York

work page 1994

[8] [8]

Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations.Statistics in Medicine, 27(15):2865–2873

work page 2008

[9] [9]

B., Stern, H

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013).Bayesian Data Analysis. Chapman & Hall/CRC, Boca Raton, FL, 3 edition

work page 2013

[10] [10]

and Broderick, T

Giordano, R. and Broderick, T. (2023). The bayesian infinitesimal jackknife for variance

work page 2023

[11] [11]

I., and Broderick, T

Giordano, R., Stephenson, W., Liu, R., Jordan, M. I., and Broderick, T. (2019). A swiss army infinitesimal jackknife.Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1139–1147

work page 2019

[12] [12]

Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1:221–233

work page 1967

[13] [13]

Jaeckel, L. A. (1972). The infinitesimal jackknife. Technical Report MM 72-1215-11, Bell Laboratories

work page 1972

[14] [14]

Ji, F., Lee, J., and Rabe-Hesketh, S. (2024). Valid standard errors for bayesian quantile regression with clustered and independent data

work page 2024

[15] [15]

Johnson, J. B. and Omland, K. S. (2004). Model selection in ecology and evolution.Trends in Ecology & Evolution, 19(2):101–108

work page 2004

[16] [16]

Kleijn, B. J. K. and van der Vaart, A. W. (2012). The Bernstein–von Mises theorem under misspecification. Electronic Journal of Statistics, 6:354–381

work page 2012

[17] [17]

and Finley, A

Kroes, J. and Finley, A. (2025). Demystifying omega squared: practical guidance for using a less biased ANOV A effect size measure.Psychological Methods, 30(4):866–887

work page 2025

[18] [18]

J., Preacher, K

Lachowicz, M. J., Preacher, K. J., and Kelley, K. (2018). A novel measure of effect size for mediation analysis.Psychological Methods, 23(2):244–261. PSYCHOMETRIKASUBMISSIONApril 7, 202635

work page 2018

[19] [19]

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOV As.Frontiers in Psychology, 4:863

work page 2013

[20] [20]

Levine, T. R. and Hullett, C. R. (2002). Eta squared, partial eta squared, and misreporting of effect size in communication research.Human Communication Research, 28(4):612–625

work page 2002

[21] [21]

MacKinnon, D. P. (2008).Introduction to Statistical Mediation Analysis. Lawrence Erlbaum Associates, New York

work page 2008

[22] [22]

P., Lockwood, C

MacKinnon, D. P., Lockwood, C. M., and Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods.Multivariate Behavioral Research, 39(1):99–128

work page 2004

[23] [23]

McGraw, K. O. and Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1):30–46

work page 1996

[24] [24]

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.Psychological Bulletin, 105(1):156–166

work page 1989

[25] [25]

P., White, I

Morris, T. P., White, I. R., and Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods.Statistics in Medicine, 38(11):2074–2102. Müller, U. K. (2013). Risk of bayesian inference in misspecified models, and the sandwich covariance matrix.Econometrica, 81(5):1805–1849

work page 2019

[26] [26]

and Schielzeth, H

Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtainingR 2 from generalized linear mixed-effects models.Methods in Ecology and Evolution, 4(2):133–142

work page 2013

[27] [27]

Oehlert, G. W. (1992). A note on the delta method.The American Statistician, 46(1):27–29

work page 1992

[28] [28]

Okada, K. (2013). Is omega squared less biased? a comparison of three major effect size indices in one-way ANOV A.Behaviormetrika, 40(2):129–144

work page 2013

[29] [29]

and Algina, J

Olejnik, S. and Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs.Psychological Methods, 8(4):434–447. PSYCHOMETRIKASUBMISSIONApril 7, 202636

work page 2003

[30] [30]

Pesigan, I. J. A. and Cheung, S. F. (2020). Sem-based methods to form confidence intervals for indirect effect: Still applicable given nonnormality, under certain conditions.Frontiers in Psychology, 11:571928

work page 2020

[31] [31]

Preacher, K. J. and Kelley, K. (2011). Effect size measures for mediation models: Quantitative strategies for communicating indirect effects.Psychological Methods, 16(2):93–115

work page 2011

[32] [32]

Raudenbush, S. W. and Bryk, A. S. (2002).Hierarchical Linear Models: Applications and Data Analysis Methods. Sage, 2 edition

work page 2002

[33] [33]

and Marcoulides, G

Raykov, T. and Marcoulides, G. A. (2015). Intraclass correlation coefficients in hierarchical design models. Educational and Psychological Measurement, 75(6):1063–1070

work page 2015

[34] [34]

Rosseel, Y . (2012). lavaan: An R package for structural equation modeling.Journal of Statistical Software, 48(2):1–36

work page 2012

[35] [35]

Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4):1151–1172

work page 1984

[36] [36]

Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients.Methods in Ecology and Evolution, 1(2):103–113

work page 2010

[37] [37]

Shrout, P. E. and Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations.Psychological Methods, 7(4):422–445

work page 2002

[38] [38]

Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2):420–428

work page 1979

[39] [39]

Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13:290–312. ten Hove, D. and others (2025). How to estimate the intraclass correlation coefficient: a systematic review of suggested methods in health behavior research.Health Psychology Review. van der Vaart, A. W. (1998).Asy...

work page 1982

[40] [40]

White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50(1):1–25

work page 1982

[41] [41]

Wilcox, R. R. (2012).Introduction to Robust Estimation and Hypothesis Testing. Academic Press, 3 edition

work page 2012

[42] [42]

and MacKinnon, D

Yuan, Y . and MacKinnon, D. P. (2014). Robust mediation analysis based on median regression. Psychological Methods, 19(1):1–20

work page 2014