Mixture priors for replication studies
Pith reviewed 2026-05-24 00:09 UTC · model grok-4.3
The pith
A mixture of the original study's posterior and a non-informative distribution serves as the prior for replication analysis, with the weight controlling pooling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mixture priors provide a Bayesian framework for replication studies by taking the posterior from the original study and mixing it with a non-informative distribution to form the prior for the replication study. The mixing weight directly sets the degree to which the two datasets are combined. The method supports both fixed weights and a prior distribution over the weight, and it permits Bayes factor tests for the presence or absence of an effect as well as for whether the weight equals zero or one.
What carries the argument
Mixture prior formed from the original posterior and a non-informative distribution, with the weight governing the degree of pooling between studies.
If this is right
- Fixed mixture weights give analysts direct control over how strongly the original result influences the replication analysis.
- Placing a prior on the mixture weight introduces uncertainty about the amount of pooling.
- Bayes factors can formally test whether the original data should be ignored, fully used, or partially pooled.
- The framework supplies an alternative to hierarchical models and power priors for combining original and replication data.
Where Pith is reading between the lines
- Sequential replication studies could update the mixture weight as each new dataset arrives.
- The weight might serve as a continuous summary statistic for replicability across many fields.
- Extending the non-informative component to reflect study-specific features could handle different kinds of replication designs.
Load-bearing premise
The mixture of the original study's posterior and a non-informative distribution appropriately represents the prior information for the replication study analysis.
What would settle it
Finding that conclusions about effect presence or the appropriate degree of pooling change materially when the same replication data are analyzed with a hierarchical model instead of the mixture prior would challenge the claim.
Figures
read the original abstract
Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel Bayesian approach for replication studies based on mixture priors. The idea is to use a mixture of the posterior distribution based on the original study and a non-informative distribution as the prior for the analysis of the replication study. The mixture weight then determines the extent to which the original and replication data are pooled. Two distinct strategies are presented: one with fixed mixture weights, and one that introduces uncertainty by assigning a prior distribution to the mixture weight itself. Furthermore, it is shown how within this framework Bayes factors can be used for formal testing of relevant scientific hypotheses, such as tests on the presence or absence of an effect or whether the mixture weight equals zero (completely discounting the original data) or one (fully pooling with the original data). To showcase the practical application of the methodology, we analyze data from three replication studies. Our findings suggest that mixture priors are a valuable and intuitive alternative to other Bayesian methods for analyzing replication studies, such as hierarchical models and power priors. We provide the free and open source R package repmix that implements the proposed methodology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Bayesian framework for replication studies that uses a mixture prior for the replication analysis: a weighted combination of the posterior from the original study and a non-informative distribution. The mixture weight controls the degree of pooling between studies. Two variants are considered (fixed weight; hierarchical prior on the weight), Bayes factors are derived for testing hypotheses including effect presence and w=0 versus w=1, and the method is illustrated on three replication datasets with an accompanying R package repmix.
Significance. If the propriety and marginal-likelihood issues can be resolved, the approach offers an intuitive, directly interpretable alternative to hierarchical or power-prior models for quantifying replication strength. The open-source repmix package is a concrete strength that supports reproducibility and adoption.
major comments (2)
- [Methodology / prior construction and Bayes-factor section] The central construction (mixture of original posterior with non-informative component) yields an improper prior whenever the mixture weight w < 1. Marginal likelihoods and the Bayes factors used to test H0: w=0 versus H1: w=1 are therefore formally undefined without an implicit proper approximation whose effect on the reported inferences is never quantified. This directly undermines the claim that the procedure supplies a coherent Bayesian alternative to hierarchical or power-prior methods.
- [Numerical implementation and application sections] The paper relies on the mixture prior both for posterior inference and for formal model comparison via Bayes factors; any regularization introduced in the numerical implementation must be shown to leave the qualitative conclusions (e.g., posterior on w, BF values) unchanged. No such sensitivity analysis is reported.
minor comments (2)
- [Notation and definitions] Notation for the non-informative component should be made explicit (e.g., whether it is a proper approximation or left improper) and cross-referenced to the Bayes-factor derivations.
- [Applications] The three empirical examples would benefit from a brief table comparing the mixture-prior results to a standard hierarchical model on the same data.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: The central construction (mixture of original posterior with non-informative component) yields an improper prior whenever the mixture weight w < 1. Marginal likelihoods and the Bayes factors used to test H0: w=0 versus H1: w=1 are therefore formally undefined without an implicit proper approximation whose effect on the reported inferences is never quantified. This directly undermines the claim that the procedure supplies a coherent Bayesian alternative to hierarchical or power-prior methods.
Authors: We acknowledge that the mixture prior is improper for w < 1 and that this raises a legitimate question about the formal definition of the marginal likelihoods and the associated Bayes factors. In the manuscript the Bayes factors for hypotheses involving w were obtained via the Savage-Dickey density ratio applied to the marginal posterior of w; this construction can be viewed as the limit of a sequence of proper priors. Nevertheless, we agree that the limiting argument and its numerical consequences should be stated explicitly. In the revision we will add a dedicated subsection that (i) clarifies the limiting procedure, (ii) supplies a concrete proper approximation (e.g., a very diffuse but proper normal or t distribution) and (iii) reports a small sensitivity study confirming that the reported Bayes-factor values change only negligibly under different proper approximations. revision: yes
-
Referee: The paper relies on the mixture prior both for posterior inference and for formal model comparison via Bayes factors; any regularization introduced in the numerical implementation must be shown to leave the qualitative conclusions (e.g., posterior on w, BF values) unchanged. No such sensitivity analysis is reported.
Authors: We agree that an explicit sensitivity analysis with respect to any numerical regularization is required. We will add this analysis to the revised manuscript, repeating the posterior and Bayes-factor calculations under several degrees of regularization of the non-informative component and demonstrating that the qualitative conclusions (posterior mass on w near 0 or 1, and the ordering of the Bayes factors) remain unchanged. revision: yes
Circularity Check
No circularity: direct modeling proposal with independent construction
full rationale
The paper defines a mixture prior directly as w times the original posterior plus (1-w) times a non-informative component, then applies standard Bayesian updating and Bayes factor calculations to this prior. No derived quantity is obtained by fitting to a subset of the same data and relabeling the fit as a prediction; no uniqueness theorem or ansatz is imported via self-citation; and the central claims rest on the explicit mixture definition rather than on any reduction to prior fitted values. The framework is therefore self-contained as a modeling choice whose propriety and numerical implementation are separate issues from circularity.
Axiom & Free-Parameter Ledger
free parameters (1)
- mixture weight
axioms (1)
- domain assumption The prior for the replication study is a mixture of the original posterior and a non-informative distribution
Reference graph
Works this paper leans on
-
[1]
Bayarri, M. and Mayoral, A. (2002a). Bayesian analysis and design for comparison of effect-sizes. Journal of Statistical Planning and Inference , 103(1):225--243
-
[2]
Bayarri, M. J. and Mayoral, A. M. (2002b). Bayesian design of “successful” replications. The American Statistician , 56(3):207--214
- [3]
-
[4]
Berry, S. M., Broglio, K. R., Groshen, S., and Berry, D. A. (2013). Bayesian hierarchical modeling of patient subpopulations: Efficient designs of phase II oncology clinical trials. Clinical Trials , 10(5):720--734. PMID: 23983156
work page 2013
-
[5]
Best, N., Price, R. G., Pouliquen, I. J., and Keene, O. N. (2021). Assessing efficacy in important subgroups in confirmatory trials: An example using Bayesian dynamic borrowing. Pharmaceutical Statistics , 20(3):551--562
work page 2021
-
[6]
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., and Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science , 351(6280):1433--1436
work page 2016
-
[7]
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., Isaksson, S., Manfredi, D., Rose, J., Wagenmakers, E.-J., and Wu, H. (2018). Evaluating the replicability of soc...
work page 2018
-
[8]
Chen, M.-H. and Ibrahim, J. G. (2000). Power prior distributions for regression models . Statistical Science , 15(1):46 -- 60
work page 2000
-
[9]
Consonni, G. and Egidi, L. (2023). Assessing replication success via skeptical mixture priors. arXiv preprint arXiv:2401.00257
-
[10]
Consonni, G., Fouskakis, D., Liseo, B., and Ntzoufras, I. (2018). Prior distributions for objective Bayesian analysis . Bayesian Analysis , 13(2):627 -- 679
work page 2018
-
[11]
Duan, Y., Ye, K., and Smith, E. P. (2006). Evaluating water quality using power priors to incorporate historical information. Environmetrics , 17(1):95--106
work page 2006
-
[12]
Egidi, L., Pauli, F., and Torelli, N. (2022). Avoiding prior–data conflict in regression models via mixture priors. Canadian Journal of Statistics , 50(2):491--510
work page 2022
-
[13]
Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., and Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. eLife , 10:e71601
work page 2021
-
[14]
Good, I. (1950). Probability and the weighing of evidence. Journal of the Institute of Actuaries , 76(3):293–296
work page 1950
-
[15]
Good, I. J. (1958). Significance tests in parallel and in series. Journal of the American Statistical Association , 53(284):799--813
work page 1958
-
[16]
F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D
Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D. S., Forster, J. J., Wagenmakers, E.-J., and Steingroever, H. (2017). A tutorial on bridge sampling. Journal of Mathematical Psychology , 81:80--97
work page 2017
-
[17]
Harms, C. (2019). A Bayes factor for replications of ANOVA results . The American Statistician , 73(4):327--339
work page 2019
-
[18]
Hedges, L. V. and Schauer, J. M. (2019). More than one replication study is needed for unambiguous tests of replication. Journal of Educational and Behavioral Statistics , 44(5):543--570
work page 2019
-
[19]
Held, L. (2020). A new standard for the analysis and design of replication studies. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(2):431--448
work page 2020
-
[20]
Held, L., Matthews, R., Ott, M., and Pawel, S. (2022a). Reverse- B ayes methods for evidence assessment and research synthesis. Research Synthesis Methods , 13(3):295--314
-
[21]
Held, L., Micheloud, C., and Pawel, S. (2022b). The assessment of replication success based on relative effect size . The Annals of Applied Statistics , 16(2):706 -- 720
-
[22]
Jeffreys, H. (1961). The Theory of Probability . Oxford University Press, third edition
work page 1961
-
[23]
Johnson, V. E., Payne, R. D., Wang, T., Asher, A., and Mandal, S. (2017). On the reproducibility of psychological science. Journal of the American Statistical Association , 112(517):1--10. PMID: 29861517
work page 2017
-
[24]
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association , 90(430):773--795
work page 1995
-
[25]
Kass, R. E. and Wasserman, L. (1995). A reference B ayesian test for nested hypotheses and its relationship to the S chwarz criterion. J. Amer. Statist. Assoc. , 90(431):928--934
work page 1995
-
[26]
Lesaffre, E., Qi, H., Banbeta, A., and van Rosmalen, J. (2024). A review of dynamic borrowing methods with applications in pharmaceutical research. Brazilian Journal of Probability and Statistics , 38(1)
work page 2024
-
[27]
Mathur, M. B. and VanderWeele, T. J. (2020). New statistical metrics for multisite replication projects. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(3):1145--1166
work page 2020
-
[28]
Micheloud, C., Balabdaoui, F., and Held, L. (2023). Assessing replicability with the sceptical p-value: Type- I error control and sample size planning. Statistica Neerlandica , 77(4):573--591
work page 2023
-
[29]
Reproducibility and Replicability in Science
National Academies of Sciences, Engineering, and Medicine (2019). Reproducibility and Replicability in Science . National Academies Press
work page 2019
-
[30]
Replication studies hold the key to generalization [editorial]
Nature Communications (2022). Replication studies hold the key to generalization [editorial]. Nature Communications , 13(1)
work page 2022
-
[31]
Neuenschwander, B., Capkun-Niggli, G., Branson, M., and Spiegelhalter, D. J. (2010). Summarizing historical information on controls in clinical trials. Clinical Trials , 7(1):5--18. PMID: 20156954
work page 2010
- [32]
-
[33]
Ntzoufras, I., Dellaportas, P., and Forster, J. J. (2003). Bayesian variable and link determination for generalised linear models. Journal of Statistical Planning and Inference , 111(1):165--180
work page 2003
-
[34]
Make replication studies a normal part of science
NWO (2016). Make replication studies a normal part of science
work page 2016
-
[35]
O'Hagan, A. and Forster, J. (2004). Kendall's Advanced Theory of Statistic 2B . Wiley & Sons, Chichester, second edition
work page 2004
-
[36]
Estimating the reproducibility of psychological science
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science , 349(6251):aac4716
work page 2015
-
[37]
Overstall, A. M. and Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis , 54(12):3269--3288
work page 2010
-
[38]
O’Hagan, A. and Pericchi, L. (2012). Bayesian heavy-tailed models and conflict resolution: A review . Brazilian Journal of Probability and Statistics , 26(4):372 -- 401
work page 2012
-
[39]
Patil, P., Peng, R. D., and Leek, J. T. (2016). What should researchers expect when they replicate studies? a statistical view of replicability in psychological science. Perspectives on Psychological Science , 11(4):539--544. PMID: 27474140
work page 2016
-
[40]
Pawel, S., Aust, F., Held, L., and Wagenmakers, E.-J. (2024). Power priors for replication studies. TEST , 33:127--154
work page 2024
-
[41]
Pawel, S. and Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE , 15(4):1--23
work page 2020
-
[42]
Pawel, S. and Held, L. (2022). The sceptical Bayes factor for the assessment of replication success . Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(3):879--911
work page 2022
-
[43]
Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., Buttrick, N., DeBell, M., Ebersole, C. R., Lundmark, S., and et al. (2023). High replicability of newly discovered social-behavioural findings is achievable. Nature Human Behaviour
work page 2023
-
[44]
R: A Language and Environment for Statistical Computing
R Core Team (2023). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria
work page 2023
-
[45]
Saban \'e s Bov \'e , D. and Held, L. (2011). Hyper- g priors for generalized linear models . Bayesian Analysis , 6(3):387 -- 410
work page 2011
-
[46]
J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S
Schad, D. J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S. (2023). Workflow techniques for the robust use of Bayes factors. Psychological Methods , 28(6):1404--1426
work page 2023
-
[47]
Schmidli, H., Gsteiger, S., Roychoudhury, S., O'Hagan, A., Spiegelhalter, D., and Neuenschwander, B. (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics , 70(4):1023--1032
work page 2014
-
[48]
Schönbrodt, F. D. and Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review , 25(1):128--142
work page 2018
-
[49]
Spiegelhalter, D., Abrams, K., and Myles, J. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Wiley, New York
work page 2004
-
[50]
Thall, P. F., Wathen, J. K., Bekele, B. N., Champlin, R. E., Baker, L. H., and Benjamin, R. S. (2003). Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes. Statistics in Medicine , 22(5):763--780
work page 2003
-
[51]
Verhagen, J. and Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General , 143(4):1457--1475
work page 2014
-
[52]
Yang, P., Zhao, Y., Nie, L., Vallejo, J., and Yuan, Y. (2023). Sam: Self-adapting mixture prior to dynamically borrow information from historical data in clinical trials. Biometrics , 79(4):2857--2868
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.