pith. sign in

arxiv: 2406.19152 · v4 · pith:K2N2Y4KCnew · submitted 2024-06-27 · 📊 stat.ME · stat.AP

Mixture priors for replication studies

Pith reviewed 2026-05-24 00:09 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords mixture priorsreplication studiesBayesian analysisBayes factorsdata poolingprior distributionsstatistical replication
0
0 comments X

The pith

A mixture of the original study's posterior and a non-informative distribution serves as the prior for replication analysis, with the weight controlling pooling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Replication studies need a way to combine original and new data while quantifying how much strength to borrow from the first result. The paper sets up the prior for the replication as a weighted mix of the original posterior and a vague distribution. The weight itself becomes the measure of how much the studies are pooled together. Bayes factors built on this mixture let researchers test whether an effect exists and whether the weight should be zero or one. The approach is applied to three real replication datasets.

Core claim

Mixture priors provide a Bayesian framework for replication studies by taking the posterior from the original study and mixing it with a non-informative distribution to form the prior for the replication study. The mixing weight directly sets the degree to which the two datasets are combined. The method supports both fixed weights and a prior distribution over the weight, and it permits Bayes factor tests for the presence or absence of an effect as well as for whether the weight equals zero or one.

What carries the argument

Mixture prior formed from the original posterior and a non-informative distribution, with the weight governing the degree of pooling between studies.

If this is right

  • Fixed mixture weights give analysts direct control over how strongly the original result influences the replication analysis.
  • Placing a prior on the mixture weight introduces uncertainty about the amount of pooling.
  • Bayes factors can formally test whether the original data should be ignored, fully used, or partially pooled.
  • The framework supplies an alternative to hierarchical models and power priors for combining original and replication data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Sequential replication studies could update the mixture weight as each new dataset arrives.
  • The weight might serve as a continuous summary statistic for replicability across many fields.
  • Extending the non-informative component to reflect study-specific features could handle different kinds of replication designs.

Load-bearing premise

The mixture of the original study's posterior and a non-informative distribution appropriately represents the prior information for the replication study analysis.

What would settle it

Finding that conclusions about effect presence or the appropriate degree of pooling change materially when the same replication data are analyzed with a hierarchical model instead of the mixture prior would challenge the claim.

Figures

Figures reproduced from arXiv: 2406.19152 by Leonardo Egidi, Leonhard Held, Roberto Macr\`i-Demartino, Samuel Pawel.

Figure 1
Figure 1. Figure 1: Effect size estimates (standardized mean difference) and 95% CI for the “Labels” original study, the three independent replications, and the pooled replication. estimates is approximately normal 𝜃ˆ 𝑜 | 𝜃 ∼ N  𝜃, 𝜎2 𝑜  𝜃ˆ 𝑟𝑖 | 𝜃 ∼ N  𝜃, 𝜎2 𝑟𝑖  , where 𝜎𝑖 represents the standard error of an estimate, which is assumed to be known. There are circumstances under which the effect size might need a particular… view at source ↗
Figure 2
Figure 2. Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: “Labels” experiment. Posterior median (points) and 95% highest posterior density interval (HPDI) of the effect size posterior against mixture prior weight assigned to the original study component. On the left and right side of each panel, the corresponding replication study effect estimate and the the original study effect estimate with 95% confidence interval. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: shows the contour plot of the joint posterior distribution for the effect size 𝜃 and the weight parameter 𝜔 considering the data from the “Labels” experiment, its three replications, and the pooled replication. In our analysis, we employ a mixture prior, as in (4), in which the informative prior component is derived from the original study, while the non-informative prior is a unit-informative prior as in … view at source ↗
Figure 5
Figure 5. Figure 5: Marginal posterior distributions of the effect size 𝜃 (left) and the weight parameter 𝜔 (right) considering the data from the “Labels” experiment, its three external replications, and the pooled replication. The dashed lines represent the posterior density of the effect size 𝜃, derived exclusively from the replication data, without considering the original data, and assuming a uniform prior for the effect … view at source ↗
read the original abstract

Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel Bayesian approach for replication studies based on mixture priors. The idea is to use a mixture of the posterior distribution based on the original study and a non-informative distribution as the prior for the analysis of the replication study. The mixture weight then determines the extent to which the original and replication data are pooled. Two distinct strategies are presented: one with fixed mixture weights, and one that introduces uncertainty by assigning a prior distribution to the mixture weight itself. Furthermore, it is shown how within this framework Bayes factors can be used for formal testing of relevant scientific hypotheses, such as tests on the presence or absence of an effect or whether the mixture weight equals zero (completely discounting the original data) or one (fully pooling with the original data). To showcase the practical application of the methodology, we analyze data from three replication studies. Our findings suggest that mixture priors are a valuable and intuitive alternative to other Bayesian methods for analyzing replication studies, such as hierarchical models and power priors. We provide the free and open source R package repmix that implements the proposed methodology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Bayesian framework for replication studies that uses a mixture prior for the replication analysis: a weighted combination of the posterior from the original study and a non-informative distribution. The mixture weight controls the degree of pooling between studies. Two variants are considered (fixed weight; hierarchical prior on the weight), Bayes factors are derived for testing hypotheses including effect presence and w=0 versus w=1, and the method is illustrated on three replication datasets with an accompanying R package repmix.

Significance. If the propriety and marginal-likelihood issues can be resolved, the approach offers an intuitive, directly interpretable alternative to hierarchical or power-prior models for quantifying replication strength. The open-source repmix package is a concrete strength that supports reproducibility and adoption.

major comments (2)
  1. [Methodology / prior construction and Bayes-factor section] The central construction (mixture of original posterior with non-informative component) yields an improper prior whenever the mixture weight w < 1. Marginal likelihoods and the Bayes factors used to test H0: w=0 versus H1: w=1 are therefore formally undefined without an implicit proper approximation whose effect on the reported inferences is never quantified. This directly undermines the claim that the procedure supplies a coherent Bayesian alternative to hierarchical or power-prior methods.
  2. [Numerical implementation and application sections] The paper relies on the mixture prior both for posterior inference and for formal model comparison via Bayes factors; any regularization introduced in the numerical implementation must be shown to leave the qualitative conclusions (e.g., posterior on w, BF values) unchanged. No such sensitivity analysis is reported.
minor comments (2)
  1. [Notation and definitions] Notation for the non-informative component should be made explicit (e.g., whether it is a proper approximation or left improper) and cross-referenced to the Bayes-factor derivations.
  2. [Applications] The three empirical examples would benefit from a brief table comparing the mixture-prior results to a standard hierarchical model on the same data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: The central construction (mixture of original posterior with non-informative component) yields an improper prior whenever the mixture weight w < 1. Marginal likelihoods and the Bayes factors used to test H0: w=0 versus H1: w=1 are therefore formally undefined without an implicit proper approximation whose effect on the reported inferences is never quantified. This directly undermines the claim that the procedure supplies a coherent Bayesian alternative to hierarchical or power-prior methods.

    Authors: We acknowledge that the mixture prior is improper for w < 1 and that this raises a legitimate question about the formal definition of the marginal likelihoods and the associated Bayes factors. In the manuscript the Bayes factors for hypotheses involving w were obtained via the Savage-Dickey density ratio applied to the marginal posterior of w; this construction can be viewed as the limit of a sequence of proper priors. Nevertheless, we agree that the limiting argument and its numerical consequences should be stated explicitly. In the revision we will add a dedicated subsection that (i) clarifies the limiting procedure, (ii) supplies a concrete proper approximation (e.g., a very diffuse but proper normal or t distribution) and (iii) reports a small sensitivity study confirming that the reported Bayes-factor values change only negligibly under different proper approximations. revision: yes

  2. Referee: The paper relies on the mixture prior both for posterior inference and for formal model comparison via Bayes factors; any regularization introduced in the numerical implementation must be shown to leave the qualitative conclusions (e.g., posterior on w, BF values) unchanged. No such sensitivity analysis is reported.

    Authors: We agree that an explicit sensitivity analysis with respect to any numerical regularization is required. We will add this analysis to the revised manuscript, repeating the posterior and Bayes-factor calculations under several degrees of regularization of the non-informative component and demonstrating that the qualitative conclusions (posterior mass on w near 0 or 1, and the ordering of the Bayes factors) remain unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: direct modeling proposal with independent construction

full rationale

The paper defines a mixture prior directly as w times the original posterior plus (1-w) times a non-informative component, then applies standard Bayesian updating and Bayes factor calculations to this prior. No derived quantity is obtained by fitting to a subset of the same data and relabeling the fit as a prediction; no uniqueness theorem or ansatz is imported via self-citation; and the central claims rest on the explicit mixture definition rather than on any reduction to prior fitted values. The framework is therefore self-contained as a modeling choice whose propriety and numerical implementation are separate issues from circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the central claim rests on the modeling choice of mixing the original posterior with a non-informative prior, with the mixture weight as the key quantity. Full paper may specify additional assumptions.

free parameters (1)
  • mixture weight
    Controls the extent of pooling between original and replication data; can be fixed or assigned a prior distribution.
axioms (1)
  • domain assumption The prior for the replication study is a mixture of the original posterior and a non-informative distribution
    This is the core idea stated in the abstract for the Bayesian approach.

pith-pipeline@v0.9.0 · 5755 in / 1318 out tokens · 32380 ms · 2026-05-24T00:09:02.432796+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

  1. [1]

    and Mayoral, A

    Bayarri, M. and Mayoral, A. (2002a). Bayesian analysis and design for comparison of effect-sizes. Journal of Statistical Planning and Inference , 103(1):225--243

  2. [2]

    successful

    Bayarri, M. J. and Mayoral, A. M. (2002b). Bayesian design of “successful” replications. The American Statistician , 56(3):207--214

  3. [3]

    and Smith, A

    Bernardo, J. and Smith, A. (1994). Bayesian Theory . Wiley

  4. [4]

    M., Broglio, K

    Berry, S. M., Broglio, K. R., Groshen, S., and Berry, D. A. (2013). Bayesian hierarchical modeling of patient subpopulations: Efficient designs of phase II oncology clinical trials. Clinical Trials , 10(5):720--734. PMID: 23983156

  5. [5]

    G., Pouliquen, I

    Best, N., Price, R. G., Pouliquen, I. J., and Keene, O. N. (2021). Assessing efficacy in important subgroups in confirmatory trials: An example using Bayesian dynamic borrowing. Pharmaceutical Statistics , 20(3):551--562

  6. [6]

    Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., and Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science , 351(6280):1433--1436

  7. [7]

    F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B

    Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., Isaksson, S., Manfredi, D., Rose, J., Wagenmakers, E.-J., and Wu, H. (2018). Evaluating the replicability of soc...

  8. [8]

    and Ibrahim, J

    Chen, M.-H. and Ibrahim, J. G. (2000). Power prior distributions for regression models . Statistical Science , 15(1):46 -- 60

  9. [9]

    and Egidi, L

    Consonni, G. and Egidi, L. (2023). Assessing replication success via skeptical mixture priors. arXiv preprint arXiv:2401.00257

  10. [10]

    Consonni, G., Fouskakis, D., Liseo, B., and Ntzoufras, I. (2018). Prior distributions for objective Bayesian analysis . Bayesian Analysis , 13(2):627 -- 679

  11. [11]

    Duan, Y., Ye, K., and Smith, E. P. (2006). Evaluating water quality using power priors to incorporate historical information. Environmetrics , 17(1):95--106

  12. [12]

    Egidi, L., Pauli, F., and Torelli, N. (2022). Avoiding prior–data conflict in regression models via mixture priors. Canadian Journal of Statistics , 50(2):491--510

  13. [13]

    M., Mathur, M., Soderberg, C

    Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., and Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. eLife , 10:e71601

  14. [14]

    Good, I. (1950). Probability and the weighing of evidence. Journal of the Institute of Actuaries , 76(3):293–296

  15. [15]

    Good, I. J. (1958). Significance tests in parallel and in series. Journal of the American Statistical Association , 53(284):799--813

  16. [16]

    F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D

    Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D. S., Forster, J. J., Wagenmakers, E.-J., and Steingroever, H. (2017). A tutorial on bridge sampling. Journal of Mathematical Psychology , 81:80--97

  17. [17]

    Harms, C. (2019). A Bayes factor for replications of ANOVA results . The American Statistician , 73(4):327--339

  18. [18]

    Hedges, L. V. and Schauer, J. M. (2019). More than one replication study is needed for unambiguous tests of replication. Journal of Educational and Behavioral Statistics , 44(5):543--570

  19. [19]

    Held, L. (2020). A new standard for the analysis and design of replication studies. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(2):431--448

  20. [20]

    Held, L., Matthews, R., Ott, M., and Pawel, S. (2022a). Reverse- B ayes methods for evidence assessment and research synthesis. Research Synthesis Methods , 13(3):295--314

  21. [21]

    Held, L., Micheloud, C., and Pawel, S. (2022b). The assessment of replication success based on relative effect size . The Annals of Applied Statistics , 16(2):706 -- 720

  22. [22]

    Jeffreys, H. (1961). The Theory of Probability . Oxford University Press, third edition

  23. [23]

    E., Payne, R

    Johnson, V. E., Payne, R. D., Wang, T., Asher, A., and Mandal, S. (2017). On the reproducibility of psychological science. Journal of the American Statistical Association , 112(517):1--10. PMID: 29861517

  24. [24]

    Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association , 90(430):773--795

  25. [25]

    Kass, R. E. and Wasserman, L. (1995). A reference B ayesian test for nested hypotheses and its relationship to the S chwarz criterion. J. Amer. Statist. Assoc. , 90(431):928--934

  26. [26]

    Lesaffre, E., Qi, H., Banbeta, A., and van Rosmalen, J. (2024). A review of dynamic borrowing methods with applications in pharmaceutical research. Brazilian Journal of Probability and Statistics , 38(1)

  27. [27]

    Mathur, M. B. and VanderWeele, T. J. (2020). New statistical metrics for multisite replication projects. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(3):1145--1166

  28. [28]

    Micheloud, C., Balabdaoui, F., and Held, L. (2023). Assessing replicability with the sceptical p-value: Type- I error control and sample size planning. Statistica Neerlandica , 77(4):573--591

  29. [29]

    Reproducibility and Replicability in Science

    National Academies of Sciences, Engineering, and Medicine (2019). Reproducibility and Replicability in Science . National Academies Press

  30. [30]

    Replication studies hold the key to generalization [editorial]

    Nature Communications (2022). Replication studies hold the key to generalization [editorial]. Nature Communications , 13(1)

  31. [31]

    Neuenschwander, B., Capkun-Niggli, G., Branson, M., and Spiegelhalter, D. J. (2010). Summarizing historical information on controls in clinical trials. Clinical Trials , 7(1):5--18. PMID: 20156954

  32. [32]

    Neuenschwander, B., Wandel, S., Roychoudhury, S., and Schmidli, H. (2023). On fixed and uncertain mixture prior weights. arXiv preprint arXiv:2306.15197

  33. [33]

    Ntzoufras, I., Dellaportas, P., and Forster, J. J. (2003). Bayesian variable and link determination for generalised linear models. Journal of Statistical Planning and Inference , 111(1):165--180

  34. [34]

    Make replication studies a normal part of science

    NWO (2016). Make replication studies a normal part of science

  35. [35]

    and Forster, J

    O'Hagan, A. and Forster, J. (2004). Kendall's Advanced Theory of Statistic 2B . Wiley & Sons, Chichester, second edition

  36. [36]

    Estimating the reproducibility of psychological science

    Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science , 349(6251):aac4716

  37. [37]

    Overstall, A. M. and Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis , 54(12):3269--3288

  38. [38]

    and Pericchi, L

    O’Hagan, A. and Pericchi, L. (2012). Bayesian heavy-tailed models and conflict resolution: A review . Brazilian Journal of Probability and Statistics , 26(4):372 -- 401

  39. [39]

    D., and Leek, J

    Patil, P., Peng, R. D., and Leek, J. T. (2016). What should researchers expect when they replicate studies? a statistical view of replicability in psychological science. Perspectives on Psychological Science , 11(4):539--544. PMID: 27474140

  40. [40]

    Pawel, S., Aust, F., Held, L., and Wagenmakers, E.-J. (2024). Power priors for replication studies. TEST , 33:127--154

  41. [41]

    and Held, L

    Pawel, S. and Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE , 15(4):1--23

  42. [42]

    and Held, L

    Pawel, S. and Held, L. (2022). The sceptical Bayes factor for the assessment of replication success . Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(3):879--911

  43. [43]

    D., Nosek, B

    Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., Buttrick, N., DeBell, M., Ebersole, C. R., Lundmark, S., and et al. (2023). High replicability of newly discovered social-behavioural findings is achievable. Nature Human Behaviour

  44. [44]

    R: A Language and Environment for Statistical Computing

    R Core Team (2023). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria

  45. [45]

    and Held, L

    Saban \'e s Bov \'e , D. and Held, L. (2011). Hyper- g priors for generalized linear models . Bayesian Analysis , 6(3):387 -- 410

  46. [46]

    J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S

    Schad, D. J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S. (2023). Workflow techniques for the robust use of Bayes factors. Psychological Methods , 28(6):1404--1426

  47. [47]

    Schmidli, H., Gsteiger, S., Roychoudhury, S., O'Hagan, A., Spiegelhalter, D., and Neuenschwander, B. (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics , 70(4):1023--1032

  48. [48]

    Schönbrodt, F. D. and Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review , 25(1):128--142

  49. [49]

    Spiegelhalter, D., Abrams, K., and Myles, J. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Wiley, New York

  50. [50]

    F., Wathen, J

    Thall, P. F., Wathen, J. K., Bekele, B. N., Champlin, R. E., Baker, L. H., and Benjamin, R. S. (2003). Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes. Statistics in Medicine , 22(5):763--780

  51. [51]

    and Wagenmakers, E.-J

    Verhagen, J. and Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General , 143(4):1457--1475

  52. [52]

    Yang, P., Zhao, Y., Nie, L., Vallejo, J., and Yuan, Y. (2023). Sam: Self-adapting mixture prior to dynamically borrow information from historical data in clinical trials. Biometrics , 79(4):2857--2868