Mixture priors for replication studies

Leonardo Egidi; Leonhard Held; Roberto Macr\`i-Demartino; Samuel Pawel

arxiv: 2406.19152 · v4 · pith:K2N2Y4KCnew · submitted 2024-06-27 · 📊 stat.ME · stat.AP

Mixture priors for replication studies

Roberto Macr\`i-Demartino , Leonardo Egidi , Leonhard Held , Samuel Pawel This is my paper

Pith reviewed 2026-05-24 00:09 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords mixture priorsreplication studiesBayesian analysisBayes factorsdata poolingprior distributionsstatistical replication

0 comments

The pith

A mixture of the original study's posterior and a non-informative distribution serves as the prior for replication analysis, with the weight controlling pooling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Replication studies need a way to combine original and new data while quantifying how much strength to borrow from the first result. The paper sets up the prior for the replication as a weighted mix of the original posterior and a vague distribution. The weight itself becomes the measure of how much the studies are pooled together. Bayes factors built on this mixture let researchers test whether an effect exists and whether the weight should be zero or one. The approach is applied to three real replication datasets.

Core claim

Mixture priors provide a Bayesian framework for replication studies by taking the posterior from the original study and mixing it with a non-informative distribution to form the prior for the replication study. The mixing weight directly sets the degree to which the two datasets are combined. The method supports both fixed weights and a prior distribution over the weight, and it permits Bayes factor tests for the presence or absence of an effect as well as for whether the weight equals zero or one.

What carries the argument

Mixture prior formed from the original posterior and a non-informative distribution, with the weight governing the degree of pooling between studies.

If this is right

Fixed mixture weights give analysts direct control over how strongly the original result influences the replication analysis.
Placing a prior on the mixture weight introduces uncertainty about the amount of pooling.
Bayes factors can formally test whether the original data should be ignored, fully used, or partially pooled.
The framework supplies an alternative to hierarchical models and power priors for combining original and replication data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Sequential replication studies could update the mixture weight as each new dataset arrives.
The weight might serve as a continuous summary statistic for replicability across many fields.
Extending the non-informative component to reflect study-specific features could handle different kinds of replication designs.

Load-bearing premise

The mixture of the original study's posterior and a non-informative distribution appropriately represents the prior information for the replication study analysis.

What would settle it

Finding that conclusions about effect presence or the appropriate degree of pooling change materially when the same replication data are analyzed with a hierarchical model instead of the mixture prior would challenge the claim.

Figures

Figures reproduced from arXiv: 2406.19152 by Leonardo Egidi, Leonhard Held, Roberto Macr\`i-Demartino, Samuel Pawel.

**Figure 1.** Figure 1: Effect size estimates (standardized mean difference) and 95% CI for the “Labels” original study, the three independent replications, and the pooled replication. estimates is approximately normal 𝜃ˆ 𝑜 | 𝜃 ∼ N 𝜃, 𝜎2 𝑜 𝜃ˆ 𝑟𝑖 | 𝜃 ∼ N 𝜃, 𝜎2 𝑟𝑖 , where 𝜎𝑖 represents the standard error of an estimate, which is assumed to be known. There are circumstances under which the effect size might need a particular… view at source ↗

**Figure 2.** Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: “Labels” experiment. Posterior median (points) and 95% highest posterior density interval (HPDI) of the effect size posterior against mixture prior weight assigned to the original study component. On the left and right side of each panel, the corresponding replication study effect estimate and the the original study effect estimate with 95% confidence interval. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: shows the contour plot of the joint posterior distribution for the effect size 𝜃 and the weight parameter 𝜔 considering the data from the “Labels” experiment, its three replications, and the pooled replication. In our analysis, we employ a mixture prior, as in (4), in which the informative prior component is derived from the original study, while the non-informative prior is a unit-informative prior as in … view at source ↗

**Figure 5.** Figure 5: Marginal posterior distributions of the effect size 𝜃 (left) and the weight parameter 𝜔 (right) considering the data from the “Labels” experiment, its three external replications, and the pooled replication. The dashed lines represent the posterior density of the effect size 𝜃, derived exclusively from the replication data, without considering the original data, and assuming a uniform prior for the effect … view at source ↗

read the original abstract

Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel Bayesian approach for replication studies based on mixture priors. The idea is to use a mixture of the posterior distribution based on the original study and a non-informative distribution as the prior for the analysis of the replication study. The mixture weight then determines the extent to which the original and replication data are pooled. Two distinct strategies are presented: one with fixed mixture weights, and one that introduces uncertainty by assigning a prior distribution to the mixture weight itself. Furthermore, it is shown how within this framework Bayes factors can be used for formal testing of relevant scientific hypotheses, such as tests on the presence or absence of an effect or whether the mixture weight equals zero (completely discounting the original data) or one (fully pooling with the original data). To showcase the practical application of the methodology, we analyze data from three replication studies. Our findings suggest that mixture priors are a valuable and intuitive alternative to other Bayesian methods for analyzing replication studies, such as hierarchical models and power priors. We provide the free and open source R package repmix that implements the proposed methodology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The mixture prior here mixes an original posterior with a non-informative component, which creates an improper prior for any w less than 1 and leaves the Bayes factors undefined.

read the letter

The main takeaway is that the central construction does not produce a proper prior, so the reported Bayes factors for testing the mixture weight cannot be interpreted in the usual way. The paper sets the replication prior as w times the original posterior plus (1-w) times a non-informative distribution. When the non-informative piece is improper, the mixture is improper unless w equals exactly 1. Marginal likelihoods under improper priors are not defined, and the paper gives no regularization or approximation to fix this before computing the Bayes factors on w=0 versus w=1 or on the presence of an effect. That gap is load-bearing for the claims about formal testing of hypotheses on the weight. The new element is the specific setup that puts a prior on the mixture weight itself and then uses Bayes factors to test whether that weight is zero or one. They also supply an R package repmix and run the method on three replication datasets. Those parts are concrete and could be useful to someone who wants to experiment with the idea. The applications themselves are standard and do not include direct comparisons against hierarchical models or power priors, so the claim that the approach is a valuable alternative rests on the framework rather than on demonstrated performance gains. The propriety issue is not minor; it directly affects the coherence of the Bayes factor results that the paper highlights. Readers working on Bayesian replication methods might still want to look at the package for implementation ideas, but the paper needs a fix on the prior before the testing procedure can be taken at face value. I would send it to peer review only after the authors address how the marginal likelihoods are actually computed.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Bayesian framework for replication studies that uses a mixture prior for the replication analysis: a weighted combination of the posterior from the original study and a non-informative distribution. The mixture weight controls the degree of pooling between studies. Two variants are considered (fixed weight; hierarchical prior on the weight), Bayes factors are derived for testing hypotheses including effect presence and w=0 versus w=1, and the method is illustrated on three replication datasets with an accompanying R package repmix.

Significance. If the propriety and marginal-likelihood issues can be resolved, the approach offers an intuitive, directly interpretable alternative to hierarchical or power-prior models for quantifying replication strength. The open-source repmix package is a concrete strength that supports reproducibility and adoption.

major comments (2)

[Methodology / prior construction and Bayes-factor section] The central construction (mixture of original posterior with non-informative component) yields an improper prior whenever the mixture weight w < 1. Marginal likelihoods and the Bayes factors used to test H0: w=0 versus H1: w=1 are therefore formally undefined without an implicit proper approximation whose effect on the reported inferences is never quantified. This directly undermines the claim that the procedure supplies a coherent Bayesian alternative to hierarchical or power-prior methods.
[Numerical implementation and application sections] The paper relies on the mixture prior both for posterior inference and for formal model comparison via Bayes factors; any regularization introduced in the numerical implementation must be shown to leave the qualitative conclusions (e.g., posterior on w, BF values) unchanged. No such sensitivity analysis is reported.

minor comments (2)

[Notation and definitions] Notation for the non-informative component should be made explicit (e.g., whether it is a proper approximation or left improper) and cross-referenced to the Bayes-factor derivations.
[Applications] The three empirical examples would benefit from a brief table comparing the mixture-prior results to a standard hierarchical model on the same data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: The central construction (mixture of original posterior with non-informative component) yields an improper prior whenever the mixture weight w < 1. Marginal likelihoods and the Bayes factors used to test H0: w=0 versus H1: w=1 are therefore formally undefined without an implicit proper approximation whose effect on the reported inferences is never quantified. This directly undermines the claim that the procedure supplies a coherent Bayesian alternative to hierarchical or power-prior methods.

Authors: We acknowledge that the mixture prior is improper for w < 1 and that this raises a legitimate question about the formal definition of the marginal likelihoods and the associated Bayes factors. In the manuscript the Bayes factors for hypotheses involving w were obtained via the Savage-Dickey density ratio applied to the marginal posterior of w; this construction can be viewed as the limit of a sequence of proper priors. Nevertheless, we agree that the limiting argument and its numerical consequences should be stated explicitly. In the revision we will add a dedicated subsection that (i) clarifies the limiting procedure, (ii) supplies a concrete proper approximation (e.g., a very diffuse but proper normal or t distribution) and (iii) reports a small sensitivity study confirming that the reported Bayes-factor values change only negligibly under different proper approximations. revision: yes
Referee: The paper relies on the mixture prior both for posterior inference and for formal model comparison via Bayes factors; any regularization introduced in the numerical implementation must be shown to leave the qualitative conclusions (e.g., posterior on w, BF values) unchanged. No such sensitivity analysis is reported.

Authors: We agree that an explicit sensitivity analysis with respect to any numerical regularization is required. We will add this analysis to the revised manuscript, repeating the posterior and Bayes-factor calculations under several degrees of regularization of the non-informative component and demonstrating that the qualitative conclusions (posterior mass on w near 0 or 1, and the ordering of the Bayes factors) remain unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: direct modeling proposal with independent construction

full rationale

The paper defines a mixture prior directly as w times the original posterior plus (1-w) times a non-informative component, then applies standard Bayesian updating and Bayes factor calculations to this prior. No derived quantity is obtained by fitting to a subset of the same data and relabeling the fit as a prediction; no uniqueness theorem or ansatz is imported via self-citation; and the central claims rest on the explicit mixture definition rather than on any reduction to prior fitted values. The framework is therefore self-contained as a modeling choice whose propriety and numerical implementation are separate issues from circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the central claim rests on the modeling choice of mixing the original posterior with a non-informative prior, with the mixture weight as the key quantity. Full paper may specify additional assumptions.

free parameters (1)

mixture weight
Controls the extent of pooling between original and replication data; can be fixed or assigned a prior distribution.

axioms (1)

domain assumption The prior for the replication study is a mixture of the original posterior and a non-informative distribution
This is the core idea stated in the abstract for the Bayesian approach.

pith-pipeline@v0.9.0 · 5755 in / 1318 out tokens · 32380 ms · 2026-05-24T00:09:02.432796+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

and Mayoral, A

Bayarri, M. and Mayoral, A. (2002a). Bayesian analysis and design for comparison of effect-sizes. Journal of Statistical Planning and Inference , 103(1):225--243

work page
[2]

successful

Bayarri, M. J. and Mayoral, A. M. (2002b). Bayesian design of “successful” replications. The American Statistician , 56(3):207--214

work page
[3]

and Smith, A

Bernardo, J. and Smith, A. (1994). Bayesian Theory . Wiley

work page 1994
[4]

M., Broglio, K

Berry, S. M., Broglio, K. R., Groshen, S., and Berry, D. A. (2013). Bayesian hierarchical modeling of patient subpopulations: Efficient designs of phase II oncology clinical trials. Clinical Trials , 10(5):720--734. PMID: 23983156

work page 2013
[5]

G., Pouliquen, I

Best, N., Price, R. G., Pouliquen, I. J., and Keene, O. N. (2021). Assessing efficacy in important subgroups in confirmatory trials: An example using Bayesian dynamic borrowing. Pharmaceutical Statistics , 20(3):551--562

work page 2021
[6]

Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., and Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science , 351(6280):1433--1436

work page 2016
[7]

F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., Isaksson, S., Manfredi, D., Rose, J., Wagenmakers, E.-J., and Wu, H. (2018). Evaluating the replicability of soc...

work page 2018
[8]

and Ibrahim, J

Chen, M.-H. and Ibrahim, J. G. (2000). Power prior distributions for regression models . Statistical Science , 15(1):46 -- 60

work page 2000
[9]

and Egidi, L

Consonni, G. and Egidi, L. (2023). Assessing replication success via skeptical mixture priors. arXiv preprint arXiv:2401.00257

work page arXiv 2023
[10]

Consonni, G., Fouskakis, D., Liseo, B., and Ntzoufras, I. (2018). Prior distributions for objective Bayesian analysis . Bayesian Analysis , 13(2):627 -- 679

work page 2018
[11]

Duan, Y., Ye, K., and Smith, E. P. (2006). Evaluating water quality using power priors to incorporate historical information. Environmetrics , 17(1):95--106

work page 2006
[12]

Egidi, L., Pauli, F., and Torelli, N. (2022). Avoiding prior–data conflict in regression models via mixture priors. Canadian Journal of Statistics , 50(2):491--510

work page 2022
[13]

M., Mathur, M., Soderberg, C

Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., and Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. eLife , 10:e71601

work page 2021
[14]

Good, I. (1950). Probability and the weighing of evidence. Journal of the Institute of Actuaries , 76(3):293–296

work page 1950
[15]

Good, I. J. (1958). Significance tests in parallel and in series. Journal of the American Statistical Association , 53(284):799--813

work page 1958
[16]

F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D

Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D. S., Forster, J. J., Wagenmakers, E.-J., and Steingroever, H. (2017). A tutorial on bridge sampling. Journal of Mathematical Psychology , 81:80--97

work page 2017
[17]

Harms, C. (2019). A Bayes factor for replications of ANOVA results . The American Statistician , 73(4):327--339

work page 2019
[18]

Hedges, L. V. and Schauer, J. M. (2019). More than one replication study is needed for unambiguous tests of replication. Journal of Educational and Behavioral Statistics , 44(5):543--570

work page 2019
[19]

Held, L. (2020). A new standard for the analysis and design of replication studies. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(2):431--448

work page 2020
[20]

Held, L., Matthews, R., Ott, M., and Pawel, S. (2022a). Reverse- B ayes methods for evidence assessment and research synthesis. Research Synthesis Methods , 13(3):295--314

work page
[21]

Held, L., Micheloud, C., and Pawel, S. (2022b). The assessment of replication success based on relative effect size . The Annals of Applied Statistics , 16(2):706 -- 720

work page
[22]

Jeffreys, H. (1961). The Theory of Probability . Oxford University Press, third edition

work page 1961
[23]

E., Payne, R

Johnson, V. E., Payne, R. D., Wang, T., Asher, A., and Mandal, S. (2017). On the reproducibility of psychological science. Journal of the American Statistical Association , 112(517):1--10. PMID: 29861517

work page 2017
[24]

Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association , 90(430):773--795

work page 1995
[25]

Kass, R. E. and Wasserman, L. (1995). A reference B ayesian test for nested hypotheses and its relationship to the S chwarz criterion. J. Amer. Statist. Assoc. , 90(431):928--934

work page 1995
[26]

Lesaffre, E., Qi, H., Banbeta, A., and van Rosmalen, J. (2024). A review of dynamic borrowing methods with applications in pharmaceutical research. Brazilian Journal of Probability and Statistics , 38(1)

work page 2024
[27]

Mathur, M. B. and VanderWeele, T. J. (2020). New statistical metrics for multisite replication projects. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(3):1145--1166

work page 2020
[28]

Micheloud, C., Balabdaoui, F., and Held, L. (2023). Assessing replicability with the sceptical p-value: Type- I error control and sample size planning. Statistica Neerlandica , 77(4):573--591

work page 2023
[29]

Reproducibility and Replicability in Science

National Academies of Sciences, Engineering, and Medicine (2019). Reproducibility and Replicability in Science . National Academies Press

work page 2019
[30]

Replication studies hold the key to generalization [editorial]

Nature Communications (2022). Replication studies hold the key to generalization [editorial]. Nature Communications , 13(1)

work page 2022
[31]

Neuenschwander, B., Capkun-Niggli, G., Branson, M., and Spiegelhalter, D. J. (2010). Summarizing historical information on controls in clinical trials. Clinical Trials , 7(1):5--18. PMID: 20156954

work page 2010
[32]

Neuenschwander, B., Wandel, S., Roychoudhury, S., and Schmidli, H. (2023). On fixed and uncertain mixture prior weights. arXiv preprint arXiv:2306.15197

work page arXiv 2023
[33]

Ntzoufras, I., Dellaportas, P., and Forster, J. J. (2003). Bayesian variable and link determination for generalised linear models. Journal of Statistical Planning and Inference , 111(1):165--180

work page 2003
[34]

Make replication studies a normal part of science

NWO (2016). Make replication studies a normal part of science

work page 2016
[35]

and Forster, J

O'Hagan, A. and Forster, J. (2004). Kendall's Advanced Theory of Statistic 2B . Wiley & Sons, Chichester, second edition

work page 2004
[36]

Estimating the reproducibility of psychological science

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science , 349(6251):aac4716

work page 2015
[37]

Overstall, A. M. and Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis , 54(12):3269--3288

work page 2010
[38]

and Pericchi, L

O’Hagan, A. and Pericchi, L. (2012). Bayesian heavy-tailed models and conflict resolution: A review . Brazilian Journal of Probability and Statistics , 26(4):372 -- 401

work page 2012
[39]

D., and Leek, J

Patil, P., Peng, R. D., and Leek, J. T. (2016). What should researchers expect when they replicate studies? a statistical view of replicability in psychological science. Perspectives on Psychological Science , 11(4):539--544. PMID: 27474140

work page 2016
[40]

Pawel, S., Aust, F., Held, L., and Wagenmakers, E.-J. (2024). Power priors for replication studies. TEST , 33:127--154

work page 2024
[41]

and Held, L

Pawel, S. and Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE , 15(4):1--23

work page 2020
[42]

and Held, L

Pawel, S. and Held, L. (2022). The sceptical Bayes factor for the assessment of replication success . Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(3):879--911

work page 2022
[43]

D., Nosek, B

Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., Buttrick, N., DeBell, M., Ebersole, C. R., Lundmark, S., and et al. (2023). High replicability of newly discovered social-behavioural findings is achievable. Nature Human Behaviour

work page 2023
[44]

R: A Language and Environment for Statistical Computing

R Core Team (2023). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria

work page 2023
[45]

and Held, L

Saban \'e s Bov \'e , D. and Held, L. (2011). Hyper- g priors for generalized linear models . Bayesian Analysis , 6(3):387 -- 410

work page 2011
[46]

J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S

Schad, D. J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S. (2023). Workflow techniques for the robust use of Bayes factors. Psychological Methods , 28(6):1404--1426

work page 2023
[47]

Schmidli, H., Gsteiger, S., Roychoudhury, S., O'Hagan, A., Spiegelhalter, D., and Neuenschwander, B. (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics , 70(4):1023--1032

work page 2014
[48]

Schönbrodt, F. D. and Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review , 25(1):128--142

work page 2018
[49]

Spiegelhalter, D., Abrams, K., and Myles, J. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Wiley, New York

work page 2004
[50]

F., Wathen, J

Thall, P. F., Wathen, J. K., Bekele, B. N., Champlin, R. E., Baker, L. H., and Benjamin, R. S. (2003). Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes. Statistics in Medicine , 22(5):763--780

work page 2003
[51]

and Wagenmakers, E.-J

Verhagen, J. and Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General , 143(4):1457--1475

work page 2014
[52]

Yang, P., Zhao, Y., Nie, L., Vallejo, J., and Yuan, Y. (2023). Sam: Self-adapting mixture prior to dynamically borrow information from historical data in clinical trials. Biometrics , 79(4):2857--2868

work page 2023

[1] [1]

and Mayoral, A

Bayarri, M. and Mayoral, A. (2002a). Bayesian analysis and design for comparison of effect-sizes. Journal of Statistical Planning and Inference , 103(1):225--243

work page

[2] [2]

successful

Bayarri, M. J. and Mayoral, A. M. (2002b). Bayesian design of “successful” replications. The American Statistician , 56(3):207--214

work page

[3] [3]

and Smith, A

Bernardo, J. and Smith, A. (1994). Bayesian Theory . Wiley

work page 1994

[4] [4]

M., Broglio, K

Berry, S. M., Broglio, K. R., Groshen, S., and Berry, D. A. (2013). Bayesian hierarchical modeling of patient subpopulations: Efficient designs of phase II oncology clinical trials. Clinical Trials , 10(5):720--734. PMID: 23983156

work page 2013

[5] [5]

G., Pouliquen, I

Best, N., Price, R. G., Pouliquen, I. J., and Keene, O. N. (2021). Assessing efficacy in important subgroups in confirmatory trials: An example using Bayesian dynamic borrowing. Pharmaceutical Statistics , 20(3):551--562

work page 2021

[6] [6]

Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., and Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science , 351(6280):1433--1436

work page 2016

[7] [7]

F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., Isaksson, S., Manfredi, D., Rose, J., Wagenmakers, E.-J., and Wu, H. (2018). Evaluating the replicability of soc...

work page 2018

[8] [8]

and Ibrahim, J

Chen, M.-H. and Ibrahim, J. G. (2000). Power prior distributions for regression models . Statistical Science , 15(1):46 -- 60

work page 2000

[9] [9]

and Egidi, L

Consonni, G. and Egidi, L. (2023). Assessing replication success via skeptical mixture priors. arXiv preprint arXiv:2401.00257

work page arXiv 2023

[10] [10]

Consonni, G., Fouskakis, D., Liseo, B., and Ntzoufras, I. (2018). Prior distributions for objective Bayesian analysis . Bayesian Analysis , 13(2):627 -- 679

work page 2018

[11] [11]

Duan, Y., Ye, K., and Smith, E. P. (2006). Evaluating water quality using power priors to incorporate historical information. Environmetrics , 17(1):95--106

work page 2006

[12] [12]

Egidi, L., Pauli, F., and Torelli, N. (2022). Avoiding prior–data conflict in regression models via mixture priors. Canadian Journal of Statistics , 50(2):491--510

work page 2022

[13] [13]

M., Mathur, M., Soderberg, C

Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., and Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. eLife , 10:e71601

work page 2021

[14] [14]

Good, I. (1950). Probability and the weighing of evidence. Journal of the Institute of Actuaries , 76(3):293–296

work page 1950

[15] [15]

Good, I. J. (1958). Significance tests in parallel and in series. Journal of the American Statistical Association , 53(284):799--813

work page 1958

[16] [16]

F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D

Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D. S., Forster, J. J., Wagenmakers, E.-J., and Steingroever, H. (2017). A tutorial on bridge sampling. Journal of Mathematical Psychology , 81:80--97

work page 2017

[17] [17]

Harms, C. (2019). A Bayes factor for replications of ANOVA results . The American Statistician , 73(4):327--339

work page 2019

[18] [18]

Hedges, L. V. and Schauer, J. M. (2019). More than one replication study is needed for unambiguous tests of replication. Journal of Educational and Behavioral Statistics , 44(5):543--570

work page 2019

[19] [19]

Held, L. (2020). A new standard for the analysis and design of replication studies. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(2):431--448

work page 2020

[20] [20]

Held, L., Matthews, R., Ott, M., and Pawel, S. (2022a). Reverse- B ayes methods for evidence assessment and research synthesis. Research Synthesis Methods , 13(3):295--314

work page

[21] [21]

Held, L., Micheloud, C., and Pawel, S. (2022b). The assessment of replication success based on relative effect size . The Annals of Applied Statistics , 16(2):706 -- 720

work page

[22] [22]

Jeffreys, H. (1961). The Theory of Probability . Oxford University Press, third edition

work page 1961

[23] [23]

E., Payne, R

Johnson, V. E., Payne, R. D., Wang, T., Asher, A., and Mandal, S. (2017). On the reproducibility of psychological science. Journal of the American Statistical Association , 112(517):1--10. PMID: 29861517

work page 2017

[24] [24]

Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association , 90(430):773--795

work page 1995

[25] [25]

Kass, R. E. and Wasserman, L. (1995). A reference B ayesian test for nested hypotheses and its relationship to the S chwarz criterion. J. Amer. Statist. Assoc. , 90(431):928--934

work page 1995

[26] [26]

Lesaffre, E., Qi, H., Banbeta, A., and van Rosmalen, J. (2024). A review of dynamic borrowing methods with applications in pharmaceutical research. Brazilian Journal of Probability and Statistics , 38(1)

work page 2024

[27] [27]

Mathur, M. B. and VanderWeele, T. J. (2020). New statistical metrics for multisite replication projects. Journal of the Royal Statistical Society Series A: Statistics in Society , 183(3):1145--1166

work page 2020

[28] [28]

Micheloud, C., Balabdaoui, F., and Held, L. (2023). Assessing replicability with the sceptical p-value: Type- I error control and sample size planning. Statistica Neerlandica , 77(4):573--591

work page 2023

[29] [29]

Reproducibility and Replicability in Science

National Academies of Sciences, Engineering, and Medicine (2019). Reproducibility and Replicability in Science . National Academies Press

work page 2019

[30] [30]

Replication studies hold the key to generalization [editorial]

Nature Communications (2022). Replication studies hold the key to generalization [editorial]. Nature Communications , 13(1)

work page 2022

[31] [31]

Neuenschwander, B., Capkun-Niggli, G., Branson, M., and Spiegelhalter, D. J. (2010). Summarizing historical information on controls in clinical trials. Clinical Trials , 7(1):5--18. PMID: 20156954

work page 2010

[32] [32]

Neuenschwander, B., Wandel, S., Roychoudhury, S., and Schmidli, H. (2023). On fixed and uncertain mixture prior weights. arXiv preprint arXiv:2306.15197

work page arXiv 2023

[33] [33]

Ntzoufras, I., Dellaportas, P., and Forster, J. J. (2003). Bayesian variable and link determination for generalised linear models. Journal of Statistical Planning and Inference , 111(1):165--180

work page 2003

[34] [34]

Make replication studies a normal part of science

NWO (2016). Make replication studies a normal part of science

work page 2016

[35] [35]

and Forster, J

O'Hagan, A. and Forster, J. (2004). Kendall's Advanced Theory of Statistic 2B . Wiley & Sons, Chichester, second edition

work page 2004

[36] [36]

Estimating the reproducibility of psychological science

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science , 349(6251):aac4716

work page 2015

[37] [37]

Overstall, A. M. and Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis , 54(12):3269--3288

work page 2010

[38] [38]

and Pericchi, L

O’Hagan, A. and Pericchi, L. (2012). Bayesian heavy-tailed models and conflict resolution: A review . Brazilian Journal of Probability and Statistics , 26(4):372 -- 401

work page 2012

[39] [39]

D., and Leek, J

Patil, P., Peng, R. D., and Leek, J. T. (2016). What should researchers expect when they replicate studies? a statistical view of replicability in psychological science. Perspectives on Psychological Science , 11(4):539--544. PMID: 27474140

work page 2016

[40] [40]

Pawel, S., Aust, F., Held, L., and Wagenmakers, E.-J. (2024). Power priors for replication studies. TEST , 33:127--154

work page 2024

[41] [41]

and Held, L

Pawel, S. and Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE , 15(4):1--23

work page 2020

[42] [42]

and Held, L

Pawel, S. and Held, L. (2022). The sceptical Bayes factor for the assessment of replication success . Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(3):879--911

work page 2022

[43] [43]

D., Nosek, B

Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., Buttrick, N., DeBell, M., Ebersole, C. R., Lundmark, S., and et al. (2023). High replicability of newly discovered social-behavioural findings is achievable. Nature Human Behaviour

work page 2023

[44] [44]

R: A Language and Environment for Statistical Computing

R Core Team (2023). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria

work page 2023

[45] [45]

and Held, L

Saban \'e s Bov \'e , D. and Held, L. (2011). Hyper- g priors for generalized linear models . Bayesian Analysis , 6(3):387 -- 410

work page 2011

[46] [46]

J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S

Schad, D. J., Nicenboim, B., B \"u rkner, P.-C., Betancourt, M., and Vasishth, S. (2023). Workflow techniques for the robust use of Bayes factors. Psychological Methods , 28(6):1404--1426

work page 2023

[47] [47]

Schmidli, H., Gsteiger, S., Roychoudhury, S., O'Hagan, A., Spiegelhalter, D., and Neuenschwander, B. (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics , 70(4):1023--1032

work page 2014

[48] [48]

Schönbrodt, F. D. and Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review , 25(1):128--142

work page 2018

[49] [49]

Spiegelhalter, D., Abrams, K., and Myles, J. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Wiley, New York

work page 2004

[50] [50]

F., Wathen, J

Thall, P. F., Wathen, J. K., Bekele, B. N., Champlin, R. E., Baker, L. H., and Benjamin, R. S. (2003). Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes. Statistics in Medicine , 22(5):763--780

work page 2003

[51] [51]

and Wagenmakers, E.-J

Verhagen, J. and Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General , 143(4):1457--1475

work page 2014

[52] [52]

Yang, P., Zhao, Y., Nie, L., Vallejo, J., and Yuan, Y. (2023). Sam: Self-adapting mixture prior to dynamically borrow information from historical data in clinical trials. Biometrics , 79(4):2857--2868

work page 2023