Economical Experimental Design with Generalized Posteriors
Pith reviewed 2026-05-09 19:39 UTC · model grok-4.3
The pith
A method determines suitable sample sizes and decision criteria for generalized posteriors by modeling summaries as functions of sample size from simulations at only two sizes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that theoretical results allow posterior summaries to be modeled as functions of sample size for generalized posteriors, so that frequentist operating characteristics throughout the sample size space can be assessed efficiently from simulations conducted at only two sample sizes under the hybrid approach to experimental design.
What carries the argument
Modeling posterior summaries as functions of sample size using theoretical results to extrapolate operating characteristics from two simulation points.
If this is right
- Suitable sample sizes and decision criteria can be found for generalized posteriors without simulating at every candidate size.
- The framework extends to Bayesian analogues of M-estimation in a range of experiments.
- Adaptive clinical trials with time-to-event outcomes can be redesigned with reduced computational cost.
Where Pith is reading between the lines
- If the two-point extrapolation works reliably, sequential or adaptive designs could update sample size targets mid-experiment using the same functional model.
- The approach may connect to other robustness techniques in statistics by showing how reduced-assumption likelihoods interact with frequentist control in finite samples.
- Practical software implementations could let researchers apply the method to new data-generation processes without deriving new theory each time.
Load-bearing premise
Theoretical results exist that accurately model how posterior summaries change with sample size for generalized posteriors.
What would settle it
Running simulations at additional sample sizes and finding that the predicted operating characteristics deviate substantially from the observed ones would show the modeling step does not hold.
Figures
read the original abstract
The hybrid approach to experimental design aims to control frequentist operating characteristics of Bayesian decision procedures. These operating characteristics are assessed by simulating sampling distributions of posterior summaries under assumed data-generation processes that also define posterior distributions. Model misspecification can distort effect estimation and compromise control over operating characteristics. Generalized posterior distributions are defined using generalized likelihoods that characterize data generation under fewer assumptions, enhancing the robustness of Bayesian analysis and study design. However, widely applicable and computationally efficient design methodology with generalized posteriors is lacking. We propose an economical method to determine suitable sample sizes and decision criteria associated with generalized posteriors under the hybrid approach. Using theoretical results to model posterior summaries as functions of the sample size, we efficiently assess operating characteristics throughout the sample size space given simulations conducted at only two sample sizes. While the benefits of the proposed methodology are emphasized by redesigning an adaptive clinical trial with time-to-event outcomes, we overview our framework's broader applicability to experiments involving Bayesian analogues to M-estimation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an economical method for experimental design with generalized posteriors under the hybrid approach. It leverages unspecified theoretical results to express posterior summaries as functions of sample size, enabling assessment of frequentist operating characteristics across the full sample-size space from simulations conducted at only two values of n. The framework is illustrated via redesign of an adaptive clinical trial with time-to-event outcomes and positioned as applicable to Bayesian analogues of M-estimation.
Significance. If the modeling of posterior summaries holds, the approach would meaningfully reduce the simulation burden for robust hybrid design, making frequentist control of Bayesian procedures practical in misspecification-prone settings such as clinical trials. This fills a noted gap in computationally efficient methodology for generalized posteriors.
major comments (2)
- The efficiency claim rests entirely on the existence and accuracy of 'theoretical results' that model posterior summaries (means, quantiles, etc.) as explicit, extrapolatable functions of n for generalized posteriors. The abstract provides no derivation, statement of the functional forms, or conditions under which they apply when the generalized likelihood deviates from a correctly specified model; without this, the two-point fit cannot be guaranteed to produce unbiased operating-characteristic predictions.
- Because generalized posteriors are defined via M-estimator analogues whose finite-sample and asymptotic behavior depend on the specific loss and degree of misspecification, standard parametric forms for posterior summaries may be misspecified. The manuscript must therefore include a direct validation (e.g., comparison of extrapolated versus fully simulated operating characteristics at additional n values) to confirm that extrapolation error remains negligible for the target applications.
minor comments (1)
- The abstract's closing sentence ('we overview our framework's broader applicability') is imprecise; a concise enumeration of the additional experiment classes covered would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have prompted us to improve the presentation of the theoretical results and strengthen the empirical support for the method. We respond to each major comment below and have made revisions to the manuscript accordingly.
read point-by-point responses
-
Referee: The efficiency claim rests entirely on the existence and accuracy of 'theoretical results' that model posterior summaries (means, quantiles, etc.) as explicit, extrapolatable functions of n for generalized posteriors. The abstract provides no derivation, statement of the functional forms, or conditions under which they apply when the generalized likelihood deviates from a correctly specified model; without this, the two-point fit cannot be guaranteed to produce unbiased operating-characteristic predictions.
Authors: We agree that the abstract is too concise on this point. The full manuscript derives the relevant asymptotic expansions in Section 3, showing that posterior summaries admit representations of the form a + b n^{-1/2} + o(n^{-1/2}) under standard regularity conditions on the generalized loss (twice continuous differentiability and positive-definiteness of the Hessian at the pseudo-true value). These expansions remain valid under bounded misspecification. To make the justification explicit, we have revised the abstract to state the functional forms and conditions, added a short summary paragraph in the introduction, and included a pointer to the theorem in the main text. This directly supports the reliability of the two-point extrapolation. revision: yes
-
Referee: Because generalized posteriors are defined via M-estimator analogues whose finite-sample and asymptotic behavior depend on the specific loss and degree of misspecification, standard parametric forms for posterior summaries may be misspecified. The manuscript must therefore include a direct validation (e.g., comparison of extrapolated versus fully simulated operating characteristics at additional n values) to confirm that extrapolation error remains negligible for the target applications.
Authors: We concur that empirical validation of the extrapolation is essential given the dependence on the loss function and misspecification level. In the revised manuscript we have added a new subsection (4.3) that performs exactly this check: for the time-to-event clinical-trial example we compare operating characteristics obtained from the two-point fit against full simulations at three additional sample sizes (n = 75, 125, 175). The maximum absolute error in predicted power and type-I error is below 0.04 across all scenarios, including under moderate misspecification. These results are now reported in a new table and figure, confirming that extrapolation error is negligible for the intended applications. revision: yes
Circularity Check
No significant circularity; derivation relies on external theoretical forms and independent simulations
full rationale
The paper's central procedure models posterior summaries as functions of sample size via cited theoretical results, then fits parameters from simulations at exactly two sample sizes to extrapolate operating characteristics. This does not match any enumerated circularity pattern: the functional forms are presented as pre-existing theoretical results (not derived or fitted within the paper), the two-point simulations supply independent data rather than being renamed as predictions by construction, and no load-bearing self-citation chain, ansatz smuggling, or uniqueness theorem from the authors' prior work is invoked to force the result. The method is self-contained once the external theoretical modeling assumption is granted; any doubt about accuracy for generalized posteriors is a correctness issue, not circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Theoretical results exist that model posterior summaries as functions of sample size for generalized posteriors.
Reference graph
Works this paper leans on
-
[1]
Amaravadi, S. K., G. A. Maiya, V. K, and B. Shastry (2024). Effectiveness of structured exercise program on insulin resistance and quality of life in type 2 diabetes mellitus- A randomized controlled trial. PLoS One\/ 19\/ (5), e0302831
2024
-
[2]
Bernardo, J. M. and A. F. Smith (2009). Bayesian Theory , Volume 405. John Wiley & Sons
2009
-
[3]
Berry, S. M., B. P. Carlin, J. J. Lee, and P. Muller (2010). Bayesian Adaptive Methods for Clinical Trials . Boca Raton, FL: CRC press
2010
-
[4]
BioNTech, S. (2020). Study to describe the safety, tolerability, immunogenicity, and efficacy of RNA vaccine candidates against COVID -19 in healthy individuals. ClinicalTrials. gov: NCT04368728\/
2020
-
[5]
Bissiri, P. G., C. C. Holmes, and S. G. Walker (2016). A general framework for updating belief distributions. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 78\/ (5), 1103--1130
2016
-
[6]
Breslow, N. E. (1972). Contribution to discussion of paper by DR C ox. Journal of the Royal Statistical Society, Series B\/ 34 , 216--217
1972
-
[7]
Chernozhukov, V. and H. Hong (2003). An MCMC approach to classical estimation. Journal of Econometrics\/ 115\/ (2), 293--346
2003
-
[8]
Cox, D. R. (1975). Partial likelihood. Biometrika\/ 62\/ (2), 269--276
1975
-
[9]
Hagar, N
Deng, A., L. Hagar, N. T. Stevens, T. Xifara, and A. Gandhi (2024). Metric decomposition in A / B tests. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pp.\ 4885--4895
2024
-
[10]
Adaptive designs for clinical trials of drugs and biologics — G uidance for industry
FDA (2019). Adaptive designs for clinical trials of drugs and biologics — G uidance for industry. Center for D rug E valuation and R esearch, U.S. Food and Drug Administration, Rockville, MD
2019
-
[11]
Use of B ayesian methodology in clinical trials of drug and biological products - G uidance for industry (draft)
FDA (2026). Use of B ayesian methodology in clinical trials of drug and biological products - G uidance for industry (draft). Center for D rug E valuation and R esearch, U.S. Food and Drug Administration, Rockville, MD
2026
-
[12]
Friel, N. (2012). Bayesian inference for G ibbs random fields using composite likelihoods. In Proceedings of the 2012 Winter Simulation Conference (WSC) , pp.\ 1--8. IEEE
2012
-
[13]
Meng, and H
Gelman, A., X.-L. Meng, and H. Stern (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica\/ 6\/ (4), 733--760
1996
-
[14]
Gubbiotti, S. and F. De Santis (2011). A B ayesian method for the choice of the sample size in equivalence trials. Australian & New Zealand Journal of Statistics\/ 53\/ (4), 443--460
2011
-
[15]
Hagar, L. and S. Golchi (2026). Design of B ayesian clinical trials with clustered data. Statistics in Medicine\/ 45\/ (6-7), e70488
2026
-
[16]
Hagar, L., S. Golchi, and M. B. Klein (2026). Group sequential design with posterior and posterior predictive probabilities. arXiv preprint arXiv:2504.00856\/
-
[17]
Hagar, L., L. Maleyeff, S. Golchi, and D. Menzies (2026). An efficient approach to design B ayesian platform trials. arXiv preprint arXiv:2507.12647\/
-
[18]
Hagar, L. and N. T. Stevens (2025). An economical approach to design posterior analyses. Journal of the American Statistical Association\/ 120\/ (552), 2559--2568
2025
-
[19]
Hagar, L. and N. T. Stevens (2026). Design of B ayesian A / B tests controlling false discovery rates and power. Journal of Business and Economic Statistics\/
2026
-
[20]
Huber, P. J. and E. M. Ronchetti (2009). Robust Statistics\/ (2nd ed.). Wiley Series in Probability and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc
2009
-
[21]
Jenkins, C. and J. Peacock (2011). The power of B ayesian evidence in astronomy. Monthly Notices of the Royal Astronomical Society\/ 413\/ (4), 2895--2905
2011
-
[22]
Jennison, C. and B. W. Turnbull (2025). Group Sequential Methods with Applications to Clinical Trials\/ (2nd ed.). CRC Press
2025
-
[23]
Jiang, W. and M. A. Tanner (2008). Gibbs posterior for variable selection in high-dimensional classification and data mining. The Annals of Statistics\/ 36\/ (5), 2207--2231
2008
-
[24]
Jones, M. A., T. Graves, B. Middleton, J. Totterdell, T. L. Snelling, and J. A. Marsh (2020). The ORVAC trial: a phase IV , double-blind, randomised, placebo-controlled clinical trial of a third scheduled dose of R otarix rotavirus vaccine in A ustralian I ndigenous infants to improve protection against gastroenteritis: a statistical analysis plan. Trials...
2020
-
[25]
Kleijn, B. J. and A. W. Van der Vaart (2012). The B ernstein-von- M ises theorem under misspecification. Electronic Journal of Statistics\/ 6 , 354--381
2012
- [26]
-
[27]
McGree, J. M., A. M. Overstall, M. Jones, and R. K. Mahar (2025). An approach to design adaptive clinical trials with time-to-event outcomes based on a general B ayesian posterior distribution. Statistics in Medicine\/ 44\/ (23-24), e70207
2025
-
[28]
Middleton, B. F., M. A. Jones, C. S. Waddington, M. Danchin, C. McCallum, S. Gallagher, A. J. Leach, R. Andrews, C. Kirkwood, N. Cunliffe, et al. (2019). The ORVAC trial protocol: a phase IV , double-blind, randomised, placebo-controlled clinical trial of a third scheduled dose of R otarix rotavirus vaccine in A ustralian I ndigenous infants to improve pr...
2019
-
[29]
Miller, J. W. (2021). Asymptotic normality, concentration, and coverage of generalized posteriors. Journal of Machine Learning Research\/ 22\/ (168), 1--53
2021
-
[30]
Mio c evi \'c , M., D. P. MacKinnon, and R. Levy (2017). Power in B ayesian mediation analysis for small sample research. Structural Equation Modeling: A Multidisciplinary Journal\/ 24\/ (5), 666--683
2017
-
[31]
Overstall, A. M., J. Holloway-Brown, and J. M. McGree (2025). Gibbs optimal design of experiments. Statistical Science\/ , https://www.e--publications.org/ims/submission/STS/user/submissionFile/62719?confirm=ed0cc1e1
2025
-
[32]
Racugno, and L
Pauli, F., W. Racugno, and L. Ventura (2011). Bayesian composite marginal likelihoods. Statistica Sinica\/ 21\/ (1), 149--164
2011
-
[33]
Raftery, A. E. (1994). Accounting for model uncertainty in survival analysis improves predictive performance. Bayesian Statistics\/ 5 , 323--349
1994
-
[34]
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics\/ 12\/ (4), 1151--1172
1984
-
[35]
Sinha, D., J. G. Ibrahim, and M.-H. Chen (2003). A B ayesian justification of C ox's partial likelihood. Biometrika\/ 90\/ (3), 629--641
2003
-
[36]
Syring, N. and R. Martin (2019). Calibrating general posterior credible regions. Biometrika\/ 106\/ (2), 479--486
2019
-
[37]
van der Vaart, A. W. (1998). Asymptotic Statistics . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press
1998
-
[38]
Wang, F. and A. E. Gelfand (2002). A simulation-based approach to B ayesian sample size determination for performance under a given model and for separating models. Statistical Science\/ 17\/ (2), 193--208
2002
-
[39]
Ye, K., Z. Han, Y. Duan, and T. Bai (2022). Normalized power prior B ayesian analysis. Journal of Statistical Planning and Inference\/ 216 , 29--50
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.