When Representative Samples Produce Worse Outcomes: Scale-up Decisions and Testing in Small-Budget RCTs

Hannah Li; Hongseok Namkoong; Isaac Scheinfeld

arxiv: 2606.13531 · v1 · pith:J6NERKUFnew · submitted 2026-06-11 · 📊 stat.ME

When Representative Samples Produce Worse Outcomes: Scale-up Decisions and Testing in Small-Budget RCTs

Hannah Li , Hongseok Namkoong , Isaac Scheinfeld This is my paper

Pith reviewed 2026-06-27 05:45 UTC · model grok-4.3

classification 📊 stat.ME

keywords pilot RCTssample representativenesssmall-budget experimentsheterogeneous treatment effectssignificance testingscale-up decisionsoptimal experimental design

0 comments

The pith

In small-budget pilot RCTs, sampling from one homogeneous subpopulation can maximize expected downstream impact more than representative sampling when decisions rely on significance tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Small randomized controlled trials screen interventions before larger studies, where errors in scaling decisions carry high costs. The paper demonstrates that representative samples are not always optimal in these pilots. When budgets are tight and scale-up decisions rest on statistical significance tests, the design maximizing expected improvement draws the entire sample from a single homogeneous subpopulation. The choice of which subpopulation depends on sampling costs and prior beliefs about treatment effect variation across groups. With large budgets the optimum shifts toward a representative sample of the full target population.

Core claim

When an RCT paired with a non-adaptive significance test determines whether an intervention receives any downstream payoff, the pilot sample composition maximizing expected impact consists of a single homogeneous subpopulation in the small-budget regime; the subpopulation is selected according to sampling costs and the designer's priors on heterogeneous treatment effects. In the large-budget limit this composition converges to a representative sample of the target population.

What carries the argument

The budget-constrained pilot sample allocation that maximizes the expected value of the downstream payoff under a fixed significance-test decision rule.

If this is right

The optimal pilot composition is not fixed but depends on the available budget size.
In the small-budget regime homogeneous sampling from one subpopulation outperforms representative sampling.
The preferred subpopulation is determined jointly by sampling costs and prior beliefs on treatment effect heterogeneity.
The small-budget result extends to any setting where a significance test on RCT data decides receipt of a non-adaptive downstream payoff.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Decision rules that adaptively incorporate pilot data or use Bayesian updating may alter the optimal sampling strategy.
The result implies that low-budget experimentation in other domains could also favor non-representative designs when tests gate payoffs.
Accurate priors on effect heterogeneity become especially valuable when budgets constrain pilot size.

Load-bearing premise

Downstream decisions are made by a non-adaptive significance test applied to the pilot RCT data.

What would settle it

A simulation or empirical study in which, under small budgets and significance-test decisions, a representative sample produces strictly higher expected downstream impact than the single-subpopulation design.

Figures

Figures reproduced from arXiv: 2606.13531 by Hannah Li, Hongseok Namkoong, Isaac Scheinfeld.

**Figure 1.** Figure 1: The three-stage experiment pipeline we model. The intervention is first evaluated in a budget-constrained pilot RCT, where the result of a significance test determines whether it advances to a follow-up RCT. A second significance test determines adoption of the intervention in the target population, leading to a potential improvement in average outcomes if successful. We show that when optimizing for expec… view at source ↗

**Figure 2.** Figure 2: Optimal design depends on pilot resources. We plot the expected downstream impact achieved by the optimal pilot design, the best small-budget single-subpopulation design, and a representative sample. In budget-constrained settings, optimal pilots sample from a single homogeneous subpopulation, while representative sampling is best for well-resourced trials. To gain insight into how treatment effect heterog… view at source ↗

**Figure 3.** Figure 3: The three-stage experimental pipeline. The pilot design s1 is highlighted in blue. The samples s1 and s2 in the first two stages determine the distribution of the average treatment effect estimates, and the population s3 in the third stage determines the change in outcomes if both significance tests pass. 2.3 Probabilistic model We define a probabilistic model of the experimental process. The pilot designe… view at source ↗

**Figure 4.** Figure 4: Why a representative sample is optimal for large budgets. When the sample is non-representative, i.e. s1 ∝̸ s3, the pilot decision boundary is misaligned with the boundary separating positive and negative average treatment effects in the target population. Even with an infinite sampling investment, there are conditional average treatment effects τ for which the pilot incorrectly classifies the sign of the … view at source ↗

**Figure 5.** Figure 5: The small-budget index for two types under a large ( [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Pilot RCT Impact. Comparison of the small-budget and limiting large-budget optimal pilot designs for a range of sample sizes in a semi-synthetic scenario calibrated to the NSLM study. In this instance where costs across types are assumed constant, the budget is equivalent to a constraint on the number of students, and the small-budget optimal design is equivalent to sampling only from the schools with the … view at source ↗

read the original abstract

Small randomized controlled trials are often used to screen interventions before running larger follow-up studies. This is a critical phase of experimentation, as missing effective interventions or scaling up harmful ones can be very costly. A common proposal to mitigate these errors is to recruit samples that are representative of the target population, but this is often challenging in resource-constrained pilots. We challenge the narrative that representative samples are always superior by showing that when statistical significance testing determines whether interventions receive further study, the pilot trial composition that maximizes the downstream expected improvement in outcomes depends critically on its budget size. In the large-budget limit, the optimal pilot design converges to a sample that is representative of the target population. However, in the small-budget regime, the pilot designer maximizes expected impact by sampling only from a single homogeneous sub-population, chosen in a manner that depends on sampling costs and the designer's prior beliefs about heterogeneous treatment effects. Our proof of the small-budget result applies more generally when an RCT and significance test are used to decide whether to receive any non-adaptive downstream payoff, a result that may be applicable to other settings with constrained experimentation budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

For small-budget pilots gated by significance tests, sampling one subpopulation beats representative sampling, but the result is tied to that specific decision rule.

read the letter

The punchline is that when your pilot is small and scale-up hinges on clearing a significance threshold, the math says to put all samples into one homogeneous group whose treatment effect your prior likes, rather than spreading them for representativeness. In the large-budget limit the optimum flips back to a representative draw.

What is new is the explicit dependence on budget size: the switch does not appear in the standard literature on representative sampling or small RCT design. The paper also gives a clean generalization that the same logic holds for any non-adaptive payoff decided by such a test.

The argument is internally consistent once you accept the model of heterogeneous effects, group-specific costs, and the threshold created by the test. That threshold is what rewards concentration in low-power regimes.

The main limitation is that the optimality result is load-bearing on the non-adaptive significance test. A Bayesian rule or one that acts on the point estimate or expected value of information would likely favor spreading samples to reduce variance across groups, so the single-subpopulation prescription is narrower than the abstract sometimes suggests. The result is also sensitive to the priors on treatment effects and the exact form of the payoff, which is acknowledged but limits immediate applicability.

This is useful for anyone who runs tight-budget pilots in policy or development and must decide follow-up with a p-value gate. A reader working on optimal design under resource constraints will find the derivation worth checking. It is coherent enough on its own terms to deserve referee time, even if revisions will need to clarify the scope and assumptions.

Referee Report

0 major / 3 minor

Summary. The manuscript claims that when small-budget RCTs are used to screen interventions before larger studies, with a significance test determining whether to scale up, the pilot sampling design that maximizes expected downstream impact is to draw all samples from a single homogeneous subpopulation (chosen based on group-specific sampling costs and the designer's priors on heterogeneous treatment effects). In the large-budget limit the optimum converges to a representative sample of the target population. The small-budget result is derived for the case of any non-adaptive downstream payoff decided by such a significance test.

Significance. If the central derivation holds, the result supplies a precise theoretical counter-example to the default recommendation for representative sampling in pilot RCTs, showing that the optimality of representativeness is budget-dependent and decision-rule-dependent. The explicit generalization of the small-budget proof to arbitrary non-adaptive payoffs decided by a significance test is a clear strength, as is the clean separation between the small-budget and large-budget regimes. The scoping to non-adaptive significance testing is stated up front, so the stress-test concern about other decision rules (Bayesian, magnitude-based, etc.) does not undermine the manuscript's internal claims.

minor comments (3)

The precise functional form of the significance test (e.g., one-sided t-test, exact threshold) and the downstream payoff function should be stated explicitly in the main text before the small-budget theorem, rather than only in the appendix, to make the load-bearing threshold effect transparent.
Notation for the heterogeneous treatment effects and the prior distribution over them is introduced gradually; a single early display equation collecting all primitives would improve readability.
Figure 2 (or equivalent) comparing optimal allocation across budget sizes would benefit from an explicit legend indicating the prior parameters used in each panel.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading, positive summary of the manuscript's contributions, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain.

full rationale

The paper presents a mathematical derivation of optimal pilot sampling under a model with heterogeneous treatment effects, sampling costs, and a non-adaptive significance-testing decision rule. The small-budget result (single-subpopulation sampling) follows from the model's assumptions and proof, without any quoted reduction of the optimum to a fitted parameter, self-defined quantity, or self-citation chain. The decision rule is an explicit modeling choice rather than a hidden tautology. This is a standard non-circular theoretical result.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on a model of heterogeneous treatment effects, group-specific sampling costs, and a non-adaptive significance test that gates a downstream payoff; these are standard domain assumptions rather than new entities.

free parameters (2)

prior beliefs on heterogeneous treatment effects
Designer's priors determine which subpopulation is chosen in the small-budget optimum.
group-specific sampling costs
Costs enter the optimization that selects the single subpopulation.

axioms (2)

domain assumption Downstream payoff is non-adaptive and determined solely by whether the pilot passes a significance test.
Stated in the abstract as the setting where the result applies.
domain assumption Treatment effects are heterogeneous across subpopulations.
Required for the single-subpopulation optimum to differ from representative sampling.

pith-pipeline@v0.9.1-grok · 5733 in / 1309 out tokens · 15753 ms · 2026-06-27T05:45:02.859473+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

75 extracted references · 50 canonical work pages · 2 internal anchors

[1]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin.Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction. Cambridge: Cambridge University Press, 2015.isbn: 978- 0-521-88588-1.doi: 10 . 1017 / CBO9781139025751.url: https : / / www . cambridge . org / core/books/causal- inference- for- statistics- social- and- biomedical- sciences/ 71126BE90C...

2015
[2]

Determining Optimal Sample Sizes for Multistage Adaptive Randomized Clinical Trials from an Industry Perspective Using Value of Information Methods

Maggie H Chen and Andrew R Willan. “Determining Optimal Sample Sizes for Multistage Adaptive Randomized Clinical Trials from an Industry Perspective Using Value of Information Methods”. In:Clinical Trials10.1 (Feb. 1, 2013), pp. 54–62.issn: 1740-7745.doi: 10 . 1177 / 1740774512467404.url: https : / / doi . org / 10 . 1177 / 1740774512467404(visited on 05/11/2024)

2013
[3]

Adaptive Treatment Assignment in Experiments for Policy Choice

Maximilian Kasy and Anja Sautmann. “Adaptive Treatment Assignment in Experiments for Policy Choice”. In:Econometrica89.1 (2021), pp. 113–132.issn: 0012-9682.doi: 10. 3982/ECTA17527.url: https://www.econometricsociety.org/doi/10.3982/ECTA17527 (visited on 02/08/2026). 49

work page doi:10.3982/ecta17527 2021
[4]

E8(R1) GENERAL CONSIDERATIONS FOR CLINICAL STUD- IES

FDA, CDER, and CBER. “E8(R1) GENERAL CONSIDERATIONS FOR CLINICAL STUD- IES”. In: (Apr. 2022)

2022
[5]

Department of Education et al.The Future of Education Research at IES: Advancing an Equity-Oriented Science

Committee on the Future of Education Research at the Institute of Education Sciences in the U.S. Department of Education et al.The Future of Education Research at IES: Advancing an Equity-Oriented Science. Ed. by Adam Gamoran and Kenne Dibner. Washington, D.C.: National Academies Press, July 1, 2022, p. 26428.isbn: 978-0-309-27539-2.doi:10.17226/ 26428.ur...

2022
[6]

Screening Designs for Drug Development

D. Rossell, P. Muller, and G. L. Rosner. “Screening Designs for Drug Development”. In: Biostatistics8.3 (July 1, 2007), pp. 595–608.issn: 1465-4644, 1468-4357.doi: 10 . 1093 / biostatistics / kxl031.url: https : / / academic . oup . com / biostatistics / article - lookup/doi/10.1093/biostatistics/kxl031(visited on 02/07/2026)

work page doi:10.1093/biostatistics/kxl031(visited 2007
[7]

Beyond Generalization of the ATE: Designing Randomized Trials to Understand Treatment Effect Heterogeneity

Elizabeth Tipton. “Beyond Generalization of the ATE: Designing Randomized Trials to Understand Treatment Effect Heterogeneity”. In:Journal of the Royal Statistical Society Series A: Statistics in Society184.2 (Apr. 1, 2021), pp. 504–521.issn: 0964-1998, 1467-985X. doi: 10.1111/rssa.12629 .url: https://academic.oup.com/jrsssa/article/184/2/ 504/7056369(vis...

work page doi:10.1111/rssa.12629 2021
[8]

FDA et al.Enhancing the Diversity of Clinical Trial Populations. Nov. 2020

2020
[9]

Toward a System of Evidence for All: Current Practices and Future Opportunities in 37 Randomized Trials

Elizabeth Tipton et al. “Toward a System of Evidence for All: Current Practices and Future Opportunities in 37 Randomized Trials”. In:Educational Researcher50.3 (Apr. 2021), pp. 145– 156.issn: 0013-189X, 1935-102X.doi: 10.3102/0013189X20960686.url: http://journals. sagepub.com/doi/10.3102/0013189X20960686(visited on 04/29/2024)

work page doi:10.3102/0013189x20960686.url: 2021
[10]

Elements of External Validity: Framework, Design, and Analysis

Naoki Egami and Erin Hartman. “Elements of External Validity: Framework, Design, and Analysis”. In:American Political Science Review117.3 (Aug. 2023), pp. 1070–1088.issn: 0003-0554, 1537-5943.doi: 10.1017/S0003055422000880.url: https://www.cambridge. org/core/product/identifier/S0003055422000880/type/journal_article (visited on 11/16/2023)

work page doi:10.1017/s0003055422000880.url: 2023
[11]

Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution

Christopher J. Bryan, Elizabeth Tipton, and David S. Yeager. “Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution”. In:Nature Human Behaviour 5.8 (July 22, 2021), pp. 980–989.issn: 2397-3374.doi:10.1038/s41562-021-01143-3.url: https://www.nature.com/articles/s41562-021-01143-3(visited on 04/29/2024)

work page doi:10.1038/s41562-021-01143-3.url: 2021
[12]

External Validity of Randomised Controlled Trials: “To Whom Do the Results of This Trial Apply?

Peter M Rothwell. “External Validity of Randomised Controlled Trials: “To Whom Do the Results of This Trial Apply?”” In:The Lancet365.9453 (Jan. 2005), pp. 82–93.issn: 01406736. doi: 10 . 1016 / S0140 - 6736(04 ) 17670 - 8.url: https : / / linkinghub . elsevier . com / retrieve/pii/S0140673604176708(visited on 04/25/2026)

2005
[13]

The Weirdest People in the World?

Joseph Henrich, Steven J. Heine, and Ara Norenzayan. “The Weirdest People in the World?” In:Behavioral and Brain Sciences33.2–3 (June 2010), pp. 61–83.issn: 0140-525X, 1469-1825. doi: 10.1017/S0140525X0999152X .url: https://www.cambridge.org/core/product/ identifier/S0140525X0999152X/type/journal_article(visited on 04/25/2026)

work page doi:10.1017/s0140525x0999152x 2010
[14]

Reproducibility of Preclinical Animal Research Improves with Het- erogeneity of Study Samples

Bernhard Voelkl et al. “Reproducibility of Preclinical Animal Research Improves with Het- erogeneity of Study Samples”. In:PLOS Biology16.2 (Feb. 22, 2018). Ed. by Eric-Jan Wagenmakers, e2003693.issn: 1545-7885.doi: 10 . 1371 / journal . pbio . 2003693.url: https://dx.plos.org/10.1371/journal.pbio.2003693(visited on 04/25/2026). 50

work page doi:10.1371/journal.pbio.2003693(visited 2018
[15]

Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework

Sandra M. Eldridge et al. “Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework”. In:PLOS ONE11.3 (Mar. 15, 2016). Ed. by Chiara Lazzeri, e0150205.issn: 1932-6203.doi: 10 . 1371 / journal . pone . 0150205.url: https : / / dx . plos . org / 10 . 1371 / journal . pone . 0150205(visited...

2016
[16]

FDA.Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products. Dec. 2019

2019
[17]

Experimental Design for Drug Development: A Bayesian Approach

Donald A. Berry. “Experimental Design for Drug Development: A Bayesian Approach”. In: Journal of Biopharmaceutical Statistics1.1 (Jan. 1, 1991), pp. 81–101.issn: 1054-3406, 1520- 5711.doi: 10.1080/10543409108835007.url: https://www.tandfonline.com/doi/full/ 10.1080/10543409108835007(visited on 01/02/2025)

work page doi:10.1080/10543409108835007.url: 1991
[18]

Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations from Experiments

Elizabeth Tipton. “Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations from Experiments”. In:Evaluation Review37.2 (Apr. 2013), pp. 109–139.issn: 1552-3926.doi:10.1177/0193841X13516324. PMID:24647924

work page doi:10.1177/0193841x13516324 2013
[19]

Naoki Egami and Diana Da In Lee.Designing Multi-Context Studies for External Validity: Site Selection via Synthetic Purposive Sampling. Aug. 24, 2023.url:https://naokiegami. com/paper/sps.pdf(visited on 11/22/2023). Pre-published

2023
[20]

Adam Bouyamourn.Where to Experiment? Site Selection Under Distribution Shift via Optimal Transport and Wasserstein DRO. Nov. 6, 2025.doi:10.48550/arXiv.2511.04658 . arXiv: 2511.04658 [stat] .url: http://arxiv.org/abs/2511.04658 (visited on 02/07/2026). Pre-published

work page doi:10.48550/arxiv.2511.04658 2025
[21]

Minimax-Regret Sample Selection in Randomized Experiments

Yuchen Hu et al. “Minimax-Regret Sample Selection in Randomized Experiments”. In:Pro- ceedings of the 25th ACM Conference on Economics and Computation. EC ’24. New York, NY, USA: Association for Computing Machinery, Dec. 17, 2024, pp. 1209–1235.isbn: 979- 8-4007-0704-9.doi: 10.1145/3670865.3673458.url: https://dl.acm.org/doi/10.1145/ 3670865.3673458(visit...

work page doi:10.1145/3670865.3673458.url: 2024
[22]

José Luis Montiel Olea et al.Externally Valid Selection of Experimental Sites via the K-Median Problem. Aug. 29, 2025.doi:10.48550/arXiv.2408.09187. arXiv: 2408.09187 [econ].url: http://arxiv.org/abs/2408.09187(visited on 02/04/2026). Pre-published

work page doi:10.48550/arxiv.2408.09187 2025
[23]

Monographs on Statistics and Applied Probability 36

KaitaiFang,SamuelKotz,andKaiWangNg.Symmetric Multivariate and Related Distributions. Monographs on Statistics and Applied Probability 36. London ; New York: Chapman and Hall, 1990. 220 pp.isbn: 978-0-412-31430-8

1990
[24]

Oracle Estimation of a Change Point in High-Dimensional Quantile Regression , year =

Xinran Li and Peng Ding. “General Forms of Finite Population Central Limit Theorems with Applications to Causal Inference”. In:Journal of the American Statistical Association112.520 (Oct. 2, 2017), pp. 1759–1769.issn: 0162-1459.doi:10.1080/01621459.2017.1295865.url: https://doi.org/10.1080/01621459.2017.1295865(visited on 01/04/2026)

work page doi:10.1080/01621459.2017.1295865.url: 2017
[25]

Peng Ding.A First Course in Causal Inference. Oct. 3, 2023.doi:10.48550/arXiv.2305. 18793. arXiv: 2305.18793 [stat] .url: http://arxiv.org/abs/2305.18793 (visited on 01/21/2026). Pre-published. 51

work page doi:10.48550/arxiv.2305 2023
[26]

The Knowledge-Gradient Policy for Correlated Normal Beliefs

Peter Frazier, Warren Powell, and Savas Dayanik. “The Knowledge-Gradient Policy for Correlated Normal Beliefs”. In:INFORMS Journal on Computing21.4 (Nov. 2009), pp. 599– 613.issn: 1091-9856, 1526-5528.doi: 10.1287/ijoc.1080.0314.url: https://pubsonline. informs.org/doi/10.1287/ijoc.1080.0314(visited on 03/18/2024)

work page doi:10.1287/ijoc.1080.0314.url: 2009
[27]

Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions

Stephen E. Chick, Noah Gans, and Özge Yapar. “Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions”. In:Management Science68.7 (July 2022), pp. 4919–4938.issn: 0025-1909.doi: 10.1287/mnsc.2021.4137.url: https://pubsonline. informs.org/doi/10.1287/mnsc.2021.4137(visited on 10/21/2025)

work page doi:10.1287/mnsc.2021.4137.url: 2022
[28]

A/B Testing with Fat Tails

Eduardo M. Azevedo et al. “A/B Testing with Fat Tails”. In:Journal of Political Economy 128.12 (Dec. 1, 2020), pp. 4614–000.issn: 0022-3808, 1537-534X.doi:10.1086/710607.url: https://www.journals.uchicago.edu/doi/10.1086/710607(visited on 03/01/2024)

work page doi:10.1086/710607.url: 2020
[29]

Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products Guidance for Industry

FDA, CDER, and CBER. “Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products Guidance for Industry”. In: (2019)

2019
[30]

Human HDAC6 senses valine abundancy to regulate DNA damage

David S. Yeager et al. “A National Experiment Reveals Where a Growth Mindset Improves Achievement”. In:Nature573.7774 (Sept. 2019), pp. 364–369.issn: 0028-0836, 1476-4687.doi: 10.1038/s41586- 019- 1466- y.url: https://www.nature.com/articles/s41586- 019- 1466-y(visited on 04/29/2024)

work page doi:10.1038/s41586- 2019
[31]

Interpreting Effect Sizes of Education Interventions

Matthew A. Kraft. “Interpreting Effect Sizes of Education Interventions”. In:Educational Researcher49.4 (May 2020), pp. 241–253.issn: 0013-189X, 1935-102X.doi: 10 . 3102 / 0013189X20912798.url: https://journals.sagepub.com/doi/10.3102/0013189X20912798 (visited on 02/02/2026)

work page doi:10.3102/0013189x20912798 2020
[32]

Translating Evidence into Practice: Eligibility Criteria Fail to Eliminate Clinically Significant Differences between Real-World and Study Populations

Amelia J. Averitt et al. “Translating Evidence into Practice: Eligibility Criteria Fail to Eliminate Clinically Significant Differences between Real-World and Study Populations”. In: NPJ digital medicine3 (2020), p. 67.issn: 2398-6352.doi: 10.1038/s41746-020-0277-8 . PMID:32411828

work page doi:10.1038/s41746-020-0277-8 2020
[33]

Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments

Rachael Meager. “Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments”. In:American Economic Journal: Applied Economics11.1 (Jan. 1, 2019), pp. 57–91.issn: 1945-7782, 1945-7790.doi:10.1257/ app.20170299.url: https://pubs.aeaweb.org/doi/10.1257/app.20170299 (visited on 01/12/2024)

work page doi:10.1257/app.20170299 2019
[34]

Heterogeneity in Mathematics Intervention Effects: Evidence from a Meta-Analysis of 191 Randomized Experiments

Ryan Williams et al. “Heterogeneity in Mathematics Intervention Effects: Evidence from a Meta-Analysis of 191 Randomized Experiments”. In:Journal of Research on Educational Effectiveness15.3 (July 3, 2022), pp. 584–634.issn: 1934-5747, 1934-5739.doi:10.1080/ 19345747 . 2021 . 2009072.url: https : / / www . tandfonline . com / doi / full / 10 . 1080 / 1934...

arXiv 2022
[35]

URL https://www.science.org/doi/abs/10.1126/science

Open Science Collaboration. “Estimating the Reproducibility of Psychological Science”. In: Science349.6251 (Aug. 28, 2015), aac4716.issn: 0036-8075, 1095-9203.doi:10.1126/science. aac4716.url: https://www.science.org/doi/10.1126/science.aac4716 (visited on 02/06/2026)

work page doi:10.1126/science 2015
[36]

Evaluating Replicability of Laboratory Experiments in Economics

Colin F. Camerer et al. “Evaluating Replicability of Laboratory Experiments in Economics”. In:Science351.6280 (Mar. 25, 2016), pp. 1433–1436.issn: 0036-8075, 1095-9203.doi: 10. 1126/science.aaf0918.url: https://www.science.org/doi/10.1126/science.aaf0918 (visited on 02/06/2026). 52

work page doi:10.1126/science.aaf0918 2016
[37]

RaiseStandardsforPreclinicalCancerResearch

C.GlennBegleyandLeeM.Ellis.“RaiseStandardsforPreclinicalCancerResearch”.In:Nature 483.7391 (Mar. 29, 2012), pp. 531–533.issn: 0028-0836, 1476-4687.doi:10.1038/483531a. url:https://www.nature.com/articles/483531a(visited on 02/06/2026)

work page doi:10.1038/483531a 2012
[38]

Washington, D.C.: National Academies Press, Sept

Committee on Reproducibility and Replicability in Science et al.Reproducibility and Repli- cability in Science. Washington, D.C.: National Academies Press, Sept. 20, 2019, p. 25303. isbn: 978-0-309-48616-3.doi: 10.17226/25303 .url: https://www.nationalacademies. org/publications/25303(visited on 02/08/2026)

work page doi:10.17226/25303 2019
[39]

Site Selection in Experiments: An Assessment of Site Recruitment and Generalizability in Two Scale-up Studies

Elizabeth Tipton et al. “Site Selection in Experiments: An Assessment of Site Recruitment and Generalizability in Two Scale-up Studies”. In:Journal of Research on Educational Effectiveness 9 (sup1 Oct. 3, 2016), pp. 209–228.issn: 1934-5747, 1934-5739.doi:10.1080/19345747. 2015.1105895.url: https://www.tandfonline.com/doi/full/10.1080/19345747.2015. 110589...

work page doi:10.1080/19345747 2016
[40]

Tett, Margaret J

Peter F. Thall. “Adaptive Enrichment Designs in Clinical Trials”. In:Annual review of statistics and its application8.1 (Mar. 2021), pp. 393–411.issn: 2326-8298.doi:10.1146/annurev- statistics- 040720- 032818. PMID: 36212769.url: https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC9544313/(visited on 12/31/2024)

work page doi:10.1146/annurev- 2021
[41]

Optimized Adaptive Enrichment Designs for Three-Arm Trials: Learning Which Subpopulations Benefit from Different Treatments

Jon Arni Steingrimsson et al. “Optimized Adaptive Enrichment Designs for Three-Arm Trials: Learning Which Subpopulations Benefit from Different Treatments”. In:Biostatistics22.2 (Apr. 10, 2021), pp. 283–297.issn: 1465-4644, 1468-4357.doi:10.1093/biostatistics/ kxz030.url: https://academic.oup.com/biostatistics/article/22/2/283/5550955 (visited on 12/31/2024)

work page doi:10.1093/biostatistics/ 2021
[42]

A Review of Statistical Methods for Generalizing From Evaluations of Educational Interventions

Elizabeth Tipton and Robert B. Olsen. “A Review of Statistical Methods for Generalizing From Evaluations of Educational Interventions”. In:Educational Researcher47.8 (Nov. 2018), pp. 516–524.issn: 0013-189X, 1935-102X.doi: 10.3102/0013189X18781522 .url: http: //journals.sagepub.com/doi/10.3102/0013189X18781522(visited on 04/11/2024)

work page doi:10.3102/0013189x18781522 2018
[44]

An Overview of Current Methods for Real-world Applications to Gener- alize or Transport Clinical Trial Findings to Target Populations of Interest

Albee Y. Ling et al. “An Overview of Current Methods for Real-world Applications to Gener- alize or Transport Clinical Trial Findings to Target Populations of Interest”. In:Epidemiology 34.5 (Sept. 1, 2023), pp. 627–636.issn: 1531-5487.doi:10.1097/EDE.0000000000001633. PMID:37255252

work page doi:10.1097/ede.0000000000001633 2023
[45]

Edinburgh: Oliver and Boyd, 1949

Ronald Aylmer Fisher.The Design of Experiments. Edinburgh: Oliver and Boyd, 1949

1949
[46]

Classic ed., unabridged republ

Friedrich Pukelsheim.Optimal Design of Experiments. Classic ed., unabridged republ. of the work first publ. by Wiley, 1993. Classics in Applied Mathematics 50. Philadelphia, Pa: SIAM, Soc. for Industrial and Applied Mathematics, 2006. 454 pp.isbn: 978-0-89871-604-7

1993
[47]

Bayesian Experimental Design: A Review

Kathryn Chaloner and Isabella Verdinelli. “Bayesian Experimental Design: A Review”. In: Statistical Science10.3 (Aug. 1, 1995).issn: 0883-4237.doi: 10 . 1214 / ss / 1177009939. url: https://projecteuclid.org/journals/statistical-science/volume-10/issue- 3/Bayesian-Experimental-Design-A-Review/10.1214/ss/1177009939.full (visited on 01/02/2025). 53

work page doi:10.1214/ss/1177009939.full 1995
[48]

Tom Rainforth et al.Modern Bayesian Experimental Design. Nov. 29, 2023. arXiv:2302.14545 [cs, stat] .url: http : / / arxiv . org / abs / 2302 . 14545(visited on 06/27/2024). Pre- published

arXiv 2023
[49]

Comparison of Experiments

David Blackwell. “Comparison of Experiments”. In:Proceedings of the Second Berkeley Sympo- sium on Mathematical Statistics and Probability. Vol. 2. University of California Press, Jan. 1, 1951, pp. 93–103.url:https://projecteuclid.org/ebooks/berkeley- symposium- on- mathematical- statistics- and- probability/Proceedings- of- the- Second- Berkeley- Symposi...

arXiv 1951
[50]

Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters

Carlos Fernández-Loría and Foster Provost. “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters”. In:INFORMS Journal on Data Science1.1 (Apr. 2022), pp. 4–16.issn: 2694-4022, 2694-4030.doi:10.1287/ijds.2021.0006. url: https : / / pubsonline . informs . org / doi / 10 . 1287 / ijds . 2021 . 0006(visited on 09/12/2025)

work page doi:10.1287/ijds.2021.0006 2022
[51]

Commentary on “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters

Dean Eckles. “Commentary on “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters”: On Loss Functions and Bias–Variance Tradeoffs in Causal Estimation and Decisions”. In:INFORMS Journal on Data Science1.1 (Apr. 2022), pp. 17–18.issn: 2694-4022, 2694-4030.doi:10.1287/ijds.2022.0012.url: https: //pubsonline.informs.org/d...

work page doi:10.1287/ijds.2022.0012.url: 2022
[52]

A Review of Modern Computational Algorithms for Bayesian Optimal Design

Elizabeth G. Ryan et al. “A Review of Modern Computational Algorithms for Bayesian Optimal Design”. In:International Statistical Review84.1 (2016), pp. 128–154.issn: 1751- 5823.doi: 10.1111/insr.12107.url: https://onlinelibrary.wiley.com/doi/abs/10. 1111/insr.12107(visited on 02/07/2026)

work page doi:10.1111/insr.12107.url: 2016
[53]

Hoiyi Ng and Guido Imbens.Scalable Decisions Using a Bayesian Decision-Theoretic Approach. Jan. 27, 2026.doi: 10.48550/arXiv.2601.20031 . arXiv: 2601.20031 [stat].url: http: //arxiv.org/abs/2601.20031(visited on 02/07/2026). Pre-published

work page doi:10.48550/arxiv.2601.20031 2026
[54]

A Nonconcavity in the Value of Information

Roy Radner and Joseph Stiglitz. “A Nonconcavity in the Value of Information”. In:Bayesian models in economic theory5 (1984), pp. 33–52.url: https : / / pages . stern . nyu . edu / ~rradner/publishedpapers/50Nonconcavity.pdf(visited on 04/11/2025)

1984
[55]

A Tight Sufficient Condition for Radner–Stiglitz Nonconcavity in the Value of Information

Michel De Lara and Laurent Gilotte. “A Tight Sufficient Condition for Radner–Stiglitz Nonconcavity in the Value of Information”. In:Journal of Economic Theory137.1 (Nov. 2007), pp. 696–708.issn: 00220531.doi: 10 . 1016 / j . jet . 2007 . 01 . 014.url: https : //linkinghub.elsevier.com/retrieve/pii/S0022053107000373(visited on 04/11/2025)

2007
[56]

Value of Information Analysis for Research Decisions-An Introduction: Report 1 of the ISPOR Value of Information Analysis Emerging Good Practices Task Force

Elisabeth Fenwick et al. “Value of Information Analysis for Research Decisions-An Introduction: Report 1 of the ISPOR Value of Information Analysis Emerging Good Practices Task Force”. In: Value in Health: The Journal of the International Society for Pharmacoeconomics and Outcomes Research23.2 (Feb. 2020), pp. 139–150.issn: 1524-4733.doi:10.1016/j.jval.20...

work page doi:10.1016/j.jval.2020.01.001 2020
[57]

Calculating Expected Value of Sample Information Adjusting for Imperfect Im- plementation

Anna Heath. “Calculating Expected Value of Sample Information Adjusting for Imperfect Im- plementation”. In:Medical Decision Making42.5 (July 1, 2022), pp. 626–636.issn: 0272-989X. doi: 10.1177/0272989X211073098.url: https://doi.org/10.1177/0272989X211073098 (visited on 05/11/2024). 54

work page doi:10.1177/0272989x211073098.url: 2022
[58]

Expected Value of Sample Information to Guide the Design of Group Sequential Clinical Trials

Laura Flight et al. “Expected Value of Sample Information to Guide the Design of Group Sequential Clinical Trials”. In:Medical Decision Making42.4 (May 2022), pp. 461–473.issn: 0272-989X, 1552-681X.doi: 10.1177/0272989X211045036.url: http://journals.sagepub. com/doi/10.1177/0272989X211045036(visited on 05/11/2024)

work page doi:10.1177/0272989x211045036.url: 2022
[59]

Version 1

Michael Gechter et al.Selecting Experimental Sites for External Validity. Version 1. May 21, 2024.doi: 10.48550/arXiv.2405.13241. arXiv: 2405.13241 [econ].url: http://arxiv. org/abs/2405.13241(visited on 02/07/2026). Pre-published

work page doi:10.48550/arxiv.2405.13241 2024
[61]

July 7, 2023.doi: 10

Stephen Bates et al.Incentive-Theoretic Bayesian Inference for Collaborative Science. July 7, 2023.doi: 10 . 48550 / arXiv . 2307 . 03748. arXiv: 2307 . 03748 [cs, stat] .url: http : //arxiv.org/abs/2307.03748(visited on 10/24/2023). Pre-published

arXiv 2023
[62]

Screening for Experiments

Daehong Min. “Screening for Experiments”. In:Games and Economic Behavior142 (Nov. 1, 2023), pp. 73–100.issn: 0899-8256.doi:10.1016/j.geb.2023.07.009.url: https://www. sciencedirect.com/science/article/pii/S0899825623001021(visited on 12/24/2024)

work page doi:10.1016/j.geb.2023.07.009.url: 2023
[63]

Stability analysis of fluid flows using Lagrangian Perturbation Theory (LPT): application to the plane Couette flow

Ronald L. Wasserstein and Nicole A. Lazar. “The ASA Statement onp-Values: Context, Process, and Purpose”. In:The American Statistician70.2 (Apr. 2, 2016), pp. 129–133. issn: 0003-1305, 1537-2731.doi: 10.1080/00031305.2016.1154108 .url: https://www. tandfonline.com/doi/full/10.1080/00031305.2016.1154108(visited on 02/08/2026)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2016.1154108 2016
[64]

A Theory of Experimenters: Robustness, Randomization, and Balance

Abhijit V. Banerjee et al. “A Theory of Experimenters: Robustness, Randomization, and Balance”. In:American Economic Review110.4 (Apr. 1, 2020), pp. 1206–1230.issn: 0002- 8282.doi: 10.1257/aer.20171634 .url: https://pubs.aeaweb.org/doi/10.1257/aer. 20171634(visited on 02/07/2026)

work page doi:10.1257/aer.20171634 2020
[65]

Benjamin Recht.A Bureaucratic Theory of Statistics. Jan. 7, 2025.doi:10.48550/arXiv. 2501.03457. arXiv: 2501.03457 [stat].url: http://arxiv.org/abs/2501.03457 (visited on 01/08/2025). Pre-published

work page internal anchor Pith review doi:10.48550/arxiv 2025
[66]

Foundations of a General Theory of Sequential Decision Functions

Abraham Wald. “Foundations of a General Theory of Sequential Decision Functions”. In: Econometrica15.4 (1947), pp. 279–313.issn: 0012-9682.doi: 10.2307/1905331. JSTOR: 1905331.url:https://www.jstor.org/stable/1905331(visited on 02/07/2026)

work page doi:10.2307/1905331 1947
[67]

Ethan Che and Hongseok Namkoong.Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches. Aug. 14, 2023.doi:10.48550/arXiv.2303.11582. arXiv: 2303.11582 [cs, stat] .url: http://arxiv.org/abs/2303.11582 (visited on 09/11/2023). Pre-published

work page doi:10.48550/arxiv.2303.11582 2023
[68]

An Empirical Evaluation of Thompson Sampling

Olivier Chapelle and Lihong Li. “An Empirical Evaluation of Thompson Sampling”. In:Ad- vances in Neural Information Processing Systems. Vol. 24. Curran Associates, Inc., 2011.url: https://papers.nips.cc/paper_files/paper/2011/hash/e53a0a2978c28872a4505bdb51db06dc- Abstract.html(visited on 02/07/2026)

2011
[69]

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines

Anna L. Trella et al. “Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines”. In:Algorithms15.8 (Aug. 2022), p. 255.issn: 1999-4893.doi: 10.3390/a15080255. PMID: 36713810.url: https://pmc.ncbi.nlm.nih.gov/articles/ PMC9881427/(visited on 02/07/2026). 55

work page doi:10.3390/a15080255 2022
[70]

SQR: Balancing Speed, Quality and Risk in Online Experiments

Ya Xu, Weitao Duan, and Shaochen Huang. “SQR: Balancing Speed, Quality and Risk in Online Experiments”. In:Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London United Kingdom: ACM, July 19, 2018, pp. 895–904.isb...

work page doi:10.1145/3219819.3219875.url: 2018
[71]

Bayesian Optimal Design for Phase II Screening Trials

Meichun Ding, Gary L. Rosner, and Peter Müller. “Bayesian Optimal Design for Phase II Screening Trials”. In:Biometrics64.3 (Sept. 2008), pp. 886–894.issn: 0006-341X, 1541- 0420.doi: 10 . 1111 / j . 1541 - 0420 . 2007 . 00951 . x.url: https : / / academic . oup . com / biometrics/article/64/3/886-894/7331561(visited on 02/07/2026)

2008
[72]

Utility-based Optimization of Phase II/III Programs

Marietta Kirchner et al. “Utility-based Optimization of Phase II/III Programs”. In:Statistics in Medicine35.2 (Jan. 30, 2016), pp. 305–316.issn: 0277-6715, 1097-0258.doi:10.1002/ sim.6624.url: https://onlinelibrary.wiley.com/doi/10.1002/sim.6624 (visited on 02/07/2026)

work page doi:10.1002/sim.6624 2016
[73]

Optimal Designs for Phase II/III Drug Development Programs Including Methods for Discounting of Phase II Results

Stella Erdmann et al. “Optimal Designs for Phase II/III Drug Development Programs Including Methods for Discounting of Phase II Results”. In:BMC Medical Research Methodology20.1 (Dec. 2020), p. 253.issn: 1471-2288.doi: 10 . 1186 / s12874 - 020 - 01093 - w.url: https : / / bmcmedresmethodol . biomedcentral . com / articles / 10 . 1186 / s12874 - 020 - 0109...

2020
[74]

How Generalizable Is Your Experiment? An Index for Comparing Ex- perimental Samples and Populations

Elizabeth Tipton. “How Generalizable Is Your Experiment? An Index for Comparing Ex- perimental Samples and Populations”. In:Journal of Educational and Behavioral Statistics 39.6 (Dec. 2014), pp. 478–501.issn: 1076-9986, 1935-1054.doi:10.3102/1076998614558486. url: https : / / journals . sagepub . com / doi / 10 . 3102 / 1076998614558486(visited on 02/09/2026)

work page doi:10.3102/1076998614558486 2014
[75]

G. B. Folland.Real Analysis: Modern Techniques and Their Applications. 2nd ed. Pure and Applied Mathematics. New York: Wiley, 1999. 386 pp.isbn: 978-0-471-31716-6

1999
[76]

G. B. Folland.Advanced Calculus. 2nd ed. Aug. 4, 2023.url: https : / / sites . math . washington.edu//~folland/AdvCalc24.pdf(visited on 04/08/2026)

2023
[77]

Tyrrell Rockafellar and Roger Wets.Variational Analysis

R. Tyrrell Rockafellar and Roger Wets.Variational Analysis. Vol. 317. Grundlehren Der Mathematischen Wissenschaften. Springer Verlag, 2009. 56

2009

[1] [1]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin.Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction. Cambridge: Cambridge University Press, 2015.isbn: 978- 0-521-88588-1.doi: 10 . 1017 / CBO9781139025751.url: https : / / www . cambridge . org / core/books/causal- inference- for- statistics- social- and- biomedical- sciences/ 71126BE90C...

2015

[2] [2]

Determining Optimal Sample Sizes for Multistage Adaptive Randomized Clinical Trials from an Industry Perspective Using Value of Information Methods

Maggie H Chen and Andrew R Willan. “Determining Optimal Sample Sizes for Multistage Adaptive Randomized Clinical Trials from an Industry Perspective Using Value of Information Methods”. In:Clinical Trials10.1 (Feb. 1, 2013), pp. 54–62.issn: 1740-7745.doi: 10 . 1177 / 1740774512467404.url: https : / / doi . org / 10 . 1177 / 1740774512467404(visited on 05/11/2024)

2013

[3] [3]

Adaptive Treatment Assignment in Experiments for Policy Choice

Maximilian Kasy and Anja Sautmann. “Adaptive Treatment Assignment in Experiments for Policy Choice”. In:Econometrica89.1 (2021), pp. 113–132.issn: 0012-9682.doi: 10. 3982/ECTA17527.url: https://www.econometricsociety.org/doi/10.3982/ECTA17527 (visited on 02/08/2026). 49

work page doi:10.3982/ecta17527 2021

[4] [4]

E8(R1) GENERAL CONSIDERATIONS FOR CLINICAL STUD- IES

FDA, CDER, and CBER. “E8(R1) GENERAL CONSIDERATIONS FOR CLINICAL STUD- IES”. In: (Apr. 2022)

2022

[5] [5]

Department of Education et al.The Future of Education Research at IES: Advancing an Equity-Oriented Science

Committee on the Future of Education Research at the Institute of Education Sciences in the U.S. Department of Education et al.The Future of Education Research at IES: Advancing an Equity-Oriented Science. Ed. by Adam Gamoran and Kenne Dibner. Washington, D.C.: National Academies Press, July 1, 2022, p. 26428.isbn: 978-0-309-27539-2.doi:10.17226/ 26428.ur...

2022

[6] [6]

Screening Designs for Drug Development

D. Rossell, P. Muller, and G. L. Rosner. “Screening Designs for Drug Development”. In: Biostatistics8.3 (July 1, 2007), pp. 595–608.issn: 1465-4644, 1468-4357.doi: 10 . 1093 / biostatistics / kxl031.url: https : / / academic . oup . com / biostatistics / article - lookup/doi/10.1093/biostatistics/kxl031(visited on 02/07/2026)

work page doi:10.1093/biostatistics/kxl031(visited 2007

[7] [7]

Beyond Generalization of the ATE: Designing Randomized Trials to Understand Treatment Effect Heterogeneity

Elizabeth Tipton. “Beyond Generalization of the ATE: Designing Randomized Trials to Understand Treatment Effect Heterogeneity”. In:Journal of the Royal Statistical Society Series A: Statistics in Society184.2 (Apr. 1, 2021), pp. 504–521.issn: 0964-1998, 1467-985X. doi: 10.1111/rssa.12629 .url: https://academic.oup.com/jrsssa/article/184/2/ 504/7056369(vis...

work page doi:10.1111/rssa.12629 2021

[8] [8]

FDA et al.Enhancing the Diversity of Clinical Trial Populations. Nov. 2020

2020

[9] [9]

Toward a System of Evidence for All: Current Practices and Future Opportunities in 37 Randomized Trials

Elizabeth Tipton et al. “Toward a System of Evidence for All: Current Practices and Future Opportunities in 37 Randomized Trials”. In:Educational Researcher50.3 (Apr. 2021), pp. 145– 156.issn: 0013-189X, 1935-102X.doi: 10.3102/0013189X20960686.url: http://journals. sagepub.com/doi/10.3102/0013189X20960686(visited on 04/29/2024)

work page doi:10.3102/0013189x20960686.url: 2021

[10] [10]

Elements of External Validity: Framework, Design, and Analysis

Naoki Egami and Erin Hartman. “Elements of External Validity: Framework, Design, and Analysis”. In:American Political Science Review117.3 (Aug. 2023), pp. 1070–1088.issn: 0003-0554, 1537-5943.doi: 10.1017/S0003055422000880.url: https://www.cambridge. org/core/product/identifier/S0003055422000880/type/journal_article (visited on 11/16/2023)

work page doi:10.1017/s0003055422000880.url: 2023

[11] [11]

Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution

Christopher J. Bryan, Elizabeth Tipton, and David S. Yeager. “Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution”. In:Nature Human Behaviour 5.8 (July 22, 2021), pp. 980–989.issn: 2397-3374.doi:10.1038/s41562-021-01143-3.url: https://www.nature.com/articles/s41562-021-01143-3(visited on 04/29/2024)

work page doi:10.1038/s41562-021-01143-3.url: 2021

[12] [12]

External Validity of Randomised Controlled Trials: “To Whom Do the Results of This Trial Apply?

Peter M Rothwell. “External Validity of Randomised Controlled Trials: “To Whom Do the Results of This Trial Apply?”” In:The Lancet365.9453 (Jan. 2005), pp. 82–93.issn: 01406736. doi: 10 . 1016 / S0140 - 6736(04 ) 17670 - 8.url: https : / / linkinghub . elsevier . com / retrieve/pii/S0140673604176708(visited on 04/25/2026)

2005

[13] [13]

The Weirdest People in the World?

Joseph Henrich, Steven J. Heine, and Ara Norenzayan. “The Weirdest People in the World?” In:Behavioral and Brain Sciences33.2–3 (June 2010), pp. 61–83.issn: 0140-525X, 1469-1825. doi: 10.1017/S0140525X0999152X .url: https://www.cambridge.org/core/product/ identifier/S0140525X0999152X/type/journal_article(visited on 04/25/2026)

work page doi:10.1017/s0140525x0999152x 2010

[14] [14]

Reproducibility of Preclinical Animal Research Improves with Het- erogeneity of Study Samples

Bernhard Voelkl et al. “Reproducibility of Preclinical Animal Research Improves with Het- erogeneity of Study Samples”. In:PLOS Biology16.2 (Feb. 22, 2018). Ed. by Eric-Jan Wagenmakers, e2003693.issn: 1545-7885.doi: 10 . 1371 / journal . pbio . 2003693.url: https://dx.plos.org/10.1371/journal.pbio.2003693(visited on 04/25/2026). 50

work page doi:10.1371/journal.pbio.2003693(visited 2018

[15] [15]

Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework

Sandra M. Eldridge et al. “Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework”. In:PLOS ONE11.3 (Mar. 15, 2016). Ed. by Chiara Lazzeri, e0150205.issn: 1932-6203.doi: 10 . 1371 / journal . pone . 0150205.url: https : / / dx . plos . org / 10 . 1371 / journal . pone . 0150205(visited...

2016

[16] [16]

FDA.Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products. Dec. 2019

2019

[17] [17]

Experimental Design for Drug Development: A Bayesian Approach

Donald A. Berry. “Experimental Design for Drug Development: A Bayesian Approach”. In: Journal of Biopharmaceutical Statistics1.1 (Jan. 1, 1991), pp. 81–101.issn: 1054-3406, 1520- 5711.doi: 10.1080/10543409108835007.url: https://www.tandfonline.com/doi/full/ 10.1080/10543409108835007(visited on 01/02/2025)

work page doi:10.1080/10543409108835007.url: 1991

[18] [18]

Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations from Experiments

Elizabeth Tipton. “Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations from Experiments”. In:Evaluation Review37.2 (Apr. 2013), pp. 109–139.issn: 1552-3926.doi:10.1177/0193841X13516324. PMID:24647924

work page doi:10.1177/0193841x13516324 2013

[19] [19]

Naoki Egami and Diana Da In Lee.Designing Multi-Context Studies for External Validity: Site Selection via Synthetic Purposive Sampling. Aug. 24, 2023.url:https://naokiegami. com/paper/sps.pdf(visited on 11/22/2023). Pre-published

2023

[20] [20]

Adam Bouyamourn.Where to Experiment? Site Selection Under Distribution Shift via Optimal Transport and Wasserstein DRO. Nov. 6, 2025.doi:10.48550/arXiv.2511.04658 . arXiv: 2511.04658 [stat] .url: http://arxiv.org/abs/2511.04658 (visited on 02/07/2026). Pre-published

work page doi:10.48550/arxiv.2511.04658 2025

[21] [21]

Minimax-Regret Sample Selection in Randomized Experiments

Yuchen Hu et al. “Minimax-Regret Sample Selection in Randomized Experiments”. In:Pro- ceedings of the 25th ACM Conference on Economics and Computation. EC ’24. New York, NY, USA: Association for Computing Machinery, Dec. 17, 2024, pp. 1209–1235.isbn: 979- 8-4007-0704-9.doi: 10.1145/3670865.3673458.url: https://dl.acm.org/doi/10.1145/ 3670865.3673458(visit...

work page doi:10.1145/3670865.3673458.url: 2024

[22] [22]

José Luis Montiel Olea et al.Externally Valid Selection of Experimental Sites via the K-Median Problem. Aug. 29, 2025.doi:10.48550/arXiv.2408.09187. arXiv: 2408.09187 [econ].url: http://arxiv.org/abs/2408.09187(visited on 02/04/2026). Pre-published

work page doi:10.48550/arxiv.2408.09187 2025

[23] [23]

Monographs on Statistics and Applied Probability 36

KaitaiFang,SamuelKotz,andKaiWangNg.Symmetric Multivariate and Related Distributions. Monographs on Statistics and Applied Probability 36. London ; New York: Chapman and Hall, 1990. 220 pp.isbn: 978-0-412-31430-8

1990

[24] [24]

Oracle Estimation of a Change Point in High-Dimensional Quantile Regression , year =

Xinran Li and Peng Ding. “General Forms of Finite Population Central Limit Theorems with Applications to Causal Inference”. In:Journal of the American Statistical Association112.520 (Oct. 2, 2017), pp. 1759–1769.issn: 0162-1459.doi:10.1080/01621459.2017.1295865.url: https://doi.org/10.1080/01621459.2017.1295865(visited on 01/04/2026)

work page doi:10.1080/01621459.2017.1295865.url: 2017

[25] [25]

Peng Ding.A First Course in Causal Inference. Oct. 3, 2023.doi:10.48550/arXiv.2305. 18793. arXiv: 2305.18793 [stat] .url: http://arxiv.org/abs/2305.18793 (visited on 01/21/2026). Pre-published. 51

work page doi:10.48550/arxiv.2305 2023

[26] [26]

The Knowledge-Gradient Policy for Correlated Normal Beliefs

Peter Frazier, Warren Powell, and Savas Dayanik. “The Knowledge-Gradient Policy for Correlated Normal Beliefs”. In:INFORMS Journal on Computing21.4 (Nov. 2009), pp. 599– 613.issn: 1091-9856, 1526-5528.doi: 10.1287/ijoc.1080.0314.url: https://pubsonline. informs.org/doi/10.1287/ijoc.1080.0314(visited on 03/18/2024)

work page doi:10.1287/ijoc.1080.0314.url: 2009

[27] [27]

Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions

Stephen E. Chick, Noah Gans, and Özge Yapar. “Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions”. In:Management Science68.7 (July 2022), pp. 4919–4938.issn: 0025-1909.doi: 10.1287/mnsc.2021.4137.url: https://pubsonline. informs.org/doi/10.1287/mnsc.2021.4137(visited on 10/21/2025)

work page doi:10.1287/mnsc.2021.4137.url: 2022

[28] [28]

A/B Testing with Fat Tails

Eduardo M. Azevedo et al. “A/B Testing with Fat Tails”. In:Journal of Political Economy 128.12 (Dec. 1, 2020), pp. 4614–000.issn: 0022-3808, 1537-534X.doi:10.1086/710607.url: https://www.journals.uchicago.edu/doi/10.1086/710607(visited on 03/01/2024)

work page doi:10.1086/710607.url: 2020

[29] [29]

Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products Guidance for Industry

FDA, CDER, and CBER. “Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products Guidance for Industry”. In: (2019)

2019

[30] [30]

Human HDAC6 senses valine abundancy to regulate DNA damage

David S. Yeager et al. “A National Experiment Reveals Where a Growth Mindset Improves Achievement”. In:Nature573.7774 (Sept. 2019), pp. 364–369.issn: 0028-0836, 1476-4687.doi: 10.1038/s41586- 019- 1466- y.url: https://www.nature.com/articles/s41586- 019- 1466-y(visited on 04/29/2024)

work page doi:10.1038/s41586- 2019

[31] [31]

Interpreting Effect Sizes of Education Interventions

Matthew A. Kraft. “Interpreting Effect Sizes of Education Interventions”. In:Educational Researcher49.4 (May 2020), pp. 241–253.issn: 0013-189X, 1935-102X.doi: 10 . 3102 / 0013189X20912798.url: https://journals.sagepub.com/doi/10.3102/0013189X20912798 (visited on 02/02/2026)

work page doi:10.3102/0013189x20912798 2020

[32] [32]

Translating Evidence into Practice: Eligibility Criteria Fail to Eliminate Clinically Significant Differences between Real-World and Study Populations

Amelia J. Averitt et al. “Translating Evidence into Practice: Eligibility Criteria Fail to Eliminate Clinically Significant Differences between Real-World and Study Populations”. In: NPJ digital medicine3 (2020), p. 67.issn: 2398-6352.doi: 10.1038/s41746-020-0277-8 . PMID:32411828

work page doi:10.1038/s41746-020-0277-8 2020

[33] [33]

Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments

Rachael Meager. “Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments”. In:American Economic Journal: Applied Economics11.1 (Jan. 1, 2019), pp. 57–91.issn: 1945-7782, 1945-7790.doi:10.1257/ app.20170299.url: https://pubs.aeaweb.org/doi/10.1257/app.20170299 (visited on 01/12/2024)

work page doi:10.1257/app.20170299 2019

[34] [34]

Heterogeneity in Mathematics Intervention Effects: Evidence from a Meta-Analysis of 191 Randomized Experiments

Ryan Williams et al. “Heterogeneity in Mathematics Intervention Effects: Evidence from a Meta-Analysis of 191 Randomized Experiments”. In:Journal of Research on Educational Effectiveness15.3 (July 3, 2022), pp. 584–634.issn: 1934-5747, 1934-5739.doi:10.1080/ 19345747 . 2021 . 2009072.url: https : / / www . tandfonline . com / doi / full / 10 . 1080 / 1934...

arXiv 2022

[35] [35]

URL https://www.science.org/doi/abs/10.1126/science

Open Science Collaboration. “Estimating the Reproducibility of Psychological Science”. In: Science349.6251 (Aug. 28, 2015), aac4716.issn: 0036-8075, 1095-9203.doi:10.1126/science. aac4716.url: https://www.science.org/doi/10.1126/science.aac4716 (visited on 02/06/2026)

work page doi:10.1126/science 2015

[36] [36]

Evaluating Replicability of Laboratory Experiments in Economics

Colin F. Camerer et al. “Evaluating Replicability of Laboratory Experiments in Economics”. In:Science351.6280 (Mar. 25, 2016), pp. 1433–1436.issn: 0036-8075, 1095-9203.doi: 10. 1126/science.aaf0918.url: https://www.science.org/doi/10.1126/science.aaf0918 (visited on 02/06/2026). 52

work page doi:10.1126/science.aaf0918 2016

[37] [37]

RaiseStandardsforPreclinicalCancerResearch

C.GlennBegleyandLeeM.Ellis.“RaiseStandardsforPreclinicalCancerResearch”.In:Nature 483.7391 (Mar. 29, 2012), pp. 531–533.issn: 0028-0836, 1476-4687.doi:10.1038/483531a. url:https://www.nature.com/articles/483531a(visited on 02/06/2026)

work page doi:10.1038/483531a 2012

[38] [38]

Washington, D.C.: National Academies Press, Sept

Committee on Reproducibility and Replicability in Science et al.Reproducibility and Repli- cability in Science. Washington, D.C.: National Academies Press, Sept. 20, 2019, p. 25303. isbn: 978-0-309-48616-3.doi: 10.17226/25303 .url: https://www.nationalacademies. org/publications/25303(visited on 02/08/2026)

work page doi:10.17226/25303 2019

[39] [39]

Site Selection in Experiments: An Assessment of Site Recruitment and Generalizability in Two Scale-up Studies

Elizabeth Tipton et al. “Site Selection in Experiments: An Assessment of Site Recruitment and Generalizability in Two Scale-up Studies”. In:Journal of Research on Educational Effectiveness 9 (sup1 Oct. 3, 2016), pp. 209–228.issn: 1934-5747, 1934-5739.doi:10.1080/19345747. 2015.1105895.url: https://www.tandfonline.com/doi/full/10.1080/19345747.2015. 110589...

work page doi:10.1080/19345747 2016

[40] [40]

Tett, Margaret J

Peter F. Thall. “Adaptive Enrichment Designs in Clinical Trials”. In:Annual review of statistics and its application8.1 (Mar. 2021), pp. 393–411.issn: 2326-8298.doi:10.1146/annurev- statistics- 040720- 032818. PMID: 36212769.url: https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC9544313/(visited on 12/31/2024)

work page doi:10.1146/annurev- 2021

[41] [41]

Optimized Adaptive Enrichment Designs for Three-Arm Trials: Learning Which Subpopulations Benefit from Different Treatments

Jon Arni Steingrimsson et al. “Optimized Adaptive Enrichment Designs for Three-Arm Trials: Learning Which Subpopulations Benefit from Different Treatments”. In:Biostatistics22.2 (Apr. 10, 2021), pp. 283–297.issn: 1465-4644, 1468-4357.doi:10.1093/biostatistics/ kxz030.url: https://academic.oup.com/biostatistics/article/22/2/283/5550955 (visited on 12/31/2024)

work page doi:10.1093/biostatistics/ 2021

[42] [42]

A Review of Statistical Methods for Generalizing From Evaluations of Educational Interventions

Elizabeth Tipton and Robert B. Olsen. “A Review of Statistical Methods for Generalizing From Evaluations of Educational Interventions”. In:Educational Researcher47.8 (Nov. 2018), pp. 516–524.issn: 0013-189X, 1935-102X.doi: 10.3102/0013189X18781522 .url: http: //journals.sagepub.com/doi/10.3102/0013189X18781522(visited on 04/11/2024)

work page doi:10.3102/0013189x18781522 2018

[43] [44]

An Overview of Current Methods for Real-world Applications to Gener- alize or Transport Clinical Trial Findings to Target Populations of Interest

Albee Y. Ling et al. “An Overview of Current Methods for Real-world Applications to Gener- alize or Transport Clinical Trial Findings to Target Populations of Interest”. In:Epidemiology 34.5 (Sept. 1, 2023), pp. 627–636.issn: 1531-5487.doi:10.1097/EDE.0000000000001633. PMID:37255252

work page doi:10.1097/ede.0000000000001633 2023

[44] [45]

Edinburgh: Oliver and Boyd, 1949

Ronald Aylmer Fisher.The Design of Experiments. Edinburgh: Oliver and Boyd, 1949

1949

[45] [46]

Classic ed., unabridged republ

Friedrich Pukelsheim.Optimal Design of Experiments. Classic ed., unabridged republ. of the work first publ. by Wiley, 1993. Classics in Applied Mathematics 50. Philadelphia, Pa: SIAM, Soc. for Industrial and Applied Mathematics, 2006. 454 pp.isbn: 978-0-89871-604-7

1993

[46] [47]

Bayesian Experimental Design: A Review

Kathryn Chaloner and Isabella Verdinelli. “Bayesian Experimental Design: A Review”. In: Statistical Science10.3 (Aug. 1, 1995).issn: 0883-4237.doi: 10 . 1214 / ss / 1177009939. url: https://projecteuclid.org/journals/statistical-science/volume-10/issue- 3/Bayesian-Experimental-Design-A-Review/10.1214/ss/1177009939.full (visited on 01/02/2025). 53

work page doi:10.1214/ss/1177009939.full 1995

[47] [48]

Tom Rainforth et al.Modern Bayesian Experimental Design. Nov. 29, 2023. arXiv:2302.14545 [cs, stat] .url: http : / / arxiv . org / abs / 2302 . 14545(visited on 06/27/2024). Pre- published

arXiv 2023

[48] [49]

Comparison of Experiments

David Blackwell. “Comparison of Experiments”. In:Proceedings of the Second Berkeley Sympo- sium on Mathematical Statistics and Probability. Vol. 2. University of California Press, Jan. 1, 1951, pp. 93–103.url:https://projecteuclid.org/ebooks/berkeley- symposium- on- mathematical- statistics- and- probability/Proceedings- of- the- Second- Berkeley- Symposi...

arXiv 1951

[49] [50]

Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters

Carlos Fernández-Loría and Foster Provost. “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters”. In:INFORMS Journal on Data Science1.1 (Apr. 2022), pp. 4–16.issn: 2694-4022, 2694-4030.doi:10.1287/ijds.2021.0006. url: https : / / pubsonline . informs . org / doi / 10 . 1287 / ijds . 2021 . 0006(visited on 09/12/2025)

work page doi:10.1287/ijds.2021.0006 2022

[50] [51]

Commentary on “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters

Dean Eckles. “Commentary on “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters”: On Loss Functions and Bias–Variance Tradeoffs in Causal Estimation and Decisions”. In:INFORMS Journal on Data Science1.1 (Apr. 2022), pp. 17–18.issn: 2694-4022, 2694-4030.doi:10.1287/ijds.2022.0012.url: https: //pubsonline.informs.org/d...

work page doi:10.1287/ijds.2022.0012.url: 2022

[51] [52]

A Review of Modern Computational Algorithms for Bayesian Optimal Design

Elizabeth G. Ryan et al. “A Review of Modern Computational Algorithms for Bayesian Optimal Design”. In:International Statistical Review84.1 (2016), pp. 128–154.issn: 1751- 5823.doi: 10.1111/insr.12107.url: https://onlinelibrary.wiley.com/doi/abs/10. 1111/insr.12107(visited on 02/07/2026)

work page doi:10.1111/insr.12107.url: 2016

[52] [53]

Hoiyi Ng and Guido Imbens.Scalable Decisions Using a Bayesian Decision-Theoretic Approach. Jan. 27, 2026.doi: 10.48550/arXiv.2601.20031 . arXiv: 2601.20031 [stat].url: http: //arxiv.org/abs/2601.20031(visited on 02/07/2026). Pre-published

work page doi:10.48550/arxiv.2601.20031 2026

[53] [54]

A Nonconcavity in the Value of Information

Roy Radner and Joseph Stiglitz. “A Nonconcavity in the Value of Information”. In:Bayesian models in economic theory5 (1984), pp. 33–52.url: https : / / pages . stern . nyu . edu / ~rradner/publishedpapers/50Nonconcavity.pdf(visited on 04/11/2025)

1984

[54] [55]

A Tight Sufficient Condition for Radner–Stiglitz Nonconcavity in the Value of Information

Michel De Lara and Laurent Gilotte. “A Tight Sufficient Condition for Radner–Stiglitz Nonconcavity in the Value of Information”. In:Journal of Economic Theory137.1 (Nov. 2007), pp. 696–708.issn: 00220531.doi: 10 . 1016 / j . jet . 2007 . 01 . 014.url: https : //linkinghub.elsevier.com/retrieve/pii/S0022053107000373(visited on 04/11/2025)

2007

[55] [56]

Value of Information Analysis for Research Decisions-An Introduction: Report 1 of the ISPOR Value of Information Analysis Emerging Good Practices Task Force

Elisabeth Fenwick et al. “Value of Information Analysis for Research Decisions-An Introduction: Report 1 of the ISPOR Value of Information Analysis Emerging Good Practices Task Force”. In: Value in Health: The Journal of the International Society for Pharmacoeconomics and Outcomes Research23.2 (Feb. 2020), pp. 139–150.issn: 1524-4733.doi:10.1016/j.jval.20...

work page doi:10.1016/j.jval.2020.01.001 2020

[56] [57]

Calculating Expected Value of Sample Information Adjusting for Imperfect Im- plementation

Anna Heath. “Calculating Expected Value of Sample Information Adjusting for Imperfect Im- plementation”. In:Medical Decision Making42.5 (July 1, 2022), pp. 626–636.issn: 0272-989X. doi: 10.1177/0272989X211073098.url: https://doi.org/10.1177/0272989X211073098 (visited on 05/11/2024). 54

work page doi:10.1177/0272989x211073098.url: 2022

[57] [58]

Expected Value of Sample Information to Guide the Design of Group Sequential Clinical Trials

Laura Flight et al. “Expected Value of Sample Information to Guide the Design of Group Sequential Clinical Trials”. In:Medical Decision Making42.4 (May 2022), pp. 461–473.issn: 0272-989X, 1552-681X.doi: 10.1177/0272989X211045036.url: http://journals.sagepub. com/doi/10.1177/0272989X211045036(visited on 05/11/2024)

work page doi:10.1177/0272989x211045036.url: 2022

[58] [59]

Version 1

Michael Gechter et al.Selecting Experimental Sites for External Validity. Version 1. May 21, 2024.doi: 10.48550/arXiv.2405.13241. arXiv: 2405.13241 [econ].url: http://arxiv. org/abs/2405.13241(visited on 02/07/2026). Pre-published

work page doi:10.48550/arxiv.2405.13241 2024

[59] [61]

July 7, 2023.doi: 10

Stephen Bates et al.Incentive-Theoretic Bayesian Inference for Collaborative Science. July 7, 2023.doi: 10 . 48550 / arXiv . 2307 . 03748. arXiv: 2307 . 03748 [cs, stat] .url: http : //arxiv.org/abs/2307.03748(visited on 10/24/2023). Pre-published

arXiv 2023

[60] [62]

Screening for Experiments

Daehong Min. “Screening for Experiments”. In:Games and Economic Behavior142 (Nov. 1, 2023), pp. 73–100.issn: 0899-8256.doi:10.1016/j.geb.2023.07.009.url: https://www. sciencedirect.com/science/article/pii/S0899825623001021(visited on 12/24/2024)

work page doi:10.1016/j.geb.2023.07.009.url: 2023

[61] [63]

Stability analysis of fluid flows using Lagrangian Perturbation Theory (LPT): application to the plane Couette flow

Ronald L. Wasserstein and Nicole A. Lazar. “The ASA Statement onp-Values: Context, Process, and Purpose”. In:The American Statistician70.2 (Apr. 2, 2016), pp. 129–133. issn: 0003-1305, 1537-2731.doi: 10.1080/00031305.2016.1154108 .url: https://www. tandfonline.com/doi/full/10.1080/00031305.2016.1154108(visited on 02/08/2026)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2016.1154108 2016

[62] [64]

A Theory of Experimenters: Robustness, Randomization, and Balance

Abhijit V. Banerjee et al. “A Theory of Experimenters: Robustness, Randomization, and Balance”. In:American Economic Review110.4 (Apr. 1, 2020), pp. 1206–1230.issn: 0002- 8282.doi: 10.1257/aer.20171634 .url: https://pubs.aeaweb.org/doi/10.1257/aer. 20171634(visited on 02/07/2026)

work page doi:10.1257/aer.20171634 2020

[63] [65]

Benjamin Recht.A Bureaucratic Theory of Statistics. Jan. 7, 2025.doi:10.48550/arXiv. 2501.03457. arXiv: 2501.03457 [stat].url: http://arxiv.org/abs/2501.03457 (visited on 01/08/2025). Pre-published

work page internal anchor Pith review doi:10.48550/arxiv 2025

[64] [66]

Foundations of a General Theory of Sequential Decision Functions

Abraham Wald. “Foundations of a General Theory of Sequential Decision Functions”. In: Econometrica15.4 (1947), pp. 279–313.issn: 0012-9682.doi: 10.2307/1905331. JSTOR: 1905331.url:https://www.jstor.org/stable/1905331(visited on 02/07/2026)

work page doi:10.2307/1905331 1947

[65] [67]

Ethan Che and Hongseok Namkoong.Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches. Aug. 14, 2023.doi:10.48550/arXiv.2303.11582. arXiv: 2303.11582 [cs, stat] .url: http://arxiv.org/abs/2303.11582 (visited on 09/11/2023). Pre-published

work page doi:10.48550/arxiv.2303.11582 2023

[66] [68]

An Empirical Evaluation of Thompson Sampling

Olivier Chapelle and Lihong Li. “An Empirical Evaluation of Thompson Sampling”. In:Ad- vances in Neural Information Processing Systems. Vol. 24. Curran Associates, Inc., 2011.url: https://papers.nips.cc/paper_files/paper/2011/hash/e53a0a2978c28872a4505bdb51db06dc- Abstract.html(visited on 02/07/2026)

2011

[67] [69]

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines

Anna L. Trella et al. “Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines”. In:Algorithms15.8 (Aug. 2022), p. 255.issn: 1999-4893.doi: 10.3390/a15080255. PMID: 36713810.url: https://pmc.ncbi.nlm.nih.gov/articles/ PMC9881427/(visited on 02/07/2026). 55

work page doi:10.3390/a15080255 2022

[68] [70]

SQR: Balancing Speed, Quality and Risk in Online Experiments

Ya Xu, Weitao Duan, and Shaochen Huang. “SQR: Balancing Speed, Quality and Risk in Online Experiments”. In:Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London United Kingdom: ACM, July 19, 2018, pp. 895–904.isb...

work page doi:10.1145/3219819.3219875.url: 2018

[69] [71]

Bayesian Optimal Design for Phase II Screening Trials

Meichun Ding, Gary L. Rosner, and Peter Müller. “Bayesian Optimal Design for Phase II Screening Trials”. In:Biometrics64.3 (Sept. 2008), pp. 886–894.issn: 0006-341X, 1541- 0420.doi: 10 . 1111 / j . 1541 - 0420 . 2007 . 00951 . x.url: https : / / academic . oup . com / biometrics/article/64/3/886-894/7331561(visited on 02/07/2026)

2008

[70] [72]

Utility-based Optimization of Phase II/III Programs

Marietta Kirchner et al. “Utility-based Optimization of Phase II/III Programs”. In:Statistics in Medicine35.2 (Jan. 30, 2016), pp. 305–316.issn: 0277-6715, 1097-0258.doi:10.1002/ sim.6624.url: https://onlinelibrary.wiley.com/doi/10.1002/sim.6624 (visited on 02/07/2026)

work page doi:10.1002/sim.6624 2016

[71] [73]

Optimal Designs for Phase II/III Drug Development Programs Including Methods for Discounting of Phase II Results

Stella Erdmann et al. “Optimal Designs for Phase II/III Drug Development Programs Including Methods for Discounting of Phase II Results”. In:BMC Medical Research Methodology20.1 (Dec. 2020), p. 253.issn: 1471-2288.doi: 10 . 1186 / s12874 - 020 - 01093 - w.url: https : / / bmcmedresmethodol . biomedcentral . com / articles / 10 . 1186 / s12874 - 020 - 0109...

2020

[72] [74]

How Generalizable Is Your Experiment? An Index for Comparing Ex- perimental Samples and Populations

Elizabeth Tipton. “How Generalizable Is Your Experiment? An Index for Comparing Ex- perimental Samples and Populations”. In:Journal of Educational and Behavioral Statistics 39.6 (Dec. 2014), pp. 478–501.issn: 1076-9986, 1935-1054.doi:10.3102/1076998614558486. url: https : / / journals . sagepub . com / doi / 10 . 3102 / 1076998614558486(visited on 02/09/2026)

work page doi:10.3102/1076998614558486 2014

[73] [75]

G. B. Folland.Real Analysis: Modern Techniques and Their Applications. 2nd ed. Pure and Applied Mathematics. New York: Wiley, 1999. 386 pp.isbn: 978-0-471-31716-6

1999

[74] [76]

G. B. Folland.Advanced Calculus. 2nd ed. Aug. 4, 2023.url: https : / / sites . math . washington.edu//~folland/AdvCalc24.pdf(visited on 04/08/2026)

2023

[75] [77]

Tyrrell Rockafellar and Roger Wets.Variational Analysis

R. Tyrrell Rockafellar and Roger Wets.Variational Analysis. Vol. 317. Grundlehren Der Mathematischen Wissenschaften. Springer Verlag, 2009. 56

2009