pith. sign in

arxiv: 2606.13531 · v1 · pith:J6NERKUFnew · submitted 2026-06-11 · 📊 stat.ME

When Representative Samples Produce Worse Outcomes: Scale-up Decisions and Testing in Small-Budget RCTs

Pith reviewed 2026-06-27 05:45 UTC · model grok-4.3

classification 📊 stat.ME
keywords pilot RCTssample representativenesssmall-budget experimentsheterogeneous treatment effectssignificance testingscale-up decisionsoptimal experimental design
0
0 comments X

The pith

In small-budget pilot RCTs, sampling from one homogeneous subpopulation can maximize expected downstream impact more than representative sampling when decisions rely on significance tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Small randomized controlled trials screen interventions before larger studies, where errors in scaling decisions carry high costs. The paper demonstrates that representative samples are not always optimal in these pilots. When budgets are tight and scale-up decisions rest on statistical significance tests, the design maximizing expected improvement draws the entire sample from a single homogeneous subpopulation. The choice of which subpopulation depends on sampling costs and prior beliefs about treatment effect variation across groups. With large budgets the optimum shifts toward a representative sample of the full target population.

Core claim

When an RCT paired with a non-adaptive significance test determines whether an intervention receives any downstream payoff, the pilot sample composition maximizing expected impact consists of a single homogeneous subpopulation in the small-budget regime; the subpopulation is selected according to sampling costs and the designer's priors on heterogeneous treatment effects. In the large-budget limit this composition converges to a representative sample of the target population.

What carries the argument

The budget-constrained pilot sample allocation that maximizes the expected value of the downstream payoff under a fixed significance-test decision rule.

If this is right

  • The optimal pilot composition is not fixed but depends on the available budget size.
  • In the small-budget regime homogeneous sampling from one subpopulation outperforms representative sampling.
  • The preferred subpopulation is determined jointly by sampling costs and prior beliefs on treatment effect heterogeneity.
  • The small-budget result extends to any setting where a significance test on RCT data decides receipt of a non-adaptive downstream payoff.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Decision rules that adaptively incorporate pilot data or use Bayesian updating may alter the optimal sampling strategy.
  • The result implies that low-budget experimentation in other domains could also favor non-representative designs when tests gate payoffs.
  • Accurate priors on effect heterogeneity become especially valuable when budgets constrain pilot size.

Load-bearing premise

Downstream decisions are made by a non-adaptive significance test applied to the pilot RCT data.

What would settle it

A simulation or empirical study in which, under small budgets and significance-test decisions, a representative sample produces strictly higher expected downstream impact than the single-subpopulation design.

Figures

Figures reproduced from arXiv: 2606.13531 by Hannah Li, Hongseok Namkoong, Isaac Scheinfeld.

Figure 1
Figure 1. Figure 1: The three-stage experiment pipeline we model. The intervention is first evaluated in a budget-constrained pilot RCT, where the result of a significance test determines whether it advances to a follow-up RCT. A second significance test determines adoption of the intervention in the target population, leading to a potential improvement in average outcomes if successful. We show that when optimizing for expec… view at source ↗
Figure 2
Figure 2. Figure 2: Optimal design depends on pilot resources. We plot the expected downstream impact achieved by the optimal pilot design, the best small-budget single-subpopulation design, and a representative sample. In budget-constrained settings, optimal pilots sample from a single homogeneous subpopulation, while representative sampling is best for well-resourced trials. To gain insight into how treatment effect heterog… view at source ↗
Figure 3
Figure 3. Figure 3: The three-stage experimental pipeline. The pilot design s1 is highlighted in blue. The samples s1 and s2 in the first two stages determine the distribution of the average treatment effect estimates, and the population s3 in the third stage determines the change in outcomes if both significance tests pass. 2.3 Probabilistic model We define a probabilistic model of the experimental process. The pilot designe… view at source ↗
Figure 4
Figure 4. Figure 4: Why a representative sample is optimal for large budgets. When the sample is non-representative, i.e. s1 ∝̸ s3, the pilot decision boundary is misaligned with the boundary separating positive and negative average treatment effects in the target population. Even with an infinite sampling investment, there are conditional average treatment effects τ for which the pilot incorrectly classifies the sign of the … view at source ↗
Figure 5
Figure 5. Figure 5: The small-budget index for two types under a large ( [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pilot RCT Impact. Comparison of the small-budget and limiting large-budget optimal pilot designs for a range of sample sizes in a semi-synthetic scenario calibrated to the NSLM study. In this instance where costs across types are assumed constant, the budget is equivalent to a constraint on the number of students, and the small-budget optimal design is equivalent to sampling only from the schools with the … view at source ↗
read the original abstract

Small randomized controlled trials are often used to screen interventions before running larger follow-up studies. This is a critical phase of experimentation, as missing effective interventions or scaling up harmful ones can be very costly. A common proposal to mitigate these errors is to recruit samples that are representative of the target population, but this is often challenging in resource-constrained pilots. We challenge the narrative that representative samples are always superior by showing that when statistical significance testing determines whether interventions receive further study, the pilot trial composition that maximizes the downstream expected improvement in outcomes depends critically on its budget size. In the large-budget limit, the optimal pilot design converges to a sample that is representative of the target population. However, in the small-budget regime, the pilot designer maximizes expected impact by sampling only from a single homogeneous sub-population, chosen in a manner that depends on sampling costs and the designer's prior beliefs about heterogeneous treatment effects. Our proof of the small-budget result applies more generally when an RCT and significance test are used to decide whether to receive any non-adaptive downstream payoff, a result that may be applicable to other settings with constrained experimentation budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript claims that when small-budget RCTs are used to screen interventions before larger studies, with a significance test determining whether to scale up, the pilot sampling design that maximizes expected downstream impact is to draw all samples from a single homogeneous subpopulation (chosen based on group-specific sampling costs and the designer's priors on heterogeneous treatment effects). In the large-budget limit the optimum converges to a representative sample of the target population. The small-budget result is derived for the case of any non-adaptive downstream payoff decided by such a significance test.

Significance. If the central derivation holds, the result supplies a precise theoretical counter-example to the default recommendation for representative sampling in pilot RCTs, showing that the optimality of representativeness is budget-dependent and decision-rule-dependent. The explicit generalization of the small-budget proof to arbitrary non-adaptive payoffs decided by a significance test is a clear strength, as is the clean separation between the small-budget and large-budget regimes. The scoping to non-adaptive significance testing is stated up front, so the stress-test concern about other decision rules (Bayesian, magnitude-based, etc.) does not undermine the manuscript's internal claims.

minor comments (3)
  1. The precise functional form of the significance test (e.g., one-sided t-test, exact threshold) and the downstream payoff function should be stated explicitly in the main text before the small-budget theorem, rather than only in the appendix, to make the load-bearing threshold effect transparent.
  2. Notation for the heterogeneous treatment effects and the prior distribution over them is introduced gradually; a single early display equation collecting all primitives would improve readability.
  3. Figure 2 (or equivalent) comparing optimal allocation across budget sizes would benefit from an explicit legend indicating the prior parameters used in each panel.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading, positive summary of the manuscript's contributions, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain.

full rationale

The paper presents a mathematical derivation of optimal pilot sampling under a model with heterogeneous treatment effects, sampling costs, and a non-adaptive significance-testing decision rule. The small-budget result (single-subpopulation sampling) follows from the model's assumptions and proof, without any quoted reduction of the optimum to a fitted parameter, self-defined quantity, or self-citation chain. The decision rule is an explicit modeling choice rather than a hidden tautology. This is a standard non-circular theoretical result.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on a model of heterogeneous treatment effects, group-specific sampling costs, and a non-adaptive significance test that gates a downstream payoff; these are standard domain assumptions rather than new entities.

free parameters (2)
  • prior beliefs on heterogeneous treatment effects
    Designer's priors determine which subpopulation is chosen in the small-budget optimum.
  • group-specific sampling costs
    Costs enter the optimization that selects the single subpopulation.
axioms (2)
  • domain assumption Downstream payoff is non-adaptive and determined solely by whether the pilot passes a significance test.
    Stated in the abstract as the setting where the result applies.
  • domain assumption Treatment effects are heterogeneous across subpopulations.
    Required for the single-subpopulation optimum to differ from representative sampling.

pith-pipeline@v0.9.1-grok · 5733 in / 1309 out tokens · 15753 ms · 2026-06-27T05:45:02.859473+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

75 extracted references · 50 canonical work pages · 2 internal anchors

  1. [1]

    Imbens and Donald B

    Guido W. Imbens and Donald B. Rubin.Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction. Cambridge: Cambridge University Press, 2015.isbn: 978- 0-521-88588-1.doi: 10 . 1017 / CBO9781139025751.url: https : / / www . cambridge . org / core/books/causal- inference- for- statistics- social- and- biomedical- sciences/ 71126BE90C...

  2. [2]

    Determining Optimal Sample Sizes for Multistage Adaptive Randomized Clinical Trials from an Industry Perspective Using Value of Information Methods

    Maggie H Chen and Andrew R Willan. “Determining Optimal Sample Sizes for Multistage Adaptive Randomized Clinical Trials from an Industry Perspective Using Value of Information Methods”. In:Clinical Trials10.1 (Feb. 1, 2013), pp. 54–62.issn: 1740-7745.doi: 10 . 1177 / 1740774512467404.url: https : / / doi . org / 10 . 1177 / 1740774512467404(visited on 05/11/2024)

  3. [3]

    Adaptive Treatment Assignment in Experiments for Policy Choice

    Maximilian Kasy and Anja Sautmann. “Adaptive Treatment Assignment in Experiments for Policy Choice”. In:Econometrica89.1 (2021), pp. 113–132.issn: 0012-9682.doi: 10. 3982/ECTA17527.url: https://www.econometricsociety.org/doi/10.3982/ECTA17527 (visited on 02/08/2026). 49

  4. [4]

    E8(R1) GENERAL CONSIDERATIONS FOR CLINICAL STUD- IES

    FDA, CDER, and CBER. “E8(R1) GENERAL CONSIDERATIONS FOR CLINICAL STUD- IES”. In: (Apr. 2022)

  5. [5]

    Department of Education et al.The Future of Education Research at IES: Advancing an Equity-Oriented Science

    Committee on the Future of Education Research at the Institute of Education Sciences in the U.S. Department of Education et al.The Future of Education Research at IES: Advancing an Equity-Oriented Science. Ed. by Adam Gamoran and Kenne Dibner. Washington, D.C.: National Academies Press, July 1, 2022, p. 26428.isbn: 978-0-309-27539-2.doi:10.17226/ 26428.ur...

  6. [6]

    Screening Designs for Drug Development

    D. Rossell, P. Muller, and G. L. Rosner. “Screening Designs for Drug Development”. In: Biostatistics8.3 (July 1, 2007), pp. 595–608.issn: 1465-4644, 1468-4357.doi: 10 . 1093 / biostatistics / kxl031.url: https : / / academic . oup . com / biostatistics / article - lookup/doi/10.1093/biostatistics/kxl031(visited on 02/07/2026)

  7. [7]

    Beyond Generalization of the ATE: Designing Randomized Trials to Understand Treatment Effect Heterogeneity

    Elizabeth Tipton. “Beyond Generalization of the ATE: Designing Randomized Trials to Understand Treatment Effect Heterogeneity”. In:Journal of the Royal Statistical Society Series A: Statistics in Society184.2 (Apr. 1, 2021), pp. 504–521.issn: 0964-1998, 1467-985X. doi: 10.1111/rssa.12629 .url: https://academic.oup.com/jrsssa/article/184/2/ 504/7056369(vis...

  8. [8]

    FDA et al.Enhancing the Diversity of Clinical Trial Populations. Nov. 2020

  9. [9]

    Toward a System of Evidence for All: Current Practices and Future Opportunities in 37 Randomized Trials

    Elizabeth Tipton et al. “Toward a System of Evidence for All: Current Practices and Future Opportunities in 37 Randomized Trials”. In:Educational Researcher50.3 (Apr. 2021), pp. 145– 156.issn: 0013-189X, 1935-102X.doi: 10.3102/0013189X20960686.url: http://journals. sagepub.com/doi/10.3102/0013189X20960686(visited on 04/29/2024)

  10. [10]

    Elements of External Validity: Framework, Design, and Analysis

    Naoki Egami and Erin Hartman. “Elements of External Validity: Framework, Design, and Analysis”. In:American Political Science Review117.3 (Aug. 2023), pp. 1070–1088.issn: 0003-0554, 1537-5943.doi: 10.1017/S0003055422000880.url: https://www.cambridge. org/core/product/identifier/S0003055422000880/type/journal_article (visited on 11/16/2023)

  11. [11]

    Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution

    Christopher J. Bryan, Elizabeth Tipton, and David S. Yeager. “Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution”. In:Nature Human Behaviour 5.8 (July 22, 2021), pp. 980–989.issn: 2397-3374.doi:10.1038/s41562-021-01143-3.url: https://www.nature.com/articles/s41562-021-01143-3(visited on 04/29/2024)

  12. [12]

    External Validity of Randomised Controlled Trials: “To Whom Do the Results of This Trial Apply?

    Peter M Rothwell. “External Validity of Randomised Controlled Trials: “To Whom Do the Results of This Trial Apply?”” In:The Lancet365.9453 (Jan. 2005), pp. 82–93.issn: 01406736. doi: 10 . 1016 / S0140 - 6736(04 ) 17670 - 8.url: https : / / linkinghub . elsevier . com / retrieve/pii/S0140673604176708(visited on 04/25/2026)

  13. [13]

    The Weirdest People in the World?

    Joseph Henrich, Steven J. Heine, and Ara Norenzayan. “The Weirdest People in the World?” In:Behavioral and Brain Sciences33.2–3 (June 2010), pp. 61–83.issn: 0140-525X, 1469-1825. doi: 10.1017/S0140525X0999152X .url: https://www.cambridge.org/core/product/ identifier/S0140525X0999152X/type/journal_article(visited on 04/25/2026)

  14. [14]

    Reproducibility of Preclinical Animal Research Improves with Het- erogeneity of Study Samples

    Bernhard Voelkl et al. “Reproducibility of Preclinical Animal Research Improves with Het- erogeneity of Study Samples”. In:PLOS Biology16.2 (Feb. 22, 2018). Ed. by Eric-Jan Wagenmakers, e2003693.issn: 1545-7885.doi: 10 . 1371 / journal . pbio . 2003693.url: https://dx.plos.org/10.1371/journal.pbio.2003693(visited on 04/25/2026). 50

  15. [15]

    Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework

    Sandra M. Eldridge et al. “Defining Feasibility and Pilot Studies in Preparation for Randomised Controlled Trials: Development of a Conceptual Framework”. In:PLOS ONE11.3 (Mar. 15, 2016). Ed. by Chiara Lazzeri, e0150205.issn: 1932-6203.doi: 10 . 1371 / journal . pone . 0150205.url: https : / / dx . plos . org / 10 . 1371 / journal . pone . 0150205(visited...

  16. [16]

    FDA.Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products. Dec. 2019

  17. [17]

    Experimental Design for Drug Development: A Bayesian Approach

    Donald A. Berry. “Experimental Design for Drug Development: A Bayesian Approach”. In: Journal of Biopharmaceutical Statistics1.1 (Jan. 1, 1991), pp. 81–101.issn: 1054-3406, 1520- 5711.doi: 10.1080/10543409108835007.url: https://www.tandfonline.com/doi/full/ 10.1080/10543409108835007(visited on 01/02/2025)

  18. [18]

    Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations from Experiments

    Elizabeth Tipton. “Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations from Experiments”. In:Evaluation Review37.2 (Apr. 2013), pp. 109–139.issn: 1552-3926.doi:10.1177/0193841X13516324. PMID:24647924

  19. [19]

    Naoki Egami and Diana Da In Lee.Designing Multi-Context Studies for External Validity: Site Selection via Synthetic Purposive Sampling. Aug. 24, 2023.url:https://naokiegami. com/paper/sps.pdf(visited on 11/22/2023). Pre-published

  20. [20]

    Adam Bouyamourn.Where to Experiment? Site Selection Under Distribution Shift via Optimal Transport and Wasserstein DRO. Nov. 6, 2025.doi:10.48550/arXiv.2511.04658 . arXiv: 2511.04658 [stat] .url: http://arxiv.org/abs/2511.04658 (visited on 02/07/2026). Pre-published

  21. [21]

    Minimax-Regret Sample Selection in Randomized Experiments

    Yuchen Hu et al. “Minimax-Regret Sample Selection in Randomized Experiments”. In:Pro- ceedings of the 25th ACM Conference on Economics and Computation. EC ’24. New York, NY, USA: Association for Computing Machinery, Dec. 17, 2024, pp. 1209–1235.isbn: 979- 8-4007-0704-9.doi: 10.1145/3670865.3673458.url: https://dl.acm.org/doi/10.1145/ 3670865.3673458(visit...

  22. [22]

    José Luis Montiel Olea et al.Externally Valid Selection of Experimental Sites via the K-Median Problem. Aug. 29, 2025.doi:10.48550/arXiv.2408.09187. arXiv: 2408.09187 [econ].url: http://arxiv.org/abs/2408.09187(visited on 02/04/2026). Pre-published

  23. [23]

    Monographs on Statistics and Applied Probability 36

    KaitaiFang,SamuelKotz,andKaiWangNg.Symmetric Multivariate and Related Distributions. Monographs on Statistics and Applied Probability 36. London ; New York: Chapman and Hall, 1990. 220 pp.isbn: 978-0-412-31430-8

  24. [24]

    Oracle Estimation of a Change Point in High-Dimensional Quantile Regression , year =

    Xinran Li and Peng Ding. “General Forms of Finite Population Central Limit Theorems with Applications to Causal Inference”. In:Journal of the American Statistical Association112.520 (Oct. 2, 2017), pp. 1759–1769.issn: 0162-1459.doi:10.1080/01621459.2017.1295865.url: https://doi.org/10.1080/01621459.2017.1295865(visited on 01/04/2026)

  25. [25]

    Peng Ding.A First Course in Causal Inference. Oct. 3, 2023.doi:10.48550/arXiv.2305. 18793. arXiv: 2305.18793 [stat] .url: http://arxiv.org/abs/2305.18793 (visited on 01/21/2026). Pre-published. 51

  26. [26]

    The Knowledge-Gradient Policy for Correlated Normal Beliefs

    Peter Frazier, Warren Powell, and Savas Dayanik. “The Knowledge-Gradient Policy for Correlated Normal Beliefs”. In:INFORMS Journal on Computing21.4 (Nov. 2009), pp. 599– 613.issn: 1091-9856, 1526-5528.doi: 10.1287/ijoc.1080.0314.url: https://pubsonline. informs.org/doi/10.1287/ijoc.1080.0314(visited on 03/18/2024)

  27. [27]

    Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions

    Stephen E. Chick, Noah Gans, and Özge Yapar. “Bayesian Sequential Learning for Clinical Trials of Multiple Correlated Medical Interventions”. In:Management Science68.7 (July 2022), pp. 4919–4938.issn: 0025-1909.doi: 10.1287/mnsc.2021.4137.url: https://pubsonline. informs.org/doi/10.1287/mnsc.2021.4137(visited on 10/21/2025)

  28. [28]

    A/B Testing with Fat Tails

    Eduardo M. Azevedo et al. “A/B Testing with Fat Tails”. In:Journal of Political Economy 128.12 (Dec. 1, 2020), pp. 4614–000.issn: 0022-3808, 1537-534X.doi:10.1086/710607.url: https://www.journals.uchicago.edu/doi/10.1086/710607(visited on 03/01/2024)

  29. [29]

    Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products Guidance for Industry

    FDA, CDER, and CBER. “Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products Guidance for Industry”. In: (2019)

  30. [30]

    Human HDAC6 senses valine abundancy to regulate DNA damage

    David S. Yeager et al. “A National Experiment Reveals Where a Growth Mindset Improves Achievement”. In:Nature573.7774 (Sept. 2019), pp. 364–369.issn: 0028-0836, 1476-4687.doi: 10.1038/s41586- 019- 1466- y.url: https://www.nature.com/articles/s41586- 019- 1466-y(visited on 04/29/2024)

  31. [31]

    Interpreting Effect Sizes of Education Interventions

    Matthew A. Kraft. “Interpreting Effect Sizes of Education Interventions”. In:Educational Researcher49.4 (May 2020), pp. 241–253.issn: 0013-189X, 1935-102X.doi: 10 . 3102 / 0013189X20912798.url: https://journals.sagepub.com/doi/10.3102/0013189X20912798 (visited on 02/02/2026)

  32. [32]

    Translating Evidence into Practice: Eligibility Criteria Fail to Eliminate Clinically Significant Differences between Real-World and Study Populations

    Amelia J. Averitt et al. “Translating Evidence into Practice: Eligibility Criteria Fail to Eliminate Clinically Significant Differences between Real-World and Study Populations”. In: NPJ digital medicine3 (2020), p. 67.issn: 2398-6352.doi: 10.1038/s41746-020-0277-8 . PMID:32411828

  33. [33]

    Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments

    Rachael Meager. “Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments”. In:American Economic Journal: Applied Economics11.1 (Jan. 1, 2019), pp. 57–91.issn: 1945-7782, 1945-7790.doi:10.1257/ app.20170299.url: https://pubs.aeaweb.org/doi/10.1257/app.20170299 (visited on 01/12/2024)

  34. [34]

    Heterogeneity in Mathematics Intervention Effects: Evidence from a Meta-Analysis of 191 Randomized Experiments

    Ryan Williams et al. “Heterogeneity in Mathematics Intervention Effects: Evidence from a Meta-Analysis of 191 Randomized Experiments”. In:Journal of Research on Educational Effectiveness15.3 (July 3, 2022), pp. 584–634.issn: 1934-5747, 1934-5739.doi:10.1080/ 19345747 . 2021 . 2009072.url: https : / / www . tandfonline . com / doi / full / 10 . 1080 / 1934...

  35. [35]

    URL https://www.science.org/doi/abs/10.1126/science

    Open Science Collaboration. “Estimating the Reproducibility of Psychological Science”. In: Science349.6251 (Aug. 28, 2015), aac4716.issn: 0036-8075, 1095-9203.doi:10.1126/science. aac4716.url: https://www.science.org/doi/10.1126/science.aac4716 (visited on 02/06/2026)

  36. [36]

    Evaluating Replicability of Laboratory Experiments in Economics

    Colin F. Camerer et al. “Evaluating Replicability of Laboratory Experiments in Economics”. In:Science351.6280 (Mar. 25, 2016), pp. 1433–1436.issn: 0036-8075, 1095-9203.doi: 10. 1126/science.aaf0918.url: https://www.science.org/doi/10.1126/science.aaf0918 (visited on 02/06/2026). 52

  37. [37]

    RaiseStandardsforPreclinicalCancerResearch

    C.GlennBegleyandLeeM.Ellis.“RaiseStandardsforPreclinicalCancerResearch”.In:Nature 483.7391 (Mar. 29, 2012), pp. 531–533.issn: 0028-0836, 1476-4687.doi:10.1038/483531a. url:https://www.nature.com/articles/483531a(visited on 02/06/2026)

  38. [38]

    Washington, D.C.: National Academies Press, Sept

    Committee on Reproducibility and Replicability in Science et al.Reproducibility and Repli- cability in Science. Washington, D.C.: National Academies Press, Sept. 20, 2019, p. 25303. isbn: 978-0-309-48616-3.doi: 10.17226/25303 .url: https://www.nationalacademies. org/publications/25303(visited on 02/08/2026)

  39. [39]

    Site Selection in Experiments: An Assessment of Site Recruitment and Generalizability in Two Scale-up Studies

    Elizabeth Tipton et al. “Site Selection in Experiments: An Assessment of Site Recruitment and Generalizability in Two Scale-up Studies”. In:Journal of Research on Educational Effectiveness 9 (sup1 Oct. 3, 2016), pp. 209–228.issn: 1934-5747, 1934-5739.doi:10.1080/19345747. 2015.1105895.url: https://www.tandfonline.com/doi/full/10.1080/19345747.2015. 110589...

  40. [40]

    Tett, Margaret J

    Peter F. Thall. “Adaptive Enrichment Designs in Clinical Trials”. In:Annual review of statistics and its application8.1 (Mar. 2021), pp. 393–411.issn: 2326-8298.doi:10.1146/annurev- statistics- 040720- 032818. PMID: 36212769.url: https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC9544313/(visited on 12/31/2024)

  41. [41]

    Optimized Adaptive Enrichment Designs for Three-Arm Trials: Learning Which Subpopulations Benefit from Different Treatments

    Jon Arni Steingrimsson et al. “Optimized Adaptive Enrichment Designs for Three-Arm Trials: Learning Which Subpopulations Benefit from Different Treatments”. In:Biostatistics22.2 (Apr. 10, 2021), pp. 283–297.issn: 1465-4644, 1468-4357.doi:10.1093/biostatistics/ kxz030.url: https://academic.oup.com/biostatistics/article/22/2/283/5550955 (visited on 12/31/2024)

  42. [42]

    A Review of Statistical Methods for Generalizing From Evaluations of Educational Interventions

    Elizabeth Tipton and Robert B. Olsen. “A Review of Statistical Methods for Generalizing From Evaluations of Educational Interventions”. In:Educational Researcher47.8 (Nov. 2018), pp. 516–524.issn: 0013-189X, 1935-102X.doi: 10.3102/0013189X18781522 .url: http: //journals.sagepub.com/doi/10.3102/0013189X18781522(visited on 04/11/2024)

  43. [44]

    An Overview of Current Methods for Real-world Applications to Gener- alize or Transport Clinical Trial Findings to Target Populations of Interest

    Albee Y. Ling et al. “An Overview of Current Methods for Real-world Applications to Gener- alize or Transport Clinical Trial Findings to Target Populations of Interest”. In:Epidemiology 34.5 (Sept. 1, 2023), pp. 627–636.issn: 1531-5487.doi:10.1097/EDE.0000000000001633. PMID:37255252

  44. [45]

    Edinburgh: Oliver and Boyd, 1949

    Ronald Aylmer Fisher.The Design of Experiments. Edinburgh: Oliver and Boyd, 1949

  45. [46]

    Classic ed., unabridged republ

    Friedrich Pukelsheim.Optimal Design of Experiments. Classic ed., unabridged republ. of the work first publ. by Wiley, 1993. Classics in Applied Mathematics 50. Philadelphia, Pa: SIAM, Soc. for Industrial and Applied Mathematics, 2006. 454 pp.isbn: 978-0-89871-604-7

  46. [47]

    Bayesian Experimental Design: A Review

    Kathryn Chaloner and Isabella Verdinelli. “Bayesian Experimental Design: A Review”. In: Statistical Science10.3 (Aug. 1, 1995).issn: 0883-4237.doi: 10 . 1214 / ss / 1177009939. url: https://projecteuclid.org/journals/statistical-science/volume-10/issue- 3/Bayesian-Experimental-Design-A-Review/10.1214/ss/1177009939.full (visited on 01/02/2025). 53

  47. [48]

    Tom Rainforth et al.Modern Bayesian Experimental Design. Nov. 29, 2023. arXiv:2302.14545 [cs, stat] .url: http : / / arxiv . org / abs / 2302 . 14545(visited on 06/27/2024). Pre- published

  48. [49]

    Comparison of Experiments

    David Blackwell. “Comparison of Experiments”. In:Proceedings of the Second Berkeley Sympo- sium on Mathematical Statistics and Probability. Vol. 2. University of California Press, Jan. 1, 1951, pp. 93–103.url:https://projecteuclid.org/ebooks/berkeley- symposium- on- mathematical- statistics- and- probability/Proceedings- of- the- Second- Berkeley- Symposi...

  49. [50]

    Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters

    Carlos Fernández-Loría and Foster Provost. “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters”. In:INFORMS Journal on Data Science1.1 (Apr. 2022), pp. 4–16.issn: 2694-4022, 2694-4030.doi:10.1287/ijds.2021.0006. url: https : / / pubsonline . informs . org / doi / 10 . 1287 / ijds . 2021 . 0006(visited on 09/12/2025)

  50. [51]

    Commentary on “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters

    Dean Eckles. “Commentary on “Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters”: On Loss Functions and Bias–Variance Tradeoffs in Causal Estimation and Decisions”. In:INFORMS Journal on Data Science1.1 (Apr. 2022), pp. 17–18.issn: 2694-4022, 2694-4030.doi:10.1287/ijds.2022.0012.url: https: //pubsonline.informs.org/d...

  51. [52]

    A Review of Modern Computational Algorithms for Bayesian Optimal Design

    Elizabeth G. Ryan et al. “A Review of Modern Computational Algorithms for Bayesian Optimal Design”. In:International Statistical Review84.1 (2016), pp. 128–154.issn: 1751- 5823.doi: 10.1111/insr.12107.url: https://onlinelibrary.wiley.com/doi/abs/10. 1111/insr.12107(visited on 02/07/2026)

  52. [53]

    Hoiyi Ng and Guido Imbens.Scalable Decisions Using a Bayesian Decision-Theoretic Approach. Jan. 27, 2026.doi: 10.48550/arXiv.2601.20031 . arXiv: 2601.20031 [stat].url: http: //arxiv.org/abs/2601.20031(visited on 02/07/2026). Pre-published

  53. [54]

    A Nonconcavity in the Value of Information

    Roy Radner and Joseph Stiglitz. “A Nonconcavity in the Value of Information”. In:Bayesian models in economic theory5 (1984), pp. 33–52.url: https : / / pages . stern . nyu . edu / ~rradner/publishedpapers/50Nonconcavity.pdf(visited on 04/11/2025)

  54. [55]

    A Tight Sufficient Condition for Radner–Stiglitz Nonconcavity in the Value of Information

    Michel De Lara and Laurent Gilotte. “A Tight Sufficient Condition for Radner–Stiglitz Nonconcavity in the Value of Information”. In:Journal of Economic Theory137.1 (Nov. 2007), pp. 696–708.issn: 00220531.doi: 10 . 1016 / j . jet . 2007 . 01 . 014.url: https : //linkinghub.elsevier.com/retrieve/pii/S0022053107000373(visited on 04/11/2025)

  55. [56]

    Value of Information Analysis for Research Decisions-An Introduction: Report 1 of the ISPOR Value of Information Analysis Emerging Good Practices Task Force

    Elisabeth Fenwick et al. “Value of Information Analysis for Research Decisions-An Introduction: Report 1 of the ISPOR Value of Information Analysis Emerging Good Practices Task Force”. In: Value in Health: The Journal of the International Society for Pharmacoeconomics and Outcomes Research23.2 (Feb. 2020), pp. 139–150.issn: 1524-4733.doi:10.1016/j.jval.20...

  56. [57]

    Calculating Expected Value of Sample Information Adjusting for Imperfect Im- plementation

    Anna Heath. “Calculating Expected Value of Sample Information Adjusting for Imperfect Im- plementation”. In:Medical Decision Making42.5 (July 1, 2022), pp. 626–636.issn: 0272-989X. doi: 10.1177/0272989X211073098.url: https://doi.org/10.1177/0272989X211073098 (visited on 05/11/2024). 54

  57. [58]

    Expected Value of Sample Information to Guide the Design of Group Sequential Clinical Trials

    Laura Flight et al. “Expected Value of Sample Information to Guide the Design of Group Sequential Clinical Trials”. In:Medical Decision Making42.4 (May 2022), pp. 461–473.issn: 0272-989X, 1552-681X.doi: 10.1177/0272989X211045036.url: http://journals.sagepub. com/doi/10.1177/0272989X211045036(visited on 05/11/2024)

  58. [59]

    Version 1

    Michael Gechter et al.Selecting Experimental Sites for External Validity. Version 1. May 21, 2024.doi: 10.48550/arXiv.2405.13241. arXiv: 2405.13241 [econ].url: http://arxiv. org/abs/2405.13241(visited on 02/07/2026). Pre-published

  59. [61]

    July 7, 2023.doi: 10

    Stephen Bates et al.Incentive-Theoretic Bayesian Inference for Collaborative Science. July 7, 2023.doi: 10 . 48550 / arXiv . 2307 . 03748. arXiv: 2307 . 03748 [cs, stat] .url: http : //arxiv.org/abs/2307.03748(visited on 10/24/2023). Pre-published

  60. [62]

    Screening for Experiments

    Daehong Min. “Screening for Experiments”. In:Games and Economic Behavior142 (Nov. 1, 2023), pp. 73–100.issn: 0899-8256.doi:10.1016/j.geb.2023.07.009.url: https://www. sciencedirect.com/science/article/pii/S0899825623001021(visited on 12/24/2024)

  61. [63]

    Stability analysis of fluid flows using Lagrangian Perturbation Theory (LPT): application to the plane Couette flow

    Ronald L. Wasserstein and Nicole A. Lazar. “The ASA Statement onp-Values: Context, Process, and Purpose”. In:The American Statistician70.2 (Apr. 2, 2016), pp. 129–133. issn: 0003-1305, 1537-2731.doi: 10.1080/00031305.2016.1154108 .url: https://www. tandfonline.com/doi/full/10.1080/00031305.2016.1154108(visited on 02/08/2026)

  62. [64]

    A Theory of Experimenters: Robustness, Randomization, and Balance

    Abhijit V. Banerjee et al. “A Theory of Experimenters: Robustness, Randomization, and Balance”. In:American Economic Review110.4 (Apr. 1, 2020), pp. 1206–1230.issn: 0002- 8282.doi: 10.1257/aer.20171634 .url: https://pubs.aeaweb.org/doi/10.1257/aer. 20171634(visited on 02/07/2026)

  63. [65]

    Benjamin Recht.A Bureaucratic Theory of Statistics. Jan. 7, 2025.doi:10.48550/arXiv. 2501.03457. arXiv: 2501.03457 [stat].url: http://arxiv.org/abs/2501.03457 (visited on 01/08/2025). Pre-published

  64. [66]

    Foundations of a General Theory of Sequential Decision Functions

    Abraham Wald. “Foundations of a General Theory of Sequential Decision Functions”. In: Econometrica15.4 (1947), pp. 279–313.issn: 0012-9682.doi: 10.2307/1905331. JSTOR: 1905331.url:https://www.jstor.org/stable/1905331(visited on 02/07/2026)

  65. [67]

    Ethan Che and Hongseok Namkoong.Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches. Aug. 14, 2023.doi:10.48550/arXiv.2303.11582. arXiv: 2303.11582 [cs, stat] .url: http://arxiv.org/abs/2303.11582 (visited on 09/11/2023). Pre-published

  66. [68]

    An Empirical Evaluation of Thompson Sampling

    Olivier Chapelle and Lihong Li. “An Empirical Evaluation of Thompson Sampling”. In:Ad- vances in Neural Information Processing Systems. Vol. 24. Curran Associates, Inc., 2011.url: https://papers.nips.cc/paper_files/paper/2011/hash/e53a0a2978c28872a4505bdb51db06dc- Abstract.html(visited on 02/07/2026)

  67. [69]

    Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines

    Anna L. Trella et al. “Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines”. In:Algorithms15.8 (Aug. 2022), p. 255.issn: 1999-4893.doi: 10.3390/a15080255. PMID: 36713810.url: https://pmc.ncbi.nlm.nih.gov/articles/ PMC9881427/(visited on 02/07/2026). 55

  68. [70]

    SQR: Balancing Speed, Quality and Risk in Online Experiments

    Ya Xu, Weitao Duan, and Shaochen Huang. “SQR: Balancing Speed, Quality and Risk in Online Experiments”. In:Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London United Kingdom: ACM, July 19, 2018, pp. 895–904.isb...

  69. [71]

    Bayesian Optimal Design for Phase II Screening Trials

    Meichun Ding, Gary L. Rosner, and Peter Müller. “Bayesian Optimal Design for Phase II Screening Trials”. In:Biometrics64.3 (Sept. 2008), pp. 886–894.issn: 0006-341X, 1541- 0420.doi: 10 . 1111 / j . 1541 - 0420 . 2007 . 00951 . x.url: https : / / academic . oup . com / biometrics/article/64/3/886-894/7331561(visited on 02/07/2026)

  70. [72]

    Utility-based Optimization of Phase II/III Programs

    Marietta Kirchner et al. “Utility-based Optimization of Phase II/III Programs”. In:Statistics in Medicine35.2 (Jan. 30, 2016), pp. 305–316.issn: 0277-6715, 1097-0258.doi:10.1002/ sim.6624.url: https://onlinelibrary.wiley.com/doi/10.1002/sim.6624 (visited on 02/07/2026)

  71. [73]

    Optimal Designs for Phase II/III Drug Development Programs Including Methods for Discounting of Phase II Results

    Stella Erdmann et al. “Optimal Designs for Phase II/III Drug Development Programs Including Methods for Discounting of Phase II Results”. In:BMC Medical Research Methodology20.1 (Dec. 2020), p. 253.issn: 1471-2288.doi: 10 . 1186 / s12874 - 020 - 01093 - w.url: https : / / bmcmedresmethodol . biomedcentral . com / articles / 10 . 1186 / s12874 - 020 - 0109...

  72. [74]

    How Generalizable Is Your Experiment? An Index for Comparing Ex- perimental Samples and Populations

    Elizabeth Tipton. “How Generalizable Is Your Experiment? An Index for Comparing Ex- perimental Samples and Populations”. In:Journal of Educational and Behavioral Statistics 39.6 (Dec. 2014), pp. 478–501.issn: 1076-9986, 1935-1054.doi:10.3102/1076998614558486. url: https : / / journals . sagepub . com / doi / 10 . 3102 / 1076998614558486(visited on 02/09/2026)

  73. [75]

    G. B. Folland.Real Analysis: Modern Techniques and Their Applications. 2nd ed. Pure and Applied Mathematics. New York: Wiley, 1999. 386 pp.isbn: 978-0-471-31716-6

  74. [76]

    G. B. Folland.Advanced Calculus. 2nd ed. Aug. 4, 2023.url: https : / / sites . math . washington.edu//~folland/AdvCalc24.pdf(visited on 04/08/2026)

  75. [77]

    Tyrrell Rockafellar and Roger Wets.Variational Analysis

    R. Tyrrell Rockafellar and Roger Wets.Variational Analysis. Vol. 317. Grundlehren Der Mathematischen Wissenschaften. Springer Verlag, 2009. 56