Decision Making in Drug Development via Inference on Power

Geoffrey S Johnson

arxiv: 2005.04721 · v20 · submitted 2020-05-10 · 📊 stat.AP · stat.ME

Decision Making in Drug Development via Inference on Power

Geoffrey S Johnson This is my paper

Pith reviewed 2026-05-24 14:17 UTC · model grok-4.3

classification 📊 stat.AP stat.ME

keywords power calculationprobability of successp-value functionGo/No-Go decisiondrug developmentrisk managementassuranceinference on power

0 comments

The pith

Go/No-Go decisions in drug development should use inference on power instead of point estimates from power or probability of success calculations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a typical power calculation replaces unknown population quantities with values observed in external studies, yielding a single assumed value of power. Probability of success, or assurance, averages over a prior or posterior distribution to capture uncertainty around the true treatment effect but still reduces to a single number. Both approaches are reframed via p-value functions as merely different point estimates of power. Decisions based on either point estimate fail to quantify and control the risk of incorrect Go or No-Go choices. The authors argue that full inference on power, using the p-value function, enables better risk management in drug development decisions.

Core claim

We use p-value functions to frame both the probability of success calculation and the typical power calculation as merely producing two different point estimates of power. We demonstrate that Go/No-Go decisions based on either point estimate of power do not adequately quantify and control the risk involved, and instead we argue for Go/No-Go decisions that utilize inference on power for better risk management and decision making.

What carries the argument

p-value functions that represent both classical power calculations and probability of success calculations as point estimates of power

If this is right

Go/No-Go decisions based on point estimates of power fail to quantify and control risk adequately.
Inference on power using the full p-value function supplies better risk management for drug development decisions.
Probability of success calculations are equivalent to one specific point estimate of power under this framing.
Replacing point estimates with inference on power changes how uncertainty around the treatment effect is handled in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same p-value function approach could be tested in non-drug contexts such as clinical trial design outside pharmaceutical settings.
Simulation studies could directly compare error rates of point-estimate decisions versus inference-based decisions on synthetic trial data.
Regulatory bodies might evaluate whether requiring inference on power alters the balance between false positives and false negatives in approval decisions.

Load-bearing premise

P-value functions provide a valid and neutral way to represent both classical power and Bayesian assurance calculations as point estimates without introducing additional assumptions that affect the risk assessment.

What would settle it

A reanalysis of historical drug development programs in which Go/No-Go decisions based on point estimates of power are compared to decisions based on the full p-value function for power, measuring whether the latter yields measurably different risk profiles such as altered rates of program termination or success.

Figures

Figures reproduced from arXiv: 2005.04721 by Geoffrey S Johnson.

**Figure 1.** Figure 1: Phase 2 likelihood ratio test of H0: θ ≤ −0.05 with N=90 per arm at α=0.2. Phase 3 likelihood ratio test of H0: θ ≤ −0.12 with N=365 per arm at α=0.025. While it is important to have a clear definition of technical success before conducting a trial, [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: shows the power curves for the success criteria outlined in Section 3.1, the combined power curve (product) for success in both phase 2 and phase 3, and the elicited confidence curve for the difference in proportions described above. The power curves in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Solid lines depict resulting confidence curves for power in phase 2, phase 3, and overall based on the elicitation. Peaks correspond to maximum likelihood estimates of power. N=45 N=65 N=85 N=115 N=155 N=205 N=275 N=365 N=465 N=600 N=765 Sample Size per Arm 0.2 0.4 0.6 0.8 1.0 Phase 3 Power Maximum Likelihood Estimate Probability of Success Estimate foot foot [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: (i) Elicited confidence curve. (ii) Confidence curve for =0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm θ from the approximate phase 2 power curve testing . H0: θ ≤ −0.05 with N=90 per arm at α=0.2. (iii) Multiplication of elicited H(θ) and phase 2 power curve, displayed as a confidence curve. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, displayed as a con… view at source ↗

**Figure 7.** Figure 7: Sampling distributions of the maximum likelihood and probability of success estimators of power over 10,000 simulations [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Sampling distribution of the Φ−1 transformed maximum likelihood estimator of power over 10,000 simulations. B.7 Extrapolation Between Endpoints or Control Groups Across Phases In the examples thus far the phase 2 study used the same endpoint and treatment groups planned for phase 3. Depending on the therapeutic area and endpoint this may not be feasible. In such cases the phase 3 treatment effect, and henc… view at source ↗

**Figure 9.** Figure 9: ^{unicode alpha}=0.025 for phase 3 LR test against difference=-0.12 Solid lines depict power curves for a likelihood ratio test of the difference in proportions in phase 2, phase 3, and with N=365 per arm. overall. Confidence bands depict extrapolation modeling uncertainty. Dashed line depicts the confidence density for θ based on historical data and expert opinion. 0.0 0.2 0.4 0.6 0.8 1.0 Power 0 1 2 3 4 … view at source ↗

**Figure 11.** Figure 11: =0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm Phase 2 power curve testing H0: θ ≤ −0.05 with N=90 per arm at α=0.2. Phase 3 power curve testing . H0: θ ≤ −0.12 with N=365 per arm at α=0.025. Confidence density for θ based on historical data and expert opinion. 0.0 0.2 0.4 0.6 0.8 1.0 Power 0 1 2 3 4 5 6 Confidence Density Phase 2 Power mle Phase 3 Power mle Phase 2 and 3 Power ml… view at source ↗

**Figure 12.** Figure 12: Solid lines depict resulting confidence distributions for power, h(β) = dH(θ)/dβ(θ), in phase 2, phase 3, and overall. Dotted lines depict maximum likelihood estimates of power. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: footEstimated phase 3 power testing H0: θ ≤ −0.12 at α=0.025 at various sample sizes with 80% confidence limits based on the elicitation (wide). E Additional Figures -0.2 -0.1 0.0 0.1 0.2 True Difference in Proportions 0.0 0.2 0.4 0.6 0.8 1.0 Power 0 4 8 12 16 20 24 Confidence Density (v) Phase 3 Power (iii) Multiplication (iv) Convolution (i) Elicited Confidence Density (ii) Minimum Phase 2 Success =0.0… view at source ↗

**Figure 14.** Figure 14: =0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm (i) Elicited confidence density (wide). (ii) Confidence density for θ from differentiating the approximate phase 2 . power curve testing H0: θ ≤ −0.05 with N=225 per arm at α=0.025. (iii) Multiplication of elicited H(θ) and phase 2 power curve, differentiated. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, diff… view at source ↗

**Figure 15.** Figure 15: =0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm (i) Elicited confidence density (narrow). (ii) Confidence density for θ from differentiating the approximate phase . 2 power curve testing H0: θ ≤ −0.05 with N=225 per arm at α=0.025. (iii) Multiplication of elicited H(θ) and phase 2 power curve, differentiated. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, di… view at source ↗

**Figure 16.** Figure 16: =0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm (i) Elicited confidence density (wide). (ii) Confidence density for θ from differentiating the approximate phase . 2 power curve testing H0: θ ≤ −0.05 with N=90 per arm at α=0.2. (iii) Multiplication of elicited H(θ) and phase 2 power curve, differentiated. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, differe… view at source ↗

**Figure 17.** Figure 17: Exact frequentist and Bayesian inference on a binomial proportion θ based on a sample of size n = 20. Let X1, ..., Xn ∼ Bernoulli(θ). The confidence curve and 95% confidence interval in [PITH_FULL_IMAGE:figures/full_fig_p028_17.png] view at source ↗

**Figure 18.** Figure 18: (a) Plug-in estimated sampling distribution for the MLE of the mean supported by ¯x for exponentially distributed data with n = 5, replacing the unknown fixed true θ with θˆmle=1.5. (b) Bayesian posterior from vague conjugate prior supported by θ. (c) Confidence distribution (density) based on the likelihood ratio test supported by θ. (d) Confidence distribution (density) based on the exact likelihood rat… view at source ↗

**Figure 20.** Figure 20: Exact null sampling distribution of ˆθMLE = X¯ for testing H0: θ ≤ 0.75. H(θ) captures the upper-tailed p-value for every value of θ in the parameter space, and dH(θ)/dθ is the resulting confidence density. The confidence density in [PITH_FULL_IMAGE:figures/full_fig_p030_20.png] view at source ↗

**Figure 21.** Figure 21: (a) Informative Bayesian prior distribution based on historical likelihood and vague conjugate prior for binomial proportion, ˆθ Hist Bayes = 0.90, n = 50. (b) Confidence distribution (likelihood ratio test) based on historical data for binomial proportion, ˆθ Hist mle = 0.90, n = 50. (c) Bayesian posterior based on current likelihood and vague conjugate prior, ˆθ Current Bayes = 0.87, n = 30. (d) Confide… view at source ↗

read the original abstract

A typical power calculation is performed by replacing unknown population-level quantities in the power function with what is observed in external studies. Many authors and practitioners view this as an assumed value of power and offer the Bayesian quantity probability of success or assurance as an alternative. The claim is by averaging over a prior or posterior distribution, probability of success transcends power by capturing the uncertainty around the unknown true treatment effect and any other population-level parameters. We use p-value functions to frame both the probability of success calculation and the typical power calculation as merely producing two different point estimates of power. We demonstrate that Go/No-Go decisions based on either point estimate of power do not adequately quantify and control the risk involved, and instead we argue for Go/No-Go decisions that utilize inference on power for better risk management and decision making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes power and assurance as point estimates from a p-value function and pushes for inference on power in Go/No-Go decisions, but the abstract gives no derivations or examples to check the construction.

read the letter

The main thing here is a reframing: both the usual plug-in power calculation and the Bayesian probability of success get treated as different point estimates pulled from the same p-value function for power. The authors then say that decisions based on those point estimates do not control risk well and that inference on the full function would do better for drug development choices. That unification step is the piece that is not standard in the literature they cite, so it is the actual new angle they are offering. It is a clean way to put the two approaches on the same footing without claiming one is always superior on its own terms. The paper does a reasonable job of stating the practical problem in pharma decision making and why point estimates can leave risk unquantified. That part lands as a fair observation rather than a stretch. The soft spot is that the abstract supplies no equations, no worked numerical example, and no derivation showing how the p-value function is constructed so that both the conditional power and the marginal assurance fall out as specific evaluations. Without that, it is impossible to see whether the mapping preserves the risk interpretation across frequentist and Bayesian settings or whether it depends on choices about the test statistic or nuisance parameters. The stress-test note flags exactly this construction as non-standard, and the provided text does not resolve it. The argument that inference on power succeeds where point estimates fail therefore stays at the level of assertion until the details are shown. This is the kind of targeted methodological note that could be useful to statisticians who already work on assurance calculations and Go/No-Go rules inside pharmaceutical development. A reader who wants a short discussion of risk control in that setting might get something out of it, but anyone looking for a new theorem or a data-backed demonstration will not. It is coherent on its own terms and engages the existing literature without circularity or invented entities, so it clears the bar for a serious referee even though the central claim still needs the supporting math and examples filled in. I would send it to peer review with a request for the derivations and at least one concrete numerical case.

Referee Report

2 major / 1 minor

Summary. The paper claims that standard power calculations (plug-in estimates for unknown parameters) and Bayesian probability of success/assurance calculations are both merely different point estimates of power when viewed through p-value functions. It argues that Go/No-Go decisions based on either point estimate fail to quantify and control risk adequately, and proposes instead using inference on power for improved risk management in drug development.

Significance. If the p-value function framing is shown to be neutral and the inference approach demonstrably improves risk control over point estimates, the work could influence decision frameworks in clinical development by emphasizing uncertainty quantification beyond single numbers. The unification of frequentist and Bayesian power concepts via p-value functions offers a potentially useful perspective if the technical mapping holds without hidden assumptions.

major comments (2)

[Abstract; Section 2 (p-value function construction)] The central unification in the abstract and early sections treats both plug-in power and assurance as recoverable point evaluations of the same p-value function for power; however, the manuscript must explicitly define this function (including how the sampling distribution maps to power values) and verify that the construction is invariant to choice of test statistic and nuisance-parameter handling, as any dependence would undermine the subsequent claim that point-estimate decisions fail to control risk while inference succeeds.
[Section 4 (decision examples)] The demonstration that Go/No-Go decisions based on point estimates of power do not control risk (abstract) requires a concrete counter-example or simulation study showing a scenario where the point-estimate rule accepts a program whose true risk exceeds a pre-specified threshold while the inference-on-power rule correctly rejects; without such a load-bearing example tied to the p-value function, the risk-management advantage remains unproven.

minor comments (1)

[Section 2] Notation for the p-value function should be introduced with a single consistent symbol and distinguished from the usual p-value function for the treatment effect.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract; Section 2 (p-value function construction)] The central unification in the abstract and early sections treats both plug-in power and assurance as recoverable point evaluations of the same p-value function for power; however, the manuscript must explicitly define this function (including how the sampling distribution maps to power values) and verify that the construction is invariant to choice of test statistic and nuisance-parameter handling, as any dependence would undermine the subsequent claim that point-estimate decisions fail to control risk while inference succeeds.

Authors: We agree that an explicit definition of the p-value function and a check on invariance are required to support the unification. Section 2 constructs the function by mapping the observed test statistic to the corresponding power value via the sampling distribution under the alternative (i.e., power equals the probability that the test statistic exceeds the critical value when the parameter equals the value implied by the observed statistic). To address the referee's concern, we will add an explicit statement of this mapping and a short verification subsection confirming invariance to standard choices of test statistic and nuisance-parameter handling within the normal-mean and binomial settings used in the paper. These clarifications will be incorporated in the revision. revision: yes
Referee: [Section 4 (decision examples)] The demonstration that Go/No-Go decisions based on point estimates of power do not control risk (abstract) requires a concrete counter-example or simulation study showing a scenario where the point-estimate rule accepts a program whose true risk exceeds a pre-specified threshold while the inference-on-power rule correctly rejects; without such a load-bearing example tied to the p-value function, the risk-management advantage remains unproven.

Authors: We accept that a concrete counter-example or simulation is needed to make the risk-control claim load-bearing. We will add a simulation study to Section 4 that generates data under a true effect size distribution, applies both point-estimate rules (plug-in and assurance) and the inference-on-power rule derived from the p-value function, and shows a case in which the point-estimate rules accept the program while the true risk (computed from the full power distribution) exceeds the threshold and the inference rule rejects. The example will be explicitly linked to the p-value function construction. revision: yes

Circularity Check

0 steps flagged

No circularity detected; reframing of power and assurance via p-value functions is interpretive and does not reduce to self-definition or fitted inputs by construction.

full rationale

The paper's central move is to use p-value functions as a device for viewing both plug-in power and assurance/PoS as point estimates of the same underlying quantity, then to advocate inference on that quantity for decision-making. This is presented as a conceptual unification rather than a derivation in which one quantity is defined in terms of the other via the authors' own equations or a fitted parameter renamed as a prediction. No load-bearing self-citation, uniqueness theorem, or ansatz imported from prior work by the same authors appears in the abstract or described chain. The argument remains self-contained against external benchmarks of power and assurance calculations; the p-value function serves as an external representational tool, not a tautological re-expression of the paper's inputs. Therefore the derivation does not collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are identifiable.

pith-pipeline@v0.9.0 · 5650 in / 1128 out tokens · 32486 ms · 2026-05-24T14:17:44.495395+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Ballentine, L. E. (1970). The statistical interpretation of quantum mechanics. Reviews of Modern Physics\/ 42\/ (4), 358

work page 1970
[2]

Birnbaum, A. (1961). Confidence curves: An omnibus technique for estimation and testing statistical hypotheses. Journal of the American Statistical Association\/ 56\/ (294), 246--249

work page 1961
[3]

assurance

Carroll, K. J. (2013). Decision making from phase ii to phase iii and the probability of success: reassured by “assurance”? Journal of Biopharmaceutical Statistics\/ 23\/ (5), 1188--1200

work page 2013
[4]

Casella, G. and R. L. Berger (2002). Statistical inference , Volume 2. Duxbury Pacific Grove, CA

work page 2002
[5]

Chuang-Stein, C. (2006). Sample size and the probability of a successful trial. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry\/ 5\/ (4), 305--309

work page 2006
[6]

Miller, D

Crisp, A., S. Miller, D. Thompson, and N. Best (2018). Practical experiences of adopting assurance as a quantitative framework to support decision making in drug development. Pharmaceutical Statistics\/ 17\/ (4), 317--328

work page 2018
[7]

Efron, B. (1986). Why isn't everyone a bayesian? The American Statistician\/ 40\/ (1), 1--5

work page 1986
[8]

Efron, B. (1998). Ra fisher in the 21st century. Statistical Science\/ , 95--114

work page 1998
[9]

Guidance on expert knowledge elicitation in food and feed safety risk assessment

EFSA (2014). Guidance on expert knowledge elicitation in food and feed safety risk assessment. European Food Safety Authority Journal\/ 12\/ (6), 3734

work page 2014
[10]

Fraser, D. A. (2011). Is bayes posterior just quick and dirty confidence? Statistical Science\/ 26\/ (3), 299--316

work page 2011
[11]

Mitchell, C

Frewer, P., P. Mitchell, C. Watkins, and J. Matcham (2016). Decision-making in early clinical drug development. Pharmaceutical statistics\/ 15\/ (3), 255--263

work page 2016
[12]

Good, I. J. (1965). The estimation of probabilities: an essay on modern bayesian methods . The MIT Press, Cambridge, Massachusetts

work page 1965
[13]

Good, I. J. (1966). The estimation of probabilities. J. Inst. Maths Applics\/ 2 , 364--383

work page 1966
[14]

Johnson, G. S. (2021). Tolerance and prediction intervals for non-normal models. Researchgate.net\/

work page 2021
[15]

King, M. (2009). Evaluating probability of success in oncology clinical trials. In Biopharmaceutical Applied Statistics Symposium

work page 2009
[16]

Kirby, S. and C. Chuang-Stein (2017). A comparison of five approaches to decision-making for a first clinical trial of efficacy. Pharmaceutical statistics\/ 16\/ (1), 37--44

work page 2017
[17]

Kowalski, M

Lalonde, R., K. Kowalski, M. Hutmacher, W. Ewy, D. Nichols, P. Milligan, B. Corrigan, P. Lockwood, S. Marshall, L. Benincosa, et al. (2007). Model-based drug development. Clinical Pharmacology & Therapeutics\/ 82\/ (1), 21--32

work page 2007
[18]

Lehmann, E. L. (1993). The fisher, neyman-pearson theories of testing hypotheses: one theory or two? Journal of the American statistical Association\/ 88\/ (424), 1242--1249

work page 1993
[19]

Oakley, J. and A. O’Hagan (2010). Shelf: The sheffield elicitation framework (version 2.0). school of mathematics and statistics, university of sheffield

work page 2010
[20]

O'Hagan, A., J. W. Stevens, and M. J. Campbell (2005). Assurance in clinical trial design. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry\/ 4\/ (3), 187--201

work page 2005
[21]

Perezgonzalez, J. D. (2015). Fisher, neyman-pearson or nhst? a tutorial for teaching data testing. Frontiers in Psychology\/ 6 , 223

work page 2015
[22]

Rufibach, K., H. U. Burger, and M. Abt (2016). Bayesian predictive power: choice of prior and some recommendations for its use as probability of success in drug development. Pharmaceutical statistics\/ 15\/ (5), 438--446

work page 2016
[23]

Saville, B. R., J. T. Connor, G. D. Ayers, and J. Alvarez (2014). The utility of bayesian predictive probabilities for interim monitoring of clinical trials. Clinical Trials\/ 11\/ (4), 485--493

work page 2014
[24]

o dinger, E. and J. D. Trimmer (1980). The present situation in quantum mechanics: a translation of schr \

Schr \"o dinger, E. and J. D. Trimmer (1980). The present situation in quantum mechanics: a translation of schr \"o dinger’s ‘cat paradox’ paper. Proceedings of the American Philosophical Society\/ 124\/ (5), 323--338

work page 1980
[25]

Schweder, T. and N. L. Hjort (2016). Confidence, likelihood, probability , Volume 41. Cambridge University Press

work page 2016
[26]

Shen, J., R. Y. Liu, and M.-g. Xie (2018). Prediction with confidence—a general framework for predictive inference. Journal of Statistical Planning and Inference\/ 195 , 126--140

work page 2018
[27]

Singh, K., M. Xie, W. E. Strawderman, et al. (2007). Confidence distribution (cd)--distribution estimator of a parameter. In Complex datasets and inverse problems , pp.\ 132--150. Institute of Mathematical Statistics

work page 2007
[28]

Spiegelhalter, D. J., K. R. Abrams, and J. P. Myles (2004). Bayesian approaches to clinical trials and health-care evaluation , Volume 13. John Wiley & Sons

work page 2004
[29]

Temple, J. R. and J. R. Robertson (2021). Conditional assurance: the answer to the questions that should be asked within drug development. Pharmaceutical Statistics\/ , 1--10

work page 2021
[30]

Thornton, S. and M. Xie (2020). Bridging bayesian, frequentist and fiducial (bff) inferences using confidence distribution. arXiv preprint arXiv:2012.04464\/

work page arXiv 2020
[31]

Trzaskoma, B. and A. Sashegyi (2007). Predictive probability of success and the assessment of futility in large outcomes trials. Journal of biopharmaceutical statistics\/ 17\/ (1), 45--63

work page 2007
[32]

Wasserstein, R. L., N. A. Lazar, et al. (2016). The asa’s statement on p-values: context, process, and purpose. The American Statistician\/ 70\/ (2), 129--133

work page 2016
[33]

Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The annals of mathematical statistics\/ 9\/ (1), 60--62

work page 1938
[34]

Xie, M., R. Y. Liu, C. Damaraju, W. H. Olson, et al. (2013). Incorporating external information in analyses of clinical trials with binary outcomes. The Annals of Applied Statistics\/ 7\/ (1), 342--368

work page 2013
[35]

Singh, and W

Xie, M., K. Singh, and W. E. Strawderman (2011). Confidence distributions and a unifying framework for meta-analysis. Journal of the American Statistical Association\/ 106\/ (493), 320--333

work page 2011
[36]

Xie, M.-g. and K. Singh (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review\/ 81\/ (1), 3--39

work page 2013

[1] [1]

Ballentine, L. E. (1970). The statistical interpretation of quantum mechanics. Reviews of Modern Physics\/ 42\/ (4), 358

work page 1970

[2] [2]

Birnbaum, A. (1961). Confidence curves: An omnibus technique for estimation and testing statistical hypotheses. Journal of the American Statistical Association\/ 56\/ (294), 246--249

work page 1961

[3] [3]

assurance

Carroll, K. J. (2013). Decision making from phase ii to phase iii and the probability of success: reassured by “assurance”? Journal of Biopharmaceutical Statistics\/ 23\/ (5), 1188--1200

work page 2013

[4] [4]

Casella, G. and R. L. Berger (2002). Statistical inference , Volume 2. Duxbury Pacific Grove, CA

work page 2002

[5] [5]

Chuang-Stein, C. (2006). Sample size and the probability of a successful trial. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry\/ 5\/ (4), 305--309

work page 2006

[6] [6]

Miller, D

Crisp, A., S. Miller, D. Thompson, and N. Best (2018). Practical experiences of adopting assurance as a quantitative framework to support decision making in drug development. Pharmaceutical Statistics\/ 17\/ (4), 317--328

work page 2018

[7] [7]

Efron, B. (1986). Why isn't everyone a bayesian? The American Statistician\/ 40\/ (1), 1--5

work page 1986

[8] [8]

Efron, B. (1998). Ra fisher in the 21st century. Statistical Science\/ , 95--114

work page 1998

[9] [9]

Guidance on expert knowledge elicitation in food and feed safety risk assessment

EFSA (2014). Guidance on expert knowledge elicitation in food and feed safety risk assessment. European Food Safety Authority Journal\/ 12\/ (6), 3734

work page 2014

[10] [10]

Fraser, D. A. (2011). Is bayes posterior just quick and dirty confidence? Statistical Science\/ 26\/ (3), 299--316

work page 2011

[11] [11]

Mitchell, C

Frewer, P., P. Mitchell, C. Watkins, and J. Matcham (2016). Decision-making in early clinical drug development. Pharmaceutical statistics\/ 15\/ (3), 255--263

work page 2016

[12] [12]

Good, I. J. (1965). The estimation of probabilities: an essay on modern bayesian methods . The MIT Press, Cambridge, Massachusetts

work page 1965

[13] [13]

Good, I. J. (1966). The estimation of probabilities. J. Inst. Maths Applics\/ 2 , 364--383

work page 1966

[14] [14]

Johnson, G. S. (2021). Tolerance and prediction intervals for non-normal models. Researchgate.net\/

work page 2021

[15] [15]

King, M. (2009). Evaluating probability of success in oncology clinical trials. In Biopharmaceutical Applied Statistics Symposium

work page 2009

[16] [16]

Kirby, S. and C. Chuang-Stein (2017). A comparison of five approaches to decision-making for a first clinical trial of efficacy. Pharmaceutical statistics\/ 16\/ (1), 37--44

work page 2017

[17] [17]

Kowalski, M

Lalonde, R., K. Kowalski, M. Hutmacher, W. Ewy, D. Nichols, P. Milligan, B. Corrigan, P. Lockwood, S. Marshall, L. Benincosa, et al. (2007). Model-based drug development. Clinical Pharmacology & Therapeutics\/ 82\/ (1), 21--32

work page 2007

[18] [18]

Lehmann, E. L. (1993). The fisher, neyman-pearson theories of testing hypotheses: one theory or two? Journal of the American statistical Association\/ 88\/ (424), 1242--1249

work page 1993

[19] [19]

Oakley, J. and A. O’Hagan (2010). Shelf: The sheffield elicitation framework (version 2.0). school of mathematics and statistics, university of sheffield

work page 2010

[20] [20]

O'Hagan, A., J. W. Stevens, and M. J. Campbell (2005). Assurance in clinical trial design. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry\/ 4\/ (3), 187--201

work page 2005

[21] [21]

Perezgonzalez, J. D. (2015). Fisher, neyman-pearson or nhst? a tutorial for teaching data testing. Frontiers in Psychology\/ 6 , 223

work page 2015

[22] [22]

Rufibach, K., H. U. Burger, and M. Abt (2016). Bayesian predictive power: choice of prior and some recommendations for its use as probability of success in drug development. Pharmaceutical statistics\/ 15\/ (5), 438--446

work page 2016

[23] [23]

Saville, B. R., J. T. Connor, G. D. Ayers, and J. Alvarez (2014). The utility of bayesian predictive probabilities for interim monitoring of clinical trials. Clinical Trials\/ 11\/ (4), 485--493

work page 2014

[24] [24]

o dinger, E. and J. D. Trimmer (1980). The present situation in quantum mechanics: a translation of schr \

Schr \"o dinger, E. and J. D. Trimmer (1980). The present situation in quantum mechanics: a translation of schr \"o dinger’s ‘cat paradox’ paper. Proceedings of the American Philosophical Society\/ 124\/ (5), 323--338

work page 1980

[25] [25]

Schweder, T. and N. L. Hjort (2016). Confidence, likelihood, probability , Volume 41. Cambridge University Press

work page 2016

[26] [26]

Shen, J., R. Y. Liu, and M.-g. Xie (2018). Prediction with confidence—a general framework for predictive inference. Journal of Statistical Planning and Inference\/ 195 , 126--140

work page 2018

[27] [27]

Singh, K., M. Xie, W. E. Strawderman, et al. (2007). Confidence distribution (cd)--distribution estimator of a parameter. In Complex datasets and inverse problems , pp.\ 132--150. Institute of Mathematical Statistics

work page 2007

[28] [28]

Spiegelhalter, D. J., K. R. Abrams, and J. P. Myles (2004). Bayesian approaches to clinical trials and health-care evaluation , Volume 13. John Wiley & Sons

work page 2004

[29] [29]

Temple, J. R. and J. R. Robertson (2021). Conditional assurance: the answer to the questions that should be asked within drug development. Pharmaceutical Statistics\/ , 1--10

work page 2021

[30] [30]

Thornton, S. and M. Xie (2020). Bridging bayesian, frequentist and fiducial (bff) inferences using confidence distribution. arXiv preprint arXiv:2012.04464\/

work page arXiv 2020

[31] [31]

Trzaskoma, B. and A. Sashegyi (2007). Predictive probability of success and the assessment of futility in large outcomes trials. Journal of biopharmaceutical statistics\/ 17\/ (1), 45--63

work page 2007

[32] [32]

Wasserstein, R. L., N. A. Lazar, et al. (2016). The asa’s statement on p-values: context, process, and purpose. The American Statistician\/ 70\/ (2), 129--133

work page 2016

[33] [33]

Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The annals of mathematical statistics\/ 9\/ (1), 60--62

work page 1938

[34] [34]

Xie, M., R. Y. Liu, C. Damaraju, W. H. Olson, et al. (2013). Incorporating external information in analyses of clinical trials with binary outcomes. The Annals of Applied Statistics\/ 7\/ (1), 342--368

work page 2013

[35] [35]

Singh, and W

Xie, M., K. Singh, and W. E. Strawderman (2011). Confidence distributions and a unifying framework for meta-analysis. Journal of the American Statistical Association\/ 106\/ (493), 320--333

work page 2011

[36] [36]

Xie, M.-g. and K. Singh (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review\/ 81\/ (1), 3--39

work page 2013