arxiv: 2604.02526 · v1 · submitted 2026-04-02 · 📊 stat.AP

Recognition: no theorem link

Applied Statistics Requires Scientific Context

Ashley I Naimi

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:54 UTC · model grok-4.3

classification 📊 stat.AP

keywords p-valuescientific contextsignificance thresholdsrandomized trialsgenome-wide association studiesstatistical validityaspirin trialankylosing spondylitis

0 comments

The pith

Applied statistics needs nuanced scientific context rather than any universal significance threshold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Statistical methods are essential for scientific inference, yet their validity depends on the specific background assumptions and substantive features of the field in which they are applied. The paper reviews a re-formulation of the p-value as a measure of divergence between an observed dataset and the assumptions used to construct the statistical measure. This framework is illustrated with two randomized trials—one examining low-dose aspirin for pregnancy loss and another testing an inhibitor of a biochemical pathway in ankylosing spondylitis—showing how context shapes interpretation. The paper further notes that low significance thresholds succeeded in genome-wide association studies and high-energy particle physics largely because of the extensive validity-checking procedures that accompanied them. These points support abandoning any universal threshold as a reform goal and instead requiring careful attention to scientific context for reliable results.

Core claim

The application and interpretation of statistical methods requires careful consideration of foundational contextual issues, which include both elusive background assumptions and quantifiable features of a study area. A recent re-formulation of the p-value as a measure of divergence between observed data and modeling assumptions is used to demonstrate this role in two randomized trials. Success with low significance thresholds in genome-wide association studies and particle physics is attributed to the accompanying validity-checking gauntlets and contextual considerations rather than the thresholds themselves. Therefore the adoption of a universal threshold should be abandoned as a goal of统计s

What carries the argument

Re-formulation of the p-value as a measure of divergence between an observed dataset and the set of assumptions used to construct the statistical measure.

If this is right

Ignoring foundational context can lead to misinterpretation of results even when low p-values are obtained.
Reform efforts in statistics should prioritize integration of domain-specific assumptions over standardization of thresholds.
The two randomized trial examples show that different scientific contexts produce different valid interpretations of the same statistical output.
Validity-checking procedures must be tailored to the specific assumptions of each field rather than applied uniformly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Greater collaboration between statisticians and domain scientists would be needed to identify the relevant contextual assumptions for each application.
Fields without strong pre-existing validity gauntlets might benefit from higher thresholds to reduce false positives until such checks are developed.
This view implies that statistical education should emphasize case-by-case contextual reasoning over mastery of fixed rules.

Load-bearing premise

The success of low significance thresholds in genome-wide association studies and particle physics stems primarily from the accompanying validity-checking gauntlets rather than from the thresholds themselves.

What would settle it

Finding a new scientific domain that achieves reliable discoveries with low significance thresholds while lacking extensive validity-checking procedures would challenge the claim that context and checks, not the thresholds, drive success.

Figures

Figures reproduced from arXiv: 2604.02526 by Ashley I Naimi.

**Figure 1.** Figure 1: Geometric interpretation of the p-value. The left panel shows the observed data point z = (Y¯ 1,Y¯ 0) and its orthogonal projection onto the model manifold M represented by the solid diagonal line, yielding an empirical measure of discrepancy between z and M, denoted d(z;M) and indexed by the dashed line. The right panel shows the reference χ 2 1 distribution of T , the variance standardized measure of d(… view at source ↗

read the original abstract

Statistical methods are indispensable to scientific inference. However, there exists a longstanding tension across a wide range of scientific disciplines about the role that ``context'' should play in the application of statistical methods and the interpretation of statistical results. Though frequently invoked, the notion of ``scientific context'' refers to at least two distinct concepts: a set of foundational nuanced and elusive background assumptions and substantive features of a given area of study that shape the validity and reliability of statistical methods; and more quantifiable contextual issues that affect the performance of statistical methods and interpretation of statistical results. I argue here that the application and interpretation of statistical methods requires careful consideration of foundational contextual issues. To motivate the arguments, I review a recent re-formulation of the $p$-value as a measure of divergence between an observed dataset and a set of assumptions used to construct statistical measures. I use this framework to illustrate the role that context plays in two randomized trials: on low-dose aspirin for pregnancy loss, and a new inhibitor of a key biochemical pathway affecting ankylosing spondylitis. Finally, I note that the adoption of low significance thresholds in genome-wide association studies and high energy particle physics has been successful more so because of extensive validity-checking gauntlets and contextual considerations that have accompanied these low thresholds, not because of the low thresholds themselves. I use these illustrations and arguments to suggest that (i) the adoption of a universal threshold for significance testing should be abandoned as a goal of statistics reform; and (ii) the validity and optimal use of applied statistical tools requires careful consideration of nuanced scientific context.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper usefully stresses that statistical results need scientific context and gives clear trial examples, but its case against universal thresholds rests on an attribution for GWAS and particle physics that lacks separating evidence.

read the letter

The main point is that p-values and other statistical tools only make sense inside the specific scientific background of a field, so a single universal significance threshold is the wrong goal. The paper motivates this with a re-reading of the p-value as divergence from assumptions, then applies it to two randomized trials: low-dose aspirin for pregnancy loss and a new inhibitor for ankylosing spondylitis. In both cases the same numerical result would be read differently once the substantive details of the disease and prior evidence are brought in. It also notes that low thresholds worked in GWAS and particle physics largely because of the surrounding validity checks rather than the threshold number itself. That framing is the central recommendation: drop the search for a universal cutoff and keep context front and center. The trial examples are the strongest part. They are concrete, easy to follow, and show why context is not just a slogan but changes what counts as evidence. The distinction between foundational background assumptions and more measurable contextual factors is also drawn cleanly. The softer part is the GWAS and particle-physics claim. The abstract states that success came more from the gauntlets than from the low thresholds, but the text does not supply a quantitative comparison or counterfactual that would separate the two. Without that, the recommendation to abandon universal thresholds stays more plausible than demonstrated. The paper is a position piece rather than a new derivation or dataset, so it will mainly interest readers already working on statistical reform or applied methodology. It engages the literature it cites without internal contradictions and shows clear thinking on its own terms. I would bring it to a reading group to discuss the examples. It deserves peer review because the topic is active and the argument is coherent enough to benefit from referee scrutiny on the attribution step.

Referee Report

2 major / 2 minor

Summary. The manuscript argues that applied statistical methods require careful consideration of scientific context—both foundational background assumptions and quantifiable features—for valid application and interpretation. It reviews a recent reformulation of the p-value as a divergence measure between data and assumptions, uses this to analyze two randomized trials (low-dose aspirin for pregnancy loss and a biochemical inhibitor for ankylosing spondylitis), and claims that the success of low significance thresholds in GWAS and particle physics arises primarily from accompanying validity-checking gauntlets rather than the thresholds themselves. On this basis, it recommends abandoning universal significance thresholds as a goal of statistics reform and prioritizing context-specific use of tools.

Significance. If the arguments hold, the paper could usefully redirect statistics reform discussions away from fixed thresholds toward context-aware practices, with potential benefits for reliability in applied fields. The concrete trial examples provide clear illustrations of how context shapes interpretation, and the emphasis on validity-checking procedures in high-stakes domains is a constructive observation.

major comments (2)

[Abstract and GWAS/particle physics discussion] Abstract and the section discussing GWAS/particle physics: The claim that low thresholds succeeded 'more so because of extensive validity-checking gauntlets and contextual considerations that have accompanied these low thresholds, not because of the low thresholds themselves' is load-bearing for the central recommendation to abandon universal thresholds. No quantitative decomposition, counterfactual, or separating evidence is supplied to isolate the contribution of the threshold value from the gauntlets; the two randomized-trial examples show context affects interpretation but do not address this attribution.
[p-value reformulation review] Section reviewing the p-value reformulation: The framework is invoked to illustrate context's role, yet the manuscript provides no formal derivation, simulation study, or direct comparison within the paper to demonstrate how the divergence measure alters conclusions relative to standard p-value usage in the cited trials.

minor comments (2)

[Abstract] The abstract introduces two distinct concepts of 'context' but does not explicitly label or separate them in the subsequent trial analyses, which could improve clarity for readers.
No tables or figures are referenced in the provided text; if any are present, ensure they directly support the trial interpretations or the GWAS contrast.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for these constructive comments. We address each major point below with planned revisions where feasible.

read point-by-point responses

Referee: [Abstract and GWAS/particle physics discussion] The claim that low thresholds succeeded 'more so because of extensive validity-checking gauntlets and contextual considerations that have accompanied these low thresholds, not because of the low thresholds themselves' is load-bearing for the central recommendation to abandon universal thresholds. No quantitative decomposition, counterfactual, or separating evidence is supplied to isolate the contribution of the threshold value from the gauntlets; the two randomized-trial examples show context affects interpretation but do not address this attribution.

Authors: We acknowledge that the manuscript supplies no quantitative decomposition or counterfactual to isolate the threshold value from the validity-checking procedures. The claim rests on historical observation: GWAS and particle physics apply low thresholds exclusively within integrated validation frameworks, and we argue this integration, rather than the threshold alone, drives reliability. The trial examples illustrate context's general role in interpretation but do not quantify the attribution. In revision we will add a paragraph clarifying the evidence as observational and historical, explicitly noting the absence of counterfactual analysis as a limitation while retaining the recommendation to prioritize context-specific practices. revision: partial
Referee: [p-value reformulation review] Section reviewing the p-value reformulation: The framework is invoked to illustrate context's role, yet the manuscript provides no formal derivation, simulation study, or direct comparison within the paper to demonstrate how the divergence measure alters conclusions relative to standard p-value usage in the cited trials.

Authors: The section reviews an existing reformulation from prior literature to supply a conceptual lens for discussing context; no new derivation is offered. To make the illustration more concrete, we will add a brief simulation or side-by-side comparison in the revised manuscript that applies both the standard p-value and the divergence measure to the cited trial data, highlighting how contextual assumptions change conclusions. revision: yes

standing simulated objections not resolved

No quantitative decomposition or counterfactual evidence is supplied to isolate the contribution of low thresholds from validity-checking gauntlets in the success of GWAS and particle physics.

Circularity Check

0 steps flagged

No significant circularity; conceptual argument is self-contained

full rationale

The paper advances a conceptual position that scientific context must inform statistical application and that universal significance thresholds should be abandoned. It motivates this via a reviewed p-value reformulation (treated as external input) and two trial illustrations plus cross-field examples. No equations, fitted quantities, or self-referential definitions appear; the central claims do not reduce to their own inputs by construction. External examples supply the evidentiary load rather than any tautological restatement or self-citation chain. The derivation chain therefore remains independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about how statistical validity depends on unquantified scientific background knowledge; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Statistical methods depend on foundational nuanced background assumptions specific to each scientific domain
Invoked to argue that context cannot be fully replaced by universal thresholds

pith-pipeline@v0.9.0 · 5567 in / 1123 out tokens · 36751 ms · 2026-05-13T19:54:55.323129+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 1 internal anchor

[1]

Things I have learned (so far).American Psychologist1990;45(12):1304–1312

Cohen J. Things I have learned (so far).American Psychologist1990;45(12):1304–1312. doi:10.1037/0003-066X.45. 12.1304

work page doi:10.1037/0003-066x.45
[2]

Cambridge, MA: Harvard Univer- sity Press, 1986

Stigler SM.The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, MA: Harvard Univer- sity Press, 1986

work page 1900
[3]

The ASA’ s Statement on p-Values: Context, Process, and Purpose.The American Statisti- cian2016;70(2):129–133

Wasserstein RL, Lazar NA. The ASA’ s Statement on p-Values: Context, Process, and Purpose.The American Statisti- cian2016;70(2):129–133. doi:10.1080/00031305.2016.1154108

work page doi:10.1080/00031305.2016.1154108 2016
[4]

Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise.BMC Medical Research Methodology2020;20(1):244

Rafi Z, Greenland S. Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise.BMC Medical Research Methodology2020;20(1):244

work page
[5]

A teaching tool about the fickle p value and other statistical principles based on real-life data.Advances in Physiology Education2021;45(1):32–40

Alawbathani S, MacCallum RC. A teaching tool about the fickle p value and other statistical principles based on real-life data.Advances in Physiology Education2021;45(1):32–40

work page
[6]

Surprise!International Journal of Epidemiology2021;190(2):191–193

Cole SR, Edwards J, Greenland S. Surprise!International Journal of Epidemiology2021;190(2):191–193. doi:10. 1093/aje/kwaa136

work page
[7]

uncertainty intervals

Gelman A, Greenland S. Are confidence intervals better termed “uncertainty intervals”? No: Call them compatibility intervals.BMJ (Clinical research ed.)2019;366(10):l5381

work page 2019
[8]

To curb research misreporting, replace significance and confidence by compatibility.Preventive Medicine2022;164

Greenland S, Mansournia MA, Joffe MM. To curb research misreporting, replace significance and confidence by compatibility.Preventive Medicine2022;164. doi:10.1016/j.ypmed.2022.107127

work page doi:10.1016/j.ypmed.2022.107127 2022
[9]

To aid scientific inference, emphasize unconditional compatibility descrip- tions of statistics

Greenland S, Rafi Z, Matthews R, et al. To aid scientific inference, emphasize unconditional compatibility descrip- tions of statistics. https://arxiv.org/abs/1909.08583, 2023

work page arXiv 1909
[10]

Greenland S. Divergence versus decision P-values: A distinction worth making in theory and keeping in practice: Or, how divergence P-values measure evidence even when decision P-values do not.Scandinavian Journal of Statis- tics2023;50(1):54–88. doi:10.1111/sjos.12625

work page doi:10.1111/sjos.12625
[11]

Two-Tailed p-Values and Coherent Measures of Evidence.The American Statistician2020;74(1):80–86

Peskun PH. Two-Tailed p-Values and Coherent Measures of Evidence.The American Statistician2020;74(1):80–86. doi:10.1080/00031305.2018.1475304

work page doi:10.1080/00031305.2018.1475304 2018
[12]

A New Look at P Values for Randomized Clinical Trials.NEJM Evidence 2023;3(1):EVIDoa2300003

van Zwet E, Gelman A, Greenland S, et al. A New Look at P Values for Randomized Clinical Trials.NEJM Evidence 2023;3(1):EVIDoa2300003. doi:10.1056/EVIDoa2300003

work page doi:10.1056/evidoa2300003 2023
[13]

The role of p-Values in judging the strength of evidence and realistic replication expectations.Statistics in Biopharmaceutical Research2021;13(1):6–18

Gibson EW . The role of p-Values in judging the strength of evidence and realistic replication expectations.Statistics in Biopharmaceutical Research2021;13(1):6–18. 13

work page
[14]

Abandon Statistical Significance.The American Statistician2019;73(sup1):235–

McShane BB, Gal D, Gelman A, et al. Abandon Statistical Significance.The American Statistician2019;73(sup1):235–

work page
[15]

doi:10.1080/00031305.2018.1527253

work page doi:10.1080/00031305.2018.1527253 2018
[16]

Redefine statistical significance.Nature Human Behaviour2018; 2(1):6–10

Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance.Nature Human Behaviour2018; 2(1):6–10

work page
[17]

Choosing alpha post hoc: The danger of multiple standard significance thresholds, 2025

Hemerik J, Koning NW . Choosing alpha post hoc: The danger of multiple standard significance thresholds, 2025. doi:10.48550/arXiv.2410.02306

work page doi:10.48550/arxiv.2410.02306 2025
[18]

Justify Your Alpha: A Primer on Two Practical Approaches.Advances in Methods and Practices in Psychological Science2022;5(2):25152459221080396

Maier M, Lakens D. Justify Your Alpha: A Primer on Two Practical Approaches.Advances in Methods and Practices in Psychological Science2022;5(2):25152459221080396. doi:10.1177/25152459221080396

work page doi:10.1177/25152459221080396
[19]

Let’ s think about cognitive bias.Nature2015;526(7572):163–163

Nature Editorial Board. Let’ s think about cognitive bias.Nature2015;526(7572):163–163

work page
[20]

Methodological and Cognitive Biases in Science: Issues for Current Research and Ways to Counteract Them.Perspectives on Science2023;31(5):535–554

Fernández Pinto M. Methodological and Cognitive Biases in Science: Issues for Current Research and Ways to Counteract Them.Perspectives on Science2023;31(5):535–554

work page
[21]

Toward evidence-based medical statistics

Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy.Ann Intern Med1999;130(12):995– 1004

work page
[22]

Bickel DR. Coherent Checking and Updating of Bayesian Models without Specifying the Model Space: A Decision- Theoretic Semantics for Possibility Theory.International Journal of Approximate Reasoning2022;142:81–93

work page
[23]

Hypothesis testing with e-values.Foundations and Trends® in Statistics2025;1(1-2):1–390

Ramdas A, Wang R. Hypothesis testing with e-values.Foundations and Trends® in Statistics2025;1(1-2):1–390. doi:10.1561/3600000002

work page doi:10.1561/3600000002
[24]

Frequentist probability and frequentist statistics.Synthese1977;36(1):97–131

Neyman J. Frequentist probability and frequentist statistics.Synthese1977;36(1):97–131

work page
[25]

The Future of Data Analysis.The Annals of Mathematical Statistics1962;33(1):1–67

Tukey JW . The Future of Data Analysis.The Annals of Mathematical Statistics1962;33(1):1–67

work page
[26]

Statistical Models and Shoe Leather.Sociological Methodology1991;21:291–313

Freedman DA. Statistical Models and Shoe Leather.Sociological Methodology1991;21:291–313

work page
[27]

Pay No Attention to the Model Behind the Curtain.Pure and Applied Geophysics2022;179(11):4121–4145

Stark PB. Pay No Attention to the Model Behind the Curtain.Pure and Applied Geophysics2022;179(11):4121–4145

work page
[28]

The p-value requires context, not a threshold.JAMA2019;321(21):2061–2062

Betensky RA. The p-value requires context, not a threshold.JAMA2019;321(21):2061–2062

work page 2061
[29]

Beyond p-values: A phase II dual-criterion design with statistical significance and clinical relevance.Clinical Trials2018;15(5):452–461

Roychoudhury S, Scheuer N, Neuenschwander B. Beyond p-values: A phase II dual-criterion design with statistical significance and clinical relevance.Clinical Trials2018;15(5):452–461. doi:10.1177/1740774518770661

work page doi:10.1177/1740774518770661
[30]

Null hypothesis significance tests. A mix–up of two different theories: The basis for widespread confusion and numerous misinterpretations

Perezgonzalez JD. P-values as percentiles. Commentary on: “Null hypothesis significance tests. A mix–up of two different theories: The basis for widespread confusion and numerous misinterpretations” .Frontiers in Psychology 2015;6. doi:10.3389/fpsyg.2015.00341. 14

work page doi:10.3389/fpsyg.2015.00341 2015
[31]

A Critical Look at Methods for Handling Missing Covariates in Epidemiologic Regression Analyses.Am J Epidemiol1995;142(12):1255–1264

Greenland S, Finkle WD. A Critical Look at Methods for Handling Missing Covariates in Epidemiologic Regression Analyses.Am J Epidemiol1995;142(12):1255–1264

work page
[32]

Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neoFisherian.Annales Zoologici Fennici2009;46(5):311–349

Hurlbert SH, Lombardi CM. Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neoFisherian.Annales Zoologici Fennici2009;46(5):311–349

work page
[33]

Two-Tailed p-Values and Coherent Measures of Evidence.The American Statistician2020;74(1):80–86

Greenland S. Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their reso- lution with s-values.The American Statistician2019;73(suppl 1):106–114. doi:10.1080/00031305.2018.1529625

work page doi:10.1080/00031305.2018.1529625 2018
[34]

Preconception low-dose aspirin and pregnancy outcomes: Results from the EAGeR randomised trial.Lancet2014;384(9937):29–36

Schisterman EF , Silver RM, Lesher LL, et al. Preconception low-dose aspirin and pregnancy outcomes: Results from the EAGeR randomised trial.Lancet2014;384(9937):29–36. doi:10.1016/S0140-6736(14)60157-4

work page doi:10.1016/s0140-6736(14)60157-4
[35]

Sporadic and Recurrent Pregnancy Loss

Silver RM, Ware Branch D. Sporadic and Recurrent Pregnancy Loss. In:Clinical Obstetrics, John Wiley & Sons, Ltd, chap. 11. 2007;141–160

work page 2007
[36]

Schisterman EF , Silver RM, Perkins NJ, et al. A randomised trial to evaluate the effects of low-dose aspirin in gesta- tion and reproduction: Design and baseline characteristics.Paediatric and Perinatal Epidemiology2013;27(6):598–

work page
[37]

doi:10.1111/ppe.12088

work page doi:10.1111/ppe.12088
[38]

Tofacitinib for the treatment of ankylosing spondylitis: A phase III, randomised, double-blind, placebo-controlled study.Annals of the Rheumatic Diseases2021;80(8):1004–1013

Deodhar A, Sliwinska-Stanczyk P , Xu H, et al. Tofacitinib for the treatment of ankylosing spondylitis: A phase III, randomised, double-blind, placebo-controlled study.Annals of the Rheumatic Diseases2021;80(8):1004–1013. doi: 10.1136/annrheumdis-2020-219601

work page doi:10.1136/annrheumdis-2020-219601 2020
[39]

New York, NY: Random House Publishing Group, 2006

Plato.The Dialogues of Plato. New York, NY: Random House Publishing Group, 2006

work page 2006
[40]

Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations.European Journal of Epidemiology2016;31(4):337–350

Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations.European Journal of Epidemiology2016;31(4):337–350

work page
[41]

Tofacitinib in patients with ankylosing spondylitis: A phase II, 16-week, randomised, placebo-controlled, dose-ranging study.Annals of the Rheumatic Diseases2017;76(8):1340–1347

van der Heijde D, Deodhar A, Wei JC, et al. Tofacitinib in patients with ankylosing spondylitis: A phase II, 16-week, randomised, placebo-controlled, dose-ranging study.Annals of the Rheumatic Diseases2017;76(8):1340–1347. doi: 10.1136/annrheumdis-2016-210322

work page doi:10.1136/annrheumdis-2016-210322 2016
[42]

Expectancy Effects, Failure of Blinding Integrity, and Placebo Response in Trials of Treatments for Psychiatric Disorders: A Narrative Review.JAMA Psychiatry2025;82(5):531–538

Huneke NTM, Fusetto Veronesi G, Garner M, et al. Expectancy Effects, Failure of Blinding Integrity, and Placebo Response in Trials of Treatments for Psychiatric Disorders: A Narrative Review.JAMA Psychiatry2025;82(5):531–538. doi:10.1001/jamapsychiatry.2025.0085

work page doi:10.1001/jamapsychiatry.2025.0085 2025
[43]

The active comparator, new user study design in pharmacoepidemiology: Historical foundations and contemporary application.Current epidemiology reports2015;2(4):221–228

Lund JL, Richardson DB, Stürmer T . The active comparator, new user study design in pharmacoepidemiology: Historical foundations and contemporary application.Current epidemiology reports2015;2(4):221–228. 15

work page
[44]

Genome-wide association study of 14,000 cases of seven common dis- eases and 3,000 shared controls.Nature2007;447(7145):661–678

Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common dis- eases and 3,000 shared controls.Nature2007;447(7145):661–678. doi:10.1038/nature05911

work page doi:10.1038/nature05911
[45]

Evidence for Top Quark Production inp¯pCollisions at ps= 1.8 TeV.Physical Review Letters1994;73:225–229

Abe F , Albrow MG, Amendolia SR, et al. Evidence for Top Quark Production inp¯pCollisions at ps= 1.8 TeV.Physical Review Letters1994;73:225–229

work page
[46]

Observation of Top Quark Production inp¯pCollisions with the Collider Detector at Fermilab.Physical Review Letters1995;74(14):2626–2631

Abe F , Akimoto H, Akopian A, et al. Observation of Top Quark Production inp¯pCollisions with the Collider Detector at Fermilab.Physical Review Letters1995;74(14):2626–2631

work page
[47]

Kinematic Evidence for Top Quark Pair Production in W+ Multijet Events in p ¯pCollisions at ps= 1.8 TeV.Physical Review D1995;51:4623–4638

Abe F , Albrow MG, Amendolia SR, et al. Kinematic Evidence for Top Quark Pair Production in W+ Multijet Events in p ¯pCollisions at ps= 1.8 TeV.Physical Review D1995;51:4623–4638

work page
[48]

University of Pittsburgh Press, 2018

Franklin A.Shifting Standards: Experiments in Particle Physics in the Twentieth Century. University of Pittsburgh Press, 2018

work page 2018
[49]

Cambridge: Cambridge University Press, 2004

Staley KW .The Evidence for the Top Quark: Objectivity and Bias in Collaborative Experimentation. Cambridge: Cambridge University Press, 2004

work page 2004
[50]

Genome-wide association studies.Nature Reviews Methods Primers 2021;1(1):59

Uffelmann E, Huang QQ, Munung NS, et al. Genome-wide association studies.Nature Reviews Methods Primers 2021;1(1):59

work page 2021
[51]

The application of CRISPR/Cas9–based genome-wide screening to disease research

Chen X, Zheng M, Lin S, et al. The application of CRISPR/Cas9–based genome-wide screening to disease research. Molecular and Cellular Probes2025;79:102004. doi:10.1016/j.mcp.2024.102004

work page doi:10.1016/j.mcp.2024.102004 2024
[52]

Lee D, Gunamalai L, Kannan J, et al. Massively parallel reporter assays identify functional enhancer variants at QT interval GWAS loci.bioRxiv: The Preprint Server for Biology2025;:2025.03.11.642686doi:10.1101/2025.03.11.642686

work page doi:10.1101/2025.03.11.642686 2025
[53]

World Scientific, 2006

James F .Statistical Methods in Experimental Physics. World Scientific, 2006

work page 2006
[54]

Oxford University Press, 2011

Cowan G.Statistical Data Analysis. Oxford University Press, 2011

work page 2011
[55]

Cambridge University Press, 2014

Lyons L.Statistics for Nuclear and Particle Physicists. Cambridge University Press, 2014

work page 2014
[56]

Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the lhc.Physics Letters B2012;716(1):1–29

Aad G, et al (ATLAS Collaboration). Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the lhc.Physics Letters B2012;716(1):1–29. doi:10.1016/j.physletb.2012.08.020

work page internal anchor Pith review doi:10.1016/j.physletb.2012.08.020 2012
[57]

Charmousis, E.J

Abachi S, et al. Observation of the Top Quark.Physical Review Letters1995;74:2632–2637. doi:10.1103/PhysRevLett. 74.2632

work page doi:10.1103/physrevlett
[58]

CONSORT 2025 Statement: Updated Guideline for Reporting Randomized Trials.JAMA2025;333(22):1998–2005

Hopewell S, Chan AW, Collins GS, et al. CONSORT 2025 Statement: Updated Guideline for Reporting Randomized Trials.JAMA2025;333(22):1998–2005. 16

work page 2025
[59]

SPIRIT 2025 statement: Updated guideline for protocols of randomised trials

Chan AW, Boutron I, Hopewell S, et al. SPIRIT 2025 statement: Updated guideline for protocols of randomised trials. BMJ2025;389:e081477

work page 2025
[60]

STRengthening analytical thinking for observational studies: The STRATOS initiative.Statistics in Medicine2014;33(30):5413–5432

Sauerbrei W, Abrahamowicz M, Altman DG, et al. STRengthening analytical thinking for observational studies: The STRATOS initiative.Statistics in Medicine2014;33(30):5413–5432

work page
[61]

Transparent Reporting of Observational Studies Emulating a Target Trial—The TARGET Statement.JAMA2025;334(12):1084–1093

Cashin AG, Hansford HJ, Hernán MA, et al. Transparent Reporting of Observational Studies Emulating a Target Trial—The TARGET Statement.JAMA2025;334(12):1084–1093. doi:10.1001/jama.2025.13350

work page doi:10.1001/jama.2025.13350 2025
[62]

London, UK: Profile Books, 2010

Gawande A.The Checklist Manifesto: How To Get Things Right. London, UK: Profile Books, 2010

work page 2010
[63]

Significance tests die hard: The amazing persistence of a probabilistic misconception

Falk R, Greenbaum CW . Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology1995;5(1):75–98

work page
[64]

The superego, the ego, and the id in statistical reasoning

Gigerenzer G. The superego, the ego, and the id in statistical reasoning. In:A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues, Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc. 1993;311–339

work page 1993
[65]

Transparency and disclosure, neutrality and balance: Shared values or just shared words?Journal of Epidemiology and Community Health2012;66(11):967–970

Greenland S. Transparency and disclosure, neutrality and balance: Shared values or just shared words?Journal of Epidemiology and Community Health2012;66(11):967–970. 17

work page