Evaluating HWE and Association in Genome Wide Association Studies: A Unified Procedure
Pith reviewed 2026-06-30 05:13 UTC · model grok-4.3
The pith
A conditional chi-square test unifies Hardy-Weinberg equilibrium checking with association testing in GWAS case-control studies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a conditional genotype-based test that conditions the Pearson χ²-statistic from the 3x2 contingency table on the χ²-statistic for HWE in the control group, deriving the relevant asymptotic distribution theory. This test is shown through simulations to have higher power than two competing retrospective procedures in most scenarios, and it leads to better SNP ranking in GWAS by accounting for HWE in association p-values, as demonstrated in an alopecia data set.
What carries the argument
The conditional Pearson chi-square statistic obtained by conditioning the association test statistic on the observed HWE statistic in controls.
If this is right
- Separate HWE testing becomes superfluous because the unified procedure already accounts for equilibrium deviations.
- SNP p-values and rankings improve because HWE information is incorporated directly rather than applied as a post-hoc filter.
- Replication studies become more cost-effective due to higher power and better prioritization of true signals.
- Subsequent fine-mapping steps benefit from a cleaner and more interpretable set of candidate SNPs.
Where Pith is reading between the lines
- The same conditioning idea could be applied to other association tests such as logistic regression or score tests that currently ignore HWE.
- In multi-ethnic or admixed cohorts the conditional distribution might need adjustment for population structure.
- Downstream analyses that use ranked SNP lists, such as gene-set enrichment or polygenic scoring, could show measurable gains if the new ranking is adopted.
Load-bearing premise
The asymptotic distribution theory developed for the conditional Pearson chi-square statistic is accurate under the case-control sampling model, and the simulation scenarios adequately cover the range of realistic GWAS conditions including varying minor allele frequencies and effect sizes.
What would settle it
A simulation experiment in which the conditional test exhibits lower or equal power than the two competing retrospective procedures across a broad grid of minor allele frequencies, effect sizes, and sample sizes would falsify the power advantage.
Figures
read the original abstract
In genome wide association studies (GWASs) based on a case-control design, single nucleotide polymorphisms (SNPs) are typically evaluated for an association test and a Hardy-Weinberg equilibrium (HWE) goodness-of-fit test. SNPs are then excluded from analysis based on a HWE cutoff to avoid false positives. In order to avoid cutoffs based on arbitrary threshold values, we propose a conditional genotype--based test that conditions the Pearson $\chi^2$-statistic in the 3x2 contingency table on the $\chi^2$-statistic for HWE in the control group, and develop the relevant asymptotic distribution theory. We show by simulations that our test in most scenarios is more powerful than two competing retrospective procedures. Another important advantage of the proposed method is a better ranking of SNPs in GWASs as HWE is accounted for in computing p-values of SNP association. We demonstrate this effect on a data set in an alopecia study. In conclusion, our test makes separate HWE testing superfluous by providing a unified framework and strictly improves on the standard procedure in terms of power and interpretability, thereby making replication more cost effective and improving subsequent fine mapping.\par
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a conditional Pearson χ² test for SNP association in case-control GWAS that conditions the 3×2 genotype table statistic on the HWE χ² statistic computed in controls only. It develops the corresponding asymptotic distribution theory, reports simulation results claiming higher power than two unspecified retrospective procedures in most scenarios, demonstrates improved SNP ranking on an alopecia dataset, and concludes that the unified procedure renders separate HWE testing superfluous.
Significance. A correctly derived conditional asymptotic null distribution together with comprehensive simulations would supply a principled way to integrate HWE information directly into association p-values, potentially improving power and ranking without arbitrary HWE thresholds.
major comments (2)
- [asymptotic theory section (exact location not numbered in abstract)] The central validity claim rests on the asymptotic distribution of the conditional Pearson χ² statistic under case-control sampling. The derivation must explicitly establish that the joint limiting distribution of the 3×2 association table and the control HWE margin yields the claimed conditional null distribution; any unaccounted dependence would miscalibrate p-values and undermine the assertion that separate HWE testing becomes superfluous.
- [simulation section] Simulation evidence is cited for power superiority, yet no parameters (MAF range, prevalence, effect sizes, sample sizes, or number of replicates) are supplied in the abstract or visible summary; without these details it is impossible to judge whether the scenarios adequately cover realistic GWAS conditions or control type-I error across the relevant parameter space.
minor comments (1)
- [abstract and introduction] The abstract refers to “two competing retrospective procedures” without naming them; the main text should identify the comparators (e.g., by citation or method) at first mention.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment below and will revise the manuscript to improve clarity and accessibility where appropriate.
read point-by-point responses
-
Referee: [asymptotic theory section (exact location not numbered in abstract)] The central validity claim rests on the asymptotic distribution of the conditional Pearson χ² statistic under case-control sampling. The derivation must explicitly establish that the joint limiting distribution of the 3×2 association table and the control HWE margin yields the claimed conditional null distribution; any unaccounted dependence would miscalibrate p-values and undermine the assertion that separate HWE testing becomes superfluous.
Authors: We appreciate the referee's emphasis on explicitness in the asymptotic derivation. The manuscript develops the conditional test by establishing the joint asymptotic behavior of the association and control HWE statistics under the null and then deriving the conditional distribution. To strengthen the presentation and directly address potential concerns about unaccounted dependence, we will expand the asymptotic theory section with a more detailed step-by-step derivation of the joint limiting distribution under case-control sampling, including the covariance structure and the resulting conditional null distribution (chi-square with 2 df). This revision will make the conditioning argument fully transparent. revision: yes
-
Referee: [simulation section] Simulation evidence is cited for power superiority, yet no parameters (MAF range, prevalence, effect sizes, sample sizes, or number of replicates) are supplied in the abstract or visible summary; without these details it is impossible to judge whether the scenarios adequately cover realistic GWAS conditions or control type-I error across the relevant parameter space.
Authors: The simulation parameters and design (including MAF ranges, sample sizes, prevalence, effect sizes, and replicate counts) are fully specified in the simulation section of the manuscript, along with results on type-I error control. We agree, however, that these details are not summarized in the abstract. We will revise the abstract to include a concise overview of the simulation settings and will add a brief statement confirming type-I error calibration across the explored parameter space. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper derives the asymptotic distribution of the conditional Pearson chi-square statistic (association test conditioned on control-group HWE) under the case-control sampling model and uses simulations only for power comparisons against competing procedures. This is a standard mathematical derivation of limiting distributions for a new test statistic, not a reduction of any claimed result to its own inputs by construction. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described claims. The central assertion that the unified test makes separate HWE testing superfluous follows directly from the derived null distribution and is externally benchmarked by simulation; it does not collapse to a tautology or prior author result.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The conditional distribution of the association chi-square statistic given the HWE chi-square statistic admits an asymptotic approximation suitable for p-value calculation.
Reference graph
Works this paper leans on
-
[1]
Telnet starwars , url =
-
[2]
Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , volume =. Am J Hum Genet , author =. 2005 , keywords =. doi:10.1086/428140 , abstract =
-
[3]
American Journal of Medical Genetics
Three-dimensional morphometric analysis of craniofacial shape in the unaffected relatives of individuals with nonsyndromic orofacial clefts: a possible marker for genetic susceptibility , volume =. American Journal of Medical Genetics. Part A , author =. 2008 , note =. doi:10.1002/ajmg.a.32177 , abstract =
-
[4]
Hazewinkel and M
M. Hazewinkel and M. Hazewinkel , year =. Tschirnhausen transformation , publisher =
-
[5]
Human Genetics , author =
A genome screen for linkage disequilibrium in. Human Genetics , author =. 2002 , note =. doi:12215840 , abstract =
2002
-
[6]
, volume =
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. , volume =. Nature , author =. 2007 , keywords =
2007
-
[7]
Research Group Programme - Call for Proposals 2010 , url =
2010
-
[8]
2002 , pages =
Journal of Electronic Imaging , author =. 2002 , pages =
2002
-
[9]
What's in a face? , volume =. Nature Genetics , author =. 1996 , note =. doi:10.1038/ng0296-124 , abstract =
-
[10]
London dysmorphology database, version 3 , author =
-
[11]
Susceptibility variants for male-pattern baldness on chromosome 20p11 , volume =. Nature Genetics , author =. 2008 , keywords =. doi:10.1038/ng.228 , abstract =
-
[12]
Annals of Human Genetics , author =
A new algorithm for haplotype-based association analysis: the. Annals of Human Genetics , author =. 2004 , note =
2004
-
[13]
Hong Kong Medical Journal = Xianggang Yi Xue Za Zhi / Hong Kong Academy of Medicine , author =
An update on the aetiology of orofacial clefts , volume =. Hong Kong Medical Journal = Xianggang Yi Xue Za Zhi / Hong Kong Academy of Medicine , author =. 2004 , note =
2004
-
[14]
European Journal of Human Genetics:
Cox proportional hazards survival regression in haplotype-based association analysis using the. European Journal of Human Genetics:. 2004 , note =. doi:10.1038/sj.ejhg.5201238 , abstract =
-
[15]
Journal of the American Statistical Association , author =
The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , volume =. Journal of the American Statistical Association , author =. 1994 , note =
1994
-
[16]
Statistics in Medicine , author =
A powerful method of combining measures of association and. Statistics in Medicine , author =. 2006 , pages =. doi:10.1002/sim.2350 , abstract =
-
[17]
Molecular Biology and Evolution , author =
Inference of haplotypes from. Molecular Biology and Evolution , author =. 1990 , note =
1990
-
[18]
Zotero - Quick Start Guide , url =
-
[19]
Linux Authentication Using
-
[20]
Stefan Boehringer , url =
-
[21]
Computer-based recognition of dysmorphic faces , volume =
Hartmut S Loos and Dagmar Wieczorek and Rolf P Würtz and Christoph von der Malsburg and Bernhard Horsthemke , month = aug, year =. Computer-based recognition of dysmorphic faces , volume =. European Journal of Human Genetics:. doi:10.1038/sj.ejhg.5200997 , abstract =
-
[22]
American Journal of Human Genetics , author =
A test for genetic association that incorporates information about deviation from. American Journal of Human Genetics , author =. 2008 , keywords =. doi:10.1016/j.ajhg.2008.06.010 , abstract =
-
[23]
Genome Research , author =
Genetic analysis of case/control data using estimated haplotype frequencies: application to. Genome Research , author =. 2001 , note =
2001
-
[24]
Biometrics , author =
Genomic Control for Association Studies , volume =. Biometrics , author =. 1999 , pages =
1999
-
[25]
Epidemiological studies on the frequency of clefts in Europe and world-wide , volume =
Karsten K H Gundlach and Christina Maus , month = sep, year =. Epidemiological studies on the frequency of clefts in Europe and world-wide , volume =. Journal of. doi:10.1016/S1010-5182(06)60001-2 , abstract =
-
[26]
The American Journal of Human Genetics , author =
A Fast Method for Computing. The American Journal of Human Genetics , author =. 2006 , pages =
2006
-
[27]
The Annals of Statistics , author =
Bayesian Analysis of Mixture Models with an Unknown Number of Components- An Alternative to Reversible Jump Methods , volume =. The Annals of Statistics , author =. 2000 , note =
2000
-
[28]
Syndrome identification based on
Stefan Boehringer and Tobias Vollmar and Christiane Tasse and Rolf P Wurtz and Gabriele. Syndrome identification based on. European Journal of Human Genetics:. 2006 , note =. doi:5201673 , abstract =
2006
-
[29]
Oxford Surveys in Evolutionary Biology , author =
Gene genealogies and the coalescent process , volume =. Oxford Surveys in Evolutionary Biology , author =. 1990 , pages =
1990
-
[30]
American Journal of Human Genetics , author =
Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data , volume =. American Journal of Human Genetics , author =. 2000 , note =. doi:10.1086/303069 , abstract =
-
[31]
The American Journal of Human Genetics , author =
Selecting a Maximally Informative Set of. The American Journal of Human Genetics , author =. 2004 , pages =
2004
-
[32]
Clinical Genetics , author =
Gene/environment causes of cleft lip and/or palate , volume =. Clinical Genetics , author =. 2002 , note =
2002
-
[33]
A second generation human haplotype map of over 3.1 million. Nature , author =. 2007 , keywords =. doi:10.1038/nature06258 , abstract =
-
[34]
Recognizing faces by dynamic link matching , url =
Laurenz Wiskott and Christoph Von Der Malsburg , year =. Recognizing faces by dynamic link matching , url =. doi:10.1.1.46.134 , journal =
-
[35]
Genome-wide haplotype association study identifies the. Nat Genet , author =. 2009 , pages =. doi:10.1038/ng.314 , number =
-
[36]
Lewin and G
B. Lewin and G. Dover , year =. Genes V , publisher =
-
[37]
Hinney and T
A. Hinney and T. T. Nguyen and A. Scherag and S. Friedel and G. Brönner and T. D. Müller and H. Grallert and T. Illig and H. E. Wichmann and W. Rief , year =. Genome Wide Association
-
[38]
Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24 , volume =. Nat Genet , author =. 2007 , pages =. doi:10.1038/ng1999 , number =
-
[39]
Stefanie Birnbaum and Kerstin U Ludwig and Heiko Reutter and Stefan Herms and Michael Steffens and Michele Rubini and Carlotta Baluardo and Melissa Ferrian and Nilma Almeida de Assis and Margrieta A Alblas and Sandra Barth and Jan Freudenberg and Carola Lauster and Gul Schmidt and Martin Scheer and Bert Braumann and Stefaan J Berge and Rudolf H Reich and ...
-
[40]
doi:10932767 , abstract =
Revista Latinoamericana De Microbiología , author =. doi:10932767 , abstract =
-
[41]
N. L. Johnson and S. Kotz and N. Balakrishnan , year =. Continuous univariate distributions. Vol. 1 , publisher =
-
[42]
Am J Dis Child , author =
Congenital cleft lip and palate , volume =. Am J Dis Child , author =. 1961 , pages =
1961
-
[43]
Effect of population stratification on the identification of significant single-nucleotide polymorphisms in genome-wide association studies , volume =
Sara M Sarasua and Julianne S Collins and Dhelia M Williamson and Glen A Satten and Andrew S Allen , year =. Effect of population stratification on the identification of significant single-nucleotide polymorphisms in genome-wide association studies , volume =
-
[44]
American Journal of Human Genetics , author =
A comparison of phasing algorithms for trios and unrelated individuals , volume =. American Journal of Human Genetics , author =. 2006 , note =. doi:10.1086/500808 , abstract =
-
[45]
Biometrics , author =
From genotypes to genes: doubling the sample size , volume =. Biometrics , author =. 1997 , keywords =
1997
-
[46]
A new multipoint method for genome-wide association studies by imputation of genotypes , volume =. Nature Genetics , author =. 2007 , note =. doi:10.1038/ng2088 , abstract =
-
[47]
The American Journal of Human Genetics , author =
Rational Inferences about Departures from. The American Journal of Human Genetics , author =. 2005 , pages =
2005
-
[48]
Human Genetics , author =
Genetic association studies of bronchial asthma--a need for Bonferroni correction? , volume =. Human Genetics , author =. 2000 , note =. doi:11030420 , number =
2000
-
[49]
Proceedings of the National Academy of Sciences of the United States of America , author =
Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , volume =. Proceedings of the National Academy of Sciences of the United States of America , author =. 2009 , note =. doi:10.1073/pnas.0903103106 , abstract =
-
[50]
Journal of Computational Biology: A Journal of Computational Molecular Cell Biology , author =
Inference of haplotypes from samples of diploid populations: complexity and algorithms , volume =. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology , author =. 2001 , note =. doi:10.1089/10665270152530863 , abstract =
-
[51]
American Journal of Medical Genetics , author =
Computer assisted diagnosis of malformation syndromes: An evaluation of three databases. American Journal of Medical Genetics , author =. 1996 , pages =. doi:10.1002/(SICI)1096-8628(19960503)63:1<257::AID-AJMG44>3.0.CO;2-K , abstract =
-
[52]
R: A Language and Environment for Statistical Computing , isbn =
-
[53]
The road to genome-wide association studies. , volume =. Nat Rev Genet , author =. 2008 , keywords =. doi:10.1038/nrg2316 , abstract =
-
[54]
Trends in Genetics , author =
On the allelic spectrum of human disease , volume =. Trends in Genetics , author =. 2001 , pages =
2001
-
[55]
Molecular Biology and Evolution , author =
Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population , volume =. Molecular Biology and Evolution , author =. 1995 , note =
1995
-
[56]
Hum Hered , author =
Exploiting. Hum Hered , author =. 2007 , pages =
2007
-
[57]
The American Journal of Human Genetics , author =. 2002 , pages =. doi:10.1086/344207 , number =
-
[58]
A. W. F. Edwards , year =. Foundations of Mathematical Genetics , publisher =
-
[59]
Salmela and T
E. Salmela and T. Lappalainen and I. Fransson and P. M. Andersen and K
-
[60]
Random Forests , volume =. Machine Learning , author =. 2001 , pages =. doi:10.1023/A:1010933404324 , abstract =
-
[61]
A model for fine mapping in family based association studies , volume =. Human Heredity , author =. 2009 , note =. doi:10.1159/000194976 , abstract =
-
[62]
American Journal of Human Genetics , author =
Meta-analysis of 13 genome scans reveals multiple cleft lip/palate genes with novel loci on 9q21 and 2q32-35 , volume =. American Journal of Human Genetics , author =. 2004 , note =. doi:10.1086/422475 , abstract =
-
[63]
European Journal of Medical Genetics , author =
Impact of geometry and viewing angle on classification accuracy of. European Journal of Medical Genetics , author =. doi:S1769-7212(07)00104-8 , abstract =
-
[64]
Mendelian Inheritance in Man,
-
[65]
New models of collaboration in genome-wide association studies: the Genetic Association Information Network , volume =. Nat Genet , year =. doi:10.1038/ng2127 , number =
-
[66]
Human Molecular Genetics , author =
Somatic mosaicism in patients with Angelman syndrome and an imprinting defect , volume =. Human Molecular Genetics , author =. 2004 , note =. doi:15385437 , abstract =
2004
-
[67]
B. F. Voight and J. K. Pritchard and G. Abecasis , year =. Confounding from cryptic relatedness in case-control association studies , volume =
-
[68]
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , volume =. Nature , month = jun, year =. doi:10.1038/nature05911 , abstract =
-
[69]
Orthodontics & Craniofacial Research , author =
Parental craniofacial morphology in cleft lip with or without cleft palate as determined by cephalometry: a meta-analysis , volume =. Orthodontics & Craniofacial Research , author =. 2006 , note =. doi:10.1111/j.1601-6343.2006.00339.x , abstract =
-
[70]
Redner , year =
S. Redner , year =. Random multiplicative processes: an elementary tutorial , volume =. doi:10.1.1.3.645 , journal =
-
[71]
American Journal of Human Genetics , author =
A new statistical method for haplotype reconstruction from population data , volume =. American Journal of Human Genetics , author =. 2001 , note =. doi:10.1086/319501 , abstract =
-
[72]
, volume =
A haplotype map of the human genome. , volume =. Nature , author =. 2005 , keywords =
2005
-
[73]
Journal of the Royal Statistical Society Series B , author =
Controlling the false discovery rate: a practical and powerful approach to multiple testing , volume =. Journal of the Royal Statistical Society Series B , author =. 1995 , pages =
1995
-
[74]
Postgresql 8.4 Recursive Queries , url =
-
[75]
Hastie and R
T. Hastie and R. Tibshirani and J. Friedman , year =. The elements of statistical learning: data mining, inference, and prediction , shorttitle =
-
[76]
Neuroimage , author =
Recognizing faces by dynamic link matching , volume =. Neuroimage , author =. 1996 , pages =
1996
-
[77]
American Journal of Human Genetics , author =
Genomewide linkage screen for Waldenstrom macroglobulinemia susceptibility loci in high-risk families , volume =. American Journal of Human Genetics , author =. 2006 , note =. doi:PMC1592553 , abstract =
2006
-
[78]
Bioinformatics , author =
Prediction error estimation: a comparison of resampling methods , volume =. Bioinformatics , author =. 2005 , pages =
2005
-
[79]
2005 , pages =
European Journal of Human Genetics , author =. 2005 , pages =
2005
-
[80]
W. N. Venables and B. D. Ripley , year =. Modern applied statistics with S , publisher =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.