pith. sign in

arxiv: 2606.30311 · v1 · pith:6MKNJ2NBnew · submitted 2026-06-29 · 📊 stat.ME · stat.AP

Evaluating HWE and Association in Genome Wide Association Studies: A Unified Procedure

Pith reviewed 2026-06-30 05:13 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords GWASHardy-Weinberg equilibriumcase-control studyconditional testassociation testingchi-square statisticSNP rankingunified procedure
0
0 comments X

The pith

A conditional chi-square test unifies Hardy-Weinberg equilibrium checking with association testing in GWAS case-control studies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a single statistical procedure that performs both association testing and Hardy-Weinberg equilibrium assessment for SNPs without separate cutoffs or tests. It conditions the Pearson chi-square statistic from the genotype-by-disease table on the chi-square statistic measuring HWE deviation in controls only, then derives the correct asymptotic distribution under this conditioning. Simulations across varied minor allele frequencies and effect sizes show the resulting test has higher power than two existing retrospective methods in most scenarios. The approach also changes SNP ranking because HWE information enters the association p-value directly, as illustrated on alopecia data. The authors conclude that the unified test removes the need for arbitrary HWE thresholds while improving power and downstream interpretability for replication and fine mapping.

Core claim

The authors introduce a conditional genotype-based test that conditions the Pearson χ²-statistic from the 3x2 contingency table on the χ²-statistic for HWE in the control group, deriving the relevant asymptotic distribution theory. This test is shown through simulations to have higher power than two competing retrospective procedures in most scenarios, and it leads to better SNP ranking in GWAS by accounting for HWE in association p-values, as demonstrated in an alopecia data set.

What carries the argument

The conditional Pearson chi-square statistic obtained by conditioning the association test statistic on the observed HWE statistic in controls.

If this is right

  • Separate HWE testing becomes superfluous because the unified procedure already accounts for equilibrium deviations.
  • SNP p-values and rankings improve because HWE information is incorporated directly rather than applied as a post-hoc filter.
  • Replication studies become more cost-effective due to higher power and better prioritization of true signals.
  • Subsequent fine-mapping steps benefit from a cleaner and more interpretable set of candidate SNPs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditioning idea could be applied to other association tests such as logistic regression or score tests that currently ignore HWE.
  • In multi-ethnic or admixed cohorts the conditional distribution might need adjustment for population structure.
  • Downstream analyses that use ranked SNP lists, such as gene-set enrichment or polygenic scoring, could show measurable gains if the new ranking is adopted.

Load-bearing premise

The asymptotic distribution theory developed for the conditional Pearson chi-square statistic is accurate under the case-control sampling model, and the simulation scenarios adequately cover the range of realistic GWAS conditions including varying minor allele frequencies and effect sizes.

What would settle it

A simulation experiment in which the conditional test exhibits lower or equal power than the two competing retrospective procedures across a broad grid of minor allele frequencies, effect sizes, and sample sizes would falsify the power advantage.

Figures

Figures reproduced from arXiv: 2606.30311 by Hajo Holzmann, Stefan B\"ohringer.

Figure 1
Figure 1. Figure 1: Q-Q-plots for p-values derived from the obesity data set comparing an empirical [PITH_FULL_IMAGE:figures/full_fig_p034_1.png] view at source ↗
read the original abstract

In genome wide association studies (GWASs) based on a case-control design, single nucleotide polymorphisms (SNPs) are typically evaluated for an association test and a Hardy-Weinberg equilibrium (HWE) goodness-of-fit test. SNPs are then excluded from analysis based on a HWE cutoff to avoid false positives. In order to avoid cutoffs based on arbitrary threshold values, we propose a conditional genotype--based test that conditions the Pearson $\chi^2$-statistic in the 3x2 contingency table on the $\chi^2$-statistic for HWE in the control group, and develop the relevant asymptotic distribution theory. We show by simulations that our test in most scenarios is more powerful than two competing retrospective procedures. Another important advantage of the proposed method is a better ranking of SNPs in GWASs as HWE is accounted for in computing p-values of SNP association. We demonstrate this effect on a data set in an alopecia study. In conclusion, our test makes separate HWE testing superfluous by providing a unified framework and strictly improves on the standard procedure in terms of power and interpretability, thereby making replication more cost effective and improving subsequent fine mapping.\par

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a conditional Pearson χ² test for SNP association in case-control GWAS that conditions the 3×2 genotype table statistic on the HWE χ² statistic computed in controls only. It develops the corresponding asymptotic distribution theory, reports simulation results claiming higher power than two unspecified retrospective procedures in most scenarios, demonstrates improved SNP ranking on an alopecia dataset, and concludes that the unified procedure renders separate HWE testing superfluous.

Significance. A correctly derived conditional asymptotic null distribution together with comprehensive simulations would supply a principled way to integrate HWE information directly into association p-values, potentially improving power and ranking without arbitrary HWE thresholds.

major comments (2)
  1. [asymptotic theory section (exact location not numbered in abstract)] The central validity claim rests on the asymptotic distribution of the conditional Pearson χ² statistic under case-control sampling. The derivation must explicitly establish that the joint limiting distribution of the 3×2 association table and the control HWE margin yields the claimed conditional null distribution; any unaccounted dependence would miscalibrate p-values and undermine the assertion that separate HWE testing becomes superfluous.
  2. [simulation section] Simulation evidence is cited for power superiority, yet no parameters (MAF range, prevalence, effect sizes, sample sizes, or number of replicates) are supplied in the abstract or visible summary; without these details it is impossible to judge whether the scenarios adequately cover realistic GWAS conditions or control type-I error across the relevant parameter space.
minor comments (1)
  1. [abstract and introduction] The abstract refers to “two competing retrospective procedures” without naming them; the main text should identify the comparators (e.g., by citation or method) at first mention.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment below and will revise the manuscript to improve clarity and accessibility where appropriate.

read point-by-point responses
  1. Referee: [asymptotic theory section (exact location not numbered in abstract)] The central validity claim rests on the asymptotic distribution of the conditional Pearson χ² statistic under case-control sampling. The derivation must explicitly establish that the joint limiting distribution of the 3×2 association table and the control HWE margin yields the claimed conditional null distribution; any unaccounted dependence would miscalibrate p-values and undermine the assertion that separate HWE testing becomes superfluous.

    Authors: We appreciate the referee's emphasis on explicitness in the asymptotic derivation. The manuscript develops the conditional test by establishing the joint asymptotic behavior of the association and control HWE statistics under the null and then deriving the conditional distribution. To strengthen the presentation and directly address potential concerns about unaccounted dependence, we will expand the asymptotic theory section with a more detailed step-by-step derivation of the joint limiting distribution under case-control sampling, including the covariance structure and the resulting conditional null distribution (chi-square with 2 df). This revision will make the conditioning argument fully transparent. revision: yes

  2. Referee: [simulation section] Simulation evidence is cited for power superiority, yet no parameters (MAF range, prevalence, effect sizes, sample sizes, or number of replicates) are supplied in the abstract or visible summary; without these details it is impossible to judge whether the scenarios adequately cover realistic GWAS conditions or control type-I error across the relevant parameter space.

    Authors: The simulation parameters and design (including MAF ranges, sample sizes, prevalence, effect sizes, and replicate counts) are fully specified in the simulation section of the manuscript, along with results on type-I error control. We agree, however, that these details are not summarized in the abstract. We will revise the abstract to include a concise overview of the simulation settings and will add a brief statement confirming type-I error calibration across the explored parameter space. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives the asymptotic distribution of the conditional Pearson chi-square statistic (association test conditioned on control-group HWE) under the case-control sampling model and uses simulations only for power comparisons against competing procedures. This is a standard mathematical derivation of limiting distributions for a new test statistic, not a reduction of any claimed result to its own inputs by construction. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described claims. The central assertion that the unified test makes separate HWE testing superfluous follows directly from the derived null distribution and is externally benchmarked by simulation; it does not collapse to a tautology or prior author result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, invented entities, or non-standard axioms are mentioned. The approach relies on standard large-sample chi-square theory for contingency tables.

axioms (1)
  • standard math The conditional distribution of the association chi-square statistic given the HWE chi-square statistic admits an asymptotic approximation suitable for p-value calculation.
    The paper states it develops the relevant asymptotic distribution theory for the conditional test.

pith-pipeline@v0.9.1-grok · 5736 in / 1213 out tokens · 40757 ms · 2026-06-30T05:13:28.213599+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

91 extracted references · 33 canonical work pages

  1. [1]

    Telnet starwars , url =

  2. [2]

    , volume =

    Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , volume =. Am J Hum Genet , author =. 2005 , keywords =. doi:10.1086/428140 , abstract =

  3. [3]

    American Journal of Medical Genetics

    Three-dimensional morphometric analysis of craniofacial shape in the unaffected relatives of individuals with nonsyndromic orofacial clefts: a possible marker for genetic susceptibility , volume =. American Journal of Medical Genetics. Part A , author =. 2008 , note =. doi:10.1002/ajmg.a.32177 , abstract =

  4. [4]

    Hazewinkel and M

    M. Hazewinkel and M. Hazewinkel , year =. Tschirnhausen transformation , publisher =

  5. [5]

    Human Genetics , author =

    A genome screen for linkage disequilibrium in. Human Genetics , author =. 2002 , note =. doi:12215840 , abstract =

  6. [6]

    , volume =

    Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. , volume =. Nature , author =. 2007 , keywords =

  7. [7]

    Research Group Programme - Call for Proposals 2010 , url =

  8. [8]

    2002 , pages =

    Journal of Electronic Imaging , author =. 2002 , pages =

  9. [9]

    Nature Genetics , author =

    What's in a face? , volume =. Nature Genetics , author =. 1996 , note =. doi:10.1038/ng0296-124 , abstract =

  10. [10]

    London dysmorphology database, version 3 , author =

  11. [11]

    Nature Genetics , author =

    Susceptibility variants for male-pattern baldness on chromosome 20p11 , volume =. Nature Genetics , author =. 2008 , keywords =. doi:10.1038/ng.228 , abstract =

  12. [12]

    Annals of Human Genetics , author =

    A new algorithm for haplotype-based association analysis: the. Annals of Human Genetics , author =. 2004 , note =

  13. [13]

    Hong Kong Medical Journal = Xianggang Yi Xue Za Zhi / Hong Kong Academy of Medicine , author =

    An update on the aetiology of orofacial clefts , volume =. Hong Kong Medical Journal = Xianggang Yi Xue Za Zhi / Hong Kong Academy of Medicine , author =. 2004 , note =

  14. [14]

    European Journal of Human Genetics:

    Cox proportional hazards survival regression in haplotype-based association analysis using the. European Journal of Human Genetics:. 2004 , note =. doi:10.1038/sj.ejhg.5201238 , abstract =

  15. [15]

    Journal of the American Statistical Association , author =

    The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , volume =. Journal of the American Statistical Association , author =. 1994 , note =

  16. [16]

    Statistics in Medicine , author =

    A powerful method of combining measures of association and. Statistics in Medicine , author =. 2006 , pages =. doi:10.1002/sim.2350 , abstract =

  17. [17]

    Molecular Biology and Evolution , author =

    Inference of haplotypes from. Molecular Biology and Evolution , author =. 1990 , note =

  18. [18]

    Zotero - Quick Start Guide , url =

  19. [19]

    Linux Authentication Using

  20. [20]

    Stefan Boehringer , url =

  21. [21]

    Computer-based recognition of dysmorphic faces , volume =

    Hartmut S Loos and Dagmar Wieczorek and Rolf P Würtz and Christoph von der Malsburg and Bernhard Horsthemke , month = aug, year =. Computer-based recognition of dysmorphic faces , volume =. European Journal of Human Genetics:. doi:10.1038/sj.ejhg.5200997 , abstract =

  22. [22]

    American Journal of Human Genetics , author =

    A test for genetic association that incorporates information about deviation from. American Journal of Human Genetics , author =. 2008 , keywords =. doi:10.1016/j.ajhg.2008.06.010 , abstract =

  23. [23]

    Genome Research , author =

    Genetic analysis of case/control data using estimated haplotype frequencies: application to. Genome Research , author =. 2001 , note =

  24. [24]

    Biometrics , author =

    Genomic Control for Association Studies , volume =. Biometrics , author =. 1999 , pages =

  25. [25]

    Epidemiological studies on the frequency of clefts in Europe and world-wide , volume =

    Karsten K H Gundlach and Christina Maus , month = sep, year =. Epidemiological studies on the frequency of clefts in Europe and world-wide , volume =. Journal of. doi:10.1016/S1010-5182(06)60001-2 , abstract =

  26. [26]

    The American Journal of Human Genetics , author =

    A Fast Method for Computing. The American Journal of Human Genetics , author =. 2006 , pages =

  27. [27]

    The Annals of Statistics , author =

    Bayesian Analysis of Mixture Models with an Unknown Number of Components- An Alternative to Reversible Jump Methods , volume =. The Annals of Statistics , author =. 2000 , note =

  28. [28]

    Syndrome identification based on

    Stefan Boehringer and Tobias Vollmar and Christiane Tasse and Rolf P Wurtz and Gabriele. Syndrome identification based on. European Journal of Human Genetics:. 2006 , note =. doi:5201673 , abstract =

  29. [29]

    Oxford Surveys in Evolutionary Biology , author =

    Gene genealogies and the coalescent process , volume =. Oxford Surveys in Evolutionary Biology , author =. 1990 , pages =

  30. [30]

    American Journal of Human Genetics , author =

    Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data , volume =. American Journal of Human Genetics , author =. 2000 , note =. doi:10.1086/303069 , abstract =

  31. [31]

    The American Journal of Human Genetics , author =

    Selecting a Maximally Informative Set of. The American Journal of Human Genetics , author =. 2004 , pages =

  32. [32]

    Clinical Genetics , author =

    Gene/environment causes of cleft lip and/or palate , volume =. Clinical Genetics , author =. 2002 , note =

  33. [33]

    Nature , author =

    A second generation human haplotype map of over 3.1 million. Nature , author =. 2007 , keywords =. doi:10.1038/nature06258 , abstract =

  34. [34]

    Recognizing faces by dynamic link matching , url =

    Laurenz Wiskott and Christoph Von Der Malsburg , year =. Recognizing faces by dynamic link matching , url =. doi:10.1.1.46.134 , journal =

  35. [35]

    Nat Genet , author =

    Genome-wide haplotype association study identifies the. Nat Genet , author =. 2009 , pages =. doi:10.1038/ng.314 , number =

  36. [36]

    Lewin and G

    B. Lewin and G. Dover , year =. Genes V , publisher =

  37. [37]

    Hinney and T

    A. Hinney and T. T. Nguyen and A. Scherag and S. Friedel and G. Brönner and T. D. Müller and H. Grallert and T. Illig and H. E. Wichmann and W. Rief , year =. Genome Wide Association

  38. [38]

    Nat Genet , author =

    Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24 , volume =. Nat Genet , author =. 2007 , pages =. doi:10.1038/ng1999 , number =

  39. [39]

    Key susceptibility locus for nonsyndromic cleft lip with or without cleft palate on chromosome 8q24 , volume =

    Stefanie Birnbaum and Kerstin U Ludwig and Heiko Reutter and Stefan Herms and Michael Steffens and Michele Rubini and Carlotta Baluardo and Melissa Ferrian and Nilma Almeida de Assis and Margrieta A Alblas and Sandra Barth and Jan Freudenberg and Carola Lauster and Gul Schmidt and Martin Scheer and Bert Braumann and Stefaan J Berge and Rudolf H Reich and ...

  40. [40]

    doi:10932767 , abstract =

    Revista Latinoamericana De Microbiología , author =. doi:10932767 , abstract =

  41. [41]

    N. L. Johnson and S. Kotz and N. Balakrishnan , year =. Continuous univariate distributions. Vol. 1 , publisher =

  42. [42]

    Am J Dis Child , author =

    Congenital cleft lip and palate , volume =. Am J Dis Child , author =. 1961 , pages =

  43. [43]

    Effect of population stratification on the identification of significant single-nucleotide polymorphisms in genome-wide association studies , volume =

    Sara M Sarasua and Julianne S Collins and Dhelia M Williamson and Glen A Satten and Andrew S Allen , year =. Effect of population stratification on the identification of significant single-nucleotide polymorphisms in genome-wide association studies , volume =

  44. [44]

    American Journal of Human Genetics , author =

    A comparison of phasing algorithms for trios and unrelated individuals , volume =. American Journal of Human Genetics , author =. 2006 , note =. doi:10.1086/500808 , abstract =

  45. [45]

    Biometrics , author =

    From genotypes to genes: doubling the sample size , volume =. Biometrics , author =. 1997 , keywords =

  46. [46]

    Nature Genetics , author =

    A new multipoint method for genome-wide association studies by imputation of genotypes , volume =. Nature Genetics , author =. 2007 , note =. doi:10.1038/ng2088 , abstract =

  47. [47]

    The American Journal of Human Genetics , author =

    Rational Inferences about Departures from. The American Journal of Human Genetics , author =. 2005 , pages =

  48. [48]

    Human Genetics , author =

    Genetic association studies of bronchial asthma--a need for Bonferroni correction? , volume =. Human Genetics , author =. 2000 , note =. doi:11030420 , number =

  49. [49]

    Proceedings of the National Academy of Sciences of the United States of America , author =

    Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , volume =. Proceedings of the National Academy of Sciences of the United States of America , author =. 2009 , note =. doi:10.1073/pnas.0903103106 , abstract =

  50. [50]

    Journal of Computational Biology: A Journal of Computational Molecular Cell Biology , author =

    Inference of haplotypes from samples of diploid populations: complexity and algorithms , volume =. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology , author =. 2001 , note =. doi:10.1089/10665270152530863 , abstract =

  51. [51]

    American Journal of Medical Genetics , author =

    Computer assisted diagnosis of malformation syndromes: An evaluation of three databases. American Journal of Medical Genetics , author =. 1996 , pages =. doi:10.1002/(SICI)1096-8628(19960503)63:1<257::AID-AJMG44>3.0.CO;2-K , abstract =

  52. [52]

    R: A Language and Environment for Statistical Computing , isbn =

  53. [53]

    , volume =

    The road to genome-wide association studies. , volume =. Nat Rev Genet , author =. 2008 , keywords =. doi:10.1038/nrg2316 , abstract =

  54. [54]

    Trends in Genetics , author =

    On the allelic spectrum of human disease , volume =. Trends in Genetics , author =. 2001 , pages =

  55. [55]

    Molecular Biology and Evolution , author =

    Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population , volume =. Molecular Biology and Evolution , author =. 1995 , note =

  56. [56]

    Hum Hered , author =

    Exploiting. Hum Hered , author =. 2007 , pages =

  57. [57]

    2002 , pages =

    The American Journal of Human Genetics , author =. 2002 , pages =. doi:10.1086/344207 , number =

  58. [58]

    A. W. F. Edwards , year =. Foundations of Mathematical Genetics , publisher =

  59. [59]

    Salmela and T

    E. Salmela and T. Lappalainen and I. Fransson and P. M. Andersen and K

  60. [60]

    URLhttps://link.springer.com/article/10.1023/A:1010933404324

    Random Forests , volume =. Machine Learning , author =. 2001 , pages =. doi:10.1023/A:1010933404324 , abstract =

  61. [61]

    Human Heredity , author =

    A model for fine mapping in family based association studies , volume =. Human Heredity , author =. 2009 , note =. doi:10.1159/000194976 , abstract =

  62. [62]

    American Journal of Human Genetics , author =

    Meta-analysis of 13 genome scans reveals multiple cleft lip/palate genes with novel loci on 9q21 and 2q32-35 , volume =. American Journal of Human Genetics , author =. 2004 , note =. doi:10.1086/422475 , abstract =

  63. [63]

    European Journal of Medical Genetics , author =

    Impact of geometry and viewing angle on classification accuracy of. European Journal of Medical Genetics , author =. doi:S1769-7212(07)00104-8 , abstract =

  64. [64]

    Mendelian Inheritance in Man,

  65. [65]

    Nat Genet , year =

    New models of collaboration in genome-wide association studies: the Genetic Association Information Network , volume =. Nat Genet , year =. doi:10.1038/ng2127 , number =

  66. [66]

    Human Molecular Genetics , author =

    Somatic mosaicism in patients with Angelman syndrome and an imprinting defect , volume =. Human Molecular Genetics , author =. 2004 , note =. doi:15385437 , abstract =

  67. [67]

    B. F. Voight and J. K. Pritchard and G. Abecasis , year =. Confounding from cryptic relatedness in case-control association studies , volume =

  68. [68]

    Nature , month = jun, year =

    Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , volume =. Nature , month = jun, year =. doi:10.1038/nature05911 , abstract =

  69. [69]

    Orthodontics & Craniofacial Research , author =

    Parental craniofacial morphology in cleft lip with or without cleft palate as determined by cephalometry: a meta-analysis , volume =. Orthodontics & Craniofacial Research , author =. 2006 , note =. doi:10.1111/j.1601-6343.2006.00339.x , abstract =

  70. [70]

    Redner , year =

    S. Redner , year =. Random multiplicative processes: an elementary tutorial , volume =. doi:10.1.1.3.645 , journal =

  71. [71]

    American Journal of Human Genetics , author =

    A new statistical method for haplotype reconstruction from population data , volume =. American Journal of Human Genetics , author =. 2001 , note =. doi:10.1086/319501 , abstract =

  72. [72]

    , volume =

    A haplotype map of the human genome. , volume =. Nature , author =. 2005 , keywords =

  73. [73]

    Journal of the Royal Statistical Society Series B , author =

    Controlling the false discovery rate: a practical and powerful approach to multiple testing , volume =. Journal of the Royal Statistical Society Series B , author =. 1995 , pages =

  74. [74]

    Postgresql 8.4 Recursive Queries , url =

  75. [75]

    Hastie and R

    T. Hastie and R. Tibshirani and J. Friedman , year =. The elements of statistical learning: data mining, inference, and prediction , shorttitle =

  76. [76]

    Neuroimage , author =

    Recognizing faces by dynamic link matching , volume =. Neuroimage , author =. 1996 , pages =

  77. [77]

    American Journal of Human Genetics , author =

    Genomewide linkage screen for Waldenstrom macroglobulinemia susceptibility loci in high-risk families , volume =. American Journal of Human Genetics , author =. 2006 , note =. doi:PMC1592553 , abstract =

  78. [78]

    Bioinformatics , author =

    Prediction error estimation: a comparison of resampling methods , volume =. Bioinformatics , author =. 2005 , pages =

  79. [79]

    2005 , pages =

    European Journal of Human Genetics , author =. 2005 , pages =

  80. [80]

    W. N. Venables and B. D. Ripley , year =. Modern applied statistics with S , publisher =

Showing first 80 references.