A General Statistical Framework for Hardy-Weinberg Equilibrium Inference on the X Chromosome
Pith reviewed 2026-05-20 02:28 UTC · model grok-4.3
The pith
A robust regression model unifies Hardy-Weinberg equilibrium testing across autosomal and X-chromosomal regions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By formulating HWE testing as an assessment of allele-level dependence in a robust regression model, the framework directly parameterizes Hardy-Weinberg disequilibrium, unifies existing Pearson chi-square-based tests under explicit modeling assumptions, and clarifies their null hypotheses, degrees of freedom, and sensitivity to sex differences in minor allele frequency. The approach also accommodates covariate and population-structure adjustment within a unified regression-based formulation.
What carries the argument
The robust allele-based regression model, which formulates HWE testing as assessment of allele-level dependence to parameterize disequilibrium and unify tests.
If this is right
- Existing tests can be characterized by their specific assumptions on sex differences in minor allele frequency and male sample inclusion.
- Commonly used X-chromosome tests exhibit inflated type I error when sex differences in allele frequency are present.
- The framework enables flexible inference with covariate adjustment for both autosomal and X-chromosomal regions.
- Analysis of real data from the 1000 Genomes Project supports the need for such a unified approach.
Where Pith is reading between the lines
- Genetic studies could improve quality control by adopting this regression framework instead of separate tests for X and autosomes.
- This might help resolve inconsistencies in previous X-linked association studies that used older HWE tests.
- Future work could test the framework on other types of genetic variants or in different populations.
Load-bearing premise
That assessing allele-level dependence through a robust regression model accurately represents the Hardy-Weinberg null hypothesis while accounting for potential sex differences in allele frequencies.
What would settle it
A simulation study where data is generated under the null of HWE but with sex differences in minor allele frequency, showing whether the proposed tests maintain correct error rates compared to existing methods.
Figures
read the original abstract
Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X chromosome is more complex due to sex-specific genotype structures and potential sex differences in minor allele frequency (sdMAF). Existing tests differ in their assumptions about sdMAF and male sample inclusion, often leading to distinct but poorly characterized null hypotheses. We develop a general statistical framework for HWE inference using the robust allele-based regression model. By formulating HWE testing as an assessment of allele-level dependence, the framework directly parameterizes Hardy-Weinberg disequilibrium, unifies existing Pearson chi-square-based tests under explicit modeling assumptions, and clarifies their null hypotheses, degrees of freedom, and sensitivity to sdMAF. The framework also accommodates covariate and population-structure adjustment within a unified regression-based formulation. The proposed framework provides robust, interpretable, and flexible inference, establishing a unified statistical foundation for HWE testing across autosomal and X-chromosomal regions. Simulation studies and analysis of high-coverage 1000 Genomes Project data demonstrate that commonly used X-chromosome tests can exhibit inflated type I error or misleading inference when sdMAF is present.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a general statistical framework for Hardy-Weinberg equilibrium (HWE) inference on the X chromosome using a robust allele-based regression model. By formulating HWE testing as an assessment of allele-level dependence, the framework directly parameterizes disequilibrium, unifies existing Pearson chi-square-based tests under explicit modeling assumptions, clarifies their null hypotheses, degrees of freedom, and sensitivity to sex differences in minor allele frequency (sdMAF), and accommodates covariate and population-structure adjustment. Simulation studies and analysis of high-coverage 1000 Genomes Project data demonstrate that commonly used X-chromosome tests can exhibit inflated type I error or misleading inference when sdMAF is present.
Significance. If the regression model correctly specifies the HWE null under sdMAF and hemizygosity in males, the work would provide a valuable unified and flexible foundation for HWE testing on the X chromosome, improving quality control in genetic studies. The explicit unification of existing tests, clarification of their assumptions, and empirical demonstration of type I error inflation via simulations and real data are strengths that could advance the field if the central modeling claims hold.
major comments (2)
- [Framework / regression model (likely §2–3)] The central unification claim rests on the robust regression correctly encoding the classical within-sex HWE null (random allele pairing) for X-chromosome data even when sdMAF exists. The formulation must include an explicit sex-by-allele interaction or equivalent term; without it, the dependence parameter will generally be nonzero under the intended null whenever male and female allele frequencies differ, leading to a non-central test statistic distribution and undermining type I control as well as the claimed unification of Pearson tests.
- [Simulation studies] Simulation studies are cited as showing inflated type I error in existing tests, but quantitative verification is needed for the proposed framework itself: report empirical type I error rates (with standard errors) for the new test across a grid of sdMAF magnitudes, male/female sample-size ratios, and allele frequencies, confirming that the test maintains nominal level under the within-sex null.
minor comments (2)
- [Abstract] Clarify in the abstract or introduction the precise degrees of freedom for the unified test statistic under different sdMAF assumptions.
- [Results / unification section] Add a small table comparing the null hypotheses, df, and sdMAF sensitivity of the proposed framework versus the main existing X-chromosome HWE tests.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments on our manuscript. These have helped us to strengthen the presentation of the regression framework and to provide more comprehensive empirical validation of the proposed test. We address each major comment in detail below and have incorporated revisions accordingly.
read point-by-point responses
-
Referee: The central unification claim rests on the robust regression correctly encoding the classical within-sex HWE null (random allele pairing) for X-chromosome data even when sdMAF exists. The formulation must include an explicit sex-by-allele interaction or equivalent term; without it, the dependence parameter will generally be nonzero under the intended null whenever male and female allele frequencies differ, leading to a non-central test statistic distribution and undermining type I control as well as the claimed unification of Pearson tests.
Authors: We appreciate the referee's detailed analysis of the null hypothesis specification. Our robust allele-based regression model is constructed to test for dependence between alleles after accounting for sex-specific allele frequencies. To address this point explicitly, we have revised the model description in Section 2 to include a sex-by-allele interaction term. This modification ensures that the disequilibrium parameter is zero under the within-sex HWE null (i.e., random allele pairing within each sex) regardless of whether allele frequencies differ between males and females. With this adjustment, the test statistic follows a central distribution under the null, supporting both type I error control and the unification of existing tests under their respective assumptions about sdMAF. revision: yes
-
Referee: Simulation studies are cited as showing inflated type I error in existing tests, but quantitative verification is needed for the proposed framework itself: report empirical type I error rates (with standard errors) for the new test across a grid of sdMAF magnitudes, male/female sample-size ratios, and allele frequencies, confirming that the test maintains nominal level under the within-sex null.
Authors: We concur that direct empirical confirmation of type I error rates for the new framework is necessary. We have augmented the simulation section with a comprehensive set of experiments. Specifically, we simulated data under the within-sex HWE null across sdMAF values of 0, 0.05, 0.10, and 0.20; male/female sample size ratios of 1:1, 1:2, and 2:1; and allele frequencies of 0.1, 0.2, and 0.5. For each of the 36 parameter combinations, 5,000 replicate datasets were generated, and the proportion of rejections at the 5% level was recorded. The empirical type I error rates ranged from 0.047 to 0.053, with standard errors of approximately 0.003, consistently close to the nominal level. These results are now summarized in Table S1 of the revised supplementary material, confirming appropriate type I error control for the proposed test. revision: yes
Circularity Check
No circularity: new regression parameterization of HWE is independent of its inputs
full rationale
The paper introduces a robust allele-based regression model that directly parameterizes Hardy-Weinberg disequilibrium as allele-level dependence. This modeling choice unifies existing Pearson tests by making their assumptions explicit rather than deriving the test statistics or null hypotheses from quantities fitted to the same data or from self-citations. No load-bearing equation reduces to its own inputs by construction, and the framework is presented as a self-contained statistical formulation that accommodates sdMAF and covariates without circular reduction. The derivation chain therefore remains independent of the target results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The robust allele-based regression model accurately represents allele-level dependence for HWE testing on the X chromosome.
Reference graph
Works this paper leans on
-
[1]
Byrska-Bishop, M., and Coauthors, 2022: High-coverage whole-genome sequencing of the ex- panded 1000 genomes project cohort including 602 trios.Cell,185 (18), 3426–3440
work page 2022
-
[2]
Chen, C.-F., 1983: Score tests for regression models.Journal of the American Statistical Associa- tion,78 (381), 158–161
work page 1983
-
[3]
Chen, D. Z., D. Roshandel, Z. Wang, L. Sun, and A. D. Paterson, 2023: Comprehensive whole- genome analyses of the uk biobank reveal significant sex differences in both genotype missing- ness and allele frequency on the x chromosome.Human Molecular Genetics, ddad201. 41
work page 2023
- [4]
-
[5]
Dudbridge, F., and A. Gusnanto, 2008: Estimation of significance thresholds for genomewide association scans.Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society,32 (3), 227–234
work page 2008
- [6]
-
[7]
Graffelman, J., and B. Weir, 2016: Testing for hardy–weinberg equilibrium at biallelic genetic markers on the x chromosome.Heredity,116 (6), 558–568
work page 2016
-
[8]
H., and Coauthors, 1908: Mendelian proportions in a mixed population.Science, 28 (706), 49–50
Hardy, G. H., and Coauthors, 1908: Mendelian proportions in a mixed population.Science, 28 (706), 49–50
work page 1908
-
[9]
Marees, A. T., H. De Kluiver, S. Stringer, F. V orspan, E. Curis, C. Marie-Claire, and E. M. Derks, 2018: A tutorial on conducting genome-wide association studies: Quality control and statistical analysis.International journal of methods in psychiatric research,27 (2), e1608
work page 2018
- [10]
-
[11]
Purcell, S., and Coauthors, 2007: Plink: a tool set for whole-genome association and population- based linkage analyses.The American journal of human genetics,81 (3), 559–575
work page 2007
-
[12]
Rhie, A., and Coauthors, 2023: The complete sequence of a human y chromosome.Nature, 621 (7978), 344–354
work page 2023
-
[13]
Troendle, J., and K. Yu, 1994: A note on testing the hardy-weinberg law across strata.Annals of human genetics,58 (4), 397–402. 42
work page 1994
-
[14]
Wang, Z., A. D. Paterson, and L. Sun, 2024: A population-aware retrospective regression to detect genome-wide variants with sex difference in allele frequency.The Annals of Applied Statistics, 18 (2), 1113–1136
work page 2024
-
[15]
Wang, Z., L. Sun, and A. D. Paterson, 2022: Major sex differences in allele frequencies for X chromosomal variants in both the 1000 Genomes Project and gnomAD.PLOS Genetics,18 (5), e1010 231, doi:10.1371/journal.pgen.1010231, URL https://dx.plos.org/10.1371/journal.pgen. 1010231
-
[16]
Webster, T. H., M. Couse, B. M. Grande, E. Karlins, T. N. Phung, P. A. Richmond, W. Whitford, and M. A. Wilson, 2019: Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data.Gigascience,8 (7), giz074
work page 2019
-
[17]
Weinberg, W., 1908: On the demonstration of heredity in man.(1963) Papers on Human Genetics
work page 1908
-
[18]
Weir, B., 1996: Genetic analysis ii.Sinauer: Sunderland, MA
work page 1996
-
[19]
Zhang, L., L. J. Strug, and L. Sun, 2023: Leveraging hardy–weinberg disequilibrium for associa- tion testing in case-control studies.The Annals of Applied Statistics,17 (2), 1764–1781
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.