A Three-groups Non-local Model for Combining Heterogeneous Data Sources to Identify Genes Associated with Parkinson's Disease
Pith reviewed 2026-05-23 23:51 UTC · model grok-4.3
The pith
A hierarchical three-group mixture model combines any number of data modalities such as GWAS and RNA-seq into one probability model to identify genes linked to Parkinson's disease.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the three-group formalism, by apportioning prior probability via a Dirichlet distribution and by specifying conditional distributions for each experiment type given the group label, permits any number of data modalities to be combined inside one probability model; the resulting posterior group probabilities deliver both automatic multiplicity correction and information sharing that together yield fewer false positives and greater detection power than separate analyses.
What carries the argument
The hierarchical three-group mixture model with Dirichlet priors on group membership probabilities and conditional likelihoods for each data modality given the group label.
If this is right
- Additional data modalities can be added by specifying only their conditional distributions given the three group labels.
- Posterior probabilities for each gene being deleterious or beneficial serve as direct, multiplicity-adjusted measures of evidence.
- The same model structure can be reused for other complex traits by swapping in the appropriate experiment types.
- Simulations confirm that the shared information improves power relative to analyzing each modality in isolation.
Where Pith is reading between the lines
- The conditional-modeling approach could be tested on other multi-omics problems such as combining proteomics with methylation data.
- If the three-group assumption is relaxed to more categories, the Dirichlet construction would still provide automatic multiplicity control.
- The framework suggests that public repositories of GWAS and expression data could be jointly re-analyzed for many diseases with minimal new experimental cost.
Load-bearing premise
The three-group labels and the conditional distributions of each data type given those labels correctly capture the relevant biology.
What would settle it
A dataset containing known Parkinson's genes in which the model either misses a substantial fraction of them or returns many genes later shown by orthogonal experiments to have no association.
Figures
read the original abstract
We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the family of statistical models that we propose is a hierarchical three-group mixture of distributions. Each gene is modeled probabilistically as belonging to either a null group that is unassociated with PD, a deleterious group, or a beneficial group. This three-group formalism has two key features. By apportioning prior probability of group assignments with a Dirichlet distribution, the resultant posterior group probabilities automatically account for the multiplicity inherent in analyzing many genes simultaneously. By building models for experimental outcomes conditionally on the group labels, any number of data modalities may be combined in a single coherent probability model, allowing information sharing across experiment types. These two features result in parsimonious inference with few false positives, while simultaneously enhancing power to detect signals. Simulations show that our three-groups approach performs at least as well as commonly-used tools for GWAS and RNA-seq, and in some cases it performs better. We apply our proposed approach to publicly-available GWAS and RNA-seq datasets, discovering novel genes that are potential therapeutic targets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a hierarchical three-group mixture model (null, deleterious, beneficial) with Dirichlet prior on group probabilities for integrating heterogeneous data modalities such as GWAS and RNA-seq to identify Parkinson's Disease associated genes. By modeling experimental outcomes conditionally on latent group labels, the framework enables information sharing across experiment types while the Dirichlet prior provides automatic multiplicity correction. Simulations are reported to show performance at least as good as standard tools, with a real-data application yielding novel candidate genes.
Significance. If the modeling assumptions hold, the approach supplies a coherent joint probability model for multi-modal integration that automatically handles multiple testing and can improve power for gene discovery. The explicit use of simulations benchmarked against common GWAS and RNA-seq tools plus a real-data application constitute concrete strengths that allow direct assessment of operating characteristics.
major comments (2)
- [Abstract and Methods] Abstract and Methods: the central claim that conditioning on the three-group labels 'result[s] in parsimonious inference with few false positives, while simultaneously enhancing power' is load-bearing on the assumption that the three discrete groups exhaust the relevant biology and that the chosen conditional distributions p(data|group) for each modality are correctly specified; the manuscript provides no model diagnostics, posterior predictive checks, or sensitivity analyses to this assumption for PD biology.
- [Simulations] Simulations section: performance comparisons to standard tools are presented, but without reporting whether the simulated data-generating processes match the model's assumed conditional distributions (or quantifying mismatch), it is unclear whether reported gains in power and false-positive control are attributable to the three-group structure or to favorable simulation design.
minor comments (2)
- [Title] Title uses 'non-local model' while the abstract describes a standard hierarchical mixture; clarify whether 'non-local' refers to a specific prior construction or is used in a different sense.
- [Abstract] The abstract states that the Dirichlet prior 'automatically account[s] for the multiplicity'; a brief derivation or reference to the implied false-discovery-rate control would strengthen the claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment below and agree that the points raised identify areas where the manuscript can be strengthened through additional clarification and analyses.
read point-by-point responses
-
Referee: [Abstract and Methods] Abstract and Methods: the central claim that conditioning on the three-group labels 'result[s] in parsimonious inference with few false positives, while simultaneously enhancing power' is load-bearing on the assumption that the three discrete groups exhaust the relevant biology and that the chosen conditional distributions p(data|group) for each modality are correctly specified; the manuscript provides no model diagnostics, posterior predictive checks, or sensitivity analyses to this assumption for PD biology.
Authors: We agree that the validity of the central claim depends on the three-group structure and conditional distributions being reasonable approximations to the underlying biology. The groups (null, deleterious, beneficial) are chosen to reflect the primary modes of gene association with PD, and the conditional distributions follow standard parametric forms used in GWAS (effect-size or z-score models) and RNA-seq (count-based models). The submitted manuscript does not contain formal posterior predictive checks or sensitivity analyses. We will add these in revision, including sensitivity to the Dirichlet hyperparameters and to alternative conditional distributions, together with posterior predictive diagnostics on both simulated and real data. revision: yes
-
Referee: [Simulations] Simulations section: performance comparisons to standard tools are presented, but without reporting whether the simulated data-generating processes match the model's assumed conditional distributions (or quantifying mismatch), it is unclear whether reported gains in power and false-positive control are attributable to the three-group structure or to favorable simulation design.
Authors: The simulations were generated under a range of scenarios intended to include both close alignment with the model's conditional distributions and moderate misspecification. The manuscript, however, does not explicitly report the degree of match or mismatch between the simulation data-generating processes and the model's assumptions. We will revise the Simulations section to describe the data-generation mechanism in detail, quantify the alignment (or deviation) for each scenario, and present performance results stratified by the degree of model match. revision: yes
Circularity Check
No circularity: model is a standard hierarchical construction with independent simulation validation
full rationale
The paper defines a new hierarchical three-group mixture model with Dirichlet prior on group probabilities and modality-specific conditional distributions given latent group labels. This structure is constructed directly from the modeling assumptions rather than derived from or reduced to any fitted input or self-citation. Simulations and real-data application are presented as external checks, not as part of the derivation chain. No equations reduce by construction to their inputs, and no load-bearing self-citations or ansatzes are invoked. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- Dirichlet concentration parameters
- Parameters of conditional distributions
axioms (3)
- domain assumption Genes belong to exactly one of three groups: null, deleterious, or beneficial
- domain assumption Dirichlet distribution appropriately apportions prior group probabilities to handle multiplicity
- domain assumption Conditional distributions for experimental outcomes given group labels can be specified for any data modality
Reference graph
Works this paper leans on
-
[1]
M., Monzón-Sandoval, J., Bowden, R., Alegre-Abarrategui, J., Wade-Martins, R., and Webber, C
Agarwal, D., Sandor, C., Volpato, V., Caffrey, T. M., Monzón-Sandoval, J., Bowden, R., Alegre-Abarrategui, J., Wade-Martins, R., and Webber, C. (2020). A single-cell atlas of the human substantia nigra reveals cell-specific pathways associated with neurological disorders. Nature communications , 11(1):4183--4183
work page 2020
-
[2]
Albert-Gasc \'o , H., Ros-Bernal, F., Castillo-G \'o mez, E., and Olucha-Bordonau, F. E. (2020). MAP/ERK signaling in developing cognitive and emotional function and its effect on pathological and neurodegenerative processes. International journal of molecular sciences , 21(12):4471
work page 2020
-
[3]
Asimit, J. L., Day-Williams, A. G., Morris, A. P., and Zeggini, E. (2012). ARIEL and AMELIA : testing for an accumulation of rare variants using next-generation sequencing data. Human Heredity , 73(2):84--94
work page 2012
-
[4]
Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. The Annals of statistics , 32(3):870--897
work page 2004
-
[5]
Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics , 64(4):1215--1222
work page 2008
-
[6]
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B-Statistical Methodology , 57(1):289--300
work page 1995
-
[7]
Bose, A. and Beal, M. F. (2019). Mitochondrial dysfunction and oxidative stress in induced pluripotent stem cell models of P arkinson's disease. European Journal of Neuroscience , 49(4):525--532
work page 2019
- [8]
-
[9]
Brehm, N., Bez, F., Carlsson, T., Kern, B., Gispert, S., Auburger, G., and Cenci, M. (2015). A genetic mouse model of P arkinson’s disease shows involuntary movements and increased postsynaptic sensitivity to apomorphine. Molecular neurobiology , 52:1152--1164
work page 2015
-
[10]
Chang, M., Zhang, Y., Hui, Z., Wang, D., and Guo, H. (2020). IFRD1 regulates the asthmatic responses of airway via NF- B pathway. Molecular Immunology , 127:186--192
work page 2020
-
[11]
Clayton, D. F. and George, J. M. (1998). The synucleins: a family of proteins involved in synaptic function, plasticity, neurodegeneration and disease. Trends in neurosciences , 21(6):249--254
work page 1998
-
[12]
de Valpine , P., Paciorek, C., Turek, D., Michaud, N., Anderson-Bergman, C., Obermeyer, F., Wehrhahn Cortes , C., Rodrìguez, A., Temple Lang , D., and Paganin, S. (2023). NIMBLE : MCMC , Particle Filtering, and Programmable Hierarchical Modeling . R package version 1.0.1
work page 2023
-
[13]
de Valpine , P., Turek, D., Paciorek, C., Anderson-Bergman, C., Lang , D. T., and Bodik, R. (2017). Programming with models: writing statistical algorithms for general model structures with NIMBLE . Journal of Computational and Graphical Statistics , 26:403--413
work page 2017
-
[14]
Y., Li, S., Narasimhan, B., and Tibshirani, R
Ding, D. Y., Li, S., Narasimhan, B., and Tibshirani, R. (2022). Cooperative learning for multiview analysis. Proceedings of the National Academy of Sciences , 119(38):e2202113119
work page 2022
-
[15]
Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001). Empirical bayes analysis of a microarray experiment. Journal of the American Statistical Association , 96(456):1151--1160
work page 2001
-
[16]
Esposito, G., Ana Clara, F., and Verstreken, P. (2012). Synaptic vesicle trafficking and P arkinson's disease. Developmental neurobiology , 72(1):134--144
work page 2012
-
[17]
Fisher, R. (1929). Tests of significance in harmonic analysis. Proceedings of the Royal Society of London Series A-Mathematical and Physical Sciences , 125(796):54--59
work page 1929
-
[18]
Gazal, S., Weissbrod, O., Hormozdiari, F., Dey, K. K., Nasser, J., Jagadeesh, K. A., Weiner, D. J., Shi, H., Fulco, C. P., O’Connor, L. J., et al. (2022). Combining snp-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nature genetics , 54(6):827--836
work page 2022
-
[19]
Gelman, A. (2014). Bayesian data analysis . YBP Print DDA. CRC Press, Boca Raton, third edition
work page 2014
-
[20]
Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. The Annals of Statistics , 32(3):1035--1061
work page 2004
-
[21]
Gerard, D. (2020). Data-based RNA -seq simulations by binomial thinning. BMC bioinformatics , 21(1):206--206
work page 2020
-
[22]
Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association , 102(477):359--378
work page 2007
-
[23]
Green, P. J. (1995). Reversible jump M arkov chain M onte C arlo computation and B ayesian model determination. Biometrika , 82(4):711--732
work page 1995
-
[24]
Holzinger, E. R., Dudek, S. M., Frase, A. T., Pendergrass, S. A., and Ritchie, M. D. (2013). Athena: the analysis tool for heritable and environmental network associations. Bioinformatics , 30(5):698--705
work page 2013
-
[25]
Hu, W., Wang, M., Sun, G., Zhang, L., and Lu, H. (2024). Early b cell factor 3 ( EBF3 ) attenuates P arkinson's disease through directly regulating contactin-associated protein-like 4 ( CNTNAP4 ) transcription: An experimental study. Cellular Signalling , pages 111--139
work page 2024
-
[26]
Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. Journal of the American Statistical Association , 107(498):649--660
work page 2012
-
[27]
Johnson, V. E. E. and Rossell, D. (2010). On the use of non-local prior densities in bayesian hypothesis tests. Journal of the Royal Statistical Society. Series B, Statistical methodology , 72(2):143--170
work page 2010
-
[28]
Kochmanski, J., Kuhn, N. C., and Bernstein, A. I. (2022). Parkinson's disease-associated, sex-specific changes in DNA methylation at PARK7 ( DJ -1), SLC17A6 ( VGLUT2 ), PTPRN2 ( IA -2 ), and NR4A2 ( NURR1 ) in cortical neurons. npj Parkinson's Disease , 8(1):120
work page 2022
-
[29]
W., Chen, Y., Shi, W., and Smyth, G
Law, C. W., Chen, Y., Shi, W., and Smyth, G. K. (2014). voom: precision weights unlock linear model analysis tools for rna-seq read counts. Genome biology , 15(2):R29--R29
work page 2014
-
[30]
Li, B. and Leal, S. M. (2008). Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. The American Journal of Human Genetics , 83(3):311--321
work page 2008
-
[31]
Li, L., Wang, H., Li, H., Lu, X., Gao, Y., and Guo, X. (2022). Long noncoding rna bace1-antisense transcript plays a critical role in P arkinson's disease via microRNA -214-3p/cell death-inducing p53-target protein 1 axis. Bioengineered , 13(4):10889--10901
work page 2022
-
[32]
Li, W. and Chekouo, T. (2022). Bayesian group selection with non-local priors. Computational statistics , 37(1):287--302
work page 2022
-
[33]
Li, Y., Wu, F.-X., and Ngom, A. (2018). A review on machine learning principles for multi-view biological data integration. Briefings in bioinformatics , 19(2):325--340
work page 2018
-
[34]
Liu, H., Ho, P. W.-L., Leung, C.-T., Pang, S. Y.-Y., Chang, E. E. S., Choi, Z. Y.-K., Kung, M. H.-W., Ramsden, D. B., and Ho, S.-L. (2021). Aberrant mitochondrial morphology and function associated with impaired mitophagy and DNM1L-MAPK/ERK signaling are found in aged mutant P arkinsonian LRRK2R1441G mice. Autophagy , 17(10):3196--3220
work page 2021
-
[35]
Liu, Y.-X., Wang, J., Guo, J., Wu, J., Lieberman, H. B., and Yin, Y. (2008). DUSP1 is controlled by p53 during the cellular response to oxidative stress. Molecular Cancer Research , 6(4):624--633
work page 2008
-
[36]
Love, M. I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome biology , 15(12):550--550
work page 2014
-
[37]
Ma, J., Dong, L., Chang, Q., Chen, S., Zheng, J., Li, D., Wu, S., Yang, H., and Li, X. (2023). CXCR4 knockout induces neuropathological changes in the MPTP -lesioned model of P arkinson's disease. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease , 1869(2):166597
work page 2023
-
[38]
Madsen, B. E. and Browning, S. R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genetics , 5(2):e1000384
work page 2009
-
[39]
Mochizuki, H., Goto, K., Mori, H., and Mizuno, Y. (1996). Histochemical detection of apoptosis in P arkinson's disease. Journal of the neurological sciences , 137(2):120--123
work page 1996
-
[40]
Montgomery, S. B., Dermitzakis, E. T., Sammeth, M., Gutierrez-Arcelus, M., Lach, R. P., Ingle, C., Nisbett, J., and Guigo, R. (2010). Transcriptome genetics using second generation sequencing in a caucasian population. Nature (London) , 464(7289):773--777
work page 2010
-
[41]
Morais, V. A., Verstreken, P., Roethig, A., Smet, J., Snellinx, A., Vanbrabant, M., Haddad, D., Frezza, C., Mandemakers, W., Vogt-Weisenhorn, D., et al. (2009). Parkinson's disease mutations in PINK1 result in decreased complex I activity and deficient synaptic function. EMBO molecular medicine , 1(2):99--111
work page 2009
-
[42]
Morgenthaler, S. and Thilly, W. G. (2007). A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (cast). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis , 615(1):28--56
work page 2007
-
[43]
Nalls, M. A., Pankratz, N., Lill, C. M., Do, C. B., Hernandez, D. G., Saad, M., DeStefano, A. L., Kara, E., Bras, J., Sharma, M., Schulte, C., Keller, M. F., Arepalli, S., Letson, C., Edsall, C., Stefansson, H., Liu, X., Pliner, H., Lee, J. H., Cheng, R., Ikram, M. A., Ioannidis, J. P. A., Hadjigeorgiou, G. M., Bis, J. C., Martinez, M., Perlmutter, J. S.,...
work page 2014
-
[44]
Nikooienejad, A., Wang, W., and Johnson, V. E. (2016). Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors. BIOINFORMATICS , 32(9):1338--1345
work page 2016
-
[45]
M., Alexopoulou, Z., and Tofaris, G
Perrett, R. M., Alexopoulou, Z., and Tofaris, G. K. (2015). The endosomal pathway in P arkinson's disease. Molecular and Cellular Neuroscience , 66:21--28
work page 2015
-
[46]
Pickrell, J. K., Gilad, Y., Pritchard, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J.-B., and Stephens, M. (2010). Understanding mechanisms underlying human gene expression variation with rna sequencing. Nature (London) , 464(7289):768--772
work page 2010
-
[47]
Richardson, S., Tseng, G. C., and Sun, W. (2016). Statistical methods in integrative genomics. Annual review of statistics and its application , 3:181--209
work page 2016
-
[48]
Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015). Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews. Genetics , 16(2):85
work page 2015
-
[49]
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edgeR : a bioconductor package for differential expression analysis of digital gene expression data. BIOINFORMATICS , 26(1):139--140
work page 2010
-
[50]
Schilder, B. M. and Raj, T. (2022). Fine-mapping of P arkinson’s disease susceptibility loci identifies putative causal variants. Human Molecular Genetics , 31(6):888--900
work page 2022
-
[51]
Scott, J. G. and Berger, J. O. (2010). Bayes and empirical- B ayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics. , 38(5):2587--2619
work page 2010
-
[52]
Shin, M., Bhattacharya, A., and Johnson, V. E. (2018). Scalable B ayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Statist. Sinica , 28(2):1053--1078
work page 2018
-
[53]
Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A., and Williams Jr, R. M. (1949). The A merican soldier: A djustment during army life, V ol. I . Studies in Social Psychology World War II
work page 1949
-
[54]
A., Harvey, K., Rossi, L., Ferraina, C., De Biase, V., Rodolfo, C., Harvey, R
Strobbe, D., Robinson, A. A., Harvey, K., Rossi, L., Ferraina, C., De Biase, V., Rodolfo, C., Harvey, R. J., and Campanella, M. (2018). Distinct mechanisms of pathogenic DJ -1 mutations in mitochondrial quality control. Frontiers in Molecular Neuroscience , 11:68
work page 2018
-
[55]
Tan, E.-K., Chao, Y.-X., West, A., Chan, L.-L., Poewe, W., and Jankovic, J. (2020). Parkinson disease and the immune system—associations, mechanisms and therapeutics. Nature Reviews Neurology , 16(6):303--318
work page 2020
-
[56]
G., Chalmers-Redman, R., Brown, D., and Tatton, N
Tatton, W. G., Chalmers-Redman, R., Brown, D., and Tatton, N. (2003). Apoptosis in P arkinson's disease: signals for neuronal degradation. Annals of Neurology: Official Journal of the American Neurological Association and the Child Neurology Society , 53(S3):S61--S72
work page 2003
-
[57]
Toskas, K., Yaghmaeian-Salmani, B., Skiteva, O., Paslawski, W., Gillberg, L., Skara, V., Antoniou, I., S \"o dersten, E., Svenningsson, P., Chergui, K., et al. (2022). PRC2 -mediated repression is essential to maintain identity and function of differentiated dopaminergic and serotonergic neurons. Science Advances , 8(34):eabo1543
work page 2022
-
[58]
Tyekucheva, S., Marchionni, L., Karchin, R., and Parmigiani, G. (2011). Integrating diverse genomic data using gene sets. Genome biology , 12(10):R105--R105
work page 2011
-
[59]
Uffelmann, E., Huang, Q. Q., Munung, N. S., De Vries, J., Okada, Y., Martin, A. R., Martin, H. C., Lappalainen, T., and Posthuma, D. (2021). Genome-wide association studies. Nature Reviews Methods Primers , 1(1):59
work page 2021
-
[60]
Usenko, T., Bezrukova, A., Basharova, K., Panteleeva, A., Nikolaev, M., Kopytova, A., Miliukhina, I., Emelyanov, A., Zakharova, E., and Pchelina, S. (2021). Comparative transcriptome analysis in monocyte-derived macrophages of asymptomatic gba mutation carriers and patients with GBA -associated P arkinson’s disease. Genes , 12(10):1545
work page 2021
-
[61]
Wang, J., Zhou, J.-Y., Kho, D., Reiners Jr, J. J., and Wu, G. S. (2016). Role for DUSP1 (dual-specificity protein phosphatase 1) in the regulation of autophagy. Autophagy , 12(10):1791--1803
work page 2016
-
[62]
Zaltieri, M., Longhena, F., Pizzi, M., Missale, C., Spano, P., Bellucci, A., et al. (2015). Mitochondrial dysfunction - S ynuclein synaptic pathology in P arkinson’s disease: who’s on first? Parkinson's disease , 2015
work page 2015
-
[63]
Zhao, C., Datta, S., Mandal, P., Xu, S., and Hamilton, T. (2010). Stress-sensitive regulation of IFRD1 mRNA decay is mediated by an upstream open reading frame. Journal of Biological Chemistry , 285(12):8552--8562
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.