pith. machine review for the scientific record. sign in

arxiv: 2603.06768 · v2 · submitted 2026-03-06 · 🧬 q-bio.GN

Recognition: no theorem link

Benchmarking end-to-end genotype-to-phenotype prediction workflows across 80 openSNP phenotypes

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:37 UTC · model grok-4.3

classification 🧬 q-bio.GN
keywords genotype-to-phenotype predictionpolygenic scoresmachine learningdeep learningbenchmarkingopenSNPcase-control discriminationworkflow comparison
0
0 comments X

The pith

No workflow family dominates end-to-end genotype-to-phenotype prediction across 80 openSNP phenotypes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks 29 machine-learning algorithms, 80 deep-learning variants, and three polygenic score tools across 675 preprocessing configurations on 80 binary phenotypes from openSNP. It finds that polygenic score workflows reach the highest discrimination on 53 phenotypes while machine-learning or deep-learning workflows do so on 27, yet 41.2 percent of head-to-head comparisons are practical ties within five discrimination points. Results vary strongly by phenotype and are sensitive to modeling and preprocessing choices, with distinct failure modes such as unstable behavior in one polygenic score tool and collapse to non-informative predictions in another for 13 phenotypes. Higher peak performance appears concentrated in smaller phenotypes, and the cohort is mostly European ancestry.

Core claim

No workflow family dominates universally across the 80 phenotypes. Polygenic score methods deliver the single highest observed discrimination for 53 phenotypes, machine-learning or deep-learning methods do so for 27, and 41.2 percent of direct comparisons register as practical ties within five discrimination points; performance remains strongly phenotype-dependent and sensitive to preprocessing and modeling decisions.

What carries the argument

End-to-end case-control discrimination measured across machine-learning, deep-learning, and polygenic score workflows on the same 80 curated binary phenotypes from openSNP, using 675 clumping and pruning settings.

If this is right

  • Workflow selection for genotype-to-phenotype prediction must be guided by the specific phenotype rather than by a universal ranking.
  • Practical ties in 41 percent of comparisons imply that simpler or faster workflows can often be substituted without meaningful loss of discrimination.
  • Distinct failure modes in individual tools, such as instability or collapse to non-informative output, require explicit checks before deployment.
  • Peak performance concentrated in smaller phenotypes signals that claims based on limited data need cautious interpretation.
  • The openSNP resource functions as a stress-test environment for evaluating new workflows under realistic data scarcity and heterogeneity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Workflow choice may need to be phenotype-specific in clinical or research pipelines rather than fixed in advance.
  • Future benchmarks could test whether combining top workflows per phenotype yields gains beyond the single best method.
  • The observed sensitivity to preprocessing suggests that automated configuration search may be necessary for reliable application.
  • Results limited to European ancestry leave open whether the same relative pattern holds in other ancestral groups.

Load-bearing premise

The 80 curated openSNP phenotypes and the predominantly European-ancestry participants form a representative test bed that does not introduce major biases from shared data, ancestry imbalance, or phenotype heterogeneity that would invert the relative performance rankings.

What would settle it

A repeat of the full benchmark on an independent, larger, non-European-ancestry cohort that produces a different ordering of workflow families or fewer ties.

read the original abstract

Genotype-to-phenotype prediction is a central goal of statistical genetics, yet practical comparisons of prediction workflows remain limited in small, heterogeneous, participant-shared genomic datasets. Here, we benchmarked end-to-end case-control prediction across 80 curated binary phenotypes from openSNP using machine learning, deep learning, and polygenic score workflows. We evaluated 29 machine-learning algorithms, 80 deep-learning model variants, and 3 polygenic score tools across 675 clumping and pruning configurations. No workflow family dominated universally. Polygenic score workflows achieved the highest observed discrimination for 53 phenotypes, whereas machine-learning or deep-learning workflows achieved the highest for 27. However, many apparent phenotype-level wins were modest, with 41.2\% of comparisons representing practical ties within five discrimination points. Performance was strongly phenotype-dependent and sensitive to modeling and preprocessing choices. Distinct workflow-specific failure modes were also observed, including unstable behaviour in PRSice and non-informative collapse in lassosum for 13 phenotypes. Higher peak performance was concentrated in smaller phenotypes, reinforcing the need for cautious interpretation in limited-data settings. The cohort was predominantly of European ancestry, restricting generalisability. Together, these results position openSNP as a useful stress-test environment for genomic prediction and support benchmark-guided workflow selection under realistic conditions of data scarcity, phenotype heterogeneity, and ancestry imbalance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper benchmarks end-to-end case-control genotype-to-phenotype prediction across 80 curated binary phenotypes from openSNP. It evaluates 29 machine-learning algorithms, 80 deep-learning model variants, and 3 polygenic score tools over 675 clumping/pruning configurations. No workflow family dominates universally: polygenic score workflows achieve the highest observed discrimination for 53 phenotypes while machine-learning or deep-learning workflows do so for 27 phenotypes, with 41.2% of comparisons classified as practical ties (within five discrimination points). Performance is strongly phenotype-dependent, higher peak values concentrate in smaller phenotypes, and distinct failure modes are reported (e.g., unstable PRSice behaviour and lassosum collapse for 13 phenotypes). The cohort is predominantly European-ancestry, limiting generalisability.

Significance. If the relative rankings and failure-mode observations hold after appropriate statistical controls, the work supplies a useful empirical stress-test of genomic prediction methods under realistic constraints of small sample sizes, phenotype heterogeneity, participant-shared data, and ancestry imbalance. It supplies concrete guidance on workflow selection in data-scarce settings and positions openSNP as a reproducible benchmark resource, which is a practical contribution to statistical genetics.

major comments (2)
  1. [Results (53-vs-27 split and tie-rate paragraph)] Results section reporting the 53-vs-27 split and 41.2% tie rate: the assignment of 'highest observed discrimination' per phenotype is presented without per-phenotype statistical tests (DeLong, bootstrap CIs, or permutation p-values) or error bars on the discrimination metric. Given the small, heterogeneous openSNP cohorts, many apparent wins are likely indistinguishable from ties within sampling error; the five-point tie threshold is uncalibrated to the metric's standard error, so the headline counts may not survive noise-aware re-analysis.
  2. [Results (phenotype-size paragraph)] Methods and results on phenotype-size dependence: the observation that higher peak performance concentrates in smaller phenotypes is reported but not accompanied by a formal test of the size-performance relationship or sensitivity analysis excluding the smallest cohorts; this weakens the claim that the benchmark is representative for typical GWAS-scale phenotypes.
minor comments (3)
  1. [Abstract] Abstract: specify the exact discrimination metric (AUC-ROC, AUPRC, etc.) and whether any multiple-testing correction was applied across the 80 phenotypes.
  2. [Methods/Results tables] Table or figure legends: ensure all 29 ML algorithms, 80 DL variants, and 3 PGS tools are listed with version numbers and hyper-parameter ranges so that the 675 configurations are fully reproducible.
  3. [Discussion] Discussion: the statement that openSNP forms a 'useful stress-test environment' would be strengthened by a short quantitative comparison of openSNP cohort sizes and ancestry composition against a standard biobank such as UK Biobank.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on statistical interpretation and robustness. We address each major point below, proposing targeted revisions to strengthen the manuscript while preserving the descriptive nature of the benchmark.

read point-by-point responses
  1. Referee: [Results (53-vs-27 split and tie-rate paragraph)] Results section reporting the 53-vs-27 split and 41.2% tie rate: the assignment of 'highest observed discrimination' per phenotype is presented without per-phenotype statistical tests (DeLong, bootstrap CIs, or permutation p-values) or error bars on the discrimination metric. Given the small, heterogeneous openSNP cohorts, many apparent wins are likely indistinguishable from ties within sampling error; the five-point tie threshold is uncalibrated to the metric's standard error, so the headline counts may not survive noise-aware re-analysis.

    Authors: We agree that uncertainty quantification would enhance interpretability. In revision we will add bootstrap confidence intervals (1000 resamples) for the discrimination metric of each workflow per phenotype and report them alongside the observed values. We will also include a sensitivity table showing how the 53-vs-27 split and tie rate change under tie thresholds of 3, 5, and 7 points. The five-point threshold was chosen as a conservative practical margin reflecting typical small-sample variability in AUC-like metrics; the counts remain descriptive of observed performance rather than formal superiority claims. revision: partial

  2. Referee: [Results (phenotype-size paragraph)] Methods and results on phenotype-size dependence: the observation that higher peak performance concentrates in smaller phenotypes is reported but not accompanied by a formal test of the size-performance relationship or sensitivity analysis excluding the smallest cohorts; this weakens the claim that the benchmark is representative for typical GWAS-scale phenotypes.

    Authors: We accept that a formal test and sensitivity check are warranted. We will add a Spearman rank-correlation analysis between case count and peak discrimination, together with a sensitivity analysis that repeats the size-performance summary after excluding phenotypes with fewer than 50 cases. These additions will clarify the trend within the data-scarce regime that openSNP represents while acknowledging limits for larger GWAS cohorts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; pure empirical benchmarking with no derivations

full rationale

The paper reports direct empirical comparisons of 29 ML algorithms, 80 DL variants, and 3 PGS tools across 675 configurations on 80 held-out openSNP binary phenotypes. Central claims consist of observed counts (PGS highest for 53 phenotypes, ML/DL for 27, 41.2% practical ties) and phenotype-dependent failure modes. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citation chains appear in the derivation chain. All results reduce to standard performance metrics computed on external data splits, rendering the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard domain assumptions of statistical genetics and machine learning without new free parameters or invented entities.

axioms (1)
  • domain assumption Standard assumptions in machine learning and statistical genetics for case-control prediction hold, such as approximate sample independence after quality control and appropriate handling of population structure.
    Invoked implicitly throughout the benchmarking of prediction workflows on openSNP data.

pith-pipeline@v0.9.0 · 5559 in / 1354 out tokens · 44872 ms · 2026-05-15T15:37:49.671014+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages

  1. [1]

    Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

    Rostam Abdollahi-Arpanahi, Daniel Gianola, and Francisco Pe˜ nagaricano. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genetics Selection Evolution, 52(1), February 2020

  2. [2]

    Performing post-genome-wide association study analysis: overview, challenges and recommendations.F1000Research, 10:1002, October 2021

    Yagoub Adam, Chaimae Samtal, Jean tristan Brandenburg, Oluwadamilare Falola, and Ezekiel Adebiyi. Performing post-genome-wide association study analysis: overview, challenges and recommendations.F1000Research, 10:1002, October 2021

  3. [3]

    Deep learning improves pancreatic cancer diagnosis using rna-based variants.Cancers, 13(11):2654, 2021

    Ali Al-Fatlawi, Negin Malekian, Sebasti´ an Garc´ ıa, Andreas Henschel, Ilwook Kim, Andreas Dahl, Beatrix Jahnke, Peter Bailey, Sarah Naomi Bolz, Anna R Poetsch, et al. Deep learning improves pancreatic cancer diagnosis using rna-based variants.Cancers, 13(11):2654, 2021

  4. [4]

    A design of polygenic risk model with deep learning for colorectal cancer in multiethnic indonesians.Procedia Computer Science, 179:632–639, 2021

    Steven Amadeus, Tjeng Wawan Cenggoro, Arif Budiarto, and Bens Pardamean. A design of polygenic risk model with deep learning for colorectal cancer in multiethnic indonesians.Procedia Computer Science, 179:632–639, 2021

  5. [5]

    Data quality control in genetic case-control association studies.Nature Protocols, 5(9):1564–1573, August 2010

    Carl A Anderson, Fredrik H Pettersson, Geraldine M Clarke, Lon R Cardon, Andrew P Morris, and Krina T Zondervan. Data quality control in genetic case-control association studies.Nature Protocols, 5(9):1564–1573, August 2010

  6. [6]

    Reynolds, and Chongle Pan

    Adrien Badr´ e, Li Zhang, Wellington Muchero, Justin C. Reynolds, and Chongle Pan. Deep neural network improves the estimation of polygenic risk scores for breast cancer. Journal of Human Genetics, 66(4):359–369, October 2020

  7. [7]

    Denny, and Dan M

    Lisa Bastarache, Joshua C. Denny, and Dan M. Roden. Phenome-wide association studies.JAMA, 327(1):75, January 2022

  8. [8]

    Machine learning for genetic prediction of psychiatric disorders: a systematic review.Molecular Psychiatry, 26(1):70–79, June 2020

    Matthew Bracher-Smith, Karen Crawford, and Valentina Escott-Price. Machine learning for genetic prediction of psychiatric disorders: a systematic review.Molecular Psychiatry, 26(1):70–79, June 2020

  9. [9]

    Genomic selection: A tool for accelerating the efficiency of molecular breeding for development of climate-resilient crops.Frontiers in Genetics, 13, February 2022

    Neeraj Budhlakoti, Amar Kant Kushwaha, Anil Rai, K K Chaturvedi, Anuj Kumar, Anjan Kumar Pradhan, Uttam Kumar, Rajeev Ranjan Kumar, Philomin Juliana, D C Mishra, and Sundeep Kumar. Genomic selection: A tool for accelerating the efficiency of molecular breeding for development of climate-resilient crops.Frontiers in Genetics, 13, February 2022

  10. [10]

    XGBoost: A scalable tree boosting system

    Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA,

  11. [11]

    Visscher, Zhihong Zhu, and Jian Yang

    Wenhan Chen, Yang Wu, Zhili Zheng, Ting Qi, Peter M. Visscher, Zhihong Zhu, and Jian Yang. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors.Nature Communications, 12(1), December 2021

  12. [12]

    O’Reilly

    Shing Wan Choi, Timothy Shin-Heng Mak, and Paul F. O’Reilly. Tutorial: a guide to performing polygenic risk score analyses.Nature Protocols, 15(9):2759–2772, July 2020

  13. [13]

    Collister, Xiaonan Liu, and Lei Clifton

    Jennifer A. Collister, Xiaonan Liu, and Lei Clifton. Calculating polygenic risk scores (PRS) in UK biobank: A practical guide for epidemiologists.Frontiers in Genetics, 13, February 2022

  14. [14]

    Cope, Hannes A

    Justin L. Cope, Hannes A. Baukmann, J¨ orn E. Klinger, Charles N. J. Ravarani, Erwin P. B¨ ottinger, Stefan Konigorski, and Marco F. Schmidt. Interaction-based feature selection algorithm outperforms polygenic risk score in predicting parkinson’s disease status.Frontiers in Genetics, 12, October 2021

  15. [15]

    J. Crossa. Methodologies for estimating the sample size required for genetic conservation of outbreeding crops. Theoretical and Applied Genetics, 77(2):153–161, February 1989

  16. [16]

    Pal, Kunal Kundu, Yizhou Yin, John Moult, Yuxiang Jiang, Vikas Pejaver, Kymberleigh A

    Roxana Daneshjou, Yanran Wang, Yana Bromberg, Samuele Bovo, Pier L Martelli, Giulia Babbi, Pietro Di Lena, Rita Casadio, Matthew Edwards, David Gifford, David T Jones, Laksshman Sundaram, Rajendra Rana Bhat, Xiaolin Li, Lipika R. Pal, Kunal Kundu, Yizhou Yin, John Moult, Yuxiang Jiang, Vikas Pejaver, Kymberleigh A. Pagel, Biao Li, Sean D. Mooney, Predrag ...

  17. [17]

    Danilevicz, Mitchell Gill, Robyn Anderson, Jacqueline Batley, Mohammed Bennamoun, Philipp E

    Monica F. Danilevicz, Mitchell Gill, Robyn Anderson, Jacqueline Batley, Mohammed Bennamoun, Philipp E. Bayer, and David Edwards. Plant genotype to phenotype prediction using machine learning.Frontiers in Genetics, 13, May 2022

  18. [18]

    Vallejo, and Jose Gerardo Genomics, Proteomics & Bioinformatics, 2026, Volume XX, Issue x 9 Tamez-Pena

    Javier de Velasco Oriol, Antonio Martinez-Torteya, Victor Trevino, Israel Alanis, Edgar E. Vallejo, and Jose Gerardo Genomics, Proteomics & Bioinformatics, 2026, Volume XX, Issue x 9 Tamez-Pena. Benchmarking machine learning models for the analysis of genetic data using FRESA.CAD Binary Classification Benchmarking.bioRxiv, 2019

  19. [19]

    Rahul Dey and Fathi M. Salem. Gate-variants of gated recurrent unit (gru) neural networks. In2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pages 1597–1600, 2017

  20. [20]

    Danielle M. Dick. Gene-environment interaction in psychological traits and disorders.Annual Review of Clinical Psychology, 7(1):383–409, April 2011

  21. [21]

    Precision medicine via the integration of phenotype-genotype information in neonatal genome project

    Xinran Dong, Tiantian Xiao, Bin Chen, Yulan Lu, and Wenhao Zhou. Precision medicine via the integration of phenotype-genotype information in neonatal genome project. Fundamental Research, 2(6):873–884, November 2022

  22. [22]

    Enoma, Janet Bishung, Theresa Abiodun, Olubanke Ogunlana, and Victor Chukwudi Osamor

    David O. Enoma, Janet Bishung, Theresa Abiodun, Olubanke Ogunlana, and Victor Chukwudi Osamor. Machine learning approaches to genome-wide association studies. Journal of King Saud University - Science, 34(4):101847, June 2022

  23. [23]

    Lewis, and Paul F

    Jack Euesden, Cathryn M. Lewis, and Paul F. O’Reilly. PRSice: Polygenic risk score software.Bioinformatics, 31(9):1466–1468, December 2014

  24. [24]

    The (in)famous GWAS p-value threshold revisited and updated for low-frequency variants.European Journal of Human Genetics, 24(8):1202–1205, January 2016

    Jo˜ ao Fadista, Alisa K Manning, Jose C Florez, and Leif Groop. The (in)famous GWAS p-value threshold revisited and updated for low-frequency variants.European Journal of Human Genetics, 24(8):1202–1205, January 2016

  25. [25]

    Machine learning approach to single nucleotide polymorphism-based asthma prediction.PLOS ONE, 14(12):e0225574, December 2019

    Joverlyn Gaudillo, Jae Joseph Russell Rodriguez, Allen Nazareno, Lei Rigi Baltazar, Julianne Vilela, Rommel Bulalacao, Mario Domingo, and Jason Albia. Machine learning approach to single nucleotide polymorphism-based asthma prediction.PLOS ONE, 14(12):e0225574, December 2019

  26. [26]

    Nguyen, Jacqueline Batley, Philipp E

    Mitchell Gill, Robyn Anderson, Haifei Hu, Mohammed Bennamoun, Jakob Petereit, Babu Valliyodan, Henry T. Nguyen, Jacqueline Batley, Philipp E. Bayer, and David Edwards. Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction.BMC Plant Biology, 22(1), April 2022

  27. [27]

    Damian Gola, Jeannette Erdmann, Bertram M¨ uller-Myhsok, Heribert Schunkert, and Inke R. K¨ onig. Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status.Genetic Epidemiology, 44(2):125–138, January 2020

  28. [28]

    Bayer, Helge Rausch, and Julia Reda

    Bastian Greshake, Philipp E. Bayer, Helge Rausch, and Julia Reda. openSNP–a crowdsourced web resource for personal genomics.PLoS ONE, 9(3):e89204, March 2014

  29. [29]

    Grinberg, Oghenejokpeme I

    Nastasiya F. Grinberg, Oghenejokpeme I. Orhobor, and Ross D. King. An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat. Machine Learning, 109(2):251–277, October 2019

  30. [30]

    Machine learning for predicting phenotype from genotype and environment

    Tingting Guo and Xianran Li. Machine learning for predicting phenotype from genotype and environment. Current Opinion in Biotechnology, 79:102853, February 2023

  31. [31]

    Krawitz, Susanne B

    Yaron Gurovich, Yair Hanani, Omri Bar, Guy Nadav, Nicole Fleischer, Dekel Gelbman, Lina Basel-Salmon, Peter M. Krawitz, Susanne B. Kamphausen, Martin Zenker, Lynne M. Bird, and Karen W. Gripp. Identifying facial phenotypes of genetic disorders using deep learning.Nature Medicine, 25(1):60–64, January 2019

  32. [32]

    A machine learning pipeline for quantitative phenotype prediction from genotype data.BMC Bioinformatics, 11(S8), October 2010

    Giorgio Guzzetta, Giuseppe Jurman, and Cesare Furlanello. A machine learning pipeline for quantitative phenotype prediction from genotype data.BMC Bioinformatics, 11(S8), October 2010

  33. [33]

    Long short-term memory.Neural Computation, 9(8):1735–1780, November 1997

    Sepp Hochreiter and J¨ urgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, November 1997

  34. [34]

    Classification based on decision tree algorithm for machine learning

    Bahzad Jijo and Adnan Mohsin Abdulazeez. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2:20– 28, 01 2021

  35. [35]

    Kaler and Larry C

    Avjinder S. Kaler and Larry C. Purcell. Estimation of a significance threshold for genome-wide association studies. BMC Genomics, 20(1), July 2019

  36. [36]

    Nikoletta Katsaouni, Araek Tashkandi, Lena Wiese, and Marcel H. Schulz. Machine learning based disease prediction from genotype data.Biological Chemistry, 402(8):871–885, July 2021

  37. [37]

    Conceptualizing human variation.Nature Genetics, 36(S11):S17–S20, October 2004

    S O Y Keita, R A Kittles, C D M Royal, G E Bonney, P Furbert-Harris, G M Dunston, and C N Rotimi. Conceptualizing human variation.Nature Genetics, 36(S11):S17–S20, October 2004

  38. [38]

    Khera, Mark Chaffin, Krishna G

    Amit V. Khera, Mark Chaffin, Krishna G. Aragam, Mary E. Haas, Carolina Roselli, Seung Hoan Choi, Pradeep Natarajan, Eric S. Lander, Steven A. Lubitz, Patrick T. Ellinor, and Sekar Kathiresan. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations.Nature Genetics, 50(9):1219–1224, August 2018

  39. [39]

    Yield prediction through integration of genetic, environment, and management data through deep learning

    Daniel R Kick, Jason G Wallace, James C Schnable, Judith M Kolkman, Barı¸ s Alaca, Timothy M Beissinger, Jode Edwards, David Ertl, Sherry Flint-Garcia, Joseph L Gage, Candice N Hirsch, Joseph E Knoll, Natalia de Leon, Dayane C Lima, Danilo E Moreta, Maninder P Singh, Addie Thompson, Teclemariam Weldekidan, and Jacob D Washburn. Yield prediction through in...

  40. [40]

    The discovery of human genetic variations and their use as disease markers: past, present and future

    Chee Seng Ku, En Yun Loy, Agus Salim, Yudi Pawitan, and Kee Seng Chia. The discovery of human genetic variations and their use as disease markers: past, present and future. Journal of Human Genetics, 55(7):403–415, May 2010

  41. [41]

    Laurie, Kimberly F

    Cathy C. Laurie, Kimberly F. Doheny, Daniel B. Mirel, Elizabeth W. Pugh, Laura J. Bierut, Tushar Bhangale, Frederick Boehm, Neil E. Caporaso, Marilyn C. Cornelis, Howard J. Edenberg, Stacy B. Gabriel, Emily L. Harris, Frank B. Hu, Kevin B. Jacobs, Peter Kraft, Maria Teresa Landi, Thomas Lumley, Teri A. Manolio, Caitlin McHugh, Ian Painter, Justin Paschall...

  42. [42]

    Ross KK Leung, Ying Wang, Ronald CW Ma, Andrea OY Luk, Vincent Lam, Maggie Ng, Wing Yee So, Stephen KW Tsui, and Juliana CN Chan. Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case–control cohort analysis.BMC Nephrology, 14(1), July 20...

  43. [43]

    Lewis and Evangelos Vassos

    Cathryn M. Lewis and Evangelos Vassos. Polygenic risk scores: from research tools to clinical instruments.Genome Medicine, 12(1), May 2020

  44. [44]

    Transfer learning in genome-wide association studies with knockoffs.Sankhya B, November 2022

    Shuangning Li, Zhimei Ren, Chiara Sabatti, and Matteo Sesia. Transfer learning in genome-wide association studies with knockoffs.Sankhya B, November 2022

  45. [45]

    Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean.Frontiers in Genetics, 10, November 2019

    Yang Liu, Duolin Wang, Fei He, Juexin Wang, Trupti Joshi, and Dong Xu. Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean.Frontiers in Genetics, 10, November 2019

  46. [46]

    A deep convolutional neural network approach for predicting phenotypes from genotypes

    Wenlong Ma, Zhixu Qiu, Jie Song, Jiajia Li, Qian Cheng, Jingjing Zhai, and Chuang Ma. A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta, 248(5):1307–1318, August 2018

  47. [47]

    Genetic prediction of complex traits with polygenic scores: a statistical review.Trends in Genetics, 37(11):995–1011, November 2021

    Ying Ma and Xiang Zhou. Genetic prediction of complex traits with polygenic scores: a statistical review.Trends in Genetics, 37(11):995–1011, November 2021

  48. [48]

    Polygenic scores via penalized regression on summary statistics.Genetic Epidemiology, 41(6):469–480, May 2017

    Timothy Shin Heng Mak, Robert Milan Porsch, Shing Wan Choi, Xueya Zhou, and Pak Chung Sham. Polygenic scores via penalized regression on summary statistics.Genetic Epidemiology, 41(6):469–480, May 2017

  49. [49]

    Deep learning of individual aesthetics.Neural Computing and Applications, 33(1):3–17, October 2020

    Jon McCormack and Andy Lomas. Deep learning of individual aesthetics.Neural Computing and Applications, 33(1):3–17, October 2020

  50. [50]

    A logical calculus of the ideas immanent in nervous activity.The bulletin of mathematical biophysics, 5(4):115–133, 1943

    Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity.The bulletin of mathematical biophysics, 5(4):115–133, 1943

  51. [51]

    Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.PLOS ONE, 17(8):e0273293, August 2022

    Aleksandr Medvedev, Satyarth Mishra Sharma, Evgenii Tsatsorin, Elena Nabieva, and Dmitry Yarotsky. Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.PLOS ONE, 17(8):e0273293, August 2022

  52. [52]

    Kang, Peter Kraft, Liming Liang, Qi Sun, Paul W

    Jordi Merino, Marta Guasch-Ferr´ e, Jun Li, Wonil Chung, Yang Hu, Baoshan Ma, Yanping Li, Jae H. Kang, Peter Kraft, Liming Liang, Qi Sun, Paul W. Franks, JoAnn E. Manson, Walter C. Willet, Jose C. Florez, and Frank B. Hu. Polygenic scores, diet quality, and type 2 diabetes risk: An observational study among 35, 759 adults from 3 US cohorts. PLOS Medicine,...

  53. [53]

    A machine learning method to identify genetic variants potentially associated with alzheimer’s disease.Frontiers in Genetics, 12, June 2021

    Bradley Monk, Andrei Rajkovic, Semar Petrus, Aleks Rajkovic, Terry Gaasterland, and Roberto Malinow. A machine learning method to identify genetic variants potentially associated with alzheimer’s disease.Frontiers in Genetics, 12, June 2021

  54. [54]

    Transfer learning for genotype–phenotype prediction using deep learning models.BMC Bioinformatics, 23(1), November 2022

    Muhammad Muneeb, Samuel Feng, and Andreas Henschel. Transfer learning for genotype–phenotype prediction using deep learning models.BMC Bioinformatics, 23(1), November 2022

  55. [55]

    Feng, and Andreas Henschel

    Muhammad Muneeb, Samuel F. Feng, and Andreas Henschel. Can we convert genotype sequences into images for cases/controls classification?Frontiers in Bioinformatics, 2, June 2022

  56. [56]

    Feng, and Andreas Henschel

    Muhammad Muneeb, Samuel F. Feng, and Andreas Henschel. Heritability, genetic variation, and the number of risk SNPs effect on deep learning and polygenic risk scores AUC. In2022 14th International Conference on Bioinformatics and Biomedical Technology. ACM, May 2022

  57. [57]

    Feng, and Andreas Henschel

    Muhammad Muneeb, Samuel F. Feng, and Andreas Henschel. Tutorial on 8 genotype files conversion. In 2022 10th International Conference on Bioinformatics and Computational Biology (ICBCB), pages 13–17, 2022

  58. [58]

    Eye-color and type-2 diabetes phenotype prediction from genotype data using deep learning methods.BMC Bioinformatics, 22(1), April 2021

    Muhammad Muneeb and Andreas Henschel. Eye-color and type-2 diabetes phenotype prediction from genotype data using deep learning methods.BMC Bioinformatics, 22(1), April 2021

  59. [59]

    Phenotype prediction from genome- wide genotyping data: a crowdsourcing experiment.bioRxiv, August 2020

    Olivier Naret, David AA Baranger, Sharada Prasanna Mohanty, Bastian Greshake Tzovaras, Marcel Salath´ e, and Jacques Fellay and. Phenotype prediction from genome- wide genotyping data: a crowdsourcing experiment.bioRxiv, August 2020

  60. [60]

    Regularized machine learning in the genetic prediction of complex traits

    Sebastian Okser, Tapio Pahikkala, Antti Airola, Tapio Salakoski, Samuli Ripatti, and Tero Aittokallio. Regularized machine learning in the genetic prediction of complex traits. PLoS genetics, 10(11):e1004754, 2014

  61. [61]

    O’Sullivan, Sridharan Raghavan, Carla Marquez- Luna, Jasmine A

    Jack W. O’Sullivan, Sridharan Raghavan, Carla Marquez- Luna, Jasmine A. Luzum, Scott M. Damrauer, Euan A. Ashley, Christopher J. O’Donnell, Cristen J. Willer, and Pradeep Natarajan and. Polygenic risk scores for cardiovascular disease: A scientific statement from the american heart association.Circulation, 146(8), August 2022

  62. [62]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

  63. [63]

    A guide for using deep learning for complex trait genomic prediction.Genes, 10(7):553, July 2019

    P´ erez-Enciso and Zingaretti. A guide for using deep learning for complex trait genomic prediction.Genes, 10(7):553, July 2019

  64. [64]

    Douglas E. V. Pires, Jing Chen, Tom L. Blundell, and David B. Ascher. In silico functional dissection of saturation mutagenesis: Interpreting the relationship between phenotypes and changes in protein stability, interactions and activity.Scientific Reports, 6(1), January 2016

  65. [65]

    Recurrent neural networks for sequential phenotype prediction in genomics

    Farhad Pouladi, Hojjat Salehinejad, and Amir Mohammad Gilani. Recurrent neural networks for sequential phenotype prediction in genomics. In2015 International Conference on Developments of E-Systems Engineering (DeSE). IEEE, December 2015

  66. [66]

    To clean or not to clean phenotypic datasets for outlier plants in genetic analyses?Journal of Experimental Botany, 70(15):3693–3698, April 2019

    Santiago Alvarez Prado, Isabelle Sanchez, Lloren¸ c Cabrera- Bosquet, Antonin Grau, Claude Welcker, Fran¸ cois Tardieu, and Nadine Hilgert. To clean or not to clean phenotypic datasets for outlier plants in genetic analyses?Journal of Experimental Botany, 70(15):3693–3698, April 2019

  67. [67]

    Vilhj´ almsson, Hugues Aschard, and Michael G.B

    Florian Priv´ e, Bjarni J. Vilhj´ almsson, Hugues Aschard, and Michael G.B. Blum. Making the most of clumping and thresholding for polygenic scores.The American Journal of Human Genetics, 105(6):1213–1221, December 2019

  68. [68]

    Ferreira, David Bender, Julian Maller, Pamela Sklar, Paul I.W

    Shaun Purcell, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel A.R. Ferreira, David Bender, Julian Maller, Pamela Sklar, Paul I.W. de Bakker, Mark J. Daly, and Pak C. Sham. PLINK: A tool set for whole-genome association and population-based linkage analyses.The American Journal of Human Genetics, 81(3):559–575, September 2007

  69. [69]

    Minimum sample sizes for invasion genomics: Empirical investigation in an invasive whitefly.Ecology and Evolution, 10(1):38–49, October 2019

    Wan-Mei Qu, Ni Liang, Zi-Ku Wu, You-Gang Zhao, and Dong Chu. Minimum sample sizes for invasion genomics: Empirical investigation in an invasive whitefly.Ecology and Evolution, 10(1):38–49, October 2019. Genomics, Proteomics & Bioinformatics, 2026, Volume XX, Issue x 11

  70. [70]

    Editorial: Towards genome interpretation: Computational methods to model the genotype-phenotype relationship.Frontiers in Bioinformatics, 2, November 2022

    Daniele Raimondi, Gabriele Orlando, Nora Verplaetse, Piero Fariselli, and Yves Moreau. Editorial: Towards genome interpretation: Computational methods to model the genotype-phenotype relationship.Frontiers in Bioinformatics, 2, November 2022

  71. [71]

    Rajesh, X

    G. Rajesh, X. Mercilin Raajini, K. Martin Sagayam, and Hien Dang. A statistical approach for high order epistasis interaction detection for prediction of diabetic macular edema.Informatics in Medicine Unlocked, 20:100362, 2020

  72. [72]

    O’Reilly, and Jonna Kuntsi

    Ebba Du Rietz, Jonathan Coleman, Kylie Glanville, Shing Wan Choi, Paul F. O’Reilly, and Jonna Kuntsi. Association of polygenic risk for attention- deficit/hyperactivity disorder with co-occurring traits and disorders.Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(7):635–643, July 2018

  73. [73]

    A novel computational methodology for GWAS multi-locus analysis based on graph theory and machine learning.medRxiv, October 2021

    Subrata Saha, Himanshu Narayan Singh, Ahmed Soliman, and Sanguthevar Rajasekaran. A novel computational methodology for GWAS multi-locus analysis based on graph theory and machine learning.medRxiv, October 2021

  74. [74]

    Schuster and K.K

    M. Schuster and K.K. Paliwal. Bidirectional recurrent neural networks.IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997

  75. [75]

    Predicting phenotypes from novel genomic markers using deep learning.Bioinformatics Advances, 3(1), January 2023

    Shivani Sehrawat, Keyhan Najafian, and Lingling Jin. Predicting phenotypes from novel genomic markers using deep learning.Bioinformatics Advances, 3(1), January 2023

  76. [76]

    Shaffer, E

    J.R. Shaffer, E. Feingold, and M.L. Marazita. Genome-wide association studies.Journal of Dental Research, 91(7):637– 641, May 2012

  77. [77]

    Johnathon Shook, Tryambak Gangopadhyay, Linjiang Wu, Baskar Ganapathysubramanian, Soumik Sarkar, and Asheesh K. Singh. Crop yield prediction integrating genotype and weather variables using deep learning.PLOS ONE, 16(6):e0252402, June 2021

  78. [78]

    Silva, Joverlyn D

    Princess P. Silva, Joverlyn D. Gaudillo, Julianne A. Vilela, Ranzivelle Marianne L. Roxas-Villanueva, Beatrice J. Tiangco, Mario R. Domingo, and Jason R. Albia. A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci.Scientific Reports, 12(1), September 2022

  79. [79]

    Machine learning for high- throughput stress phenotyping in plants.Trends in Plant Science, 21(2):110–124, February 2016

    Arti Singh, Baskar Ganapathysubramanian, Asheesh Kumar Singh, and Soumik Sarkar. Machine learning for high- throughput stress phenotyping in plants.Trends in Plant Science, 21(2):110–124, February 2016

  80. [80]

    Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data

    Parvathaneni Naga Srinivasu, Jana Shafi, T Balamurali Krishna, Canavoy Narahari Sujatha, S Phani Praveen, and Muhammad Fazal Ijaz. Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data. Diagnostics, 12(12):3067, December 2022

Showing first 80 references.