plmmr: an R package to fit penalized linear mixed models for genome-wide association data with complex correlation structure

Anna C. Reisetter; Oscar A. Rysavy; Patrick J. Breheny; Tabitha K. Peter; Yujing Lu

arxiv: 2502.01577 · v1 · submitted 2025-02-03 · 📊 stat.CO

plmmr: an R package to fit penalized linear mixed models for genome-wide association data with complex correlation structure

Tabitha K. Peter , Anna C. Reisetter , Yujing Lu , Oscar A. Rysavy , Patrick J. Breheny This is my paper

Pith reviewed 2026-05-23 04:21 UTC · model grok-4.3

classification 📊 stat.CO

keywords plmmrpenalized linear mixed modelsGWAScorrelation structurebest linear unbiased predictormemory-mappingR packagegenome-wide association

0 comments

The pith

plmmr fits penalized linear mixed models to GWAS data by estimating correlations among observations to improve BLUP predictions while using memory-mapping for data larger than RAM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents plmmr, an R package for fitting penalized linear mixed models to high-dimensional genome-wide association data. It estimates the correlation structure among observations directly from the data and incorporates those estimates into the best linear unbiased predictor to refine predictions. The package relies on memory-mapping and file-backing so that genome-scale matrices can be analyzed on ordinary computers even when they exceed available RAM. A reader would care because correlations frequently confound regression in genetic studies, and the tool makes penalized mixed-model analysis practical without specialized hardware. The manuscript describes the underlying methods, workflow, and two real-data examples to illustrate its use.

Core claim

plmmr implements penalized linear mixed models that estimate correlation among observations in high-dimensional data and use those estimates to improve prediction with the best linear unbiased predictor, supported by memory-mapping that allows genome-scale data to be analyzed on ordinary machines even when the data size exceeds RAM, as demonstrated through the package's methods, workflow, file-backing approach, and examples from real GWAS data.

What carries the argument

Penalized linear mixed model that estimates the correlation structure and applies the best linear unbiased predictor, combined with memory-mapping for file-backed storage of large genomic matrices.

If this is right

Genome-scale datasets can be analyzed on standard laptops or desktops without requiring data to fit entirely in RAM.
Prediction in GWAS settings improves when the estimated correlation among observations is incorporated via the best linear unbiased predictor.
Users gain a complete workflow in R for fitting these models with file-backing and memory-mapping.
The approach handles complex correlation structures that arise in real genome-wide association studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same memory-mapping strategy could be applied to other high-dimensional regression problems outside genetics where correlation among samples matters.
Direct comparisons of out-of-sample prediction error between plmmr and ordinary penalized regression on held-out GWAS cohorts would quantify the practical gain from the correlation adjustment.
The package's file-backing layer might integrate with existing genomic data pipelines to reduce preprocessing steps for very large cohorts.

Load-bearing premise

The correlation structure estimated from the data is accurate and stable enough that feeding it into the best linear unbiased predictor produces meaningful gains in the penalized setting.

What would settle it

A simulation or real GWAS analysis in which predictions from the plmmr model show no improvement or perform worse than a standard penalized regression that ignores the estimated correlations.

Figures

Figures reproduced from arXiv: 2502.01577 by Anna C. Reisetter, Oscar A. Rysavy, Patrick J. Breheny, Tabitha K. Peter, Yujing Lu.

**Figure 2.** Figure 2: Total pipeline time 11 [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Time spent in each stage of pipeline 3.2 Orofacial clefting GWAS To illustrate plmmr at work with more complex correlation structures, we used the plmmr pipleline to analyze data from the Pittsburgh Orofacial Cleft (POFC) study [Marazita and Weinberg, 2024] as our second example. The POFC study was a global, family-based GWAS in which the phenotype of focus was orofacial cleft (e.g., cleft palate). The GWA… view at source ↗

**Figure 4.** Figure 4: Plot of coefficient paths, POFC data [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Plot of cross-validation error, POFC data [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

Correlation among the observations in high-dimensional regression modeling can be a major source of confounding. We present a new open-source package, plmmr, to implement penalized linear mixed models in R. This R package estimates correlation among observations in high-dimensional data and uses those estimates to improve prediction with the best linear unbiased predictor. The package uses memory-mapping so that genome-scale data can be analyzed on ordinary machines even if the size of data exceeds RAM. We present here the methods, workflow, and file-backing approach upon which plmmr is built, and we demonstrate its computational capabilities with two examples from real GWAS data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

plmmr is a practical R package that makes penalized LMMs runnable on large GWAS data via file-backing, but the statistical core is standard implementation rather than new work.

read the letter

The main thing to know is that this paper ships plmmr, an R package for penalized linear mixed models aimed at GWAS with correlated observations. It estimates the correlation structure, folds it into the penalized objective via BLUP, and uses memory-mapping so the data can exceed RAM on ordinary hardware. The two real-data examples show it can process genome-scale matrices without crashing or needing special servers, which solves a real engineering pain point for practitioners.

Referee Report

0 major / 2 minor

Summary. The manuscript presents plmmr, an R package for fitting penalized linear mixed models to genome-wide association data. It estimates correlation structures in high-dimensional data and incorporates the best linear unbiased predictor (BLUP) to enhance prediction accuracy within a penalized regression framework. The package utilizes memory-mapping to enable analysis of genome-scale datasets on standard hardware without requiring data to fit in RAM. The paper details the underlying methods, workflow, and file-backing approach, and illustrates the package's capabilities through examples on real GWAS datasets.

Significance. This work provides a practical software tool that addresses challenges in statistical genetics involving correlated observations. By combining penalized LMMs with BLUP and scalable data handling, plmmr could facilitate more robust analyses in GWAS where ignoring correlations might lead to biased results. The open-source implementation and demonstration on real data are strengths that support its potential utility in the field.

minor comments (2)

[Abstract] Abstract: the claim that the package 'uses those estimates to improve prediction with the best linear unbiased predictor' is central but the provided text gives no quantitative metrics (e.g., prediction error reduction or cross-validation scores) from the two real-data examples; adding a concise summary of these results would strengthen the abstract without altering scope.
The workflow description would benefit from an explicit algorithmic outline (e.g., steps for correlation estimation followed by penalized objective with BLUP adjustment) to make the integration of standard methods fully transparent for users.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the package's utility for correlated GWAS data, and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; software implementation of established methods

full rationale

The manuscript describes the plmmr R package for penalized linear mixed models on GWAS data, focusing on correlation estimation, BLUP-based prediction, and file-backed matrix handling for scale. No derivation chain is presented that reduces any claimed prediction or result to its own inputs by construction. The workflow follows standard LMM theory without self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations for uniqueness. The paper is self-contained as an implementation description with runtime examples on real data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The package rests on standard assumptions of linear mixed models and penalized regression; no new free parameters, axioms, or invented entities are introduced beyond the software implementation itself.

axioms (1)

domain assumption Linear mixed model assumptions hold for the correlation structure in GWAS data
Invoked in the description of estimating correlations to improve BLUP predictions

pith-pipeline@v0.9.0 · 5654 in / 1133 out tokens · 38865 ms · 2026-05-23T04:21:16.055430+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

preconditioning ... Σ^{-1/2}y ∼ N((Σ^{-1/2}X)β, I) ... penalized regression approaches such as lasso ... may be applied

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Identifying large sets of unrelated individuals and unrelated markers

Kuruvilla Joseph Abraham and Clara Diaz. Identifying large sets of unrelated individuals and unrelated markers. Source code for biology and medicine, 9: 0 1--8, 2014

work page 2014
[2]

Matrix: Sparse and Dense Matrix Classes and Methods, 2024

Douglas Bates, Martin Maechler, and Mikael Jagan. Matrix: Sparse and Dense Matrix Classes and Methods, 2024. URL https://CRAN.R-project.org/package=Matrix. R package version 1.7-0

work page 2024
[3]

Genetic factors influencing risk to orofacial clefts: today’s challenges and tomorrow’s opportunities

Terri H Beaty, Mary L Marazita, and Elizabeth J Leslie. Genetic factors influencing risk to orofacial clefts: today’s challenges and tomorrow’s opportunities. F1000Research, 5, 2016

work page 2016
[4]

Family-based genome-wide association studies

Beben Benyamin, Peter M Visscher, and Allan F McRae. Family-based genome-wide association studies. Pharmacogenomics, 10 0 (2): 0 181--190, 2009

work page 2009
[5]

Kane, John Emerson, and Stephen Weston

Frederic Bertrand, Michael J. Kane, John Emerson, and Stephen Weston. 'BLAS' and 'LAPACK' Routines for Native R Matrices and 'big.matrix' Objects, 2024. URL https://fbertran.github.io/bigalgebra/. R package version 1.1.2

work page 2024
[6]

Simultaneous snp selection and adjustment for population structure in high dimensional prediction models

Sahir R Bhatnagar, Yi Yang, Tianyuan Lu, Erwin Schurr, JC Loredo-Osti, Marie Forest, Karim Oualkacha, and Celia MT Greenwood. Simultaneous snp selection and adjustment for population structure in high dimensional prediction models. PLoS genetics, 16 0 (5): 0 e1008766, 2020

work page 2020
[7]

Spectral deconfounding via perturbed sparse linear models

Domagoj \'C evid, Peter B \"u hlmann, and Nicolai Meinshausen. Spectral deconfounding via perturbed sparse linear models. Journal of Machine Learning Research, 21 0 (232): 0 1--41, 2020

work page 2020
[8]

Variable selection via nonconcave penalized likelihood and its oracle properties

Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96 0 (456): 0 1348--1360, 2001

work page 2001
[9]

Efficient algorithms for finding maximum matching in graphs

Zvi Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys (CSUR), 18 0 (1): 0 23--38, 1986

work page 1986
[10]

Gorstein, R

E. Gorstein, R. Aghdam, and C. Sol' i s-Lemus. HighDimMixedModels.jl: Robust High Dimensional Mixed Models across Omics Data . In preparation, 2024

work page 2024
[11]

The elements of statistical learning: data mining, inference, and prediction, volume 2

Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009

work page 2009
[12]

Heiling, Naim U

Hillary M. Heiling, Naim U. Rashid, Quefeng Li, and Joseph G. Ibrahim. glmmpen: High dimensional penalized generalized linear mixed models. The R Journal, 15: 0 106--128, 2024. ISSN 2073-4859. doi:10.32614/RJ-2023-086. https://doi.org/10.32614/RJ-2023-086

work page doi:10.32614/rj-2023-086 2024
[13]

Preconditioning the lasso for sign consistency

Jinzhu Jia and Karl Rohe. Preconditioning the lasso for sign consistency. Electronic Journal of Statistics, 9 0 (1): 0 1150--1172, 2015. doi:10.1214/15-EJS1029

work page doi:10.1214/15-ejs1029 2015
[14]

A resource-efficient tool for mixed model association analysis of large-scale data

Longda Jiang, Zhili Zheng, Ting Qi, Kathryn E Kemper, Naomi R Wray, Peter M Visscher, and Jian Yang. A resource-efficient tool for mixed model association analysis of large-scale data. Nature genetics, 51 0 (12): 0 1749--1755, 2019

work page 2019
[15]

Kane, John W

Michael J. Kane, John W. Emerson, and Stephen Weston. Scalable strategies for computing with massive data. Journal of Statistical Software, 55 0 (14): 0 1--19, 2013. URL https://www.jstatsoft.org/article/view/v055i14

work page 2013
[16]

J. T. Leek and J. D. Storey. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics, 3 0 (9): 0 e161, 2007. doi:10.1371/journal.pgen.0030161

work page doi:10.1371/journal.pgen.0030161 2007
[17]

Leek, Robert B

Jeffrey T. Leek, Robert B. Scharpf, H \'e ctor Corrada Bravo, David Simcha, Benjamin Langmead, W. Evan Johnson, Donald Geman, Keith Baggerly, and Rafael A. Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11 0 (10): 0 733--739, October 2010. ISSN 1471-0056. doi:10.1038/nrg2825

work page doi:10.1038/nrg2825 2010
[18]

Leslie, D.C

E.J. Leslie, D.C. Koboldt, C.J. Kang, L. Ma, J.T. Hecht, G.L. Wehby, K. Christensen, A.E. Czeizel, F.W.-B. Deleyiannis, R.S. Fulton, R.K. Wilson, T.H. Beaty, B.C. Schutte, J.C. Murray, and M.L. Marazita. IRF 6mutation screening in non-syndromic orofacial clefting: analysis of 1521 families. Clinical Genetics, 90 0 (1): 0 28--34, oct 2015 a . doi:10.1111/cge.12675

work page doi:10.1111/cge.12675 2015
[19]

Genetics of cleft lip and cleft palate

Elizabeth J Leslie and Mary L Marazita. Genetics of cleft lip and cleft palate. American Journal of Medical Genetics Part C: Seminars in Medical Genetics, 163 0 (4): 0 246--258, 2013. doi:https://doi.org/10.1002/ajmg.c.31381

work page doi:10.1002/ajmg.c.31381 2013
[20]

Leslie, Margaret A

Elizabeth J. Leslie, Margaret A. Taub, Huan Liu, Karyn Meltz Steinberg, Daniel C. Koboldt, Qunyuan Zhang, Jenna C. Carlson, Jacqueline B. Hetmanski, Hang Wang, David E. Larson, Robert S. Fulton, Youssef A. Kousa, Walid D. Fakhouri, Ali Naji, Ingo Ruczinski, Ferdouse Begum, Margaret M. Parker, Tamara Busch, Jennifer Standley, Jennifer Rigdon, Jacqueline T....

work page doi:10.1016/j.ajhg.2015.01.004 2015
[21]

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Xihao Li, Zilin Li, Hufeng Zhou, Sheila M Gaynor, Yaowu Liu, Han Chen, Ryan Sun, Rounak Dey, Donna K Arnett, Stella Aslibekyan, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature genetics, 52 0 (9): 0 969--983, 2020

work page 2020
[22]

Efficient bayesian mixed-model analysis increases association power in large cohorts

Po-Ru Loh, George Tucker, Brendan K Bulik-Sullivan, Bjarni J Vilhjalmsson, Hilary K Finucane, Rany M Salem, Daniel I Chasman, Paul M Ridker, Benjamin M Neale, Bonnie Berger, et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nature genetics, 47 0 (3): 0 284--290, 2015

work page 2015
[23]

Pittsburgh orofacial cleft studies, September 2024

Mary Marazita and Seth Weinberg. Pittsburgh orofacial cleft studies, September 2024. URL https://www.dental.pitt.edu/research/ccdg/participate-research/pittsburgh-orofacial-cleft-studies. Center for Craniofacial and Dental Genetics, University of Pittsburgh. Website

work page 2024
[24]

Computationally efficient whole-genome regression for quantitative and binary traits

Joelle Mbatchou, Leland Barnard, Joshua Backman, Anthony Marcketta, Jack A Kosmicki, Andrey Ziyatdinov, Christian Benner, Colm O’Dushlaine, Mathew Barber, Boris Boutkov, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nature genetics, 53 0 (7): 0 1097--1103, 2021

work page 2021
[25]

The gwas diversity monitor tracks diversity by disease in real time

Melinda C Mills and Charles Rahal. The gwas diversity monitor tracks diversity by disease in real time. Nature genetics, 52 0 (3): 0 242--243, 2020

work page 2020
[26]

Principal components analysis corrects for stratification in genome-wide association studies

Alkes L Price, Nick J Patterson, Robert M Plenge, Michael E Weinblatt, Nancy A Shadick, and David Reich. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38 0 (8): 0 904--909, 2006

work page 2006
[27]

Efficient analysis of large-scale genome-wide data with two r packages: bigstatsr and bigsnpr

Florian Priv \'e , Hugues Aschard, Andrey Ziyatdinov, and Michael GB Blum. Efficient analysis of large-scale genome-wide data with two r packages: bigstatsr and bigsnpr. Bioinformatics, 34 0 (16): 0 2781--2787, 2018

work page 2018
[28]

Plink: a tool set for whole-genome association and population-based linkage analyses

Shaun Purcell, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel AR Ferreira, David Bender, Julian Maller, Pamela Sklar, Paul IW De Bakker, Mark J Daly, et al. Plink: a tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics, 81 0 (3): 0 559--575, 2007

work page 2007
[29]

R: A Language and Environment for Statistical Computing

R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024. URL https://www.R-project.org/

work page 2024
[30]

Cross-validation for correlated data

Assaf Rabinowicz and Saharon Rosset. Cross-validation for correlated data. Journal of the American Statistical Association, 117 0 (538): 0 718--731, 2022

work page 2022
[31]

A lasso multi-marker mixed model for association mapping with population structure correction

Barbara Rakitsch, Christoph Lippert, Oliver Stegle, and Karsten Borgwardt. A lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics, 29 0 (2): 0 206--214, 2013

work page 2013
[32]

Muredach P Reilly, Mingyao Li, Jing He, Jane F Ferguson, Ioannis M Stylianou, Nehal N Mehta, Mary Susan Burnett, Joseph M Devaney, Christopher W Knouff, John R Thompson, et al. Identification of adamts7 as a novel locus for coronary atherosclerosis and association of abo with myocardial infarction in the presence of coronary atherosclerosis: two genome-wi...

work page 2011
[33]

G. K. Robinson. That BLUP is a good thing: The estimation of random effects. Statistical Science, 6 0 (1): 0 15--32, 1991

work page 1991
[34]

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data

Julien St-Pierre, Karim Oualkacha, and Sahir Rai Bhatnagar. Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data. Bioinformatics, 39 0 (2): 0 btad063, 2023

work page 2023
[35]

Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis

Jeffrey Staples, Deborah A Nickerson, and Jennifer E Below. Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis. Genetic epidemiology, 37 0 (2): 0 136--141, 2013

work page 2013
[36]

The estimation of genetic relationships using molecular markers and their efficiency in estimating heritability in natural populations

Stuart C Thomas. The estimation of genetic relationships using molecular markers and their efficiency in estimating heritability in natural populations. Philosophical Transactions of the Royal Society B: Biological Sciences, 360 0 (1459): 0 1457--1467, 2005

work page 2005
[37]

Regression shrinkage and selection via the lasso

Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58 0 (1): 0 267--288, 1996

work page 1996
[38]

Genetic algorithm for the personnel assignment problem with multiple objectives

Ismail H Toroslu and Yilmaz Arslanoglu. Genetic algorithm for the personnel assignment problem with multiple objectives. Information Sciences, 177 0 (3): 0 787--803, 2007

work page 2007
[39]

Preconditioning

Andrew J Wathen. Preconditioning. Acta Numerica, 24: 0 329--376, 2015

work page 2015
[40]

The biglasso package: A memory- and computation-efficient solver for lasso model fitting with big data in r

Yaohui Zeng and Patrick Breheny. The biglasso package: A memory- and computation-efficient solver for lasso model fitting with big data in r. R Journal, 12 0 (2): 0 6--19, 2021. URL https://doi.org/10.32614/RJ-2021-001

work page doi:10.32614/rj-2021-001 2021
[41]

C. H. Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38 0 (2): 0 894--942, 2010

work page 2010
[42]

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

Wei Zhou, Jonas B Nielsen, Lars G Fritsche, Rounak Dey, Maiken E Gabrielsen, Brooke N Wolford, Jonathon LeFaive, Peter VandeHaar, Sarah A Gagliano, Aliya Gifford, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nature genetics, 50 0 (9): 0 1335--1341, 2018

work page 2018

[1] [1]

Identifying large sets of unrelated individuals and unrelated markers

Kuruvilla Joseph Abraham and Clara Diaz. Identifying large sets of unrelated individuals and unrelated markers. Source code for biology and medicine, 9: 0 1--8, 2014

work page 2014

[2] [2]

Matrix: Sparse and Dense Matrix Classes and Methods, 2024

Douglas Bates, Martin Maechler, and Mikael Jagan. Matrix: Sparse and Dense Matrix Classes and Methods, 2024. URL https://CRAN.R-project.org/package=Matrix. R package version 1.7-0

work page 2024

[3] [3]

Genetic factors influencing risk to orofacial clefts: today’s challenges and tomorrow’s opportunities

Terri H Beaty, Mary L Marazita, and Elizabeth J Leslie. Genetic factors influencing risk to orofacial clefts: today’s challenges and tomorrow’s opportunities. F1000Research, 5, 2016

work page 2016

[4] [4]

Family-based genome-wide association studies

Beben Benyamin, Peter M Visscher, and Allan F McRae. Family-based genome-wide association studies. Pharmacogenomics, 10 0 (2): 0 181--190, 2009

work page 2009

[5] [5]

Kane, John Emerson, and Stephen Weston

Frederic Bertrand, Michael J. Kane, John Emerson, and Stephen Weston. 'BLAS' and 'LAPACK' Routines for Native R Matrices and 'big.matrix' Objects, 2024. URL https://fbertran.github.io/bigalgebra/. R package version 1.1.2

work page 2024

[6] [6]

Simultaneous snp selection and adjustment for population structure in high dimensional prediction models

Sahir R Bhatnagar, Yi Yang, Tianyuan Lu, Erwin Schurr, JC Loredo-Osti, Marie Forest, Karim Oualkacha, and Celia MT Greenwood. Simultaneous snp selection and adjustment for population structure in high dimensional prediction models. PLoS genetics, 16 0 (5): 0 e1008766, 2020

work page 2020

[7] [7]

Spectral deconfounding via perturbed sparse linear models

Domagoj \'C evid, Peter B \"u hlmann, and Nicolai Meinshausen. Spectral deconfounding via perturbed sparse linear models. Journal of Machine Learning Research, 21 0 (232): 0 1--41, 2020

work page 2020

[8] [8]

Variable selection via nonconcave penalized likelihood and its oracle properties

Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96 0 (456): 0 1348--1360, 2001

work page 2001

[9] [9]

Efficient algorithms for finding maximum matching in graphs

Zvi Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys (CSUR), 18 0 (1): 0 23--38, 1986

work page 1986

[10] [10]

Gorstein, R

E. Gorstein, R. Aghdam, and C. Sol' i s-Lemus. HighDimMixedModels.jl: Robust High Dimensional Mixed Models across Omics Data . In preparation, 2024

work page 2024

[11] [11]

The elements of statistical learning: data mining, inference, and prediction, volume 2

Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009

work page 2009

[12] [12]

Heiling, Naim U

Hillary M. Heiling, Naim U. Rashid, Quefeng Li, and Joseph G. Ibrahim. glmmpen: High dimensional penalized generalized linear mixed models. The R Journal, 15: 0 106--128, 2024. ISSN 2073-4859. doi:10.32614/RJ-2023-086. https://doi.org/10.32614/RJ-2023-086

work page doi:10.32614/rj-2023-086 2024

[13] [13]

Preconditioning the lasso for sign consistency

Jinzhu Jia and Karl Rohe. Preconditioning the lasso for sign consistency. Electronic Journal of Statistics, 9 0 (1): 0 1150--1172, 2015. doi:10.1214/15-EJS1029

work page doi:10.1214/15-ejs1029 2015

[14] [14]

A resource-efficient tool for mixed model association analysis of large-scale data

Longda Jiang, Zhili Zheng, Ting Qi, Kathryn E Kemper, Naomi R Wray, Peter M Visscher, and Jian Yang. A resource-efficient tool for mixed model association analysis of large-scale data. Nature genetics, 51 0 (12): 0 1749--1755, 2019

work page 2019

[15] [15]

Kane, John W

Michael J. Kane, John W. Emerson, and Stephen Weston. Scalable strategies for computing with massive data. Journal of Statistical Software, 55 0 (14): 0 1--19, 2013. URL https://www.jstatsoft.org/article/view/v055i14

work page 2013

[16] [16]

J. T. Leek and J. D. Storey. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics, 3 0 (9): 0 e161, 2007. doi:10.1371/journal.pgen.0030161

work page doi:10.1371/journal.pgen.0030161 2007

[17] [17]

Leek, Robert B

Jeffrey T. Leek, Robert B. Scharpf, H \'e ctor Corrada Bravo, David Simcha, Benjamin Langmead, W. Evan Johnson, Donald Geman, Keith Baggerly, and Rafael A. Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11 0 (10): 0 733--739, October 2010. ISSN 1471-0056. doi:10.1038/nrg2825

work page doi:10.1038/nrg2825 2010

[18] [18]

Leslie, D.C

E.J. Leslie, D.C. Koboldt, C.J. Kang, L. Ma, J.T. Hecht, G.L. Wehby, K. Christensen, A.E. Czeizel, F.W.-B. Deleyiannis, R.S. Fulton, R.K. Wilson, T.H. Beaty, B.C. Schutte, J.C. Murray, and M.L. Marazita. IRF 6mutation screening in non-syndromic orofacial clefting: analysis of 1521 families. Clinical Genetics, 90 0 (1): 0 28--34, oct 2015 a . doi:10.1111/cge.12675

work page doi:10.1111/cge.12675 2015

[19] [19]

Genetics of cleft lip and cleft palate

Elizabeth J Leslie and Mary L Marazita. Genetics of cleft lip and cleft palate. American Journal of Medical Genetics Part C: Seminars in Medical Genetics, 163 0 (4): 0 246--258, 2013. doi:https://doi.org/10.1002/ajmg.c.31381

work page doi:10.1002/ajmg.c.31381 2013

[20] [20]

Leslie, Margaret A

Elizabeth J. Leslie, Margaret A. Taub, Huan Liu, Karyn Meltz Steinberg, Daniel C. Koboldt, Qunyuan Zhang, Jenna C. Carlson, Jacqueline B. Hetmanski, Hang Wang, David E. Larson, Robert S. Fulton, Youssef A. Kousa, Walid D. Fakhouri, Ali Naji, Ingo Ruczinski, Ferdouse Begum, Margaret M. Parker, Tamara Busch, Jennifer Standley, Jennifer Rigdon, Jacqueline T....

work page doi:10.1016/j.ajhg.2015.01.004 2015

[21] [21]

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Xihao Li, Zilin Li, Hufeng Zhou, Sheila M Gaynor, Yaowu Liu, Han Chen, Ryan Sun, Rounak Dey, Donna K Arnett, Stella Aslibekyan, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature genetics, 52 0 (9): 0 969--983, 2020

work page 2020

[22] [22]

Efficient bayesian mixed-model analysis increases association power in large cohorts

Po-Ru Loh, George Tucker, Brendan K Bulik-Sullivan, Bjarni J Vilhjalmsson, Hilary K Finucane, Rany M Salem, Daniel I Chasman, Paul M Ridker, Benjamin M Neale, Bonnie Berger, et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nature genetics, 47 0 (3): 0 284--290, 2015

work page 2015

[23] [23]

Pittsburgh orofacial cleft studies, September 2024

Mary Marazita and Seth Weinberg. Pittsburgh orofacial cleft studies, September 2024. URL https://www.dental.pitt.edu/research/ccdg/participate-research/pittsburgh-orofacial-cleft-studies. Center for Craniofacial and Dental Genetics, University of Pittsburgh. Website

work page 2024

[24] [24]

Computationally efficient whole-genome regression for quantitative and binary traits

Joelle Mbatchou, Leland Barnard, Joshua Backman, Anthony Marcketta, Jack A Kosmicki, Andrey Ziyatdinov, Christian Benner, Colm O’Dushlaine, Mathew Barber, Boris Boutkov, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nature genetics, 53 0 (7): 0 1097--1103, 2021

work page 2021

[25] [25]

The gwas diversity monitor tracks diversity by disease in real time

Melinda C Mills and Charles Rahal. The gwas diversity monitor tracks diversity by disease in real time. Nature genetics, 52 0 (3): 0 242--243, 2020

work page 2020

[26] [26]

Principal components analysis corrects for stratification in genome-wide association studies

Alkes L Price, Nick J Patterson, Robert M Plenge, Michael E Weinblatt, Nancy A Shadick, and David Reich. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38 0 (8): 0 904--909, 2006

work page 2006

[27] [27]

Efficient analysis of large-scale genome-wide data with two r packages: bigstatsr and bigsnpr

Florian Priv \'e , Hugues Aschard, Andrey Ziyatdinov, and Michael GB Blum. Efficient analysis of large-scale genome-wide data with two r packages: bigstatsr and bigsnpr. Bioinformatics, 34 0 (16): 0 2781--2787, 2018

work page 2018

[28] [28]

Plink: a tool set for whole-genome association and population-based linkage analyses

Shaun Purcell, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel AR Ferreira, David Bender, Julian Maller, Pamela Sklar, Paul IW De Bakker, Mark J Daly, et al. Plink: a tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics, 81 0 (3): 0 559--575, 2007

work page 2007

[29] [29]

R: A Language and Environment for Statistical Computing

R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024. URL https://www.R-project.org/

work page 2024

[30] [30]

Cross-validation for correlated data

Assaf Rabinowicz and Saharon Rosset. Cross-validation for correlated data. Journal of the American Statistical Association, 117 0 (538): 0 718--731, 2022

work page 2022

[31] [31]

A lasso multi-marker mixed model for association mapping with population structure correction

Barbara Rakitsch, Christoph Lippert, Oliver Stegle, and Karsten Borgwardt. A lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics, 29 0 (2): 0 206--214, 2013

work page 2013

[32] [32]

Muredach P Reilly, Mingyao Li, Jing He, Jane F Ferguson, Ioannis M Stylianou, Nehal N Mehta, Mary Susan Burnett, Joseph M Devaney, Christopher W Knouff, John R Thompson, et al. Identification of adamts7 as a novel locus for coronary atherosclerosis and association of abo with myocardial infarction in the presence of coronary atherosclerosis: two genome-wi...

work page 2011

[33] [33]

G. K. Robinson. That BLUP is a good thing: The estimation of random effects. Statistical Science, 6 0 (1): 0 15--32, 1991

work page 1991

[34] [34]

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data

Julien St-Pierre, Karim Oualkacha, and Sahir Rai Bhatnagar. Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data. Bioinformatics, 39 0 (2): 0 btad063, 2023

work page 2023

[35] [35]

Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis

Jeffrey Staples, Deborah A Nickerson, and Jennifer E Below. Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis. Genetic epidemiology, 37 0 (2): 0 136--141, 2013

work page 2013

[36] [36]

The estimation of genetic relationships using molecular markers and their efficiency in estimating heritability in natural populations

Stuart C Thomas. The estimation of genetic relationships using molecular markers and their efficiency in estimating heritability in natural populations. Philosophical Transactions of the Royal Society B: Biological Sciences, 360 0 (1459): 0 1457--1467, 2005

work page 2005

[37] [37]

Regression shrinkage and selection via the lasso

Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58 0 (1): 0 267--288, 1996

work page 1996

[38] [38]

Genetic algorithm for the personnel assignment problem with multiple objectives

Ismail H Toroslu and Yilmaz Arslanoglu. Genetic algorithm for the personnel assignment problem with multiple objectives. Information Sciences, 177 0 (3): 0 787--803, 2007

work page 2007

[39] [39]

Preconditioning

Andrew J Wathen. Preconditioning. Acta Numerica, 24: 0 329--376, 2015

work page 2015

[40] [40]

The biglasso package: A memory- and computation-efficient solver for lasso model fitting with big data in r

Yaohui Zeng and Patrick Breheny. The biglasso package: A memory- and computation-efficient solver for lasso model fitting with big data in r. R Journal, 12 0 (2): 0 6--19, 2021. URL https://doi.org/10.32614/RJ-2021-001

work page doi:10.32614/rj-2021-001 2021

[41] [41]

C. H. Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38 0 (2): 0 894--942, 2010

work page 2010

[42] [42]

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

Wei Zhou, Jonas B Nielsen, Lars G Fritsche, Rounak Dey, Maiken E Gabrielsen, Brooke N Wolford, Jonathon LeFaive, Peter VandeHaar, Sarah A Gagliano, Aliya Gifford, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nature genetics, 50 0 (9): 0 1335--1341, 2018

work page 2018