Tool Choice Matters: Evaluating edgeR vs. DESeq2 for Sensitivity, Robustness, and Cross-Study Performance

Mostafa Rezapour

arxiv: 2601.04122 · v2 · pith:LHMEZJBWnew · submitted 2026-01-07 · 🧬 q-bio.GN

Tool Choice Matters: Evaluating edgeR vs. DESeq2 for Sensitivity, Robustness, and Cross-Study Performance

Mostafa Rezapour This is my paper

Pith reviewed 2026-05-21 15:43 UTC · model grok-4.3

classification 🧬 q-bio.GN

keywords differential gene expressionedgeRDESeq2RNA-Seqcross-study validationclassification performancerobustnesstool comparison

0 comments

The pith

edgeR yields more robust and generalizable gene sets than DESeq2 for cross-study RNA-Seq classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests edgeR and DESeq2 on bulk RNA-Seq datasets from viral, bacterial, and fibrotic conditions to measure how tool choice shapes downstream results. It checks sensitivity to sample size and outliers, then trains classifiers on each tool's unique genes to see how well they separate samples inside the original data. The decisive test applies those same gene sets to four separate SARS-CoV-2 studies and tracks accuracy, precision, and recall. edgeR's genes produce higher and steadier performance across the held-out datasets, while DESeq2 tends to surface more genes yet delivers less reliable classification when moved to new studies. A reader cares because the choice directly affects which genes become candidates for biomarkers or follow-up experiments and whether those findings hold up beyond one lab.

Core claim

Using real and semi-simulated data, the study finds that gene sets identified only by edgeR deliver higher AUC, precision, and recall when used to classify samples in independent SARS-CoV-2 datasets, with some test cases reaching perfect separation. DESeq2-specific genes show lower and more variable performance across the same folds. Both tools respond similarly to added outliers, yet edgeR maintains classification performance closer to optimal across a larger share of contrasts.

What carries the argument

Cross-study validation of tool-specific differentially expressed gene sets through supervised classification on four held-out SARS-CoV-2 datasets.

If this is right

edgeR-specific genes produce higher F1 scores in nine of thirteen classification contrasts.
Dolan-More profiles show edgeR performance stays nearer the optimum across more datasets.
Cross-study replication of classification succeeds more consistently with edgeR-unique genes.
Jaccard overlap between DEG lists drops for both tools as more outliers are introduced.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Preference for edgeR could reduce wasted effort on non-replicable leads in biomarker studies.
A hybrid workflow that takes the intersection or union of both tools' outputs might balance sensitivity and robustness.
The pattern may extend to other high-throughput sequencing applications where generalizability matters more than raw count of discoveries.
Repeating the cross-study design on non-viral disease cohorts would test whether the advantage is context-specific.

Load-bearing premise

The four independent SARS-CoV-2 datasets are similar enough in experimental design and biological context for a fair head-to-head comparison of the two tools.

What would settle it

If gene sets found only by DESeq2 achieve equal or higher AUC, precision, and recall than edgeR-specific sets when classifying samples from new independent studies, the claim that edgeR produces more generalizable results would be refuted.

Figures

Figures reproduced from arXiv: 2601.04122 by Mostafa Rezapour.

**Figure 2.** Figure 2: Comparison of edgeR and DESeq2 across multiple biological contrasts. Panel (a) shows the log2-scaled number of uniquely identified upregulated and downregulated genes by each tool across 13 contrasts spanning viral, bacterial, and fibrotic conditions. Panel (b) displays the Jaccard index for upregulated and downregulated gene sets, indicating overlap between tools. Panel (c) shows Pearson and Spearman corr… view at source ↗

**Figure 3.** Figure 3: Classification performance of uniquely identified genes from [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Cross-study generalizability of uniquely significant genes from [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

Differential gene expression (DGE) analysis is foundational to transcriptomic research, yet tool selection can substantially influence results. This study presents a comprehensive comparison of two widely used DGE tools, edgeR and DESeq2, using real and semi-simulated bulk RNA-Seq datasets spanning viral, bacterial, and fibrotic conditions. We evaluated tool performance across three key dimensions: (1) sensitivity to sample size and robustness to outliers; (2) classification performance of uniquely identified gene sets within the discovery dataset; and (3) generalizability of tool-specific gene sets across independent studies. First, both tools showed similar responses to simulated outliers, with Jaccard similarity between the DEG sets from perturbed and original (unperturbed) data decreasing as more outliers were added. Second, classification models trained on tool-specific genes showed that edgeR achieved higher F1 scores in 9 of 13 contrasts and more frequently reached perfect or near-perfect precision. Dolan-More performance profiles further indicated that edgeR maintained performance closer to optimal across a greater proportion of datasets. Third, in cross-study validation using four independent SARS-CoV-2 datasets, gene sets uniquely identified by edgeR yielded higher AUC, precision, and recall in classifying samples from held-out datasets. This pattern was consistent across folds, with some test cases achieving perfect separation using edgeR-specific genes. In contrast, DESeq2-specific genes showed lower and more variable performance across studies. Overall, our findings highlight that while DESeq2 may identify more DEGs even under stringent significance conditions, edgeR yields more robust and generalizable gene sets for downstream classification and cross-study replication, which underscores key trade-offs in tool selection for transcriptomic analyses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EdgeR's unique genes show better cross-study classification performance than DESeq2's in these SARS-CoV-2 datasets, but the datasets' comparability is the unexamined assumption.

read the letter

This paper's main takeaway is that edgeR produces gene sets that classify better across independent SARS-CoV-2 studies, while DESeq2 tends to return more DEGs that do not transfer as cleanly. The cross-study validation on four datasets is the piece that feels freshest here. They also test both tools on real data from viral, bacterial, and fibrotic conditions plus semi-simulated outliers, then evaluate within-study classification with F1 scores and Dolan-More profiles. That gives a practical view of the sensitivity-robustness trade-off instead of just counting significant genes. Credit to them for noting that DESeq2 can still call more hits under strict thresholds. The setup is straightforward and the metrics are relevant for people who actually use these tools downstream. The soft spot is the cross-study claim. Treating the four SARS-CoV-2 datasets as interchangeable tests of generalizability only works if they are similar in tissue, platform, depth, and covariates. The abstract gives no metadata or matching details, so the performance gap could reflect tool differences in picking up study-specific signals rather than intrinsic robustness. The outlier simulation methods are also light on specifics, which makes the robustness results harder to judge. This is the sort of paper that matters to transcriptomics groups who have to pick a DGE tool for multi-cohort work. It is not foundational, but it adds concrete numbers on a choice people make every day. I would bring it to a methods reading group. It deserves peer review because the empirical question is clear and the data are real, though reviewers will need to check the dataset similarity and methods transparency.

Referee Report

1 major / 2 minor

Summary. The manuscript compares edgeR and DESeq2 for differential gene expression analysis on real and semi-simulated bulk RNA-Seq datasets spanning viral, bacterial, and fibrotic conditions. It assesses sensitivity to sample size and outliers, classification performance of tool-specific gene sets within discovery data, and generalizability of those gene sets across four independent SARS-CoV-2 studies. The central claim is that DESeq2 identifies more DEGs but edgeR yields more robust gene sets with higher F1 scores, AUC, precision, and recall in classification and cross-study validation.

Significance. If the results hold, the work would usefully inform tool choice in transcriptomics by documenting concrete trade-offs between sensitivity and robustness/generalizability. The study gains strength from combining real and semi-simulated data, multiple performance metrics (F1, Dolan-More profiles, AUC/precision/recall), and held-out cross-validation on independent datasets.

major comments (1)

Cross-study validation paragraph: the headline result that edgeR-unique gene sets achieve higher AUC, precision, and recall on held-out SARS-CoV-2 datasets rests on treating the four independent studies as interchangeable replicates. No metadata on tissue, sequencing depth, platform, or patient covariates, nor any matching procedure, is supplied to establish comparability; without this, performance differences could reflect dataset-specific technical or biological signals rather than intrinsic tool robustness. This assumption is load-bearing for the generalizability claim.

minor comments (2)

Abstract and methods: full details on how outliers were simulated (distribution, magnitude, number per sample) and the exact statistical tests used for robustness comparisons are not provided, making independent verification difficult.
Results section on classification: the statement that edgeR reached 'perfect or near-perfect precision' in 9 of 13 contrasts would be clearer if the exact precision values and the definition of 'near-perfect' were tabulated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript comparing edgeR and DESeq2. We address the major comment below and have made revisions to strengthen the presentation of the cross-study validation.

read point-by-point responses

Referee: Cross-study validation paragraph: the headline result that edgeR-unique gene sets achieve higher AUC, precision, and recall on held-out SARS-CoV-2 datasets rests on treating the four independent studies as interchangeable replicates. No metadata on tissue, sequencing depth, platform, or patient covariates, nor any matching procedure, is supplied to establish comparability; without this, performance differences could reflect dataset-specific technical or biological signals rather than intrinsic tool robustness. This assumption is load-bearing for the generalizability claim.

Authors: We agree that additional dataset metadata would improve transparency and help readers assess comparability. In the revised manuscript we will add a supplementary table summarizing available characteristics of the four SARS-CoV-2 studies (tissue source, sequencing platform, read depth, and sample size) drawn from their original publications. We note that all four datasets involve human samples from SARS-CoV-2 infected individuals and were used as independent held-out test sets for gene sets derived from a separate discovery contrast; the observed performance advantage for edgeR-unique genes was consistent across every fold and every test study. While we did not apply an explicit matching procedure—because the aim was to evaluate real-world generalizability rather than idealized matched conditions—we will add a limitations paragraph acknowledging that unmeasured technical or biological differences between studies could contribute to the results and that perfect interchangeability cannot be assumed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on independent held-out datasets

full rationale

The manuscript is a purely empirical comparison of edgeR and DESeq2 on real and semi-simulated bulk RNA-Seq data. Tool-specific DEG sets are identified in discovery data, classifiers are trained on those sets, and performance is measured on held-out samples and four independent SARS-CoV-2 studies. No equations, fitted parameters, or derivations appear; the central claims rest on direct computation of AUC, precision, recall, and F1 scores from data partitions. Cross-study validation uses external datasets whose labels and features are not derived from the discovery analysis, so the reported superiority of edgeR-unique genes is falsifiable and not forced by construction. Minor self-citation, if present, is not load-bearing for the reported metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard statistical models for RNA-Seq data and empirical evaluation; no new entities or many free parameters introduced.

axioms (1)

domain assumption Statistical assumptions underlying negative binomial models in both edgeR and DESeq2 hold for the RNA-Seq count data used.
Both tools rely on negative binomial distribution for modeling gene counts, which is a standard but not always perfectly fitting assumption for real data.

pith-pipeline@v0.9.0 · 5846 in / 1472 out tokens · 54814 ms · 2026-05-21T15:43:13.782520+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

edgeR achieved higher F1 scores in 9 of 13 contrasts... cross-study validation using four independent SARS-CoV-2 datasets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis

Rezapour M, Murphy SV, Ornelles DA, McNutt PM, Atala A. Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis. Scientific Reports. 2025;15(1):23961

work page 2025
[2]

Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection

Rezapour M, McNutt PM, Ornelles DA, Walker S, Murphy SV, Atala A, et al. Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection. Frontiers in Genetics. 2025;16:1658577

work page 2025
[3]

Transcriptomic profiling of human endothelial cells infected with venezuelan equine encephalitis virus reveals NRF2 driven host reprogramming mediated by omaveloxolone treatment

Rezapour M, Opoku LA, Trefry SV, Alili A, Konadu M, Dionisio MG, et al. Transcriptomic profiling of human endothelial cells infected with venezuelan equine encephalitis virus reveals NRF2 driven host reprogramming mediated by omaveloxolone treatment. Frontiers in Genetics. 2025;16:1722527

work page 2025
[4]

Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons

Rezapour M, Bowser J, Richardson C, Gurcan MN. Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons. International Journal of Molecular Sciences. 2025;26(18):9032

work page 2025
[5]

Assessing concordance between RNA-Seq and NanoString technologies in Ebola-infected nonhuman primates using machine learning

Rezapour M, Narayanan A, Mowery WH, Gurcan MN. Assessing concordance between RNA-Seq and NanoString technologies in Ebola-infected nonhuman primates using machine learning. BMC genomics. 2025;26(1):358

work page 2025
[6]

A comparative analysis of RNA-Seq and NanoString technologies in deciphering viral infection response in upper airway lung organoids

Rezapour M, Walker SJ, Ornelles DA, Niazi MKK, McNutt PM, Atala A, et al. A comparative analysis of RNA-Seq and NanoString technologies in deciphering viral infection response in upper airway lung organoids. Frontiers in Genetics. 2024;15:1327984

work page 2024
[7]

Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods

Rezapour M, Walker SJ, Ornelles DA, McNutt PM, Atala A, Gurcan MN. Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods. Frontiers in Microbiology. 2024;15:1342328

work page 2024
[8]

Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS

Rezapour M, Wesolowski R, Gurcan MN. Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS. International journal of molecular sciences. 2024;25(13):7306

work page 2024
[9]

Machine Learning Analysis of RNA-Seq Data Identifies Key Gene Signatures and Pathways in Mpox Virus-Induced Gastrointestinal Complications Using Colon Organoid Models

Rezapour M, Narayanan A, Gurcan MN. Machine Learning Analysis of RNA-Seq Data Identifies Key Gene Signatures and Pathways in Mpox Virus-Induced Gastrointestinal Complications Using Colon Organoid Models. International Journal of Molecular Sciences. 2024;25(20):11142

work page 2024
[10]

Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates

Rezapour M, Niazi MKK, Lu H, Narayanan A, Gurcan MN. Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates. Frontiers in Artificial Intelligence. 2024;7:1405332. January 8, 2026 17/20

work page 2024
[11]

Mapping and quantifying mammalian transcriptomes by RNA-Seq

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008;5(7):621–628

work page 2008
[12]

RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays

Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research. 2008;18(9):1509–1517

work page 2008
[13]

A survey of best practices for RNA-seq data analysis

Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome biology. 2016;17:1–19

work page 2016
[14]

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. bioinformatics. 2010;26(1):139–140

work page 2010
[15]

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15:1–21

work page 2014
[16]

A scaling normalization method for differential expression analysis of RNA-seq data

Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology. 2010;11:1–9

work page 2010
[17]

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic acids research. 2012;40(10):4288–4297

work page 2012
[18]

Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates

Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology. 2012;11(5)

work page 2012
[19]

Differential expression analysis for sequence count data

Anders S, Huber W. Differential expression analysis for sequence count data. Nature Precedings. 2010; p. 1–1

work page 2010
[20]

Comparison of software packages for detecting differential expression in RNA-seq studies

Seyednasrollah F, Laiho A, Elo LL. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in bioinformatics. 2015;16(1):59–70

work page 2015
[21]

A comparative study of techniques for differential expression analysis on RNA-Seq data

Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, et al. A comparative study of techniques for differential expression analysis on RNA-Seq data. PloS one. 2014;9(8):e103207

work page 2014
[22]

Robustness of differential gene expression analysis of RNA-seq

Stupnikov A, McInerney C, Savage K, McIntosh S, Emmert-Streib F, Kennedy R, et al. Robustness of differential gene expression analysis of RNA-seq. Computational and structural biotechnology journal. 2021;19:3470–3481

work page 2021
[23]

Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2

Liu S, Wang Z, Zhu R, Wang F, Cheng Y, Liu Y. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. Journal of Visualized Experiments (JoVE). 2021;(175):e62528

work page 2021
[24]

Exaggerated false positives by popular differential expression methods when analyzing human population samples

Li Y, Ge X, Peng F, Li W, Li JJ. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome biology. 2022;23(1):79

work page 2022
[25]

An evaluation of RNA-seq differential analysis methods

Li D, Zand MS, Dye TD, Goniewicz ML, Rahman I, Xie Z. An evaluation of RNA-seq differential analysis methods. PLoS One. 2022;17(9):e0264246

work page 2022
[26]

Differential anti-viral response to respiratory syncytial virus A in preterm and term infants

Anderson J, Imran S, Ng YY, Wang T, Ashley S, Thang CM, et al. Differential anti-viral response to respiratory syncytial virus A in preterm and term infants. EBioMedicine. 2024;102. January 8, 2026 18/20

work page 2024
[27]

Mpox infection protects against re-challenge in rhesus macaques

Aid M, Sciacca M, McMahan K, Hope D, Liu J, Jacob-Dolan C, et al. Mpox infection protects against re-challenge in rhesus macaques. Cell. 2023;186(21):4652–4661

work page 2023
[28]

Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans

Cross RW, Speranza E, Borisevich V, Widen SG, Wood TG, Shim RS, et al. Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans. The Journal of infectious diseases. 2018;218(suppl 5):S486–S495

work page 2018
[29]

Dysregulated transcriptional responses to SARS-CoV-2 in the periphery

McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery. Nature communications. 2021;12(1):1079

work page 2021
[30]

RNA sequencing of transplant-stage idiopathic pulmonary fibrosis lung reveals unique pathway regulation

Sivakumar P, Thompson JR, Ammar R, Porteous M, McCoubrey C, Cantu III E, et al. RNA sequencing of transplant-stage idiopathic pulmonary fibrosis lung reveals unique pathway regulation. ERJ open research. 2019;5(3)

work page 2019
[31]

Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients

Bibert S, Guex N, Lourenco J, Brahier T, Papadimitriou-Olivgeris M, Damonti L, et al. Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients. Frontiers in immunology. 2021;12:666163

work page 2021
[32]

Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans

Arunachalam PS, Wimmers F, Mok CKP, Perera RA, Scott M, Hagan T, et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science. 2020;369(6508):1210–1220

work page 2020
[33]

Dysregulated transcriptional responses to SARS-CoV-2 in the periphery support novel diagnostic approaches

McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery support novel diagnostic approaches. medRxiv. 2020

work page 2020
[34]

CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death

L´ evy Y, Wiedemann A, Hejblum BP, Durand M, Lefebvre C, Sur´ enaud M, et al. CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death. Iscience. 2021;24(7)

work page 2019
[35]

When to use the B onferroni correction

Armstrong RA. When to use the B onferroni correction. Ophthalmic and physiological optics. 2014;34(5):502–508

work page 2014
[36]

Comparing sets of patterns with the Jaccard index

Fletcher S, Islam MZ, et al. Comparing sets of patterns with the Jaccard index. Australasian Journal of Information Systems. 2018;22

work page 2018
[37]

Pearson correlation coefficient

Cohen I, Huang Y, Chen J, Benesty J, Benesty J, Chen J, et al. Pearson correlation coefficient. Noise reduction in speech processing. 2009; p. 1–4

work page 2009
[38]

Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data

De Winter JC, Gosling SD, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods. 2016;21(3):273

work page 2016
[39]

Principal component analysis

Abdi H, Williams LJ. Principal component analysis. Wiley interdisciplinary reviews: computational statistics. 2010;2(4):433–459

work page 2010
[40]

On discriminative vs

Ng A, Jordan M. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in neural information processing systems. 2001;14

work page 2001
[41]

Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models

Yacouby R, Axman D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems; 2020. p. 79–91. January 8, 2026 19/20

work page 2020
[42]

Benchmarking optimization software with performance profiles

Dolan ED, Mor´ e JJ. Benchmarking optimization software with performance profiles. Mathematical programming. 2002;91:201–213

work page 2002
[43]

What is an ROC curve?; 2017

Hoo ZH, Candlish J, Teare D. What is an ROC curve?; 2017. January 8, 2026 20/20

work page 2017

[1] [1]

Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis

Rezapour M, Murphy SV, Ornelles DA, McNutt PM, Atala A. Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis. Scientific Reports. 2025;15(1):23961

work page 2025

[2] [2]

Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection

Rezapour M, McNutt PM, Ornelles DA, Walker S, Murphy SV, Atala A, et al. Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection. Frontiers in Genetics. 2025;16:1658577

work page 2025

[3] [3]

Transcriptomic profiling of human endothelial cells infected with venezuelan equine encephalitis virus reveals NRF2 driven host reprogramming mediated by omaveloxolone treatment

Rezapour M, Opoku LA, Trefry SV, Alili A, Konadu M, Dionisio MG, et al. Transcriptomic profiling of human endothelial cells infected with venezuelan equine encephalitis virus reveals NRF2 driven host reprogramming mediated by omaveloxolone treatment. Frontiers in Genetics. 2025;16:1722527

work page 2025

[4] [4]

Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons

Rezapour M, Bowser J, Richardson C, Gurcan MN. Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons. International Journal of Molecular Sciences. 2025;26(18):9032

work page 2025

[5] [5]

Assessing concordance between RNA-Seq and NanoString technologies in Ebola-infected nonhuman primates using machine learning

Rezapour M, Narayanan A, Mowery WH, Gurcan MN. Assessing concordance between RNA-Seq and NanoString technologies in Ebola-infected nonhuman primates using machine learning. BMC genomics. 2025;26(1):358

work page 2025

[6] [6]

A comparative analysis of RNA-Seq and NanoString technologies in deciphering viral infection response in upper airway lung organoids

Rezapour M, Walker SJ, Ornelles DA, Niazi MKK, McNutt PM, Atala A, et al. A comparative analysis of RNA-Seq and NanoString technologies in deciphering viral infection response in upper airway lung organoids. Frontiers in Genetics. 2024;15:1327984

work page 2024

[7] [7]

Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods

Rezapour M, Walker SJ, Ornelles DA, McNutt PM, Atala A, Gurcan MN. Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods. Frontiers in Microbiology. 2024;15:1342328

work page 2024

[8] [8]

Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS

Rezapour M, Wesolowski R, Gurcan MN. Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS. International journal of molecular sciences. 2024;25(13):7306

work page 2024

[9] [9]

Machine Learning Analysis of RNA-Seq Data Identifies Key Gene Signatures and Pathways in Mpox Virus-Induced Gastrointestinal Complications Using Colon Organoid Models

Rezapour M, Narayanan A, Gurcan MN. Machine Learning Analysis of RNA-Seq Data Identifies Key Gene Signatures and Pathways in Mpox Virus-Induced Gastrointestinal Complications Using Colon Organoid Models. International Journal of Molecular Sciences. 2024;25(20):11142

work page 2024

[10] [10]

Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates

Rezapour M, Niazi MKK, Lu H, Narayanan A, Gurcan MN. Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates. Frontiers in Artificial Intelligence. 2024;7:1405332. January 8, 2026 17/20

work page 2024

[11] [11]

Mapping and quantifying mammalian transcriptomes by RNA-Seq

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008;5(7):621–628

work page 2008

[12] [12]

RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays

Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research. 2008;18(9):1509–1517

work page 2008

[13] [13]

A survey of best practices for RNA-seq data analysis

Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome biology. 2016;17:1–19

work page 2016

[14] [14]

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. bioinformatics. 2010;26(1):139–140

work page 2010

[15] [15]

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15:1–21

work page 2014

[16] [16]

A scaling normalization method for differential expression analysis of RNA-seq data

Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology. 2010;11:1–9

work page 2010

[17] [17]

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic acids research. 2012;40(10):4288–4297

work page 2012

[18] [18]

Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates

Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology. 2012;11(5)

work page 2012

[19] [19]

Differential expression analysis for sequence count data

Anders S, Huber W. Differential expression analysis for sequence count data. Nature Precedings. 2010; p. 1–1

work page 2010

[20] [20]

Comparison of software packages for detecting differential expression in RNA-seq studies

Seyednasrollah F, Laiho A, Elo LL. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in bioinformatics. 2015;16(1):59–70

work page 2015

[21] [21]

A comparative study of techniques for differential expression analysis on RNA-Seq data

Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, et al. A comparative study of techniques for differential expression analysis on RNA-Seq data. PloS one. 2014;9(8):e103207

work page 2014

[22] [22]

Robustness of differential gene expression analysis of RNA-seq

Stupnikov A, McInerney C, Savage K, McIntosh S, Emmert-Streib F, Kennedy R, et al. Robustness of differential gene expression analysis of RNA-seq. Computational and structural biotechnology journal. 2021;19:3470–3481

work page 2021

[23] [23]

Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2

Liu S, Wang Z, Zhu R, Wang F, Cheng Y, Liu Y. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. Journal of Visualized Experiments (JoVE). 2021;(175):e62528

work page 2021

[24] [24]

Exaggerated false positives by popular differential expression methods when analyzing human population samples

Li Y, Ge X, Peng F, Li W, Li JJ. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome biology. 2022;23(1):79

work page 2022

[25] [25]

An evaluation of RNA-seq differential analysis methods

Li D, Zand MS, Dye TD, Goniewicz ML, Rahman I, Xie Z. An evaluation of RNA-seq differential analysis methods. PLoS One. 2022;17(9):e0264246

work page 2022

[26] [26]

Differential anti-viral response to respiratory syncytial virus A in preterm and term infants

Anderson J, Imran S, Ng YY, Wang T, Ashley S, Thang CM, et al. Differential anti-viral response to respiratory syncytial virus A in preterm and term infants. EBioMedicine. 2024;102. January 8, 2026 18/20

work page 2024

[27] [27]

Mpox infection protects against re-challenge in rhesus macaques

Aid M, Sciacca M, McMahan K, Hope D, Liu J, Jacob-Dolan C, et al. Mpox infection protects against re-challenge in rhesus macaques. Cell. 2023;186(21):4652–4661

work page 2023

[28] [28]

Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans

Cross RW, Speranza E, Borisevich V, Widen SG, Wood TG, Shim RS, et al. Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans. The Journal of infectious diseases. 2018;218(suppl 5):S486–S495

work page 2018

[29] [29]

Dysregulated transcriptional responses to SARS-CoV-2 in the periphery

McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery. Nature communications. 2021;12(1):1079

work page 2021

[30] [30]

RNA sequencing of transplant-stage idiopathic pulmonary fibrosis lung reveals unique pathway regulation

Sivakumar P, Thompson JR, Ammar R, Porteous M, McCoubrey C, Cantu III E, et al. RNA sequencing of transplant-stage idiopathic pulmonary fibrosis lung reveals unique pathway regulation. ERJ open research. 2019;5(3)

work page 2019

[31] [31]

Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients

Bibert S, Guex N, Lourenco J, Brahier T, Papadimitriou-Olivgeris M, Damonti L, et al. Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients. Frontiers in immunology. 2021;12:666163

work page 2021

[32] [32]

Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans

Arunachalam PS, Wimmers F, Mok CKP, Perera RA, Scott M, Hagan T, et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science. 2020;369(6508):1210–1220

work page 2020

[33] [33]

Dysregulated transcriptional responses to SARS-CoV-2 in the periphery support novel diagnostic approaches

McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery support novel diagnostic approaches. medRxiv. 2020

work page 2020

[34] [34]

CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death

L´ evy Y, Wiedemann A, Hejblum BP, Durand M, Lefebvre C, Sur´ enaud M, et al. CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death. Iscience. 2021;24(7)

work page 2019

[35] [35]

When to use the B onferroni correction

Armstrong RA. When to use the B onferroni correction. Ophthalmic and physiological optics. 2014;34(5):502–508

work page 2014

[36] [36]

Comparing sets of patterns with the Jaccard index

Fletcher S, Islam MZ, et al. Comparing sets of patterns with the Jaccard index. Australasian Journal of Information Systems. 2018;22

work page 2018

[37] [37]

Pearson correlation coefficient

Cohen I, Huang Y, Chen J, Benesty J, Benesty J, Chen J, et al. Pearson correlation coefficient. Noise reduction in speech processing. 2009; p. 1–4

work page 2009

[38] [38]

Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data

De Winter JC, Gosling SD, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods. 2016;21(3):273

work page 2016

[39] [39]

Principal component analysis

Abdi H, Williams LJ. Principal component analysis. Wiley interdisciplinary reviews: computational statistics. 2010;2(4):433–459

work page 2010

[40] [40]

On discriminative vs

Ng A, Jordan M. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in neural information processing systems. 2001;14

work page 2001

[41] [41]

Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models

Yacouby R, Axman D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems; 2020. p. 79–91. January 8, 2026 19/20

work page 2020

[42] [42]

Benchmarking optimization software with performance profiles

Dolan ED, Mor´ e JJ. Benchmarking optimization software with performance profiles. Mathematical programming. 2002;91:201–213

work page 2002

[43] [43]

What is an ROC curve?; 2017

Hoo ZH, Candlish J, Teare D. What is an ROC curve?; 2017. January 8, 2026 20/20

work page 2017