Tool Choice Matters: Evaluating edgeR vs. DESeq2 for Sensitivity, Robustness, and Cross-Study Performance
Pith reviewed 2026-05-21 15:43 UTC · model grok-4.3
The pith
edgeR yields more robust and generalizable gene sets than DESeq2 for cross-study RNA-Seq classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using real and semi-simulated data, the study finds that gene sets identified only by edgeR deliver higher AUC, precision, and recall when used to classify samples in independent SARS-CoV-2 datasets, with some test cases reaching perfect separation. DESeq2-specific genes show lower and more variable performance across the same folds. Both tools respond similarly to added outliers, yet edgeR maintains classification performance closer to optimal across a larger share of contrasts.
What carries the argument
Cross-study validation of tool-specific differentially expressed gene sets through supervised classification on four held-out SARS-CoV-2 datasets.
If this is right
- edgeR-specific genes produce higher F1 scores in nine of thirteen classification contrasts.
- Dolan-More profiles show edgeR performance stays nearer the optimum across more datasets.
- Cross-study replication of classification succeeds more consistently with edgeR-unique genes.
- Jaccard overlap between DEG lists drops for both tools as more outliers are introduced.
Where Pith is reading between the lines
- Preference for edgeR could reduce wasted effort on non-replicable leads in biomarker studies.
- A hybrid workflow that takes the intersection or union of both tools' outputs might balance sensitivity and robustness.
- The pattern may extend to other high-throughput sequencing applications where generalizability matters more than raw count of discoveries.
- Repeating the cross-study design on non-viral disease cohorts would test whether the advantage is context-specific.
Load-bearing premise
The four independent SARS-CoV-2 datasets are similar enough in experimental design and biological context for a fair head-to-head comparison of the two tools.
What would settle it
If gene sets found only by DESeq2 achieve equal or higher AUC, precision, and recall than edgeR-specific sets when classifying samples from new independent studies, the claim that edgeR produces more generalizable results would be refuted.
Figures
read the original abstract
Differential gene expression (DGE) analysis is foundational to transcriptomic research, yet tool selection can substantially influence results. This study presents a comprehensive comparison of two widely used DGE tools, edgeR and DESeq2, using real and semi-simulated bulk RNA-Seq datasets spanning viral, bacterial, and fibrotic conditions. We evaluated tool performance across three key dimensions: (1) sensitivity to sample size and robustness to outliers; (2) classification performance of uniquely identified gene sets within the discovery dataset; and (3) generalizability of tool-specific gene sets across independent studies. First, both tools showed similar responses to simulated outliers, with Jaccard similarity between the DEG sets from perturbed and original (unperturbed) data decreasing as more outliers were added. Second, classification models trained on tool-specific genes showed that edgeR achieved higher F1 scores in 9 of 13 contrasts and more frequently reached perfect or near-perfect precision. Dolan-More performance profiles further indicated that edgeR maintained performance closer to optimal across a greater proportion of datasets. Third, in cross-study validation using four independent SARS-CoV-2 datasets, gene sets uniquely identified by edgeR yielded higher AUC, precision, and recall in classifying samples from held-out datasets. This pattern was consistent across folds, with some test cases achieving perfect separation using edgeR-specific genes. In contrast, DESeq2-specific genes showed lower and more variable performance across studies. Overall, our findings highlight that while DESeq2 may identify more DEGs even under stringent significance conditions, edgeR yields more robust and generalizable gene sets for downstream classification and cross-study replication, which underscores key trade-offs in tool selection for transcriptomic analyses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript compares edgeR and DESeq2 for differential gene expression analysis on real and semi-simulated bulk RNA-Seq datasets spanning viral, bacterial, and fibrotic conditions. It assesses sensitivity to sample size and outliers, classification performance of tool-specific gene sets within discovery data, and generalizability of those gene sets across four independent SARS-CoV-2 studies. The central claim is that DESeq2 identifies more DEGs but edgeR yields more robust gene sets with higher F1 scores, AUC, precision, and recall in classification and cross-study validation.
Significance. If the results hold, the work would usefully inform tool choice in transcriptomics by documenting concrete trade-offs between sensitivity and robustness/generalizability. The study gains strength from combining real and semi-simulated data, multiple performance metrics (F1, Dolan-More profiles, AUC/precision/recall), and held-out cross-validation on independent datasets.
major comments (1)
- Cross-study validation paragraph: the headline result that edgeR-unique gene sets achieve higher AUC, precision, and recall on held-out SARS-CoV-2 datasets rests on treating the four independent studies as interchangeable replicates. No metadata on tissue, sequencing depth, platform, or patient covariates, nor any matching procedure, is supplied to establish comparability; without this, performance differences could reflect dataset-specific technical or biological signals rather than intrinsic tool robustness. This assumption is load-bearing for the generalizability claim.
minor comments (2)
- Abstract and methods: full details on how outliers were simulated (distribution, magnitude, number per sample) and the exact statistical tests used for robustness comparisons are not provided, making independent verification difficult.
- Results section on classification: the statement that edgeR reached 'perfect or near-perfect precision' in 9 of 13 contrasts would be clearer if the exact precision values and the definition of 'near-perfect' were tabulated.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript comparing edgeR and DESeq2. We address the major comment below and have made revisions to strengthen the presentation of the cross-study validation.
read point-by-point responses
-
Referee: Cross-study validation paragraph: the headline result that edgeR-unique gene sets achieve higher AUC, precision, and recall on held-out SARS-CoV-2 datasets rests on treating the four independent studies as interchangeable replicates. No metadata on tissue, sequencing depth, platform, or patient covariates, nor any matching procedure, is supplied to establish comparability; without this, performance differences could reflect dataset-specific technical or biological signals rather than intrinsic tool robustness. This assumption is load-bearing for the generalizability claim.
Authors: We agree that additional dataset metadata would improve transparency and help readers assess comparability. In the revised manuscript we will add a supplementary table summarizing available characteristics of the four SARS-CoV-2 studies (tissue source, sequencing platform, read depth, and sample size) drawn from their original publications. We note that all four datasets involve human samples from SARS-CoV-2 infected individuals and were used as independent held-out test sets for gene sets derived from a separate discovery contrast; the observed performance advantage for edgeR-unique genes was consistent across every fold and every test study. While we did not apply an explicit matching procedure—because the aim was to evaluate real-world generalizability rather than idealized matched conditions—we will add a limitations paragraph acknowledging that unmeasured technical or biological differences between studies could contribute to the results and that perfect interchangeability cannot be assumed. revision: yes
Circularity Check
No circularity: empirical evaluation on independent held-out datasets
full rationale
The manuscript is a purely empirical comparison of edgeR and DESeq2 on real and semi-simulated bulk RNA-Seq data. Tool-specific DEG sets are identified in discovery data, classifiers are trained on those sets, and performance is measured on held-out samples and four independent SARS-CoV-2 studies. No equations, fitted parameters, or derivations appear; the central claims rest on direct computation of AUC, precision, recall, and F1 scores from data partitions. Cross-study validation uses external datasets whose labels and features are not derived from the discovery analysis, so the reported superiority of edgeR-unique genes is falsifiable and not forced by construction. Minor self-citation, if present, is not load-bearing for the reported metrics.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Statistical assumptions underlying negative binomial models in both edgeR and DESeq2 hold for the RNA-Seq count data used.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
edgeR achieved higher F1 scores in 9 of 13 contrasts... cross-study validation using four independent SARS-CoV-2 datasets
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis
Rezapour M, Murphy SV, Ornelles DA, McNutt PM, Atala A. Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis. Scientific Reports. 2025;15(1):23961
work page 2025
-
[2]
Rezapour M, McNutt PM, Ornelles DA, Walker S, Murphy SV, Atala A, et al. Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection. Frontiers in Genetics. 2025;16:1658577
work page 2025
-
[3]
Rezapour M, Opoku LA, Trefry SV, Alili A, Konadu M, Dionisio MG, et al. Transcriptomic profiling of human endothelial cells infected with venezuelan equine encephalitis virus reveals NRF2 driven host reprogramming mediated by omaveloxolone treatment. Frontiers in Genetics. 2025;16:1722527
work page 2025
-
[4]
Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons
Rezapour M, Bowser J, Richardson C, Gurcan MN. Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons. International Journal of Molecular Sciences. 2025;26(18):9032
work page 2025
-
[5]
Rezapour M, Narayanan A, Mowery WH, Gurcan MN. Assessing concordance between RNA-Seq and NanoString technologies in Ebola-infected nonhuman primates using machine learning. BMC genomics. 2025;26(1):358
work page 2025
-
[6]
Rezapour M, Walker SJ, Ornelles DA, Niazi MKK, McNutt PM, Atala A, et al. A comparative analysis of RNA-Seq and NanoString technologies in deciphering viral infection response in upper airway lung organoids. Frontiers in Genetics. 2024;15:1327984
work page 2024
-
[7]
Rezapour M, Walker SJ, Ornelles DA, McNutt PM, Atala A, Gurcan MN. Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods. Frontiers in Microbiology. 2024;15:1342328
work page 2024
-
[8]
Rezapour M, Wesolowski R, Gurcan MN. Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS. International journal of molecular sciences. 2024;25(13):7306
work page 2024
-
[9]
Rezapour M, Narayanan A, Gurcan MN. Machine Learning Analysis of RNA-Seq Data Identifies Key Gene Signatures and Pathways in Mpox Virus-Induced Gastrointestinal Complications Using Colon Organoid Models. International Journal of Molecular Sciences. 2024;25(20):11142
work page 2024
-
[10]
Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates
Rezapour M, Niazi MKK, Lu H, Narayanan A, Gurcan MN. Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates. Frontiers in Artificial Intelligence. 2024;7:1405332. January 8, 2026 17/20
work page 2024
-
[11]
Mapping and quantifying mammalian transcriptomes by RNA-Seq
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008;5(7):621–628
work page 2008
-
[12]
RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research. 2008;18(9):1509–1517
work page 2008
-
[13]
A survey of best practices for RNA-seq data analysis
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome biology. 2016;17:1–19
work page 2016
-
[14]
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. bioinformatics. 2010;26(1):139–140
work page 2010
-
[15]
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15:1–21
work page 2014
-
[16]
A scaling normalization method for differential expression analysis of RNA-seq data
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology. 2010;11:1–9
work page 2010
-
[17]
McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic acids research. 2012;40(10):4288–4297
work page 2012
-
[18]
Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology. 2012;11(5)
work page 2012
-
[19]
Differential expression analysis for sequence count data
Anders S, Huber W. Differential expression analysis for sequence count data. Nature Precedings. 2010; p. 1–1
work page 2010
-
[20]
Comparison of software packages for detecting differential expression in RNA-seq studies
Seyednasrollah F, Laiho A, Elo LL. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in bioinformatics. 2015;16(1):59–70
work page 2015
-
[21]
A comparative study of techniques for differential expression analysis on RNA-Seq data
Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, et al. A comparative study of techniques for differential expression analysis on RNA-Seq data. PloS one. 2014;9(8):e103207
work page 2014
-
[22]
Robustness of differential gene expression analysis of RNA-seq
Stupnikov A, McInerney C, Savage K, McIntosh S, Emmert-Streib F, Kennedy R, et al. Robustness of differential gene expression analysis of RNA-seq. Computational and structural biotechnology journal. 2021;19:3470–3481
work page 2021
-
[23]
Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2
Liu S, Wang Z, Zhu R, Wang F, Cheng Y, Liu Y. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. Journal of Visualized Experiments (JoVE). 2021;(175):e62528
work page 2021
-
[24]
Li Y, Ge X, Peng F, Li W, Li JJ. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome biology. 2022;23(1):79
work page 2022
-
[25]
An evaluation of RNA-seq differential analysis methods
Li D, Zand MS, Dye TD, Goniewicz ML, Rahman I, Xie Z. An evaluation of RNA-seq differential analysis methods. PLoS One. 2022;17(9):e0264246
work page 2022
-
[26]
Differential anti-viral response to respiratory syncytial virus A in preterm and term infants
Anderson J, Imran S, Ng YY, Wang T, Ashley S, Thang CM, et al. Differential anti-viral response to respiratory syncytial virus A in preterm and term infants. EBioMedicine. 2024;102. January 8, 2026 18/20
work page 2024
-
[27]
Mpox infection protects against re-challenge in rhesus macaques
Aid M, Sciacca M, McMahan K, Hope D, Liu J, Jacob-Dolan C, et al. Mpox infection protects against re-challenge in rhesus macaques. Cell. 2023;186(21):4652–4661
work page 2023
-
[28]
Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans
Cross RW, Speranza E, Borisevich V, Widen SG, Wood TG, Shim RS, et al. Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans. The Journal of infectious diseases. 2018;218(suppl 5):S486–S495
work page 2018
-
[29]
Dysregulated transcriptional responses to SARS-CoV-2 in the periphery
McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery. Nature communications. 2021;12(1):1079
work page 2021
-
[30]
Sivakumar P, Thompson JR, Ammar R, Porteous M, McCoubrey C, Cantu III E, et al. RNA sequencing of transplant-stage idiopathic pulmonary fibrosis lung reveals unique pathway regulation. ERJ open research. 2019;5(3)
work page 2019
-
[31]
Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients
Bibert S, Guex N, Lourenco J, Brahier T, Papadimitriou-Olivgeris M, Damonti L, et al. Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients. Frontiers in immunology. 2021;12:666163
work page 2021
-
[32]
Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans
Arunachalam PS, Wimmers F, Mok CKP, Perera RA, Scott M, Hagan T, et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science. 2020;369(6508):1210–1220
work page 2020
-
[33]
McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery support novel diagnostic approaches. medRxiv. 2020
work page 2020
-
[34]
L´ evy Y, Wiedemann A, Hejblum BP, Durand M, Lefebvre C, Sur´ enaud M, et al. CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death. Iscience. 2021;24(7)
work page 2019
-
[35]
When to use the B onferroni correction
Armstrong RA. When to use the B onferroni correction. Ophthalmic and physiological optics. 2014;34(5):502–508
work page 2014
-
[36]
Comparing sets of patterns with the Jaccard index
Fletcher S, Islam MZ, et al. Comparing sets of patterns with the Jaccard index. Australasian Journal of Information Systems. 2018;22
work page 2018
-
[37]
Pearson correlation coefficient
Cohen I, Huang Y, Chen J, Benesty J, Benesty J, Chen J, et al. Pearson correlation coefficient. Noise reduction in speech processing. 2009; p. 1–4
work page 2009
-
[38]
De Winter JC, Gosling SD, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods. 2016;21(3):273
work page 2016
-
[39]
Abdi H, Williams LJ. Principal component analysis. Wiley interdisciplinary reviews: computational statistics. 2010;2(4):433–459
work page 2010
-
[40]
Ng A, Jordan M. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in neural information processing systems. 2001;14
work page 2001
-
[41]
Yacouby R, Axman D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems; 2020. p. 79–91. January 8, 2026 19/20
work page 2020
-
[42]
Benchmarking optimization software with performance profiles
Dolan ED, Mor´ e JJ. Benchmarking optimization software with performance profiles. Mathematical programming. 2002;91:201–213
work page 2002
-
[43]
Hoo ZH, Candlish J, Teare D. What is an ROC curve?; 2017. January 8, 2026 20/20
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.