pith. sign in

arxiv: 2601.04122 · v2 · pith:LHMEZJBWnew · submitted 2026-01-07 · 🧬 q-bio.GN

Tool Choice Matters: Evaluating edgeR vs. DESeq2 for Sensitivity, Robustness, and Cross-Study Performance

Pith reviewed 2026-05-21 15:43 UTC · model grok-4.3

classification 🧬 q-bio.GN
keywords differential gene expressionedgeRDESeq2RNA-Seqcross-study validationclassification performancerobustnesstool comparison
0
0 comments X

The pith

edgeR yields more robust and generalizable gene sets than DESeq2 for cross-study RNA-Seq classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests edgeR and DESeq2 on bulk RNA-Seq datasets from viral, bacterial, and fibrotic conditions to measure how tool choice shapes downstream results. It checks sensitivity to sample size and outliers, then trains classifiers on each tool's unique genes to see how well they separate samples inside the original data. The decisive test applies those same gene sets to four separate SARS-CoV-2 studies and tracks accuracy, precision, and recall. edgeR's genes produce higher and steadier performance across the held-out datasets, while DESeq2 tends to surface more genes yet delivers less reliable classification when moved to new studies. A reader cares because the choice directly affects which genes become candidates for biomarkers or follow-up experiments and whether those findings hold up beyond one lab.

Core claim

Using real and semi-simulated data, the study finds that gene sets identified only by edgeR deliver higher AUC, precision, and recall when used to classify samples in independent SARS-CoV-2 datasets, with some test cases reaching perfect separation. DESeq2-specific genes show lower and more variable performance across the same folds. Both tools respond similarly to added outliers, yet edgeR maintains classification performance closer to optimal across a larger share of contrasts.

What carries the argument

Cross-study validation of tool-specific differentially expressed gene sets through supervised classification on four held-out SARS-CoV-2 datasets.

If this is right

  • edgeR-specific genes produce higher F1 scores in nine of thirteen classification contrasts.
  • Dolan-More profiles show edgeR performance stays nearer the optimum across more datasets.
  • Cross-study replication of classification succeeds more consistently with edgeR-unique genes.
  • Jaccard overlap between DEG lists drops for both tools as more outliers are introduced.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Preference for edgeR could reduce wasted effort on non-replicable leads in biomarker studies.
  • A hybrid workflow that takes the intersection or union of both tools' outputs might balance sensitivity and robustness.
  • The pattern may extend to other high-throughput sequencing applications where generalizability matters more than raw count of discoveries.
  • Repeating the cross-study design on non-viral disease cohorts would test whether the advantage is context-specific.

Load-bearing premise

The four independent SARS-CoV-2 datasets are similar enough in experimental design and biological context for a fair head-to-head comparison of the two tools.

What would settle it

If gene sets found only by DESeq2 achieve equal or higher AUC, precision, and recall than edgeR-specific sets when classifying samples from new independent studies, the claim that edgeR produces more generalizable results would be refuted.

Figures

Figures reproduced from arXiv: 2601.04122 by Mostafa Rezapour.

Figure 1
Figure 1. Figure 1: Sensitivity of edgeR and DESeq2 to sample size and outliers. [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of edgeR and DESeq2 across multiple biological contrasts. Panel (a) shows the log2-scaled number of uniquely identified upregulated and downregulated genes by each tool across 13 contrasts spanning viral, bacterial, and fibrotic conditions. Panel (b) displays the Jaccard index for upregulated and downregulated gene sets, indicating overlap between tools. Panel (c) shows Pearson and Spearman corr… view at source ↗
Figure 3
Figure 3. Figure 3: Classification performance of uniquely identified genes from [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cross-study generalizability of uniquely significant genes from [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

Differential gene expression (DGE) analysis is foundational to transcriptomic research, yet tool selection can substantially influence results. This study presents a comprehensive comparison of two widely used DGE tools, edgeR and DESeq2, using real and semi-simulated bulk RNA-Seq datasets spanning viral, bacterial, and fibrotic conditions. We evaluated tool performance across three key dimensions: (1) sensitivity to sample size and robustness to outliers; (2) classification performance of uniquely identified gene sets within the discovery dataset; and (3) generalizability of tool-specific gene sets across independent studies. First, both tools showed similar responses to simulated outliers, with Jaccard similarity between the DEG sets from perturbed and original (unperturbed) data decreasing as more outliers were added. Second, classification models trained on tool-specific genes showed that edgeR achieved higher F1 scores in 9 of 13 contrasts and more frequently reached perfect or near-perfect precision. Dolan-More performance profiles further indicated that edgeR maintained performance closer to optimal across a greater proportion of datasets. Third, in cross-study validation using four independent SARS-CoV-2 datasets, gene sets uniquely identified by edgeR yielded higher AUC, precision, and recall in classifying samples from held-out datasets. This pattern was consistent across folds, with some test cases achieving perfect separation using edgeR-specific genes. In contrast, DESeq2-specific genes showed lower and more variable performance across studies. Overall, our findings highlight that while DESeq2 may identify more DEGs even under stringent significance conditions, edgeR yields more robust and generalizable gene sets for downstream classification and cross-study replication, which underscores key trade-offs in tool selection for transcriptomic analyses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript compares edgeR and DESeq2 for differential gene expression analysis on real and semi-simulated bulk RNA-Seq datasets spanning viral, bacterial, and fibrotic conditions. It assesses sensitivity to sample size and outliers, classification performance of tool-specific gene sets within discovery data, and generalizability of those gene sets across four independent SARS-CoV-2 studies. The central claim is that DESeq2 identifies more DEGs but edgeR yields more robust gene sets with higher F1 scores, AUC, precision, and recall in classification and cross-study validation.

Significance. If the results hold, the work would usefully inform tool choice in transcriptomics by documenting concrete trade-offs between sensitivity and robustness/generalizability. The study gains strength from combining real and semi-simulated data, multiple performance metrics (F1, Dolan-More profiles, AUC/precision/recall), and held-out cross-validation on independent datasets.

major comments (1)
  1. Cross-study validation paragraph: the headline result that edgeR-unique gene sets achieve higher AUC, precision, and recall on held-out SARS-CoV-2 datasets rests on treating the four independent studies as interchangeable replicates. No metadata on tissue, sequencing depth, platform, or patient covariates, nor any matching procedure, is supplied to establish comparability; without this, performance differences could reflect dataset-specific technical or biological signals rather than intrinsic tool robustness. This assumption is load-bearing for the generalizability claim.
minor comments (2)
  1. Abstract and methods: full details on how outliers were simulated (distribution, magnitude, number per sample) and the exact statistical tests used for robustness comparisons are not provided, making independent verification difficult.
  2. Results section on classification: the statement that edgeR reached 'perfect or near-perfect precision' in 9 of 13 contrasts would be clearer if the exact precision values and the definition of 'near-perfect' were tabulated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript comparing edgeR and DESeq2. We address the major comment below and have made revisions to strengthen the presentation of the cross-study validation.

read point-by-point responses
  1. Referee: Cross-study validation paragraph: the headline result that edgeR-unique gene sets achieve higher AUC, precision, and recall on held-out SARS-CoV-2 datasets rests on treating the four independent studies as interchangeable replicates. No metadata on tissue, sequencing depth, platform, or patient covariates, nor any matching procedure, is supplied to establish comparability; without this, performance differences could reflect dataset-specific technical or biological signals rather than intrinsic tool robustness. This assumption is load-bearing for the generalizability claim.

    Authors: We agree that additional dataset metadata would improve transparency and help readers assess comparability. In the revised manuscript we will add a supplementary table summarizing available characteristics of the four SARS-CoV-2 studies (tissue source, sequencing platform, read depth, and sample size) drawn from their original publications. We note that all four datasets involve human samples from SARS-CoV-2 infected individuals and were used as independent held-out test sets for gene sets derived from a separate discovery contrast; the observed performance advantage for edgeR-unique genes was consistent across every fold and every test study. While we did not apply an explicit matching procedure—because the aim was to evaluate real-world generalizability rather than idealized matched conditions—we will add a limitations paragraph acknowledging that unmeasured technical or biological differences between studies could contribute to the results and that perfect interchangeability cannot be assumed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on independent held-out datasets

full rationale

The manuscript is a purely empirical comparison of edgeR and DESeq2 on real and semi-simulated bulk RNA-Seq data. Tool-specific DEG sets are identified in discovery data, classifiers are trained on those sets, and performance is measured on held-out samples and four independent SARS-CoV-2 studies. No equations, fitted parameters, or derivations appear; the central claims rest on direct computation of AUC, precision, recall, and F1 scores from data partitions. Cross-study validation uses external datasets whose labels and features are not derived from the discovery analysis, so the reported superiority of edgeR-unique genes is falsifiable and not forced by construction. Minor self-citation, if present, is not load-bearing for the reported metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard statistical models for RNA-Seq data and empirical evaluation; no new entities or many free parameters introduced.

axioms (1)
  • domain assumption Statistical assumptions underlying negative binomial models in both edgeR and DESeq2 hold for the RNA-Seq count data used.
    Both tools rely on negative binomial distribution for modeling gene counts, which is a standard but not always perfectly fitting assumption for real data.

pith-pipeline@v0.9.0 · 5846 in / 1472 out tokens · 54814 ms · 2026-05-21T15:43:13.782520+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis

    Rezapour M, Murphy SV, Ornelles DA, McNutt PM, Atala A. Tracing the evolutionary pathway of SARS-CoV-2 through RNA sequencing analysis. Scientific Reports. 2025;15(1):23961

  2. [2]

    Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection

    Rezapour M, McNutt PM, Ornelles DA, Walker S, Murphy SV, Atala A, et al. Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection. Frontiers in Genetics. 2025;16:1658577

  3. [3]

    Transcriptomic profiling of human endothelial cells infected with venezuelan equine encephalitis virus reveals NRF2 driven host reprogramming mediated by omaveloxolone treatment

    Rezapour M, Opoku LA, Trefry SV, Alili A, Konadu M, Dionisio MG, et al. Transcriptomic profiling of human endothelial cells infected with venezuelan equine encephalitis virus reveals NRF2 driven host reprogramming mediated by omaveloxolone treatment. Frontiers in Genetics. 2025;16:1722527

  4. [4]

    Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons

    Rezapour M, Bowser J, Richardson C, Gurcan MN. Transcriptional Consequences of MeCP2 Knockdown and Overexpression in Mouse Primary Cortical Neurons. International Journal of Molecular Sciences. 2025;26(18):9032

  5. [5]

    Assessing concordance between RNA-Seq and NanoString technologies in Ebola-infected nonhuman primates using machine learning

    Rezapour M, Narayanan A, Mowery WH, Gurcan MN. Assessing concordance between RNA-Seq and NanoString technologies in Ebola-infected nonhuman primates using machine learning. BMC genomics. 2025;26(1):358

  6. [6]

    A comparative analysis of RNA-Seq and NanoString technologies in deciphering viral infection response in upper airway lung organoids

    Rezapour M, Walker SJ, Ornelles DA, Niazi MKK, McNutt PM, Atala A, et al. A comparative analysis of RNA-Seq and NanoString technologies in deciphering viral infection response in upper airway lung organoids. Frontiers in Genetics. 2024;15:1327984

  7. [7]

    Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods

    Rezapour M, Walker SJ, Ornelles DA, McNutt PM, Atala A, Gurcan MN. Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods. Frontiers in Microbiology. 2024;15:1342328

  8. [8]

    Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS

    Rezapour M, Wesolowski R, Gurcan MN. Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS. International journal of molecular sciences. 2024;25(13):7306

  9. [9]

    Machine Learning Analysis of RNA-Seq Data Identifies Key Gene Signatures and Pathways in Mpox Virus-Induced Gastrointestinal Complications Using Colon Organoid Models

    Rezapour M, Narayanan A, Gurcan MN. Machine Learning Analysis of RNA-Seq Data Identifies Key Gene Signatures and Pathways in Mpox Virus-Induced Gastrointestinal Complications Using Colon Organoid Models. International Journal of Molecular Sciences. 2024;25(20):11142

  10. [10]

    Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates

    Rezapour M, Niazi MKK, Lu H, Narayanan A, Gurcan MN. Machine learning-based analysis of Ebola virus’ impact on gene expression in nonhuman primates. Frontiers in Artificial Intelligence. 2024;7:1405332. January 8, 2026 17/20

  11. [11]

    Mapping and quantifying mammalian transcriptomes by RNA-Seq

    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008;5(7):621–628

  12. [12]

    RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays

    Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research. 2008;18(9):1509–1517

  13. [13]

    A survey of best practices for RNA-seq data analysis

    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome biology. 2016;17:1–19

  14. [14]

    edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

    Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. bioinformatics. 2010;26(1):139–140

  15. [15]

    Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

    Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15:1–21

  16. [16]

    A scaling normalization method for differential expression analysis of RNA-seq data

    Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology. 2010;11:1–9

  17. [17]

    Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

    McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic acids research. 2012;40(10):4288–4297

  18. [18]

    Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates

    Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology. 2012;11(5)

  19. [19]

    Differential expression analysis for sequence count data

    Anders S, Huber W. Differential expression analysis for sequence count data. Nature Precedings. 2010; p. 1–1

  20. [20]

    Comparison of software packages for detecting differential expression in RNA-seq studies

    Seyednasrollah F, Laiho A, Elo LL. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in bioinformatics. 2015;16(1):59–70

  21. [21]

    A comparative study of techniques for differential expression analysis on RNA-Seq data

    Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, et al. A comparative study of techniques for differential expression analysis on RNA-Seq data. PloS one. 2014;9(8):e103207

  22. [22]

    Robustness of differential gene expression analysis of RNA-seq

    Stupnikov A, McInerney C, Savage K, McIntosh S, Emmert-Streib F, Kennedy R, et al. Robustness of differential gene expression analysis of RNA-seq. Computational and structural biotechnology journal. 2021;19:3470–3481

  23. [23]

    Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2

    Liu S, Wang Z, Zhu R, Wang F, Cheng Y, Liu Y. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. Journal of Visualized Experiments (JoVE). 2021;(175):e62528

  24. [24]

    Exaggerated false positives by popular differential expression methods when analyzing human population samples

    Li Y, Ge X, Peng F, Li W, Li JJ. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome biology. 2022;23(1):79

  25. [25]

    An evaluation of RNA-seq differential analysis methods

    Li D, Zand MS, Dye TD, Goniewicz ML, Rahman I, Xie Z. An evaluation of RNA-seq differential analysis methods. PLoS One. 2022;17(9):e0264246

  26. [26]

    Differential anti-viral response to respiratory syncytial virus A in preterm and term infants

    Anderson J, Imran S, Ng YY, Wang T, Ashley S, Thang CM, et al. Differential anti-viral response to respiratory syncytial virus A in preterm and term infants. EBioMedicine. 2024;102. January 8, 2026 18/20

  27. [27]

    Mpox infection protects against re-challenge in rhesus macaques

    Aid M, Sciacca M, McMahan K, Hope D, Liu J, Jacob-Dolan C, et al. Mpox infection protects against re-challenge in rhesus macaques. Cell. 2023;186(21):4652–4661

  28. [28]

    Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans

    Cross RW, Speranza E, Borisevich V, Widen SG, Wood TG, Shim RS, et al. Comparative transcriptomics in Ebola Makona-infected ferrets, nonhuman primates, and humans. The Journal of infectious diseases. 2018;218(suppl 5):S486–S495

  29. [29]

    Dysregulated transcriptional responses to SARS-CoV-2 in the periphery

    McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery. Nature communications. 2021;12(1):1079

  30. [30]

    RNA sequencing of transplant-stage idiopathic pulmonary fibrosis lung reveals unique pathway regulation

    Sivakumar P, Thompson JR, Ammar R, Porteous M, McCoubrey C, Cantu III E, et al. RNA sequencing of transplant-stage idiopathic pulmonary fibrosis lung reveals unique pathway regulation. ERJ open research. 2019;5(3)

  31. [31]

    Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients

    Bibert S, Guex N, Lourenco J, Brahier T, Papadimitriou-Olivgeris M, Damonti L, et al. Transcriptomic signature differences between SARS-CoV-2 and influenza virus infected patients. Frontiers in immunology. 2021;12:666163

  32. [32]

    Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans

    Arunachalam PS, Wimmers F, Mok CKP, Perera RA, Scott M, Hagan T, et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science. 2020;369(6508):1210–1220

  33. [33]

    Dysregulated transcriptional responses to SARS-CoV-2 in the periphery support novel diagnostic approaches

    McClain MT, Constantine FJ, Henao R, Liu Y, Tsalik EL, Burke TW, et al. Dysregulated transcriptional responses to SARS-CoV-2 in the periphery support novel diagnostic approaches. medRxiv. 2020

  34. [34]

    CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death

    L´ evy Y, Wiedemann A, Hejblum BP, Durand M, Lefebvre C, Sur´ enaud M, et al. CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death. Iscience. 2021;24(7)

  35. [35]

    When to use the B onferroni correction

    Armstrong RA. When to use the B onferroni correction. Ophthalmic and physiological optics. 2014;34(5):502–508

  36. [36]

    Comparing sets of patterns with the Jaccard index

    Fletcher S, Islam MZ, et al. Comparing sets of patterns with the Jaccard index. Australasian Journal of Information Systems. 2018;22

  37. [37]

    Pearson correlation coefficient

    Cohen I, Huang Y, Chen J, Benesty J, Benesty J, Chen J, et al. Pearson correlation coefficient. Noise reduction in speech processing. 2009; p. 1–4

  38. [38]

    Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data

    De Winter JC, Gosling SD, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods. 2016;21(3):273

  39. [39]

    Principal component analysis

    Abdi H, Williams LJ. Principal component analysis. Wiley interdisciplinary reviews: computational statistics. 2010;2(4):433–459

  40. [40]

    On discriminative vs

    Ng A, Jordan M. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in neural information processing systems. 2001;14

  41. [41]

    Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models

    Yacouby R, Axman D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems; 2020. p. 79–91. January 8, 2026 19/20

  42. [42]

    Benchmarking optimization software with performance profiles

    Dolan ED, Mor´ e JJ. Benchmarking optimization software with performance profiles. Mathematical programming. 2002;91:201–213

  43. [43]

    What is an ROC curve?; 2017

    Hoo ZH, Candlish J, Teare D. What is an ROC curve?; 2017. January 8, 2026 20/20