Detecting and Correcting Sample-by-Sample Scale Distortion in RNA Sequencing Data

Christopher Thron; Farhad Jafari

arxiv: 2605.22838 · v1 · pith:L7TJ4HHBnew · submitted 2026-05-10 · 🧬 q-bio.GN · math.OC· stat.AP

Detecting and Correcting Sample-by-Sample Scale Distortion in RNA Sequencing Data

Christopher Thron , Farhad Jafari This is my paper

Pith reviewed 2026-05-25 00:53 UTC · model grok-4.3

classification 🧬 q-bio.GN math.OCstat.AP

keywords RNA-seqbias correctionnormalizationgene correlationsubpopulation testsnonlinear transformscale distortionTCGA

0 comments

The pith

Nonlinear transforms remove sample-specific scale biases in RNA-seq data and boost the power of gene correlation and subpopulation analyses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies expression-level dependent biases that differ between samples in RNA-seq datasets and are not fixed by standard normalization. It proposes two nonlinear transforms derived from statistical considerations to correct these biases. Simulations show that the corrections reduce variance in gene-gene correlations and improve the sensitivity and specificity of tests comparing subpopulations by 3-5 percent. A sympathetic reader would care because better bias correction could lead to more reliable identification of gene relationships and disease determinants from existing large RNA-seq collections.

Core claim

Local averaging reveals per-sample, expression-level-dependent scale distortions in multiple RNA-seq datasets; two nonlinear transforms correct these distortions, thereby removing observed biases, lowering sample-to-sample variance, and sharpening gene-gene correlation distributions while increasing the sensitivity of subpopulation tests.

What carries the argument

Two nonlinear transforms based on statistical considerations that adjust for the detected expression-level-dependent biases on a per-sample basis.

If this is right

Corrected data yields gene-gene correlation distributions with improved characteristics.
Variability in two-population tests decreases after correction.
Sensitivity and specificity of subpopulation comparisons increase by roughly 3-5% in most cases.
Gene relationships become more reliably estimable from clinical RNA-seq tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These corrections could be applied retroactively to existing TCGA and GTEx datasets to re-evaluate prior findings on gene correlations.
If the biases are widespread, standard RNA-seq pipelines may need to incorporate similar per-sample nonlinear adjustments.
Testable extension: apply the transforms to new datasets and check if known biological correlations become stronger or more consistent.

Load-bearing premise

The observed local-average deviations truly represent correctable scale distortions rather than biological signal or uncorrectable noise.

What would settle it

Apply the transforms to a dataset where the true expression levels are known from an orthogonal method such as qPCR or spike-in controls and check whether the corrected values match the true values more closely than the original data.

Figures

Figures reproduced from arXiv: 2605.22838 by Christopher Thron, Farhad Jafari.

**Figure 1.** Figure 1: also shows d’Agostino p-values obtained from the distribution of 16-patient averages from the preprocessed TGCA bladder cancer dataset. Most of the curve is close to the y = x line, indicating consistency with normality. Still, about 10% of genes have p-values indistinguishable from 0 on the scale of the figure. But the figure also shows that if genes with mean transformed expression level below 3 are excl… view at source ↗

**Figure 2.** Figure 2: (a) Distribution of log transformed expression level for all genes for reference dataset. (b) Distribu [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Block deviations for selected patients computed according to (4), for log transformed TPM data. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Nonlinear scale distortion for a few patients as a function of their gene expression levels. On the [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Block averaged gene inter-patient expression level variance for the different processing methods as [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: (Top) Distributions of Spearman correlations for gene-gene pairs subject to four data transforms: log transformed TPM (abbreviated as TPM), log transformed TPM with shift(abbreviated as TPM-shifted), local leveled (LL), and nonlinear rescaled (NL). The different data transforms are explained in Sections 3.1 and 3.2. For comparative purposes, the correlation distribution for data which shuffles the per-gene… view at source ↗

**Figure 7.** Figure 7: Distributions of differences between pair-by-pair correlation differences as in Figure 6( [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Complementary CDFs (ccdfs) for 5000 simulated subpopulations of size 50 at different gene en [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Complementary CDFs for 5000 simulated subpopulations of size 10 at different gene enhancement [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Averaged ROC curves for 5000 simulations with randomly-chosen subpopulations of size 50 at [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Averaged ROC curves for 5000 simulated 2-population tests of size 10 at different gene enhancement [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: Detection rate by mean expression level for gene-enhanced subpopulations of size 50. Rows corre [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Averaged ROC curves for 5000 simulated 2-population [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14: ROC curves for 5000 simulated 2-population [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗

**Figure 15.** Figure 15: Averaged ROC curves for 5000 simulated single-patient single-gene [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗

**Figure 16.** Figure 16: Repeat of the test shown in Figure 15, but with each gene’s data scrambled separately across [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 17.** Figure 17: ROC curves for 5000 simulated single-patient tests for gene expression level differences, for 2,4,8 [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗

read the original abstract

RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used for population-based studies designed to identify genetic determinants of various diseases. Naturally, the accuracy of these tests should be verified and improved if possible. In this study, we aimed to detect and correct for expression level-dependent errors which vary from sample to sample, and are not corrected by conventional normalization techniques . We examined several RNA-seq datasets from the Cancer Genome Atlas (TCGA), Stand Up 2 Cancer (SU2C), and GTEx databases with various types of preprocessing. By applying local averaging, we found sample by sample expression-level dependent biases in all datasets studied. Using simulations, we show that these biases corrupt gene-gene correlation estimations and $t$ tests between subpopulations. To mitigate these biases, we introduce two different nonlinear transforms based on statistical considerations that correct these observed biases. We demonstrate that that these transforms effectively remove the observed per-sample biases, reduce sample-to-sample variance, and improve the characteristics of gene-gene correlation distributions. Using a novel simulation methodology that creates controlled differences between subpopulations, we show that these transforms reduce variability and increase sensitivity of two population tests. The improvements in sensitivity and specificity were of the order of 3-5\% in most instances after the data was corrected for bias. Altogether, these results improve our capacity to understand gene-gene relationships, and may lead to novel ways to utilize the information derived from clinical tests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags per-sample expression-level scale biases in RNA-seq that standard normalization misses and offers two nonlinear corrections, but the evidence rests on simulations with modest 3-5% gains and thin methodological detail.

read the letter

The core observation is that local averaging across TCGA, SU2C, and GTEx datasets reveals sample-specific, expression-dependent scale distortions not fixed by usual methods. They introduce two nonlinear transforms derived from statistical considerations and test them on simulated subpopulation differences, reporting reduced variance in gene correlations and 3-5% lifts in sensitivity/specificity for two population tests. Using multiple public datasets is a reasonable starting point, and the focus on downstream effects like correlation distributions and t-tests is practical for disease studies. The simulation approach that injects controlled subpopulation signals is at least an attempt to quantify impact rather than just showing bias removal on real data alone. That said, the abstract supplies no equations for the transforms, no simulation construction details, and only order-of-magnitude estimates for the gains, which makes it impossible to check whether the reported improvements are robust or whether the simulation setup implicitly favors the proposed corrections. The stress-test point about potential artifacts in how per-sample factors are generated looks like it needs direct examination in the methods. This is the kind of incremental normalization tweak that could matter for people running large RNA-seq cohorts, but only if the full paper supplies reproducible code or parameter-free derivations. I'd bring it to a reading group for the simulation design discussion but would not cite it yet. It deserves peer review to see whether the transforms hold up once the details are visible.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that local averaging across TCGA, SU2C, and GTEx RNA-seq datasets reveals persistent sample-by-sample, expression-level-dependent scale biases not removed by standard normalization; two novel nonlinear transforms are introduced to correct these biases; and a novel simulation injecting controlled subpopulation differences demonstrates that the corrections reduce sample-to-sample variance, improve gene-gene correlation distributions, and yield 3-5% gains in sensitivity and specificity for two-population tests.

Significance. If the transforms prove robust and the simulation faithfully models real biological variation, the work could modestly improve the reliability of correlation and differential analyses in population-scale RNA-seq studies. The empirical detection of biases across independent public databases is a positive feature, but the absence of explicit equations, simulation specifications, and quantitative validation metrics prevents assessment of whether the reported gains are generalizable or artifactual.

major comments (3)

[Abstract] Abstract: the central performance claim states that 'the improvements in sensitivity and specificity were of the order of 3-5% in most instances' yet supplies neither tables, exact percentages, nor any quantitative validation metrics (e.g., AUC, power curves, or pre/post-correction distributions), so the magnitude and reproducibility of the claimed benefit cannot be verified.
[Abstract and Methods] Abstract and Methods: the two nonlinear transforms are described only as 'based on statistical considerations' with no equations, pseudocode, or parameter definitions provided; without these, it is impossible to determine whether the transforms are parameter-free, how they are derived from the local averages, or whether they are guaranteed to remove the observed biases rather than overfit them.
[Simulation methodology] Simulation methodology (described in the abstract and results): the novel simulation that 'creates controlled differences between subpopulations' is not specified with respect to how per-sample scale factors are generated, how they interact with the injected biological signal, or whether the resulting count marginals match real RNA-seq distributions; this detail is load-bearing because the 3-5% gains rest entirely on this simulation, and any mismatch with real data would render the sensitivity/specificity improvements non-generalizable.

minor comments (2)

[Abstract] Abstract contains a duplicated word: 'We demonstrate that that these transforms'.
The manuscript does not indicate whether code or processed data underlying the local-averaging plots and simulations will be made available, which would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies opportunities to strengthen the clarity and reproducibility of the manuscript. We respond to each major comment below and will make the indicated revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim states that 'the improvements in sensitivity and specificity were of the order of 3-5% in most instances' yet supplies neither tables, exact percentages, nor any quantitative validation metrics (e.g., AUC, power curves, or pre/post-correction distributions), so the magnitude and reproducibility of the claimed benefit cannot be verified.

Authors: We agree that the abstract would be improved by greater specificity. The main text already contains tables and figures reporting exact pre- and post-correction sensitivity/specificity values, full distributions, and related metrics from the simulations. In the revised manuscript we will update the abstract to cite these specific percentages and explicitly direct readers to the relevant tables and figures. revision: yes
Referee: [Abstract and Methods] Abstract and Methods: the two nonlinear transforms are described only as 'based on statistical considerations' with no equations, pseudocode, or parameter definitions provided; without these, it is impossible to determine whether the transforms are parameter-free, how they are derived from the local averages, or whether they are guaranteed to remove the observed biases rather than overfit them.

Authors: The Methods section derives the transforms directly from the local-averaging procedure using statistical considerations of the observed per-sample biases. We will revise both the abstract and Methods to include the explicit equations, confirm that the transforms are parameter-free, and provide pseudocode. The results demonstrate that the transforms remove the biases across independent datasets without evidence of overfitting. revision: yes
Referee: [Simulation methodology] Simulation methodology (described in the abstract and results): the novel simulation that 'creates controlled differences between subpopulations' is not specified with respect to how per-sample scale factors are generated, how they interact with the injected biological signal, or whether the resulting count marginals match real RNA-seq distributions; this detail is load-bearing because the 3-5% gains rest entirely on this simulation, and any mismatch with real data would render the sensitivity/specificity improvements non-generalizable.

Authors: The Methods section specifies that scale factors are sampled from the empirical distribution of observed biases, applied multiplicatively to counts before subpopulation-specific expression changes are added, and that marginal count distributions are matched to real data by using empirical distributions from the TCGA, SU2C, and GTEx cohorts. We will expand this description with additional explicit steps and any necessary pseudocode to make the procedure fully reproducible. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical observations and separate simulations are self-contained

full rationale

The paper identifies expression-level-dependent biases via local averaging on independent public datasets (TCGA, SU2C, GTEx), proposes nonlinear transforms derived from statistical considerations to correct them, and validates effects on gene correlations and subpopulation tests using a novel simulation methodology that creates controlled differences. No equations or steps are shown to reduce by construction to fitted inputs or self-citations; the simulation is presented as an external test rather than a re-derivation of the transforms. This matches the default expectation of non-circularity for papers resting on external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to identify specific free parameters, axioms, or invented entities; the transforms are described only generically as based on statistical considerations without equations or parameter counts.

pith-pipeline@v0.9.0 · 5808 in / 1118 out tokens · 30063 ms · 2026-05-25T00:53:05.249453+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce two different nonlinear transforms based on statistical considerations that correct these observed biases... gnm = Gnm + βm(¯gn) + ϵnm (LL model) and Gnm = αm(gnm) + ϵnm (NL model) with polynomial αm, βm estimated by block regression.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

log2(bXj + c) ≈ log2(b) + log2(Xj) + c/(bXj) (multiplicative bias becomes additive shift after log)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium.Nature biotechnology, 32(9):903–914, 16 Figure 6: (Top) Distributions of Spearman correlations for gene-gene pairs subject to four data transforms: log transformed TPM (abbreviated as TPM), log tran...

work page 2014
[2]

Burguillo

Luis A Corchete, Elizabeta A Rojas, Diego Alonso-López, Javier De Las Rivas, Norma C Gutiérrez, and Francisco J. Burguillo. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis.Scientific Reports, 10(1):19737, 2020

work page 2020
[3]

Yance Feng and Lei M. Li. MUREN: A robust and multi-reference approach of RNA-seq transcripts normalization.BMC Bioinformatics, 22:386, 2021

work page 2021
[4]

Normalizationandvariancestabilizationofsingle-cellRNA-seq data using regularized negative binomial regression.Genome Biology, 20(1):296, 2019

ChristophHafemeisterandRahul.Satija. Normalizationandvariancestabilizationofsingle-cellRNA-seq data using regularized negative binomial regression.Genome Biology, 20(1):296, 2019

work page 2019
[5]

Montgomery

Kimberly R Kukurba and Stephen B. Montgomery. RNA sequencing and analysis.Cold Spring Harbor Protocols, 2015(11):pdb–top084970, 2015

work page 2015
[6]

Charity W Law, Yunshun Chen, Wei Shi, and Gordon K. Smyth. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.Genome Biology, 15:1–17, 2014

work page 2014
[7]

Comprehensive comparative analysis of strand-specific RNA sequencing methods.Nat Methods, 7(9):709–15, 2010

JZ Levin, M Yassour, X Adiconis, et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods.Nat Methods, 7(9):709–15, 2010

work page 2010
[8]

Normalization methods for the analysis of unbalanced transcriptome data: A review.Frontiers in Bioengineering and Biotechnology, 7, 2019

Xueyan Liu, Nan Li, Sheng Liu, Jun Wang, Ning Zhang, Xubin Zheng, Kwong-Sak Leung, and Lixin Cheng. Normalization methods for the analysis of unbalanced transcriptome data: A review.Frontiers in Bioengineering and Biotechnology, 7, 2019. 17 Figure 7: Distributions of differences between pair-by-pair correlation differences as in Figure 6(bottom), but rest...

work page 2019
[9]

Michael Love, Simon Anders, and Wolfgang. Huber. Differential analysis of count data–the DESeq2 package.Genome Biol, 15(550):10–1186, 2014

work page 2014
[10]

DJ McCarthy, KR Campbell, AT Lun, and QF. Wills. Scater: Pre-processing, quality control, normal- ization and visualization of single-cell RNA-seq data in R.Bioinformatics, 33(8):1179–86, 2017

work page 2017
[11]

MJ Nueda, S Tarazona, and A. Conesa. Next masigpro: updating masigpro bioconductor package for RNA-seq time series.Bioinformatics, 15;30(18):2598–602, 2014

work page 2014
[12]

H. Qin, H. Ni, Y. Liu, et al. RNA-binding proteins in tumor progression.J. Hematol. Oncol., 13 (90), 2020

work page 2020
[13]

A Roberts, C Trapnell, J Donaghey, JL Rinn, and L. Pachter. Improving RNA-Seq expression estimates by correcting for fragment bias.Genome Biol., 12(3), 2011

work page 2011
[14]

Rinn, and Lior

Adam Roberts, Cole Trapnell, Julie Donaghey, John L. Rinn, and Lior. Pachter. Improving rna-seq expression estimates by correcting for fragment bias.Genome Biology, 12(3):R22, 2011

work page 2011
[15]

Ascalingnormalizationmethodfordifferentialexpressionanalysis of RNA-seq data.Genome Biology, 11(3):1–9, 2010

MarkDRobinsonandAlicia.Oshlack. Ascalingnormalizationmethodfordifferentialexpressionanalysis of RNA-seq data.Genome Biology, 11(3):1–9, 2010

work page 2010
[16]

MD Robinson, DJ McCarthy, and GK. Smyth. edgeR: A bioconductor package for differential expression analysis of digital gene expression data.Bioinformatics, 26(1):139–40, 2010

work page 2010
[17]

MD Robinson and A. Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data.Genome Biol., 11(3), 2010

work page 2010
[18]

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA, 22(6):839–51, 2016

NJ Schurch, P Schofield, M Gierliński, C Cole, A Sherstnev, V Singh, N Wrobel, K Gharbi, GG Simpson, T Owen-Hughes, M Blaxter, and Barton GJ. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA, 22(6):839–51, 2016

work page 2016
[19]

Robinson

C Soneson, MI Love, and MD. Robinson. Differential analyses for rna-seq: transcript-level estimates improve gene-level inferences.F1000Research, 4:1521, 2016

work page 2016
[20]

Christopher Thron, Hannah Bergom, Ella Boytim, Mienie Roberts, Justin Hwang, and Farhad. Jafari. A simple bias reduction algorithm for RNA sequencing datasets.BioRxiv, pages 202–10, 2023

work page 2023
[21]

spike levels

Z Wang, M Gerstein, and M. Snyder. RN-Seq: a revolutionary tool for transcriptomics.Nat Rev Genet., 10(1):57–63, 2009. 18 Figure 8: Complementary CDFs (ccdfs) for 5000 simulated subpopulations of size 50 at different gene en- hancement levels (a.k.a. "spike levels"). For each curve, they-axis values give probability of genet-statistics exceeding the corre...

work page 2009
[22]

Weiss, Z.Z

S. Weiss, Z.Z. Xu, S Peddada, et al. Normalization and microbial differential abundance strategies depend upon data characteristics.Microbiome, 5(27), 2017

work page 2017
[23]

Yip, Panwen Wang, Jean-Pierre A

Shun H. Yip, Panwen Wang, Jean-Pierre A. Kocher, Pak Chung Sham, and Junwen. Wang. Lin- norm: improved statistical analysis for single cell RNA-seq expression data.Nucleic Acids Research, 45(22):e179–e179, 09 2017

work page 2017
[24]

L Zhao, J Wit, N Svetec, and DJ. Begun. Parallel gene expression differences between low and high latitude populations of drosophila melanogaster and d. simulans.PLoS Genet, 11(5):e1005184

work page
[25]

spike levels

Yingdong et al. Zhao. TPM or FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCl Patient-derived models repository.Journal of Translational Medicine, 19, 2021. 19 Figure 9: Complementary CDFs for 5000 simulated subpopulations of size 10 at different gene enhancement levels. Axes and erro...

work page 2021

[1] [1]

SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium.Nature biotechnology, 32(9):903–914, 16 Figure 6: (Top) Distributions of Spearman correlations for gene-gene pairs subject to four data transforms: log transformed TPM (abbreviated as TPM), log tran...

work page 2014

[2] [2]

Burguillo

Luis A Corchete, Elizabeta A Rojas, Diego Alonso-López, Javier De Las Rivas, Norma C Gutiérrez, and Francisco J. Burguillo. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis.Scientific Reports, 10(1):19737, 2020

work page 2020

[3] [3]

Yance Feng and Lei M. Li. MUREN: A robust and multi-reference approach of RNA-seq transcripts normalization.BMC Bioinformatics, 22:386, 2021

work page 2021

[4] [4]

Normalizationandvariancestabilizationofsingle-cellRNA-seq data using regularized negative binomial regression.Genome Biology, 20(1):296, 2019

ChristophHafemeisterandRahul.Satija. Normalizationandvariancestabilizationofsingle-cellRNA-seq data using regularized negative binomial regression.Genome Biology, 20(1):296, 2019

work page 2019

[5] [5]

Montgomery

Kimberly R Kukurba and Stephen B. Montgomery. RNA sequencing and analysis.Cold Spring Harbor Protocols, 2015(11):pdb–top084970, 2015

work page 2015

[6] [6]

Charity W Law, Yunshun Chen, Wei Shi, and Gordon K. Smyth. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.Genome Biology, 15:1–17, 2014

work page 2014

[7] [7]

Comprehensive comparative analysis of strand-specific RNA sequencing methods.Nat Methods, 7(9):709–15, 2010

JZ Levin, M Yassour, X Adiconis, et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods.Nat Methods, 7(9):709–15, 2010

work page 2010

[8] [8]

Normalization methods for the analysis of unbalanced transcriptome data: A review.Frontiers in Bioengineering and Biotechnology, 7, 2019

Xueyan Liu, Nan Li, Sheng Liu, Jun Wang, Ning Zhang, Xubin Zheng, Kwong-Sak Leung, and Lixin Cheng. Normalization methods for the analysis of unbalanced transcriptome data: A review.Frontiers in Bioengineering and Biotechnology, 7, 2019. 17 Figure 7: Distributions of differences between pair-by-pair correlation differences as in Figure 6(bottom), but rest...

work page 2019

[9] [9]

Michael Love, Simon Anders, and Wolfgang. Huber. Differential analysis of count data–the DESeq2 package.Genome Biol, 15(550):10–1186, 2014

work page 2014

[10] [10]

DJ McCarthy, KR Campbell, AT Lun, and QF. Wills. Scater: Pre-processing, quality control, normal- ization and visualization of single-cell RNA-seq data in R.Bioinformatics, 33(8):1179–86, 2017

work page 2017

[11] [11]

MJ Nueda, S Tarazona, and A. Conesa. Next masigpro: updating masigpro bioconductor package for RNA-seq time series.Bioinformatics, 15;30(18):2598–602, 2014

work page 2014

[12] [12]

H. Qin, H. Ni, Y. Liu, et al. RNA-binding proteins in tumor progression.J. Hematol. Oncol., 13 (90), 2020

work page 2020

[13] [13]

A Roberts, C Trapnell, J Donaghey, JL Rinn, and L. Pachter. Improving RNA-Seq expression estimates by correcting for fragment bias.Genome Biol., 12(3), 2011

work page 2011

[14] [14]

Rinn, and Lior

Adam Roberts, Cole Trapnell, Julie Donaghey, John L. Rinn, and Lior. Pachter. Improving rna-seq expression estimates by correcting for fragment bias.Genome Biology, 12(3):R22, 2011

work page 2011

[15] [15]

Ascalingnormalizationmethodfordifferentialexpressionanalysis of RNA-seq data.Genome Biology, 11(3):1–9, 2010

MarkDRobinsonandAlicia.Oshlack. Ascalingnormalizationmethodfordifferentialexpressionanalysis of RNA-seq data.Genome Biology, 11(3):1–9, 2010

work page 2010

[16] [16]

MD Robinson, DJ McCarthy, and GK. Smyth. edgeR: A bioconductor package for differential expression analysis of digital gene expression data.Bioinformatics, 26(1):139–40, 2010

work page 2010

[17] [17]

MD Robinson and A. Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data.Genome Biol., 11(3), 2010

work page 2010

[18] [18]

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA, 22(6):839–51, 2016

NJ Schurch, P Schofield, M Gierliński, C Cole, A Sherstnev, V Singh, N Wrobel, K Gharbi, GG Simpson, T Owen-Hughes, M Blaxter, and Barton GJ. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA, 22(6):839–51, 2016

work page 2016

[19] [19]

Robinson

C Soneson, MI Love, and MD. Robinson. Differential analyses for rna-seq: transcript-level estimates improve gene-level inferences.F1000Research, 4:1521, 2016

work page 2016

[20] [20]

Christopher Thron, Hannah Bergom, Ella Boytim, Mienie Roberts, Justin Hwang, and Farhad. Jafari. A simple bias reduction algorithm for RNA sequencing datasets.BioRxiv, pages 202–10, 2023

work page 2023

[21] [21]

spike levels

Z Wang, M Gerstein, and M. Snyder. RN-Seq: a revolutionary tool for transcriptomics.Nat Rev Genet., 10(1):57–63, 2009. 18 Figure 8: Complementary CDFs (ccdfs) for 5000 simulated subpopulations of size 50 at different gene en- hancement levels (a.k.a. "spike levels"). For each curve, they-axis values give probability of genet-statistics exceeding the corre...

work page 2009

[22] [22]

Weiss, Z.Z

S. Weiss, Z.Z. Xu, S Peddada, et al. Normalization and microbial differential abundance strategies depend upon data characteristics.Microbiome, 5(27), 2017

work page 2017

[23] [23]

Yip, Panwen Wang, Jean-Pierre A

Shun H. Yip, Panwen Wang, Jean-Pierre A. Kocher, Pak Chung Sham, and Junwen. Wang. Lin- norm: improved statistical analysis for single cell RNA-seq expression data.Nucleic Acids Research, 45(22):e179–e179, 09 2017

work page 2017

[24] [24]

L Zhao, J Wit, N Svetec, and DJ. Begun. Parallel gene expression differences between low and high latitude populations of drosophila melanogaster and d. simulans.PLoS Genet, 11(5):e1005184

work page

[25] [25]

spike levels

Yingdong et al. Zhao. TPM or FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCl Patient-derived models repository.Journal of Translational Medicine, 19, 2021. 19 Figure 9: Complementary CDFs for 5000 simulated subpopulations of size 10 at different gene enhancement levels. Axes and erro...

work page 2021