Predicting Gene Expression Between Species with Neural Networks

Peter Eastman; Vijay S. Pande

arxiv: 1907.03041 · v1 · pith:MJR7JCJ2new · submitted 2019-07-05 · 🧬 q-bio.GN

Predicting Gene Expression Between Species with Neural Networks

Peter Eastman , Vijay S. Pande This is my paper

Pith reviewed 2026-05-25 01:51 UTC · model grok-4.3

classification 🧬 q-bio.GN

keywords neural networkgene expressioncross-species predictionrat humanTG-GATESmachine learningdifferential expressiontoxicology

0 comments

The pith

A neural network can translate rat gene expression to human gene expression for new compounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors train a neural network on paired rat and human gene expression measurements from the same compounds at matching doses. Tested on compounds withheld from training, the model produces human expression values that yield lists of differentially expressed genes matching those from real human experiments in most cases. This indicates the network has extracted a transferable mapping between the two species instead of fitting compound-specific patterns. The result matters because it shows machine learning can bridge species differences in transcriptomic response data without requiring new human samples for every test chemical.

Core claim

We train a neural network to predict human gene expression levels based on experimental data for rat cells. The network is trained with paired human/rat samples from the Open TG-GATES database, where paired samples were treated with the same compound at the same dose. When evaluated on a test set of held out compounds, the network successfully predicts human expression levels. On the majority of the test compounds, the list of differentially expressed genes determined from predicted expression levels agrees well with the list of differentially expressed genes determined from actual human experimental data.

What carries the argument

Neural network trained on paired rat-human samples to learn a cross-species mapping of gene expression levels.

If this is right

The network produces usable human expression predictions for compounds never seen in training.
Differentially expressed gene lists derived from the predictions align with experimental human lists on most test compounds.
The learned mapping generalizes beyond the specific training compounds in the database.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same paired-sample approach could be applied to other species pairs or additional omics layers such as proteomics.
If the mapping proves stable across doses and cell types, it might support virtual screening of large chemical libraries for human-relevant effects.
Retraining the network periodically on expanding databases could improve accuracy without changing the core architecture.

Load-bearing premise

The paired rat-human samples contain enough shared information for the network to learn a general, compound-independent mapping rather than memorizing training-specific patterns.

What would settle it

Evaluating the network on a fresh collection of held-out compounds and finding that the predicted differentially expressed gene lists match actual human data for fewer than half the compounds would falsify the central claim.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a neural net to paired TG-GATES rat-human data and claims it predicts human expression on held-out compounds, but the abstract supplies zero numbers or details so the result cannot be judged.

read the letter

The main takeaway is that they train a neural network on paired rat and human samples from the same compounds and doses in Open TG-GATES, then test on held-out compounds and say the predicted human expression produces DE gene lists that agree with real human data for most test cases. That setup uses the right kind of paired data and the right kind of test split for checking whether a mapping transfers to new chemicals.

Referee Report

3 major / 0 minor

Summary. The manuscript trains a neural network on paired rat-human gene expression samples from the Open TG-GATES database (same compounds and doses) to predict human expression levels from rat data. It reports that, on a test set of held-out compounds, the network successfully predicts human expression and that the resulting differentially expressed gene lists agree well with those derived from actual human experimental data.

Significance. If the central empirical claim holds after supplying quantitative metrics, architecture details, and controls for chemical similarity, the work would demonstrate a transferable cross-species mapping learned from paired toxicogenomics data. This could reduce reliance on human cell experiments in toxicology, but the current lack of evaluable numbers and controls makes the practical significance impossible to gauge from the provided text.

major comments (3)

[Abstract] Abstract: the central claim that the network 'successfully predicts' human expression levels and that DE gene lists 'agree well' supplies no quantitative metrics (Pearson r, RMSE, precision-recall on DE calls, or p-values), no error bars, and no baseline comparisons. Without these, the result cannot be assessed and the claim remains unevaluable.
[Abstract] Abstract / Methods (data split): no information is given on chemical structure similarity (Tanimoto coefficients, scaffold overlap, or clustering) between the training compounds and the held-out test compounds. If test compounds are structurally related to the training set, performance may reflect local interpolation rather than a compound-independent rat-to-human mapping, directly undermining the generalizability asserted in the abstract.
[Abstract] Abstract: the manuscript provides no description of network architecture, loss function, training procedure, regularization, or hyperparameter selection. These details are load-bearing for determining whether the model learned a general mapping or simply memorized compound-specific patterns.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to strengthen the presentation of results and methods.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the network 'successfully predicts' human expression levels and that DE gene lists 'agree well' supplies no quantitative metrics (Pearson r, RMSE, precision-recall on DE calls, or p-values), no error bars, and no baseline comparisons. Without these, the result cannot be assessed and the claim remains unevaluable.

Authors: We agree that the abstract (and manuscript) would be strengthened by explicit quantitative metrics. We will revise to report key statistics including average Pearson correlation and RMSE between predicted and measured human expression, overlap or precision-recall metrics for differentially expressed gene lists, and comparisons against baselines such as direct use of rat data or shuffled mappings, along with error bars from repeated training runs. revision: yes
Referee: [Abstract] Abstract / Methods (data split): no information is given on chemical structure similarity (Tanimoto coefficients, scaffold overlap, or clustering) between the training compounds and the held-out test compounds. If test compounds are structurally related to the training set, performance may reflect local interpolation rather than a compound-independent rat-to-human mapping, directly undermining the generalizability asserted in the abstract.

Authors: This point is well taken and directly relevant to claims of generalizability. In the revision we will compute and report chemical similarity measures (Tanimoto coefficients on Morgan fingerprints, scaffold overlap, and clustering) between the held-out test compounds and the training set, and discuss whether performance reflects interpolation or a broader mapping. revision: yes
Referee: [Abstract] Abstract: the manuscript provides no description of network architecture, loss function, training procedure, regularization, or hyperparameter selection. These details are load-bearing for determining whether the model learned a general mapping or simply memorized compound-specific patterns.

Authors: We will expand the Methods section to provide a complete description of the network architecture (layers, widths, activations), loss function, optimizer and training schedule, regularization techniques, and the hyperparameter selection procedure (including any cross-validation used). revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised ML on held-out compounds

full rationale

The paper trains a neural network on paired rat-human samples from Open TG-GATES for a set of compounds and reports performance on a disjoint test set of held-out compounds. The central claim (agreement between predicted and measured differentially expressed genes) is an empirical comparison against external human experimental data on those held-out compounds; it does not reduce by any equation or definition in the paper to a fitted parameter, self-citation chain, or input quantity. No self-definitional steps, fitted-input-as-prediction, or load-bearing self-citations are present. The result is therefore self-contained as a conventional train/test evaluation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical performance of a neural network whose weights are fitted to the training pairs; no additional free parameters beyond standard NN training are named. No new entities are postulated.

free parameters (1)

neural network weights
All model parameters are fitted to the paired training samples; their values are not reported.

axioms (1)

domain assumption The training and test compounds are drawn from the same underlying distribution so that generalization to held-out compounds is meaningful.
Implicit in any held-out test evaluation on the TG-GATES database.

pith-pipeline@v0.9.0 · 5603 in / 1254 out tokens · 24300 ms · 2026-05-25T01:51:02.990550+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We train a neural network to predict human gene expression levels based on experimental data for rat cells... fully connected neural network with one hidden layer of width 20,000 and rectified linear unit activation... 50% dropout... Adam optimizer
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat_equivNat unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

On the majority of the test compounds, the list of differentially expressed genes determined from predicted expression levels agrees well with the list... correlation coefficient of 0.697

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.