pith. sign in

arxiv: 1907.00118 · v1 · pith:3X3MWVAJnew · submitted 2019-06-28 · 🧬 q-bio.QM · cs.LG

Cellular State Transformations using Generative Adversarial Networks

Pith reviewed 2026-05-25 12:33 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.LG
keywords generative adversarial networkstranscriptomegene expressioncellular state transitionTSPGperturbationsdeep learning
0
0 comments X

The pith

A conditioned GAN generator can perturb gene expression profiles to simulate realistic transitions between cellular RNA states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that generative adversarial networks can be trained to take any input gene expression profile and produce a perturbed version that transitions toward a target state. These generated profiles match the statistical distribution of real samples in the dataset. The approach identifies the genes changed most by the generator and shows that the biological functions enriched among those genes align with expected patterns for the states involved. The resulting framework is called the Transcriptome State Perturbation Generator.

Core claim

A generator conditioned to perturb any input gene expression profile simulates a realistic transition between source and target RNA expression states. The perturbed samples follow a similar distribution to original samples from the dataset, also suggesting these are biologically meaningful perturbations. It is possible to identify the genes most positively and negatively perturbed by the generator and that the enriched biological function of the perturbed genes are realistic.

What carries the argument

The conditioned generator within the Transcriptome State Perturbation Generator (TSPG) GAN framework, which produces output profiles from input profiles and target state information.

If this is right

  • Key genes driving the transition can be extracted directly from the generator's output.
  • Enriched functions among the most perturbed genes match known biology for the states.
  • The method can reveal condition-defining gene expression patterns without requiring paired experimental data for every transition.
  • Perturbations remain within the distribution of the original dataset rather than producing arbitrary values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditioning approach could be tested on other high-dimensional biological measurements such as proteomics or metabolomics.
  • If the generator preserves distribution, it may serve as a tool for in silico hypothesis generation about how cells respond to specific inputs.
  • Direct comparison of generator outputs against time-series expression data from real transitions would provide a stronger test than distribution matching alone.

Load-bearing premise

Similarity in statistical distribution between generated and real samples is sufficient evidence that the perturbations are biologically meaningful and that the enriched functions of the most perturbed genes are realistic.

What would settle it

An experiment in which the genes identified as most perturbed by the generator show no corresponding change or incorrect functional enrichment when the same source-to-target transition is measured in actual biological samples would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.00118 by Benjamin T. Shealy, Colin Targonski, F. Alex Feltus, Melissa C. Smith.

Figure 1
Figure 1. Figure 1: Architecture of transcriptome state perturbation generator (TSPG). [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Adversarial generation for Nerve-Tibial target using the Hallmark Hedgehog Signaling gene set. t￾SNE plot of original and perturbed samples using the Hallmark Hedgehog Signaling gene set (left). Heatmap of cellular transformation from Brain-Hippocampus to Nerve-Tibial (right). Perturbation (P) ranges from [−1, 1], which is added to original sample (X), then adversarial example (xadv) is clipped to [0, 1]. … view at source ↗
Figure 3
Figure 3. Figure 3: Adversarial generation for Heart-Left Ventricle target using all Hallmark genes as the input gene set. t-SNE plot of original and perturbed samples using the all Hallmark genes (left). Heatmap of cellular transformations from Brain-Amygdala, Esophagus-Mucosa, Pancreas, and Thyroid to to Heart-Left Ventricle (right). Perturbations (P) range from [−1, 1], which is added to original sample (x), then adversari… view at source ↗
Figure 4
Figure 4. Figure 4: Adversarial generation for Muscle-Skeletal target using all Hallmark genes as the input gene set. t￾SNE plot of original and perturbed samples using the all Hallmark genes (left). Heatmap of cellular transformations from Brain-Cerebellum, Breast-Mammary, Liver, and Spleen to to Muscle-Skeletal (right). Perturbations (P) range from [−1, 1], which is added to original sample (x), then adversarial example (xa… view at source ↗
Figure 5
Figure 5. Figure 5: Adversarial generation for three subtypes of Kidney cancer using all Hallmark genes as the input gene set. t-SNE plot and corresponding heatmap of cellular transformation from healthy to KICH (A), healthy to KIRC (B), and healthy to KIRP (C). Perturbations (P) range from [−1, 1], which is added to original sample (X), then adversarial example (Xadv) is clipped to [0, 1]. The mean expression vector (µT ) of… view at source ↗
read the original abstract

We introduce a novel method to unite deep learning with biology by which generative adversarial networks (GANs) generate transcriptome perturbations and reveal condition-defining gene expression patterns. We find that a generator conditioned to perturb any input gene expression profile simulates a realistic transition between source and target RNA expression states. The perturbed samples follow a similar distribution to original samples from the dataset, also suggesting these are biologically meaningful perturbations. Finally, we show that it is possible to identify the genes most positively and negatively perturbed by the generator and that the enriched biological function of the perturbed genes are realistic. We call the framework the Transcriptome State Perturbation Generator (TSPG), which is open source software available at https://github.com/ctargon/TSPG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces the Transcriptome State Perturbation Generator (TSPG), a conditional GAN framework that takes source gene expression profiles and generates perturbations to simulate transitions to target cellular states. It asserts that the generated profiles match the statistical distribution of real samples from the dataset and that the genes most positively or negatively perturbed by the generator exhibit biologically realistic functional enrichments via GO analysis.

Significance. If the central claims hold after proper validation, TSPG could offer a generative tool for in silico exploration of transcriptomic state changes and hypothesis generation in systems biology. The open-source release of the software is a strength that supports potential reproducibility and extension by the community.

major comments (3)
  1. [Results] Results section (distribution similarity claim): the assertion that perturbed samples follow a similar distribution to original samples provides no quantitative metrics (e.g., MMD, Wasserstein distance), statistical tests, error bars, or baseline comparisons against alternative models or random perturbations, leaving the central claim of realistic transitions unverifiable from the presented evidence.
  2. [Results] Results section (GO enrichment): the functional enrichment of the most perturbed genes is presented without controls for dataset-specific co-expression modules or orthogonal validation such as overlap with experimentally measured differentially expressed genes for matched source-target pairs, so it does not establish that the perturbations capture condition-defining mechanisms rather than marginal distribution matching.
  3. [Methods] Methods section: insufficient detail is given on data splits, training/validation procedures, and hyperparameter choices, which is required to evaluate whether the supervised GAN respects regulatory structure or simply reproduces training-set statistics.
minor comments (1)
  1. [Abstract] Abstract: the description of the generator conditioning could be clarified with a brief reference to the specific conditioning mechanism used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for strengthening the validation and reproducibility of TSPG. We address each point below and have revised the manuscript to include the requested quantitative metrics, additional controls, and expanded methodological details.

read point-by-point responses
  1. Referee: [Results] Results section (distribution similarity claim): the assertion that perturbed samples follow a similar distribution to original samples provides no quantitative metrics (e.g., MMD, Wasserstein distance), statistical tests, error bars, or baseline comparisons against alternative models or random perturbations, leaving the central claim of realistic transitions unverifiable from the presented evidence.

    Authors: We agree that quantitative metrics are necessary to support the distribution similarity claim. In the revised manuscript we now report MMD and Wasserstein distances between generated and real target distributions, together with statistical tests, error bars from repeated runs, and explicit comparisons against random perturbations and a simple baseline model. These results are added to the Results section and supplementary figures. revision: yes

  2. Referee: [Results] Results section (GO enrichment): the functional enrichment of the most perturbed genes is presented without controls for dataset-specific co-expression modules or orthogonal validation such as overlap with experimentally measured differentially expressed genes for matched source-target pairs, so it does not establish that the perturbations capture condition-defining mechanisms rather than marginal distribution matching.

    Authors: The referee correctly notes the absence of controls. We have added (i) enrichment comparisons against co-expression modules derived from the same dataset and (ii) overlap analysis with published differentially expressed gene lists for the relevant source-to-target transitions. These controls are now presented in the revised Results section to better distinguish condition-specific signals from marginal matching. revision: yes

  3. Referee: [Methods] Methods section: insufficient detail is given on data splits, training/validation procedures, and hyperparameter choices, which is required to evaluate whether the supervised GAN respects regulatory structure or simply reproduces training-set statistics.

    Authors: We have substantially expanded the Methods section to specify the exact train/validation/test splits (including how samples were partitioned by condition or cell type), the training protocol with validation-based early stopping, the hyperparameter search procedure, and regularization choices intended to encourage learning of regulatory patterns rather than memorization of training statistics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard conditional GAN application on external data

full rationale

The paper applies a conditional GAN (TSPG) trained on transcriptomic datasets to generate perturbations that match source-to-target state transitions. The claim that generated samples follow a similar distribution is the explicit training objective of the adversarial setup and is presented as empirical validation rather than a first-principles derivation. No equations, self-citations, or ansatzes reduce any result to a definitionally equivalent input. The biological interpretation step is interpretive and does not create a self-definitional or fitted-input-called-prediction loop. The derivation chain remains independent of the target claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that statistical distribution matching between generated and real transcriptomes implies biological validity; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption GAN training converges to a generator whose output distribution matches the real data distribution sufficiently for downstream biological interpretation.
    Invoked when the abstract states that similar distributions suggest biologically meaningful perturbations.

pith-pipeline@v0.9.0 · 5657 in / 1218 out tokens · 29266 ms · 2026-05-25T12:33:31.087834+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 9 internal anchors

  1. [1]

    Rna sequencing and analysis

    Kimberly R Kukurba and Stephen B Montgomery. Rna sequencing and analysis. Cold Spring Harbor Protocols, 2015(11):pdb–top084970, 2015

  2. [2]

    K. E. Roche, M. Weinstein, L. J. Dunwoodie, W. L. Poehlman, and F. A. Feltus. Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.Sci Rep, 8(1):8180, May 2018

  3. [3]

    Gene expression inference with deep learning

    Yifei Chen, Yi Li, Rajiv Narayan, Aravind Subramanian, and Xiaohui Xie. Gene expression inference with deep learning. Bioinformatics, 32(12):1832–1839, 2016

  4. [4]

    Using neural networks for reducing the dimensions of single-cell rna-seq data

    Chieh Lin, Siddhartha Jain, Hannah Kim, and Ziv Bar-Joseph. Using neural networks for reducing the dimensions of single-cell rna-seq data. Nucleic acids research, 45(17):e156–e156, 2017

  5. [5]

    Deep learning

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015

  6. [6]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012

  7. [7]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016

  8. [8]

    Deep neural networks for acoustic modeling in speech recognition

    Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29, 2012

  9. [9]

    Teaching machines to read and comprehend

    Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. In Advances in neural information processing systems, pages 1693–1701, 2015

  10. [10]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

  11. [11]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems , pages 2672–2680, 2014

  12. [12]

    Image-to-image translation with conditional adversarial networks

    Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1125–1134, 2017

  13. [13]

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017

  14. [14]

    Large Scale GAN Training for High Fidelity Natural Image Synthesis

    Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018

  15. [15]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013

  16. [16]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014

  17. [17]

    Towards evaluating the robustness of neural networks

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP) , pages 39–57. IEEE, 2017

  18. [18]

    Delving into Transferable Adversarial Examples and Black-box Attacks

    Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016

  19. [19]

    Generating Adversarial Examples with Adversarial Networks

    Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610, 2018

  20. [20]

    Generative adversarial networks uncover epidermal regulators and predict single cell perturbations

    Arsham Ghahramani, Fiona M Watt, and Nicholas M Luscombe. Generative adversarial networks uncover epidermal regulators and predict single cell perturbations. bioRxiv, page 262501, 2018

  21. [21]

    Wasserstein GAN

    Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017

  22. [22]

    Semi-supervised generative adversarial network for gene expression inference

    Kamran Ghasedi Dizaji, Xiaoqian Wang, and Heng Huang. Semi-supervised generative adversarial network for gene expression inference. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages 1435–1444. ACM, 2018. 9 A PREPRINT - J ULY 2, 2019

  23. [23]

    J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, J. M. Stuart, K. Chang, C. J. Creighton, C. Davis, L. Donehower, J. Drummond, D. Wheeler, A. Ally, M. Balasun- daram, I. Birol, S. N. Butterfield, A. Chu, E. Chuah, H. J. Chun, N. Dhalla, R. Guin, M. Hirst, C. Hirst, R. A. Holt, S. J. Jones, D...

  24. [24]

    Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles

    Aravind Subramanian, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Benjamin L Ebert, Michael A Gillette, Amanda Paulovich, Scott L Pomeroy, Todd R Golub, Eric S Lander, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545–15550, 2005

  25. [25]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

  26. [26]

    Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murra...

  27. [27]

    Least squares generative adversarial networks

    Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision , pages 2794–2802, 2017. 10 A PREPRINT - J ULY 2, 2019

  28. [28]

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision , pages 2223–2232, 2017

  29. [29]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  30. [30]

    Visualizing data using t-sne

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008

  31. [31]

    Toppgene suite for gene list enrichment analysis and candidate gene prioritization

    Jing Chen, Eric E Bardes, Bruce J Aronow, and Anil G Jegga. Toppgene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic acids research, 37(suppl_2):W305–W311, 2009

  32. [32]

    The gudmap database–an online resource for genitourinary research

    Simon D Harding, Chris Armit, Jane Armstrong, Jane Brennan, Ying Cheng, Bernard Haggarty, Derek Houghton, Sue Lloyd-MacGilp, Xingjun Pi, Yogmatee Roochun, et al. The gudmap database–an online resource for genitourinary research. Development, 138(13):2845–2853, 2011

  33. [33]

    Annexins–unique membrane binding proteins with diverse functions.Journal of cell science, 117(13):2631–2639, 2004

    Ursula Rescher and V olker Gerke. Annexins–unique membrane binding proteins with diverse functions.Journal of cell science, 117(13):2631–2639, 2004

  34. [34]

    Annexins are instrumental for efficient plasma membrane repair in cancer cells

    Stine Prehn Lauritzen, Theresa Louise Boye, and Jesper Nylandsted. Annexins are instrumental for efficient plasma membrane repair in cancer cells. In Seminars in cell & developmental biology , volume 45, pages 32–38. Elsevier, 2015

  35. [35]

    Annexin a2 in renal cell carcinoma: expression, function, and prognostic significance

    Shun-Fa Yang, Han-Lin Hsu, Tai-Kuang Chao, Chia-Jung Hsiao, Yung-Feng Lin, and Chao-Wen Cheng. Annexin a2 in renal cell carcinoma: expression, function, and prognostic significance. In Urologic Oncology: Seminars and Original Investigations, volume 33, pages 22–e11. Elsevier, 2015

  36. [36]

    Estrogen-related receptorα is critical for the growth of estrogen receptor– negative breast cancer

    Rebecca A Stein, Ching-yi Chang, Dmitri A Kazmin, James Way, Thies Schroeder, Melanie Wergin, Mark W Dewhirst, and Donald P McDonnell. Estrogen-related receptorα is critical for the growth of estrogen receptor– negative breast cancer. Cancer research, 68(21):8805–8812, 2008

  37. [37]

    Expression analysis of the estrogen receptor target genes in renal cell carcinoma

    Zhihong Liu, You Lu, Zonghai He, Libo Chen, and Yiping Lu. Expression analysis of the estrogen receptor target genes in renal cell carcinoma. Molecular medicine reports, 11(1):75–82, 2015. 11