pith. sign in

arxiv: 2604.13471 · v1 · submitted 2026-04-15 · 💻 cs.LG

Computational framework for multistep metabolic pathway design

Pith reviewed 2026-05-10 13:41 UTC · model grok-4.3

classification 💻 cs.LG
keywords retrobiosynthesismetabolic pathway designneural network classificationenzymatic templatesdata augmentationde novo pathway designcomputational biosynthesisxenobiotic synthesis
0
0 comments X

The pith

Neural network classifiers trained on real versus template-generated reactions enable a multistep retrobiosynthesis pipeline that reproduces natural and non-natural metabolic pathways.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles metabolic reactions and enzymatic templates from public databases and augments the data with artificial reactions created by the templates. Two neural network models are trained as binary classifiers to score the plausibility of 1-step and 2-step pathways by separating assembled real reactions from the artificial ones. These models are then combined with the enzymatic templates to form a retrobiosynthesis search procedure. The resulting pipeline is shown to computationally recover selected natural and non-natural pathways, providing an in silico method for generating hypotheses in de novo metabolic pathway design.

Core claim

We assembled metabolic reaction and enzymatic template data from public databases, carried out a data augmentation procedure to enrich the dataset with artificial metabolic reactions generated by enzymatic reaction templates, trained two neural network-based pathway ranking models as binary classifiers to distinguish assembled reactions from artificial counterparts, and integrated the models with enzymatic templates into a multistep retrobiosynthesis pipeline that reproduces some natural and non-natural pathways computationally.

What carries the argument

Two neural network-based pathway ranking models that each output a scalar plausibility score for a 1-step or 2-step pathway, combined with enzymatic templates for retrobiosynthetic search.

If this is right

  • The framework supports exploration of alternatives in de novo metabolic pathway design by scoring pathway steps.
  • Integration of the 1-step and 2-step models allows construction of longer pathways from individual reaction steps.
  • Validation through reproduction of known pathways indicates the approach can identify routes that align with existing biochemical knowledge.
  • The method extends traditional retrobiosynthetic workflows by using learned plausibility scores rather than purely rule-based matching.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the scoring generalizes beyond the training distribution, the pipeline could prioritize candidate pathways for experimental testing in synthetic biology applications.
  • The same augmentation-plus-classifier pattern might be adapted to other retrosynthesis domains where template rules exist but plausibility filtering is needed.
  • Combining the models into a search that builds pathways step by step suggests it could be extended to longer chains by chaining additional step-wise scores.

Load-bearing premise

Neural network scores trained to separate real assembled reactions from template-generated artificial ones will reliably pick out biologically plausible multistep pathways during retrobiosynthetic search.

What would settle it

Running the pipeline on a larger set of known natural pathways and finding that many cannot be recovered or that it proposes routes contradicted by experimental literature would show the scoring does not generalize to multistep plausibility.

Figures

Figures reproduced from arXiv: 2604.13471 by Jeffrey D. Varner, Peter Zhiping Zhang.

Figure 1
Figure 1. Figure 1: A general retrobiosynthesis workflow consists of two parts. In network generation, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architectures of the neural network pathway ranking models. (a) NN1PR: a 1-step [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Proposed multistep metabolic pathway design pipeline with NN1PR and NN2PR as ranking [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance comparison on testing data. (a) 1-step ranking: NN1PR vs. Tanimoto similarity [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ranks assigned by NN1PR to the BDO pathway reported in [ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ranks assigned by NN1PR for glycolysis pathway in a backward manner. 234501 templates [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ranks assigned by NN1PR for the naloxone pathway designed by human expert in a [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Three candidates for BDO synthesis found by the proposed pipeline. At the top is the [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

In silico tools are important for generating novel hypotheses and exploring alternatives in de novo metabolic pathway design. However, while many computational frameworks have been proposed for retrobiosynthesis, few successful examples of algorithm-guided xenobiotic biochemical retrosynthesis have been reported in the literature. Deep learning has improved the quality of synthesis and retrosynthesis in organic chemistry applications. Inspired by this progress, we explored combining deep learning of biochemical transformations with the traditional retrobiosynthetic workflow to improve in silico synthetic metabolic pathway designs. To develop our computational biosynthetic pathway design framework, we assembled metabolic reaction and enzymatic template data from public databases. A data augmentation procedure, adapted from literature, was carried out to enrich the assembled reaction dataset with artificial metabolic reactions generated by enzymatic reaction templates. Two neural network-based pathway ranking models were trained as binary classifiers to distinguish assembled reactions from artificial counterparts; each model output a scalar quantifying the plausibility of a 1-step or 2-step pathway. Combining these two models with enzymatic templates, we built a multistep retrobiosynthesis pipeline and validated it by reproducing some natural and non-natural pathways computationally.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript presents a computational framework for multistep metabolic pathway design that assembles metabolic reaction and enzymatic template data from public databases, augments the dataset with artificial reactions generated via the same templates, trains two neural network binary classifiers (for 1-step and 2-step pathways) to distinguish real from artificial reactions, and integrates the resulting plausibility scores into a retrobiosynthesis pipeline. The framework is validated by computationally reproducing some natural and non-natural pathways.

Significance. If the neural network ranking models prove to capture biological plausibility beyond template artifacts and the pipeline demonstrates measurable improvements over existing retrobiosynthesis methods, the work could advance in silico hypothesis generation for metabolic engineering and xenobiotic synthesis. The approach of combining template-based generation with learned ranking is a natural extension of deep learning successes in organic retrosynthesis, and the data augmentation strategy is a reasonable starting point. However, the current evidence consists only of qualitative reproduction of pathways without metrics or baselines, limiting the assessed impact.

major comments (3)
  1. [Abstract] Abstract: The central claim of an 'improved' multistep retrobiosynthesis pipeline is not supported by any quantitative results. No classification accuracies, pathway reproduction rates, number of pathways tested, success criteria, or comparisons to baseline template-only searches are reported, making it impossible to determine whether the neural network scores contribute to the pipeline or whether the reproduction exceeds what template enumeration alone would achieve.
  2. [Model training and data augmentation description] Model training and data augmentation description: The negative examples for the binary classifiers are generated by applying the identical enzymatic templates later used for candidate generation in the retrobiosynthesis search. This creates a risk that the models learn template-application artifacts (e.g., consistent atom-mapping conventions, bond-change signatures, or substrate patterns) rather than intrinsic metabolic feasibility. When these scores are then used to rank pathways inside the same template-driven search, the ranking may reduce to a measure of template conformity instead of biological plausibility, undermining generalization to novel or non-natural pathways.
  3. [Pipeline integration section] Pipeline integration section: No description is given of how the 1-step and 2-step neural network scores are combined during multistep search, what search algorithm or beam/pruning strategy is employed, how pathway length is handled, or how conflicts between the two models are resolved. Without these details the multistep claim cannot be evaluated.
minor comments (1)
  1. [Abstract] The abstract states that the data augmentation procedure was 'adapted from literature' but provides no specific citation or description of the adaptation, which would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major point below and outline the revisions we will make to improve clarity and address concerns.

read point-by-point responses
  1. Referee: [Abstract] The central claim of an 'improved' multistep retrobiosynthesis pipeline is not supported by any quantitative results. No classification accuracies, pathway reproduction rates, number of pathways tested, success criteria, or comparisons to baseline template-only searches are reported, making it impossible to determine whether the neural network scores contribute to the pipeline or whether the reproduction exceeds what template enumeration alone would achieve.

    Authors: We agree that the abstract implies improvement without sufficient supporting metrics in the summary. We will revise the abstract to describe the work as an exploratory framework that integrates neural ranking with template-based retrobiosynthesis, validated via computational reproduction of pathways, without claiming quantitative superiority. We will also add key metrics (model accuracies, number of pathways tested, and reproduction rates) to the abstract and results section, along with a baseline comparison to template-only enumeration. revision: yes

  2. Referee: The negative examples for the binary classifiers are generated by applying the identical enzymatic templates later used for candidate generation in the retrobiosynthesis search. This creates a risk that the models learn template-application artifacts (e.g., consistent atom-mapping conventions, bond-change signatures, or substrate patterns) rather than intrinsic metabolic feasibility. When these scores are then used to rank pathways inside the same template-driven search, the ranking may reduce to a measure of template conformity instead of biological plausibility, undermining generalization to novel or non-natural pathways.

    Authors: This is a substantive concern regarding potential overfitting to template artifacts. Our augmentation strategy follows prior literature to create plausible negatives, with positives drawn from curated databases to encourage learning of real reaction features. We will add a limitations paragraph in the discussion explicitly acknowledging this risk and proposing future mitigations such as held-out test sets or alternative negative sampling. No changes to the core method are needed, but the clarification strengthens the presentation. revision: partial

  3. Referee: No description is given of how the 1-step and 2-step neural network scores are combined during multistep search, what search algorithm or beam/pruning strategy is employed, how pathway length is handled, or how conflicts between the two models are resolved. Without these details the multistep claim cannot be evaluated.

    Authors: We appreciate this feedback on missing implementation details. In the revised manuscript, we have expanded the pipeline integration section to specify: scores are combined via a weighted sum (with higher weight on 2-step predictions for longer segments), a beam search of width 5 with cumulative score pruning is employed, pathways are extended iteratively up to a user-defined maximum length, and model conflicts are resolved by selecting the higher plausibility score for candidate extensions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the multistep retrobiosynthesis pipeline

full rationale

The paper assembles metabolic reaction and enzymatic template data from public databases, augments the dataset with artificial reactions generated by applying the templates, and trains two neural network binary classifiers to distinguish real assembled reactions from these artificial counterparts. The resulting scalar plausibility scores for 1-step and 2-step pathways are then combined with the same enzymatic templates to form the retrobiosynthesis pipeline, which is validated by computationally reproducing some natural and non-natural pathways. No equations, fitted parameters, or self-citations are described that would make the pipeline outputs or pathway rankings definitionally equivalent to the training inputs by construction. The training distinction relies on an external database of real reactions, and the validation provides an independent check rather than a tautological reproduction of fitted values.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that artificial reactions generated by enzymatic templates form a useful negative class for training, plus the implicit claim that public database reactions are sufficiently complete and accurate; no new physical entities are introduced.

free parameters (1)
  • neural network parameters
    Weights and biases of the two binary classifiers are fitted to the augmented reaction dataset.
axioms (1)
  • domain assumption Template-generated artificial reactions are representative of implausible biochemical transformations
    This distinction is used to create the training labels for the ranking models.

pith-pipeline@v0.9.0 · 5487 in / 1341 out tokens · 30788 ms · 2026-05-10T13:41:49.561687+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 1 internal anchor

  1. [1]

    Tensorflow: A system for large-scale machine learning

    Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In12th {USENIX} symposium on operating systems design and implementation ({OSDI}16), pages 265–283, 2016

  2. [2]

    Event extraction for systems biology by text mining the literature.Trends in biotechnology, 28(7):381–390, 2010

    Sophia Ananiadou, Sampo Pyysalo, Jun’ichi Tsujii, and Douglas B Kell. Event extraction for systems biology by text mining the literature.Trends in biotechnology, 28(7):381–390, 2010

  3. [3]

    Reconciliation of metabolites and biochemical reactions for metabolic networks.Brief- ings in bioinformatics, 15(1):123–135, 2014

    Thomas Bernard, Alan Bridge, Anne Morgat, Sébastien Moretti, Ioannis Xenarios, and Marco Pagni. Reconciliation of metabolites and biochemical reactions for metabolic networks.Brief- ings in bioinformatics, 15(1):123–135, 2014

  4. [4]

    Stereo signature molecular descriptor

    Pablo Carbonell, Lars Carlsson, and Jean-Loup Faulon. Stereo signature molecular descriptor. Journal of chemical information and modeling, 53(4):887–897, 2013

  5. [5]

    Deep learning with python, 2017

    François Chollet. Deep learning with python, 2017. 12

  6. [6]

    Convolutional embedding of attributed molecular graphs for physical property prediction

    Connor W Coley, Regina Barzilay, William H Green, Tommi S Jaakkola, and Klavs F Jensen. Convolutional embedding of attributed molecular graphs for physical property prediction. Journal of chemical information and modeling, 57(8):1757–1772, 2017

  7. [7]

    Prediction of organic reaction outcomes using machine learning.ACS central science, 3(5):434– 443, 2017

    Connor W Coley, Regina Barzilay, Tommi S Jaakkola, William H Green, and Klavs F Jensen. Prediction of organic reaction outcomes using machine learning.ACS central science, 3(5):434– 443, 2017

  8. [8]

    A graph-convolutional neural network model for the prediction of chemical reactivity.Chemical science, 10(2):370–377, 2019

    Connor W Coley, Wengong Jin, Luke Rogers, Timothy F Jamison, Tommi S Jaakkola, William H Green, Regina Barzilay, and Klavs F Jensen. A graph-convolutional neural network model for the prediction of chemical reactivity.Chemical science, 10(2):370–377, 2019

  9. [9]

    Computer-assisted retrosynthesis based on molecular similarity.ACS central science, 3(12):1237–1245, 2017

    Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. Computer-assisted retrosynthesis based on molecular similarity.ACS central science, 3(12):1237–1245, 2017

  10. [10]

    Arthur Dalby, James G Nourse, W Douglas Hounshell, Ann KI Gushurst, David L Grier, Burton A Leland, and John Laufer. Description of several chemical structure file formats used by computer programs developed at molecular design limited.Journal of chemical information and computer sciences, 32(3):244–255, 1992

  11. [11]

    Daylight Chemical Information Systems, Inc. 3. smiles - a simplified chemical language. https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html, 2020. [On- line; accessed 17-September-2020]

  12. [12]

    Daylight Chemical Information Systems, Inc. 4. smarts - a language for describing molecu- lar patterns. https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html,

  13. [13]

    [Online; accessed 17-September-2020]

  14. [14]

    Retropath2

    Baudoin Delépine, Thomas Duigou, Pablo Carbonell, and Jean-Loup Faulon. Retropath2. 0: A retrosynthesis workflow for metabolic engineers.Metabolic engineering, 45:158–170, 2018

  15. [15]

    Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods

    Jianxin Duan, Steven L Dixon, Jeffrey F Lowrie, and Woody Sherman. Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods. Journal of Molecular Graphics and Modelling, 29(2):157–170, 2010

  16. [16]

    Retrorules: a database of reaction rules for engineering biology.Nucleic acids research, 47(D1):D1229– D1235, 2019

    Thomas Duigou, Melchior Du Lac, Pablo Carbonell, and Jean-Loup Faulon. Retrorules: a database of reaction rules for engineering biology.Nucleic acids research, 47(D1):D1229– D1235, 2019

  17. [17]

    Convolutional networks on graphs for learning molecular fingerprints

    David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. InAdvances in neural information processing systems, pages 2224– 2232, 2015

  18. [18]

    EMBL-EBI. Chebi. https://www.ebi.ac.uk/chebi/, 2020. [Online; accessed 24- September-2020]

  19. [19]

    Computational framework for predictive biodegradation.Biotechnology and bioengineering, 104(6):1086–1097, 2009

    Stacey D Finley, Linda J Broadbelt, and Vassily Hatzimanikatis. Computational framework for predictive biodegradation.Biotechnology and bioengineering, 104(6):1086–1097, 2009

  20. [20]

    In silico feasibility of novel biodegradation pathways for 1, 2, 4-trichlorobenzene.BMC systems biology, 4(1):7, 2010

    Stacey D Finley, Linda J Broadbelt, and Vassily Hatzimanikatis. In silico feasibility of novel biodegradation pathways for 1, 2, 4-trichlorobenzene.BMC systems biology, 4(1):7, 2010

  21. [21]

    Metanetx

    Mathias Ganter, Thomas Bernard, Sébastien Moretti, Joerg Stelling, and Marco Pagni. Metanetx. org: a website and repository for accessing, analysing and manipulating metabolic networks. Bioinformatics, 29(6):815–816, 2013

  22. [22]

    The synthesizability of molecules proposed by generative models.Journal of chemical information and modeling, 60(12):5714–5723, 2020

    Wenhao Gao and Connor W Coley. The synthesizability of molecules proposed by generative models.Journal of chemical information and modeling, 60(12):5714–5723, 2020

  23. [23]

    O’Reilly Media, 2019

    Aurélien Géron.Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, 2019. 13

  24. [24]

    Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

    Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

  25. [25]

    MIT press Cambridge, 2016

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep learning, volume 1. MIT press Cambridge, 2016

  26. [26]

    Atlas of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies.ACS synthetic biology, 5(10):1155–1166, 2016

    Noushin Hadadi, Jasmin Hafner, Adrian Shajkofci, Aikaterini Zisaki, and Vassily Hatzi- manikatis. Atlas of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies.ACS synthetic biology, 5(10):1155–1166, 2016

  27. [27]

    Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways.Current opinion in chemical biology, 28:99–104, 2015

    Noushin Hadadi and Vassily Hatzimanikatis. Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways.Current opinion in chemical biology, 28:99–104, 2015

  28. [28]

    Updated atlas of biochemistry with new metabolites and improved enzyme prediction power.ACS Synthetic Biology, 9(6):1479–1482, 2020

    Jasmin Hafner, Homa MohammadiPeyhani, Anastasia Sveshnikova, Alan Scheidegger, and Vassily Hatzimanikatis. Updated atlas of biochemistry with new metabolites and improved enzyme prediction power.ACS Synthetic Biology, 9(6):1479–1482, 2020

  29. [29]

    Masahiro Hattori, Yasushi Okuno, Susumu Goto, and Minoru Kanehisa. Development of a chem- ical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways.Journal of the American Chemical Society, 125(39):11853–11865, 2003

  30. [30]

    Exploring the diversity of complex metabolic networks.Bioinformatics, 21(8):1603–1609, 2005

    Vassily Hatzimanikatis, Chunhui Li, Justin A Ionita, Christopher S Henry, Matthew D Jankowski, and Linda J Broadbelt. Exploring the diversity of complex metabolic networks.Bioinformatics, 21(8):1603–1609, 2005

  31. [31]

    Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate

    Christopher S Henry, Linda J Broadbelt, and Vassily Hatzimanikatis. Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnology and bioengineering, 106(3):462–473, 2010

  32. [32]

    Craig A. James. Opensmiles specification, 2020. [Online; accessed 16-September-2020]

  33. [33]

    Structure-based synthesizability prediction of crystals using partially supervised learning.Journal of the American Chemical Society, 142(44):18836–18843, 2020

    Jidon Jang, Geun Ho Gu, Juhwan Noh, Juhwan Kim, and Yousung Jung. Structure-based synthesizability prediction of crystals using partially supervised learning.Journal of the American Chemical Society, 142(44):18836–18843, 2020

  34. [34]

    Predicting organic reaction outcomes with weisfeiler-lehman network

    Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with weisfeiler-lehman network. InAdvances in Neural Information Processing Systems, pages 2607–2616, 2017

  35. [35]

    Toward pathway engineering: a new database of genetic and molecular pathways.Sci

    Minoru Kanehisa. Toward pathway engineering: a new database of genetic and molecular pathways.Sci. Technol. Jap., 59:34–38, 1996

  36. [36]

    A database for post-genome analysis.Trends Genet., 13:375–376, 1997

    Minoru Kanehisa. A database for post-genome analysis.Trends Genet., 13:375–376, 1997

  37. [37]

    Toward understanding the origin and evolution of cellular organisms.Protein Science, 28(11):1947–1951, 2019

    Minoru Kanehisa. Toward understanding the origin and evolution of cellular organisms.Protein Science, 28(11):1947–1951, 2019

  38. [38]

    Kegg for linking genomes to life and the environment.Nucleic acids research, 36(suppl_1):D480– D484, 2007

    Minoru Kanehisa, Michihiro Araki, Susumu Goto, Masahiro Hattori, Mika Hirakawa, Masumi Itoh, Toshiaki Katayama, Shuichi Kawashima, Shujiro Okuda, Toshiaki Tokimatsu, et al. Kegg for linking genomes to life and the environment.Nucleic acids research, 36(suppl_1):D480– D484, 2007

  39. [39]

    Kegg: new perspectives on genomes, pathways, diseases and drugs.Nucleic acids research, 45(D1):D353– D361, 2017

    Minoru Kanehisa, Miho Furumichi, Mao Tanabe, Yoko Sato, and Kanae Morishima. Kegg: new perspectives on genomes, pathways, diseases and drugs.Nucleic acids research, 45(D1):D353– D361, 2017

  40. [40]

    Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000

    Minoru Kanehisa and Susumu Goto. Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000. 14

  41. [41]

    Kegg for representation and analysis of molecular networks involving diseases and drugs.Nucleic acids research, 38(suppl_1):D355–D360, 2010

    Minoru Kanehisa, Susumu Goto, Miho Furumichi, Mao Tanabe, and Mika Hirakawa. Kegg for representation and analysis of molecular networks involving diseases and drugs.Nucleic acids research, 38(suppl_1):D355–D360, 2010

  42. [42]

    From genomics to chemical genomics: new developments in kegg.Nucleic acids research, 34(suppl_1):D354– D357, 2006

    Minoru Kanehisa, Susumu Goto, Masahiro Hattori, Kiyoko F Aoki-Kinoshita, Masumi Itoh, Shuichi Kawashima, Toshiaki Katayama, Michihiro Araki, and Mika Hirakawa. From genomics to chemical genomics: new developments in kegg.Nucleic acids research, 34(suppl_1):D354– D357, 2006

  43. [43]

    The kegg databases at genomenet.Nucleic acids research, 30(1):42–46, 2002

    Minoru Kanehisa, Susumu Goto, Shuichi Kawashima, and Akihiro Nakaya. The kegg databases at genomenet.Nucleic acids research, 30(1):42–46, 2002

  44. [44]

    The kegg resource for deciphering the genome.Nucleic acids research, 32(suppl_1):D277– D280, 2004

    Minoru Kanehisa, Susumu Goto, Shuichi Kawashima, Yasushi Okuno, and Masahiro Hattori. The kegg resource for deciphering the genome.Nucleic acids research, 32(suppl_1):D277– D280, 2004

  45. [45]

    Kegg for integration and interpretation of large-scale molecular data sets.Nucleic acids research, 40(D1):D109–D114, 2012

    Minoru Kanehisa, Susumu Goto, Yoko Sato, Miho Furumichi, and Mao Tanabe. Kegg for integration and interpretation of large-scale molecular data sets.Nucleic acids research, 40(D1):D109–D114, 2012

  46. [46]

    Data, information, knowledge and principle: back to metabolism in kegg.Nucleic acids research, 42(D1):D199–D205, 2014

    Minoru Kanehisa, Susumu Goto, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. Data, information, knowledge and principle: back to metabolism in kegg.Nucleic acids research, 42(D1):D199–D205, 2014

  47. [47]

    New approach for understanding genome variations in kegg.Nucleic acids research, 47(D1):D590– D595, 2019

    Minoru Kanehisa, Yoko Sato, Miho Furumichi, Kanae Morishima, and Mao Tanabe. New approach for understanding genome variations in kegg.Nucleic acids research, 47(D1):D590– D595, 2019

  48. [48]

    Kegg as a reference resource for gene and protein annotation.Nucleic acids research, 44(D1):D457– D462, 2016

    Minoru Kanehisa, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. Kegg as a reference resource for gene and protein annotation.Nucleic acids research, 44(D1):D457– D462, 2016

  49. [49]

    Learning to predict chemical reactions.Journal of chemical information and modeling, 51(9):2209–2222, 2011

    Matthew A Kayala, Chloé-Agathe Azencott, Jonathan H Chen, and Pierre Baldi. Learning to predict chemical reactions.Journal of chemical information and modeling, 51(9):2209–2222, 2011

  50. [50]

    Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning.Journal of chemical information and modeling, 52(10):2526–2540, 2012

    Matthew A Kayala and Pierre Baldi. Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning.Journal of chemical information and modeling, 52(10):2526–2540, 2012

  51. [51]

    Molecular graph convolutions: moving beyond fingerprints.Journal of computer-aided molecular design, 30(8):595–608, 2016

    Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular graph convolutions: moving beyond fingerprints.Journal of computer-aided molecular design, 30(8):595–608, 2016

  52. [52]

    Manufacturing molecules through metabolic engineering.Science, 330(6009):1355–1358, 2010

    Jay D Keasling. Manufacturing molecules through metabolic engineering.Science, 330(6009):1355–1358, 2010

  53. [53]

    Untersuchungen über aromatische verbindungen ueber die constitution der aromatischen verbindungen

    Aug Kekuié. Untersuchungen über aromatische verbindungen ueber die constitution der aromatischen verbindungen. i. ueber die constitution der aromatischen verbindungen.Justus Liebigs Annalen der Chemie, 137(2):129–196, 1866

  54. [54]

    Sur la constitution des substances aromatiques.Bulletin mensuel de la Société Chimique de Paris, 3:98, 1865

    Auguste Kekulé. Sur la constitution des substances aromatiques.Bulletin mensuel de la Société Chimique de Paris, 3:98, 1865

  55. [55]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  56. [56]

    Reinforcement learning for bioret- rosynthesis.ACS Synthetic Biology, 9(1):157–168, 2019

    Mathilde Koch, Thomas Duigou, and Jean-Loup Faulon. Reinforcement learning for bioret- rosynthesis.ACS Synthetic Biology, 9(1):157–168, 2019

  57. [57]

    Algorithm for reaction classification.Journal of chemical information and modeling, 53(11):2884–2895, 2013

    Hans Kraut, Josef Eiblmaier, Guenter Grethe, Peter Löw, Heinz Matuszczyk, and Heinz Saller. Algorithm for reaction classification.Journal of chemical information and modeling, 53(11):2884–2895, 2013. 15

  58. [58]

    Atlas of biochemistry

    Laboratory of Computational Systems Biotechnology. Atlas of biochemistry. https: //lcsb-databases.epfl.ch/atlas/Downloads, 2020. [Online; accessed 24-September- 2020]

  59. [59]

    RDKit: Open-source cheminformatics software

    Greg Landrum et al. RDKit: Open-source cheminformatics software. https://www.rdkit. org/, 2020. [Online; accessed 16-September-2020]

  60. [60]

    Deep learning.nature, 521(7553):436–444, 2015

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015

  61. [61]

    Retrosynthetic design of metabolic pathways to chemicals not found in nature.Current Opinion in Systems Biology, 14:82–107, 2019

    Geng-Min Lin, Robert Warden-Rothman, and Christopher A V oigt. Retrosynthetic design of metabolic pathways to chemicals not found in nature.Current Opinion in Systems Biology, 14:82–107, 2019

  62. [62]

    Retrosynthetic reaction prediction using neural sequence-to-sequence models.ACS central science, 3(10):1103–1113, 2017

    Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. Retrosynthetic reaction prediction using neural sequence-to-sequence models.ACS central science, 3(10):1103–1113, 2017

  63. [63]

    PhD thesis, University of Cambridge, 2012

    Daniel Mark Lowe.Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 2012

  64. [64]

    Molecular similarity in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 57(8):3186–3204, 2014

    Gerald Maggiora, Martin V ogt, Dagmar Stumpfe, and Jurgen Bajorath. Molecular similarity in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 57(8):3186–3204, 2014

  65. [65]

    Molecular similarity measures

    Gerald M Maggiora and Veerabahu Shanmugasundaram. Molecular similarity measures. In Chemoinformatics, pages 1–50. Springer, 2004

  66. [66]

    Molecular similarity measures

    Gerald M Maggiora and Veerabahu Shanmugasundaram. Molecular similarity measures. In Chemoinformatics and computational chemical biology, pages 39–100. Springer, 2011

  67. [67]

    Pathminer: predicting metabolic pathways by heuristic search.Bioinformatics, 19(13):1692–1698, 2003

    Daniel C McShan, S Rao, and Imran Shah. Pathminer: predicting metabolic pathways by heuristic search.Bioinformatics, 19(13):1692–1698, 2003

  68. [68]

    Metanetx: Automated model construction and genome annotation for large-scale metabolic networks

    MetaNetX. Metanetx: Automated model construction and genome annotation for large-scale metabolic networks. https://www.metanetx.org/, 2020. [Online; accessed 24-September- 2020]

  69. [69]

    Metanetx/mnxref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks.Nucleic acids research, 44(D1):D523–D526, 2016

    Sébastien Moretti, Olivier Martin, T Van Du Tran, Alan Bridge, Anne Morgat, and Marco Pagni. Metanetx/mnxref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks.Nucleic acids research, 44(D1):D523–D526, 2016

  70. [70]

    The generation of a unique machine description for chemical structures- a technique developed at chemical abstracts service.Journal of Chemical Documentation, 5(2):107–113, 1965

    Harry L Morgan. The generation of a unique machine description for chemical structures- a technique developed at chemical abstracts service.Journal of Chemical Documentation, 5(2):107–113, 1965

  71. [71]

    Updates in rhea—an expert curated resource of biochemical reactions.Nucleic acids research, page gkw990, 2016

    Anne Morgat, Thierry Lombardot, Kristian B Axelsen, Lucila Aimo, Anne Niknejad, Nevila Hyka-Nouspikel, Elisabeth Coudert, Monica Pozzato, Marco Pagni, Sébastien Moretti, et al. Updates in rhea—an expert curated resource of biochemical reactions.Nucleic acids research, page gkw990, 2016

  72. [72]

    Pathpred: an enzyme-catalyzed metabolic pathway prediction server.Nucleic acids research, 38(suppl_2):W138–W143, 2010

    Yuki Moriya, Daichi Shigemizu, Masahiro Hattori, Toshiaki Tokimatsu, Masaaki Kotera, Susumu Goto, and Minoru Kanehisa. Pathpred: an enzyme-catalyzed metabolic pathway prediction server.Nucleic acids research, 38(suppl_2):W138–W143, 2010

  73. [73]

    Engineering cellular metabolism.Cell, 164(6):1185–1197, 2016

    Jens Nielsen and Jay D Keasling. Engineering cellular metabolism.Cell, 164(6):1185–1197, 2016

  74. [74]

    Open babel: An open chemical toolbox.Journal of cheminformatics, 3(1):33, 2011

    Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open babel: An open chemical toolbox.Journal of cheminformatics, 3(1):33, 2011

  75. [75]

    Kegg: Kyoto encyclopedia of genes and genomes.Nucleic acids research, 27(1):29– 34, 1999

    Hiroyuki Ogata, Susumu Goto, Kazushige Sato, Wataru Fujibuchi, Hidemasa Bono, and Minoru Kanehisa. Kegg: Kyoto encyclopedia of genes and genomes.Nucleic acids research, 27(1):29– 34, 1999. 16

  76. [76]

    A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009

    Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009

  77. [77]

    Blackwell Scientific Publications, Oxford, 1993

    R Panico, WH Powell, and Jean-Claude Richer.A guide to IUPAC Nomenclature of Organic Compounds, volume 2. Blackwell Scientific Publications, Oxford, 1993

  78. [78]

    Rhea. Rhea. https://www.rhea-db.org/download, 2020. [Online; accessed 24-September- 2020]

  79. [79]

    Open-source platform to benchmark fingerprints for ligand-based virtual screening.Journal of cheminformatics, 5(1):26, 2013

    Sereina Riniker and Gregory A Landrum. Open-source platform to benchmark fingerprints for ligand-based virtual screening.Journal of cheminformatics, 5(1):26, 2013

  80. [80]

    Production of the antimalarial drug precursor artemisinic acid in engineered yeast.Nature, 440(7086):940–943, 2006

    Dae-Kyun Ro, Eric M Paradise, Mario Ouellet, Karl J Fisher, Karyn L Newman, John M Ndungu, Kimberly A Ho, Rachel A Eachus, Timothy S Ham, James Kirby, et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast.Nature, 440(7086):940–943, 2006

Showing first 80 references.