Computational framework for multistep metabolic pathway design

Jeffrey D. Varner; Peter Zhiping Zhang

arxiv: 2604.13471 · v1 · submitted 2026-04-15 · 💻 cs.LG

Computational framework for multistep metabolic pathway design

Peter Zhiping Zhang , Jeffrey D. Varner This is my paper

Pith reviewed 2026-05-10 13:41 UTC · model grok-4.3

classification 💻 cs.LG

keywords retrobiosynthesismetabolic pathway designneural network classificationenzymatic templatesdata augmentationde novo pathway designcomputational biosynthesisxenobiotic synthesis

0 comments

The pith

Neural network classifiers trained on real versus template-generated reactions enable a multistep retrobiosynthesis pipeline that reproduces natural and non-natural metabolic pathways.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles metabolic reactions and enzymatic templates from public databases and augments the data with artificial reactions created by the templates. Two neural network models are trained as binary classifiers to score the plausibility of 1-step and 2-step pathways by separating assembled real reactions from the artificial ones. These models are then combined with the enzymatic templates to form a retrobiosynthesis search procedure. The resulting pipeline is shown to computationally recover selected natural and non-natural pathways, providing an in silico method for generating hypotheses in de novo metabolic pathway design.

Core claim

We assembled metabolic reaction and enzymatic template data from public databases, carried out a data augmentation procedure to enrich the dataset with artificial metabolic reactions generated by enzymatic reaction templates, trained two neural network-based pathway ranking models as binary classifiers to distinguish assembled reactions from artificial counterparts, and integrated the models with enzymatic templates into a multistep retrobiosynthesis pipeline that reproduces some natural and non-natural pathways computationally.

What carries the argument

Two neural network-based pathway ranking models that each output a scalar plausibility score for a 1-step or 2-step pathway, combined with enzymatic templates for retrobiosynthetic search.

If this is right

The framework supports exploration of alternatives in de novo metabolic pathway design by scoring pathway steps.
Integration of the 1-step and 2-step models allows construction of longer pathways from individual reaction steps.
Validation through reproduction of known pathways indicates the approach can identify routes that align with existing biochemical knowledge.
The method extends traditional retrobiosynthetic workflows by using learned plausibility scores rather than purely rule-based matching.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the scoring generalizes beyond the training distribution, the pipeline could prioritize candidate pathways for experimental testing in synthetic biology applications.
The same augmentation-plus-classifier pattern might be adapted to other retrosynthesis domains where template rules exist but plausibility filtering is needed.
Combining the models into a search that builds pathways step by step suggests it could be extended to longer chains by chaining additional step-wise scores.

Load-bearing premise

Neural network scores trained to separate real assembled reactions from template-generated artificial ones will reliably pick out biologically plausible multistep pathways during retrobiosynthetic search.

What would settle it

Running the pipeline on a larger set of known natural pathways and finding that many cannot be recovered or that it proposes routes contradicted by experimental literature would show the scoring does not generalize to multistep plausibility.

Figures

Figures reproduced from arXiv: 2604.13471 by Jeffrey D. Varner, Peter Zhiping Zhang.

**Figure 2.** Figure 2: Architectures of the neural network pathway ranking models. (a) NN1PR: a 1-step [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Proposed multistep metabolic pathway design pipeline with NN1PR and NN2PR as ranking [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Performance comparison on testing data. (a) 1-step ranking: NN1PR vs. Tanimoto similarity [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Ranks assigned by NN1PR to the BDO pathway reported in [ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Ranks assigned by NN1PR for glycolysis pathway in a backward manner. 234501 templates [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Ranks assigned by NN1PR for the naloxone pathway designed by human expert in a [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Three candidates for BDO synthesis found by the proposed pipeline. At the top is the [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

In silico tools are important for generating novel hypotheses and exploring alternatives in de novo metabolic pathway design. However, while many computational frameworks have been proposed for retrobiosynthesis, few successful examples of algorithm-guided xenobiotic biochemical retrosynthesis have been reported in the literature. Deep learning has improved the quality of synthesis and retrosynthesis in organic chemistry applications. Inspired by this progress, we explored combining deep learning of biochemical transformations with the traditional retrobiosynthetic workflow to improve in silico synthetic metabolic pathway designs. To develop our computational biosynthetic pathway design framework, we assembled metabolic reaction and enzymatic template data from public databases. A data augmentation procedure, adapted from literature, was carried out to enrich the assembled reaction dataset with artificial metabolic reactions generated by enzymatic reaction templates. Two neural network-based pathway ranking models were trained as binary classifiers to distinguish assembled reactions from artificial counterparts; each model output a scalar quantifying the plausibility of a 1-step or 2-step pathway. Combining these two models with enzymatic templates, we built a multistep retrobiosynthesis pipeline and validated it by reproducing some natural and non-natural pathways computationally.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends template-based retrobiosynthesis with two separate neural classifiers for 1-step and 2-step ranking, but the abstract supplies no accuracy numbers or baselines so the practical gain is unclear.

read the letter

The paper assembles metabolic reactions and enzymatic templates from public sources, augments the set with artificial reactions generated by those same templates, and trains two neural networks as binary classifiers. One scores single reactions and the other scores pairs; the scores then guide a multistep search. They report that the pipeline recovers some known natural and non-natural pathways in silico. The two-model split for different step depths is a reasonable engineering choice that keeps the ranking tractable as pathway length grows. Data handling follows patterns already used in retrobiosynthesis work, so the implementation itself looks straightforward to reproduce if the code is released. The main limitation is the missing quantitative evidence. No classification accuracies, no comparison against simpler ranking rules or single-model baselines, and no count of how many pathways were tested or how many succeeded appear in the abstract. Without those numbers it is difficult to judge whether the learned scores improve design success rates or merely reproduce the pathways that are already easy to find with the templates. The training setup also carries a built-in risk: because negative examples are produced by the identical template rules used at search time, the networks could learn surface features of template application rather than deeper metabolic constraints. If the full manuscript contains cross-validation results, held-out pathway tests, or external experimental checks, that would directly address the gap. The work is aimed at metabolic engineers and synthetic biologists who already use computational retrosynthesis tools and want a drop-in ranking layer. A reader building similar pipelines would get a clear description of the data pipeline and model architecture even if the performance claims need strengthening. I would send it to peer review; the methods are concrete enough that referees can ask for the missing metrics and any additional validation experiments.

Referee Report

3 major / 1 minor

Summary. The manuscript presents a computational framework for multistep metabolic pathway design that assembles metabolic reaction and enzymatic template data from public databases, augments the dataset with artificial reactions generated via the same templates, trains two neural network binary classifiers (for 1-step and 2-step pathways) to distinguish real from artificial reactions, and integrates the resulting plausibility scores into a retrobiosynthesis pipeline. The framework is validated by computationally reproducing some natural and non-natural pathways.

Significance. If the neural network ranking models prove to capture biological plausibility beyond template artifacts and the pipeline demonstrates measurable improvements over existing retrobiosynthesis methods, the work could advance in silico hypothesis generation for metabolic engineering and xenobiotic synthesis. The approach of combining template-based generation with learned ranking is a natural extension of deep learning successes in organic retrosynthesis, and the data augmentation strategy is a reasonable starting point. However, the current evidence consists only of qualitative reproduction of pathways without metrics or baselines, limiting the assessed impact.

major comments (3)

[Abstract] Abstract: The central claim of an 'improved' multistep retrobiosynthesis pipeline is not supported by any quantitative results. No classification accuracies, pathway reproduction rates, number of pathways tested, success criteria, or comparisons to baseline template-only searches are reported, making it impossible to determine whether the neural network scores contribute to the pipeline or whether the reproduction exceeds what template enumeration alone would achieve.
[Model training and data augmentation description] Model training and data augmentation description: The negative examples for the binary classifiers are generated by applying the identical enzymatic templates later used for candidate generation in the retrobiosynthesis search. This creates a risk that the models learn template-application artifacts (e.g., consistent atom-mapping conventions, bond-change signatures, or substrate patterns) rather than intrinsic metabolic feasibility. When these scores are then used to rank pathways inside the same template-driven search, the ranking may reduce to a measure of template conformity instead of biological plausibility, undermining generalization to novel or non-natural pathways.
[Pipeline integration section] Pipeline integration section: No description is given of how the 1-step and 2-step neural network scores are combined during multistep search, what search algorithm or beam/pruning strategy is employed, how pathway length is handled, or how conflicts between the two models are resolved. Without these details the multistep claim cannot be evaluated.

minor comments (1)

[Abstract] The abstract states that the data augmentation procedure was 'adapted from literature' but provides no specific citation or description of the adaptation, which would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major point below and outline the revisions we will make to improve clarity and address concerns.

read point-by-point responses

Referee: [Abstract] The central claim of an 'improved' multistep retrobiosynthesis pipeline is not supported by any quantitative results. No classification accuracies, pathway reproduction rates, number of pathways tested, success criteria, or comparisons to baseline template-only searches are reported, making it impossible to determine whether the neural network scores contribute to the pipeline or whether the reproduction exceeds what template enumeration alone would achieve.

Authors: We agree that the abstract implies improvement without sufficient supporting metrics in the summary. We will revise the abstract to describe the work as an exploratory framework that integrates neural ranking with template-based retrobiosynthesis, validated via computational reproduction of pathways, without claiming quantitative superiority. We will also add key metrics (model accuracies, number of pathways tested, and reproduction rates) to the abstract and results section, along with a baseline comparison to template-only enumeration. revision: yes
Referee: The negative examples for the binary classifiers are generated by applying the identical enzymatic templates later used for candidate generation in the retrobiosynthesis search. This creates a risk that the models learn template-application artifacts (e.g., consistent atom-mapping conventions, bond-change signatures, or substrate patterns) rather than intrinsic metabolic feasibility. When these scores are then used to rank pathways inside the same template-driven search, the ranking may reduce to a measure of template conformity instead of biological plausibility, undermining generalization to novel or non-natural pathways.

Authors: This is a substantive concern regarding potential overfitting to template artifacts. Our augmentation strategy follows prior literature to create plausible negatives, with positives drawn from curated databases to encourage learning of real reaction features. We will add a limitations paragraph in the discussion explicitly acknowledging this risk and proposing future mitigations such as held-out test sets or alternative negative sampling. No changes to the core method are needed, but the clarification strengthens the presentation. revision: partial
Referee: No description is given of how the 1-step and 2-step neural network scores are combined during multistep search, what search algorithm or beam/pruning strategy is employed, how pathway length is handled, or how conflicts between the two models are resolved. Without these details the multistep claim cannot be evaluated.

Authors: We appreciate this feedback on missing implementation details. In the revised manuscript, we have expanded the pipeline integration section to specify: scores are combined via a weighted sum (with higher weight on 2-step predictions for longer segments), a beam search of width 5 with cumulative score pruning is employed, pathways are extended iteratively up to a user-defined maximum length, and model conflicts are resolved by selecting the higher plausibility score for candidate extensions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the multistep retrobiosynthesis pipeline

full rationale

The paper assembles metabolic reaction and enzymatic template data from public databases, augments the dataset with artificial reactions generated by applying the templates, and trains two neural network binary classifiers to distinguish real assembled reactions from these artificial counterparts. The resulting scalar plausibility scores for 1-step and 2-step pathways are then combined with the same enzymatic templates to form the retrobiosynthesis pipeline, which is validated by computationally reproducing some natural and non-natural pathways. No equations, fitted parameters, or self-citations are described that would make the pipeline outputs or pathway rankings definitionally equivalent to the training inputs by construction. The training distinction relies on an external database of real reactions, and the validation provides an independent check rather than a tautological reproduction of fitted values.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that artificial reactions generated by enzymatic templates form a useful negative class for training, plus the implicit claim that public database reactions are sufficiently complete and accurate; no new physical entities are introduced.

free parameters (1)

neural network parameters
Weights and biases of the two binary classifiers are fitted to the augmented reaction dataset.

axioms (1)

domain assumption Template-generated artificial reactions are representative of implausible biochemical transformations
This distinction is used to create the training labels for the ranking models.

pith-pipeline@v0.9.0 · 5487 in / 1341 out tokens · 30788 ms · 2026-05-10T13:41:49.561687+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 1 internal anchor

[1]

Tensorflow: A system for large-scale machine learning

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In12th {USENIX} symposium on operating systems design and implementation ({OSDI}16), pages 265–283, 2016

work page 2016
[2]

Event extraction for systems biology by text mining the literature.Trends in biotechnology, 28(7):381–390, 2010

Sophia Ananiadou, Sampo Pyysalo, Jun’ichi Tsujii, and Douglas B Kell. Event extraction for systems biology by text mining the literature.Trends in biotechnology, 28(7):381–390, 2010

work page 2010
[3]

Reconciliation of metabolites and biochemical reactions for metabolic networks.Brief- ings in bioinformatics, 15(1):123–135, 2014

Thomas Bernard, Alan Bridge, Anne Morgat, Sébastien Moretti, Ioannis Xenarios, and Marco Pagni. Reconciliation of metabolites and biochemical reactions for metabolic networks.Brief- ings in bioinformatics, 15(1):123–135, 2014

work page 2014
[4]

Stereo signature molecular descriptor

Pablo Carbonell, Lars Carlsson, and Jean-Loup Faulon. Stereo signature molecular descriptor. Journal of chemical information and modeling, 53(4):887–897, 2013

work page 2013
[5]

Deep learning with python, 2017

François Chollet. Deep learning with python, 2017. 12

work page 2017
[6]

Convolutional embedding of attributed molecular graphs for physical property prediction

Connor W Coley, Regina Barzilay, William H Green, Tommi S Jaakkola, and Klavs F Jensen. Convolutional embedding of attributed molecular graphs for physical property prediction. Journal of chemical information and modeling, 57(8):1757–1772, 2017

work page 2017
[7]

Prediction of organic reaction outcomes using machine learning.ACS central science, 3(5):434– 443, 2017

Connor W Coley, Regina Barzilay, Tommi S Jaakkola, William H Green, and Klavs F Jensen. Prediction of organic reaction outcomes using machine learning.ACS central science, 3(5):434– 443, 2017

work page 2017
[8]

A graph-convolutional neural network model for the prediction of chemical reactivity.Chemical science, 10(2):370–377, 2019

Connor W Coley, Wengong Jin, Luke Rogers, Timothy F Jamison, Tommi S Jaakkola, William H Green, Regina Barzilay, and Klavs F Jensen. A graph-convolutional neural network model for the prediction of chemical reactivity.Chemical science, 10(2):370–377, 2019

work page 2019
[9]

Computer-assisted retrosynthesis based on molecular similarity.ACS central science, 3(12):1237–1245, 2017

Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. Computer-assisted retrosynthesis based on molecular similarity.ACS central science, 3(12):1237–1245, 2017

work page 2017
[10]

Arthur Dalby, James G Nourse, W Douglas Hounshell, Ann KI Gushurst, David L Grier, Burton A Leland, and John Laufer. Description of several chemical structure file formats used by computer programs developed at molecular design limited.Journal of chemical information and computer sciences, 32(3):244–255, 1992

work page 1992
[11]

Daylight Chemical Information Systems, Inc. 3. smiles - a simplified chemical language. https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html, 2020. [On- line; accessed 17-September-2020]

work page 2020
[12]

Daylight Chemical Information Systems, Inc. 4. smarts - a language for describing molecu- lar patterns. https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html,

work page
[13]

[Online; accessed 17-September-2020]

work page 2020
[14]

Retropath2

Baudoin Delépine, Thomas Duigou, Pablo Carbonell, and Jean-Loup Faulon. Retropath2. 0: A retrosynthesis workflow for metabolic engineers.Metabolic engineering, 45:158–170, 2018

work page 2018
[15]

Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods

Jianxin Duan, Steven L Dixon, Jeffrey F Lowrie, and Woody Sherman. Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods. Journal of Molecular Graphics and Modelling, 29(2):157–170, 2010

work page 2010
[16]

Retrorules: a database of reaction rules for engineering biology.Nucleic acids research, 47(D1):D1229– D1235, 2019

Thomas Duigou, Melchior Du Lac, Pablo Carbonell, and Jean-Loup Faulon. Retrorules: a database of reaction rules for engineering biology.Nucleic acids research, 47(D1):D1229– D1235, 2019

work page 2019
[17]

Convolutional networks on graphs for learning molecular fingerprints

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. InAdvances in neural information processing systems, pages 2224– 2232, 2015

work page 2015
[18]

EMBL-EBI. Chebi. https://www.ebi.ac.uk/chebi/, 2020. [Online; accessed 24- September-2020]

work page 2020
[19]

Computational framework for predictive biodegradation.Biotechnology and bioengineering, 104(6):1086–1097, 2009

Stacey D Finley, Linda J Broadbelt, and Vassily Hatzimanikatis. Computational framework for predictive biodegradation.Biotechnology and bioengineering, 104(6):1086–1097, 2009

work page 2009
[20]

In silico feasibility of novel biodegradation pathways for 1, 2, 4-trichlorobenzene.BMC systems biology, 4(1):7, 2010

Stacey D Finley, Linda J Broadbelt, and Vassily Hatzimanikatis. In silico feasibility of novel biodegradation pathways for 1, 2, 4-trichlorobenzene.BMC systems biology, 4(1):7, 2010

work page 2010
[21]

Metanetx

Mathias Ganter, Thomas Bernard, Sébastien Moretti, Joerg Stelling, and Marco Pagni. Metanetx. org: a website and repository for accessing, analysing and manipulating metabolic networks. Bioinformatics, 29(6):815–816, 2013

work page 2013
[22]

The synthesizability of molecules proposed by generative models.Journal of chemical information and modeling, 60(12):5714–5723, 2020

Wenhao Gao and Connor W Coley. The synthesizability of molecules proposed by generative models.Journal of chemical information and modeling, 60(12):5714–5723, 2020

work page 2020
[23]

O’Reilly Media, 2019

Aurélien Géron.Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, 2019. 13

work page 2019
[24]

Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

work page 2018
[25]

MIT press Cambridge, 2016

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep learning, volume 1. MIT press Cambridge, 2016

work page 2016
[26]

Atlas of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies.ACS synthetic biology, 5(10):1155–1166, 2016

Noushin Hadadi, Jasmin Hafner, Adrian Shajkofci, Aikaterini Zisaki, and Vassily Hatzi- manikatis. Atlas of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies.ACS synthetic biology, 5(10):1155–1166, 2016

work page 2016
[27]

Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways.Current opinion in chemical biology, 28:99–104, 2015

Noushin Hadadi and Vassily Hatzimanikatis. Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways.Current opinion in chemical biology, 28:99–104, 2015

work page 2015
[28]

Updated atlas of biochemistry with new metabolites and improved enzyme prediction power.ACS Synthetic Biology, 9(6):1479–1482, 2020

Jasmin Hafner, Homa MohammadiPeyhani, Anastasia Sveshnikova, Alan Scheidegger, and Vassily Hatzimanikatis. Updated atlas of biochemistry with new metabolites and improved enzyme prediction power.ACS Synthetic Biology, 9(6):1479–1482, 2020

work page 2020
[29]

Masahiro Hattori, Yasushi Okuno, Susumu Goto, and Minoru Kanehisa. Development of a chem- ical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways.Journal of the American Chemical Society, 125(39):11853–11865, 2003

work page 2003
[30]

Exploring the diversity of complex metabolic networks.Bioinformatics, 21(8):1603–1609, 2005

Vassily Hatzimanikatis, Chunhui Li, Justin A Ionita, Christopher S Henry, Matthew D Jankowski, and Linda J Broadbelt. Exploring the diversity of complex metabolic networks.Bioinformatics, 21(8):1603–1609, 2005

work page 2005
[31]

Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate

Christopher S Henry, Linda J Broadbelt, and Vassily Hatzimanikatis. Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnology and bioengineering, 106(3):462–473, 2010

work page 2010
[32]

Craig A. James. Opensmiles specification, 2020. [Online; accessed 16-September-2020]

work page 2020
[33]

Structure-based synthesizability prediction of crystals using partially supervised learning.Journal of the American Chemical Society, 142(44):18836–18843, 2020

Jidon Jang, Geun Ho Gu, Juhwan Noh, Juhwan Kim, and Yousung Jung. Structure-based synthesizability prediction of crystals using partially supervised learning.Journal of the American Chemical Society, 142(44):18836–18843, 2020

work page 2020
[34]

Predicting organic reaction outcomes with weisfeiler-lehman network

Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with weisfeiler-lehman network. InAdvances in Neural Information Processing Systems, pages 2607–2616, 2017

work page 2017
[35]

Toward pathway engineering: a new database of genetic and molecular pathways.Sci

Minoru Kanehisa. Toward pathway engineering: a new database of genetic and molecular pathways.Sci. Technol. Jap., 59:34–38, 1996

work page 1996
[36]

A database for post-genome analysis.Trends Genet., 13:375–376, 1997

Minoru Kanehisa. A database for post-genome analysis.Trends Genet., 13:375–376, 1997

work page 1997
[37]

Toward understanding the origin and evolution of cellular organisms.Protein Science, 28(11):1947–1951, 2019

Minoru Kanehisa. Toward understanding the origin and evolution of cellular organisms.Protein Science, 28(11):1947–1951, 2019

work page 1947
[38]

Kegg for linking genomes to life and the environment.Nucleic acids research, 36(suppl_1):D480– D484, 2007

Minoru Kanehisa, Michihiro Araki, Susumu Goto, Masahiro Hattori, Mika Hirakawa, Masumi Itoh, Toshiaki Katayama, Shuichi Kawashima, Shujiro Okuda, Toshiaki Tokimatsu, et al. Kegg for linking genomes to life and the environment.Nucleic acids research, 36(suppl_1):D480– D484, 2007

work page 2007
[39]

Kegg: new perspectives on genomes, pathways, diseases and drugs.Nucleic acids research, 45(D1):D353– D361, 2017

Minoru Kanehisa, Miho Furumichi, Mao Tanabe, Yoko Sato, and Kanae Morishima. Kegg: new perspectives on genomes, pathways, diseases and drugs.Nucleic acids research, 45(D1):D353– D361, 2017

work page 2017
[40]

Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000

Minoru Kanehisa and Susumu Goto. Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000. 14

work page 2000
[41]

Kegg for representation and analysis of molecular networks involving diseases and drugs.Nucleic acids research, 38(suppl_1):D355–D360, 2010

Minoru Kanehisa, Susumu Goto, Miho Furumichi, Mao Tanabe, and Mika Hirakawa. Kegg for representation and analysis of molecular networks involving diseases and drugs.Nucleic acids research, 38(suppl_1):D355–D360, 2010

work page 2010
[42]

From genomics to chemical genomics: new developments in kegg.Nucleic acids research, 34(suppl_1):D354– D357, 2006

Minoru Kanehisa, Susumu Goto, Masahiro Hattori, Kiyoko F Aoki-Kinoshita, Masumi Itoh, Shuichi Kawashima, Toshiaki Katayama, Michihiro Araki, and Mika Hirakawa. From genomics to chemical genomics: new developments in kegg.Nucleic acids research, 34(suppl_1):D354– D357, 2006

work page 2006
[43]

The kegg databases at genomenet.Nucleic acids research, 30(1):42–46, 2002

Minoru Kanehisa, Susumu Goto, Shuichi Kawashima, and Akihiro Nakaya. The kegg databases at genomenet.Nucleic acids research, 30(1):42–46, 2002

work page 2002
[44]

The kegg resource for deciphering the genome.Nucleic acids research, 32(suppl_1):D277– D280, 2004

Minoru Kanehisa, Susumu Goto, Shuichi Kawashima, Yasushi Okuno, and Masahiro Hattori. The kegg resource for deciphering the genome.Nucleic acids research, 32(suppl_1):D277– D280, 2004

work page 2004
[45]

Kegg for integration and interpretation of large-scale molecular data sets.Nucleic acids research, 40(D1):D109–D114, 2012

Minoru Kanehisa, Susumu Goto, Yoko Sato, Miho Furumichi, and Mao Tanabe. Kegg for integration and interpretation of large-scale molecular data sets.Nucleic acids research, 40(D1):D109–D114, 2012

work page 2012
[46]

Data, information, knowledge and principle: back to metabolism in kegg.Nucleic acids research, 42(D1):D199–D205, 2014

Minoru Kanehisa, Susumu Goto, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. Data, information, knowledge and principle: back to metabolism in kegg.Nucleic acids research, 42(D1):D199–D205, 2014

work page 2014
[47]

New approach for understanding genome variations in kegg.Nucleic acids research, 47(D1):D590– D595, 2019

Minoru Kanehisa, Yoko Sato, Miho Furumichi, Kanae Morishima, and Mao Tanabe. New approach for understanding genome variations in kegg.Nucleic acids research, 47(D1):D590– D595, 2019

work page 2019
[48]

Kegg as a reference resource for gene and protein annotation.Nucleic acids research, 44(D1):D457– D462, 2016

Minoru Kanehisa, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. Kegg as a reference resource for gene and protein annotation.Nucleic acids research, 44(D1):D457– D462, 2016

work page 2016
[49]

Learning to predict chemical reactions.Journal of chemical information and modeling, 51(9):2209–2222, 2011

Matthew A Kayala, Chloé-Agathe Azencott, Jonathan H Chen, and Pierre Baldi. Learning to predict chemical reactions.Journal of chemical information and modeling, 51(9):2209–2222, 2011

work page 2011
[50]

Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning.Journal of chemical information and modeling, 52(10):2526–2540, 2012

Matthew A Kayala and Pierre Baldi. Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning.Journal of chemical information and modeling, 52(10):2526–2540, 2012

work page 2012
[51]

Molecular graph convolutions: moving beyond fingerprints.Journal of computer-aided molecular design, 30(8):595–608, 2016

Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular graph convolutions: moving beyond fingerprints.Journal of computer-aided molecular design, 30(8):595–608, 2016

work page 2016
[52]

Manufacturing molecules through metabolic engineering.Science, 330(6009):1355–1358, 2010

Jay D Keasling. Manufacturing molecules through metabolic engineering.Science, 330(6009):1355–1358, 2010

work page 2010
[53]

Untersuchungen über aromatische verbindungen ueber die constitution der aromatischen verbindungen

Aug Kekuié. Untersuchungen über aromatische verbindungen ueber die constitution der aromatischen verbindungen. i. ueber die constitution der aromatischen verbindungen.Justus Liebigs Annalen der Chemie, 137(2):129–196, 1866

work page
[54]

Sur la constitution des substances aromatiques.Bulletin mensuel de la Société Chimique de Paris, 3:98, 1865

Auguste Kekulé. Sur la constitution des substances aromatiques.Bulletin mensuel de la Société Chimique de Paris, 3:98, 1865

work page
[55]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[56]

Reinforcement learning for bioret- rosynthesis.ACS Synthetic Biology, 9(1):157–168, 2019

Mathilde Koch, Thomas Duigou, and Jean-Loup Faulon. Reinforcement learning for bioret- rosynthesis.ACS Synthetic Biology, 9(1):157–168, 2019

work page 2019
[57]

Algorithm for reaction classification.Journal of chemical information and modeling, 53(11):2884–2895, 2013

Hans Kraut, Josef Eiblmaier, Guenter Grethe, Peter Löw, Heinz Matuszczyk, and Heinz Saller. Algorithm for reaction classification.Journal of chemical information and modeling, 53(11):2884–2895, 2013. 15

work page 2013
[58]

Atlas of biochemistry

Laboratory of Computational Systems Biotechnology. Atlas of biochemistry. https: //lcsb-databases.epfl.ch/atlas/Downloads, 2020. [Online; accessed 24-September- 2020]

work page 2020
[59]

RDKit: Open-source cheminformatics software

Greg Landrum et al. RDKit: Open-source cheminformatics software. https://www.rdkit. org/, 2020. [Online; accessed 16-September-2020]

work page 2020
[60]

Deep learning.nature, 521(7553):436–444, 2015

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015

work page 2015
[61]

Retrosynthetic design of metabolic pathways to chemicals not found in nature.Current Opinion in Systems Biology, 14:82–107, 2019

Geng-Min Lin, Robert Warden-Rothman, and Christopher A V oigt. Retrosynthetic design of metabolic pathways to chemicals not found in nature.Current Opinion in Systems Biology, 14:82–107, 2019

work page 2019
[62]

Retrosynthetic reaction prediction using neural sequence-to-sequence models.ACS central science, 3(10):1103–1113, 2017

Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. Retrosynthetic reaction prediction using neural sequence-to-sequence models.ACS central science, 3(10):1103–1113, 2017

work page 2017
[63]

PhD thesis, University of Cambridge, 2012

Daniel Mark Lowe.Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 2012

work page 2012
[64]

Molecular similarity in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 57(8):3186–3204, 2014

Gerald Maggiora, Martin V ogt, Dagmar Stumpfe, and Jurgen Bajorath. Molecular similarity in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 57(8):3186–3204, 2014

work page 2014
[65]

Molecular similarity measures

Gerald M Maggiora and Veerabahu Shanmugasundaram. Molecular similarity measures. In Chemoinformatics, pages 1–50. Springer, 2004

work page 2004
[66]

Molecular similarity measures

Gerald M Maggiora and Veerabahu Shanmugasundaram. Molecular similarity measures. In Chemoinformatics and computational chemical biology, pages 39–100. Springer, 2011

work page 2011
[67]

Pathminer: predicting metabolic pathways by heuristic search.Bioinformatics, 19(13):1692–1698, 2003

Daniel C McShan, S Rao, and Imran Shah. Pathminer: predicting metabolic pathways by heuristic search.Bioinformatics, 19(13):1692–1698, 2003

work page 2003
[68]

Metanetx: Automated model construction and genome annotation for large-scale metabolic networks

MetaNetX. Metanetx: Automated model construction and genome annotation for large-scale metabolic networks. https://www.metanetx.org/, 2020. [Online; accessed 24-September- 2020]

work page 2020
[69]

Metanetx/mnxref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks.Nucleic acids research, 44(D1):D523–D526, 2016

Sébastien Moretti, Olivier Martin, T Van Du Tran, Alan Bridge, Anne Morgat, and Marco Pagni. Metanetx/mnxref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks.Nucleic acids research, 44(D1):D523–D526, 2016

work page 2016
[70]

The generation of a unique machine description for chemical structures- a technique developed at chemical abstracts service.Journal of Chemical Documentation, 5(2):107–113, 1965

Harry L Morgan. The generation of a unique machine description for chemical structures- a technique developed at chemical abstracts service.Journal of Chemical Documentation, 5(2):107–113, 1965

work page 1965
[71]

Updates in rhea—an expert curated resource of biochemical reactions.Nucleic acids research, page gkw990, 2016

Anne Morgat, Thierry Lombardot, Kristian B Axelsen, Lucila Aimo, Anne Niknejad, Nevila Hyka-Nouspikel, Elisabeth Coudert, Monica Pozzato, Marco Pagni, Sébastien Moretti, et al. Updates in rhea—an expert curated resource of biochemical reactions.Nucleic acids research, page gkw990, 2016

work page 2016
[72]

Pathpred: an enzyme-catalyzed metabolic pathway prediction server.Nucleic acids research, 38(suppl_2):W138–W143, 2010

Yuki Moriya, Daichi Shigemizu, Masahiro Hattori, Toshiaki Tokimatsu, Masaaki Kotera, Susumu Goto, and Minoru Kanehisa. Pathpred: an enzyme-catalyzed metabolic pathway prediction server.Nucleic acids research, 38(suppl_2):W138–W143, 2010

work page 2010
[73]

Engineering cellular metabolism.Cell, 164(6):1185–1197, 2016

Jens Nielsen and Jay D Keasling. Engineering cellular metabolism.Cell, 164(6):1185–1197, 2016

work page 2016
[74]

Open babel: An open chemical toolbox.Journal of cheminformatics, 3(1):33, 2011

Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open babel: An open chemical toolbox.Journal of cheminformatics, 3(1):33, 2011

work page 2011
[75]

Kegg: Kyoto encyclopedia of genes and genomes.Nucleic acids research, 27(1):29– 34, 1999

Hiroyuki Ogata, Susumu Goto, Kazushige Sato, Wataru Fujibuchi, Hidemasa Bono, and Minoru Kanehisa. Kegg: Kyoto encyclopedia of genes and genomes.Nucleic acids research, 27(1):29– 34, 1999. 16

work page 1999
[76]

A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009

Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009

work page 2009
[77]

Blackwell Scientific Publications, Oxford, 1993

R Panico, WH Powell, and Jean-Claude Richer.A guide to IUPAC Nomenclature of Organic Compounds, volume 2. Blackwell Scientific Publications, Oxford, 1993

work page 1993
[78]

Rhea. Rhea. https://www.rhea-db.org/download, 2020. [Online; accessed 24-September- 2020]

work page 2020
[79]

Open-source platform to benchmark fingerprints for ligand-based virtual screening.Journal of cheminformatics, 5(1):26, 2013

Sereina Riniker and Gregory A Landrum. Open-source platform to benchmark fingerprints for ligand-based virtual screening.Journal of cheminformatics, 5(1):26, 2013

work page 2013
[80]

Production of the antimalarial drug precursor artemisinic acid in engineered yeast.Nature, 440(7086):940–943, 2006

Dae-Kyun Ro, Eric M Paradise, Mario Ouellet, Karl J Fisher, Karyn L Newman, John M Ndungu, Kimberly A Ho, Rachel A Eachus, Timothy S Ham, James Kirby, et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast.Nature, 440(7086):940–943, 2006

work page 2006

Showing first 80 references.

[1] [1]

Tensorflow: A system for large-scale machine learning

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In12th {USENIX} symposium on operating systems design and implementation ({OSDI}16), pages 265–283, 2016

work page 2016

[2] [2]

Event extraction for systems biology by text mining the literature.Trends in biotechnology, 28(7):381–390, 2010

Sophia Ananiadou, Sampo Pyysalo, Jun’ichi Tsujii, and Douglas B Kell. Event extraction for systems biology by text mining the literature.Trends in biotechnology, 28(7):381–390, 2010

work page 2010

[3] [3]

Reconciliation of metabolites and biochemical reactions for metabolic networks.Brief- ings in bioinformatics, 15(1):123–135, 2014

Thomas Bernard, Alan Bridge, Anne Morgat, Sébastien Moretti, Ioannis Xenarios, and Marco Pagni. Reconciliation of metabolites and biochemical reactions for metabolic networks.Brief- ings in bioinformatics, 15(1):123–135, 2014

work page 2014

[4] [4]

Stereo signature molecular descriptor

Pablo Carbonell, Lars Carlsson, and Jean-Loup Faulon. Stereo signature molecular descriptor. Journal of chemical information and modeling, 53(4):887–897, 2013

work page 2013

[5] [5]

Deep learning with python, 2017

François Chollet. Deep learning with python, 2017. 12

work page 2017

[6] [6]

Convolutional embedding of attributed molecular graphs for physical property prediction

Connor W Coley, Regina Barzilay, William H Green, Tommi S Jaakkola, and Klavs F Jensen. Convolutional embedding of attributed molecular graphs for physical property prediction. Journal of chemical information and modeling, 57(8):1757–1772, 2017

work page 2017

[7] [7]

Prediction of organic reaction outcomes using machine learning.ACS central science, 3(5):434– 443, 2017

Connor W Coley, Regina Barzilay, Tommi S Jaakkola, William H Green, and Klavs F Jensen. Prediction of organic reaction outcomes using machine learning.ACS central science, 3(5):434– 443, 2017

work page 2017

[8] [8]

A graph-convolutional neural network model for the prediction of chemical reactivity.Chemical science, 10(2):370–377, 2019

Connor W Coley, Wengong Jin, Luke Rogers, Timothy F Jamison, Tommi S Jaakkola, William H Green, Regina Barzilay, and Klavs F Jensen. A graph-convolutional neural network model for the prediction of chemical reactivity.Chemical science, 10(2):370–377, 2019

work page 2019

[9] [9]

Computer-assisted retrosynthesis based on molecular similarity.ACS central science, 3(12):1237–1245, 2017

Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. Computer-assisted retrosynthesis based on molecular similarity.ACS central science, 3(12):1237–1245, 2017

work page 2017

[10] [10]

Arthur Dalby, James G Nourse, W Douglas Hounshell, Ann KI Gushurst, David L Grier, Burton A Leland, and John Laufer. Description of several chemical structure file formats used by computer programs developed at molecular design limited.Journal of chemical information and computer sciences, 32(3):244–255, 1992

work page 1992

[11] [11]

Daylight Chemical Information Systems, Inc. 3. smiles - a simplified chemical language. https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html, 2020. [On- line; accessed 17-September-2020]

work page 2020

[12] [12]

Daylight Chemical Information Systems, Inc. 4. smarts - a language for describing molecu- lar patterns. https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html,

work page

[13] [13]

[Online; accessed 17-September-2020]

work page 2020

[14] [14]

Retropath2

Baudoin Delépine, Thomas Duigou, Pablo Carbonell, and Jean-Loup Faulon. Retropath2. 0: A retrosynthesis workflow for metabolic engineers.Metabolic engineering, 45:158–170, 2018

work page 2018

[15] [15]

Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods

Jianxin Duan, Steven L Dixon, Jeffrey F Lowrie, and Woody Sherman. Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods. Journal of Molecular Graphics and Modelling, 29(2):157–170, 2010

work page 2010

[16] [16]

Retrorules: a database of reaction rules for engineering biology.Nucleic acids research, 47(D1):D1229– D1235, 2019

Thomas Duigou, Melchior Du Lac, Pablo Carbonell, and Jean-Loup Faulon. Retrorules: a database of reaction rules for engineering biology.Nucleic acids research, 47(D1):D1229– D1235, 2019

work page 2019

[17] [17]

Convolutional networks on graphs for learning molecular fingerprints

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. InAdvances in neural information processing systems, pages 2224– 2232, 2015

work page 2015

[18] [18]

EMBL-EBI. Chebi. https://www.ebi.ac.uk/chebi/, 2020. [Online; accessed 24- September-2020]

work page 2020

[19] [19]

Computational framework for predictive biodegradation.Biotechnology and bioengineering, 104(6):1086–1097, 2009

Stacey D Finley, Linda J Broadbelt, and Vassily Hatzimanikatis. Computational framework for predictive biodegradation.Biotechnology and bioengineering, 104(6):1086–1097, 2009

work page 2009

[20] [20]

In silico feasibility of novel biodegradation pathways for 1, 2, 4-trichlorobenzene.BMC systems biology, 4(1):7, 2010

Stacey D Finley, Linda J Broadbelt, and Vassily Hatzimanikatis. In silico feasibility of novel biodegradation pathways for 1, 2, 4-trichlorobenzene.BMC systems biology, 4(1):7, 2010

work page 2010

[21] [21]

Metanetx

Mathias Ganter, Thomas Bernard, Sébastien Moretti, Joerg Stelling, and Marco Pagni. Metanetx. org: a website and repository for accessing, analysing and manipulating metabolic networks. Bioinformatics, 29(6):815–816, 2013

work page 2013

[22] [22]

The synthesizability of molecules proposed by generative models.Journal of chemical information and modeling, 60(12):5714–5723, 2020

Wenhao Gao and Connor W Coley. The synthesizability of molecules proposed by generative models.Journal of chemical information and modeling, 60(12):5714–5723, 2020

work page 2020

[23] [23]

O’Reilly Media, 2019

Aurélien Géron.Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, 2019. 13

work page 2019

[24] [24]

Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

work page 2018

[25] [25]

MIT press Cambridge, 2016

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep learning, volume 1. MIT press Cambridge, 2016

work page 2016

[26] [26]

Atlas of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies.ACS synthetic biology, 5(10):1155–1166, 2016

Noushin Hadadi, Jasmin Hafner, Adrian Shajkofci, Aikaterini Zisaki, and Vassily Hatzi- manikatis. Atlas of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies.ACS synthetic biology, 5(10):1155–1166, 2016

work page 2016

[27] [27]

Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways.Current opinion in chemical biology, 28:99–104, 2015

Noushin Hadadi and Vassily Hatzimanikatis. Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways.Current opinion in chemical biology, 28:99–104, 2015

work page 2015

[28] [28]

Updated atlas of biochemistry with new metabolites and improved enzyme prediction power.ACS Synthetic Biology, 9(6):1479–1482, 2020

Jasmin Hafner, Homa MohammadiPeyhani, Anastasia Sveshnikova, Alan Scheidegger, and Vassily Hatzimanikatis. Updated atlas of biochemistry with new metabolites and improved enzyme prediction power.ACS Synthetic Biology, 9(6):1479–1482, 2020

work page 2020

[29] [29]

Masahiro Hattori, Yasushi Okuno, Susumu Goto, and Minoru Kanehisa. Development of a chem- ical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways.Journal of the American Chemical Society, 125(39):11853–11865, 2003

work page 2003

[30] [30]

Exploring the diversity of complex metabolic networks.Bioinformatics, 21(8):1603–1609, 2005

Vassily Hatzimanikatis, Chunhui Li, Justin A Ionita, Christopher S Henry, Matthew D Jankowski, and Linda J Broadbelt. Exploring the diversity of complex metabolic networks.Bioinformatics, 21(8):1603–1609, 2005

work page 2005

[31] [31]

Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate

Christopher S Henry, Linda J Broadbelt, and Vassily Hatzimanikatis. Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnology and bioengineering, 106(3):462–473, 2010

work page 2010

[32] [32]

Craig A. James. Opensmiles specification, 2020. [Online; accessed 16-September-2020]

work page 2020

[33] [33]

Structure-based synthesizability prediction of crystals using partially supervised learning.Journal of the American Chemical Society, 142(44):18836–18843, 2020

Jidon Jang, Geun Ho Gu, Juhwan Noh, Juhwan Kim, and Yousung Jung. Structure-based synthesizability prediction of crystals using partially supervised learning.Journal of the American Chemical Society, 142(44):18836–18843, 2020

work page 2020

[34] [34]

Predicting organic reaction outcomes with weisfeiler-lehman network

Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with weisfeiler-lehman network. InAdvances in Neural Information Processing Systems, pages 2607–2616, 2017

work page 2017

[35] [35]

Toward pathway engineering: a new database of genetic and molecular pathways.Sci

Minoru Kanehisa. Toward pathway engineering: a new database of genetic and molecular pathways.Sci. Technol. Jap., 59:34–38, 1996

work page 1996

[36] [36]

A database for post-genome analysis.Trends Genet., 13:375–376, 1997

Minoru Kanehisa. A database for post-genome analysis.Trends Genet., 13:375–376, 1997

work page 1997

[37] [37]

Toward understanding the origin and evolution of cellular organisms.Protein Science, 28(11):1947–1951, 2019

Minoru Kanehisa. Toward understanding the origin and evolution of cellular organisms.Protein Science, 28(11):1947–1951, 2019

work page 1947

[38] [38]

Kegg for linking genomes to life and the environment.Nucleic acids research, 36(suppl_1):D480– D484, 2007

Minoru Kanehisa, Michihiro Araki, Susumu Goto, Masahiro Hattori, Mika Hirakawa, Masumi Itoh, Toshiaki Katayama, Shuichi Kawashima, Shujiro Okuda, Toshiaki Tokimatsu, et al. Kegg for linking genomes to life and the environment.Nucleic acids research, 36(suppl_1):D480– D484, 2007

work page 2007

[39] [39]

Kegg: new perspectives on genomes, pathways, diseases and drugs.Nucleic acids research, 45(D1):D353– D361, 2017

Minoru Kanehisa, Miho Furumichi, Mao Tanabe, Yoko Sato, and Kanae Morishima. Kegg: new perspectives on genomes, pathways, diseases and drugs.Nucleic acids research, 45(D1):D353– D361, 2017

work page 2017

[40] [40]

Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000

Minoru Kanehisa and Susumu Goto. Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000. 14

work page 2000

[41] [41]

Kegg for representation and analysis of molecular networks involving diseases and drugs.Nucleic acids research, 38(suppl_1):D355–D360, 2010

Minoru Kanehisa, Susumu Goto, Miho Furumichi, Mao Tanabe, and Mika Hirakawa. Kegg for representation and analysis of molecular networks involving diseases and drugs.Nucleic acids research, 38(suppl_1):D355–D360, 2010

work page 2010

[42] [42]

From genomics to chemical genomics: new developments in kegg.Nucleic acids research, 34(suppl_1):D354– D357, 2006

Minoru Kanehisa, Susumu Goto, Masahiro Hattori, Kiyoko F Aoki-Kinoshita, Masumi Itoh, Shuichi Kawashima, Toshiaki Katayama, Michihiro Araki, and Mika Hirakawa. From genomics to chemical genomics: new developments in kegg.Nucleic acids research, 34(suppl_1):D354– D357, 2006

work page 2006

[43] [43]

The kegg databases at genomenet.Nucleic acids research, 30(1):42–46, 2002

Minoru Kanehisa, Susumu Goto, Shuichi Kawashima, and Akihiro Nakaya. The kegg databases at genomenet.Nucleic acids research, 30(1):42–46, 2002

work page 2002

[44] [44]

The kegg resource for deciphering the genome.Nucleic acids research, 32(suppl_1):D277– D280, 2004

Minoru Kanehisa, Susumu Goto, Shuichi Kawashima, Yasushi Okuno, and Masahiro Hattori. The kegg resource for deciphering the genome.Nucleic acids research, 32(suppl_1):D277– D280, 2004

work page 2004

[45] [45]

Kegg for integration and interpretation of large-scale molecular data sets.Nucleic acids research, 40(D1):D109–D114, 2012

Minoru Kanehisa, Susumu Goto, Yoko Sato, Miho Furumichi, and Mao Tanabe. Kegg for integration and interpretation of large-scale molecular data sets.Nucleic acids research, 40(D1):D109–D114, 2012

work page 2012

[46] [46]

Data, information, knowledge and principle: back to metabolism in kegg.Nucleic acids research, 42(D1):D199–D205, 2014

Minoru Kanehisa, Susumu Goto, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. Data, information, knowledge and principle: back to metabolism in kegg.Nucleic acids research, 42(D1):D199–D205, 2014

work page 2014

[47] [47]

New approach for understanding genome variations in kegg.Nucleic acids research, 47(D1):D590– D595, 2019

Minoru Kanehisa, Yoko Sato, Miho Furumichi, Kanae Morishima, and Mao Tanabe. New approach for understanding genome variations in kegg.Nucleic acids research, 47(D1):D590– D595, 2019

work page 2019

[48] [48]

Kegg as a reference resource for gene and protein annotation.Nucleic acids research, 44(D1):D457– D462, 2016

Minoru Kanehisa, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. Kegg as a reference resource for gene and protein annotation.Nucleic acids research, 44(D1):D457– D462, 2016

work page 2016

[49] [49]

Learning to predict chemical reactions.Journal of chemical information and modeling, 51(9):2209–2222, 2011

Matthew A Kayala, Chloé-Agathe Azencott, Jonathan H Chen, and Pierre Baldi. Learning to predict chemical reactions.Journal of chemical information and modeling, 51(9):2209–2222, 2011

work page 2011

[50] [50]

Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning.Journal of chemical information and modeling, 52(10):2526–2540, 2012

Matthew A Kayala and Pierre Baldi. Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning.Journal of chemical information and modeling, 52(10):2526–2540, 2012

work page 2012

[51] [51]

Molecular graph convolutions: moving beyond fingerprints.Journal of computer-aided molecular design, 30(8):595–608, 2016

Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular graph convolutions: moving beyond fingerprints.Journal of computer-aided molecular design, 30(8):595–608, 2016

work page 2016

[52] [52]

Manufacturing molecules through metabolic engineering.Science, 330(6009):1355–1358, 2010

Jay D Keasling. Manufacturing molecules through metabolic engineering.Science, 330(6009):1355–1358, 2010

work page 2010

[53] [53]

Untersuchungen über aromatische verbindungen ueber die constitution der aromatischen verbindungen

Aug Kekuié. Untersuchungen über aromatische verbindungen ueber die constitution der aromatischen verbindungen. i. ueber die constitution der aromatischen verbindungen.Justus Liebigs Annalen der Chemie, 137(2):129–196, 1866

work page

[54] [54]

Sur la constitution des substances aromatiques.Bulletin mensuel de la Société Chimique de Paris, 3:98, 1865

Auguste Kekulé. Sur la constitution des substances aromatiques.Bulletin mensuel de la Société Chimique de Paris, 3:98, 1865

work page

[55] [55]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[56] [56]

Reinforcement learning for bioret- rosynthesis.ACS Synthetic Biology, 9(1):157–168, 2019

Mathilde Koch, Thomas Duigou, and Jean-Loup Faulon. Reinforcement learning for bioret- rosynthesis.ACS Synthetic Biology, 9(1):157–168, 2019

work page 2019

[57] [57]

Algorithm for reaction classification.Journal of chemical information and modeling, 53(11):2884–2895, 2013

Hans Kraut, Josef Eiblmaier, Guenter Grethe, Peter Löw, Heinz Matuszczyk, and Heinz Saller. Algorithm for reaction classification.Journal of chemical information and modeling, 53(11):2884–2895, 2013. 15

work page 2013

[58] [58]

Atlas of biochemistry

Laboratory of Computational Systems Biotechnology. Atlas of biochemistry. https: //lcsb-databases.epfl.ch/atlas/Downloads, 2020. [Online; accessed 24-September- 2020]

work page 2020

[59] [59]

RDKit: Open-source cheminformatics software

Greg Landrum et al. RDKit: Open-source cheminformatics software. https://www.rdkit. org/, 2020. [Online; accessed 16-September-2020]

work page 2020

[60] [60]

Deep learning.nature, 521(7553):436–444, 2015

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015

work page 2015

[61] [61]

Retrosynthetic design of metabolic pathways to chemicals not found in nature.Current Opinion in Systems Biology, 14:82–107, 2019

Geng-Min Lin, Robert Warden-Rothman, and Christopher A V oigt. Retrosynthetic design of metabolic pathways to chemicals not found in nature.Current Opinion in Systems Biology, 14:82–107, 2019

work page 2019

[62] [62]

Retrosynthetic reaction prediction using neural sequence-to-sequence models.ACS central science, 3(10):1103–1113, 2017

Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. Retrosynthetic reaction prediction using neural sequence-to-sequence models.ACS central science, 3(10):1103–1113, 2017

work page 2017

[63] [63]

PhD thesis, University of Cambridge, 2012

Daniel Mark Lowe.Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 2012

work page 2012

[64] [64]

Molecular similarity in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 57(8):3186–3204, 2014

Gerald Maggiora, Martin V ogt, Dagmar Stumpfe, and Jurgen Bajorath. Molecular similarity in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 57(8):3186–3204, 2014

work page 2014

[65] [65]

Molecular similarity measures

Gerald M Maggiora and Veerabahu Shanmugasundaram. Molecular similarity measures. In Chemoinformatics, pages 1–50. Springer, 2004

work page 2004

[66] [66]

Molecular similarity measures

Gerald M Maggiora and Veerabahu Shanmugasundaram. Molecular similarity measures. In Chemoinformatics and computational chemical biology, pages 39–100. Springer, 2011

work page 2011

[67] [67]

Pathminer: predicting metabolic pathways by heuristic search.Bioinformatics, 19(13):1692–1698, 2003

Daniel C McShan, S Rao, and Imran Shah. Pathminer: predicting metabolic pathways by heuristic search.Bioinformatics, 19(13):1692–1698, 2003

work page 2003

[68] [68]

Metanetx: Automated model construction and genome annotation for large-scale metabolic networks

MetaNetX. Metanetx: Automated model construction and genome annotation for large-scale metabolic networks. https://www.metanetx.org/, 2020. [Online; accessed 24-September- 2020]

work page 2020

[69] [69]

Metanetx/mnxref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks.Nucleic acids research, 44(D1):D523–D526, 2016

Sébastien Moretti, Olivier Martin, T Van Du Tran, Alan Bridge, Anne Morgat, and Marco Pagni. Metanetx/mnxref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks.Nucleic acids research, 44(D1):D523–D526, 2016

work page 2016

[70] [70]

The generation of a unique machine description for chemical structures- a technique developed at chemical abstracts service.Journal of Chemical Documentation, 5(2):107–113, 1965

Harry L Morgan. The generation of a unique machine description for chemical structures- a technique developed at chemical abstracts service.Journal of Chemical Documentation, 5(2):107–113, 1965

work page 1965

[71] [71]

Updates in rhea—an expert curated resource of biochemical reactions.Nucleic acids research, page gkw990, 2016

Anne Morgat, Thierry Lombardot, Kristian B Axelsen, Lucila Aimo, Anne Niknejad, Nevila Hyka-Nouspikel, Elisabeth Coudert, Monica Pozzato, Marco Pagni, Sébastien Moretti, et al. Updates in rhea—an expert curated resource of biochemical reactions.Nucleic acids research, page gkw990, 2016

work page 2016

[72] [72]

Pathpred: an enzyme-catalyzed metabolic pathway prediction server.Nucleic acids research, 38(suppl_2):W138–W143, 2010

Yuki Moriya, Daichi Shigemizu, Masahiro Hattori, Toshiaki Tokimatsu, Masaaki Kotera, Susumu Goto, and Minoru Kanehisa. Pathpred: an enzyme-catalyzed metabolic pathway prediction server.Nucleic acids research, 38(suppl_2):W138–W143, 2010

work page 2010

[73] [73]

Engineering cellular metabolism.Cell, 164(6):1185–1197, 2016

Jens Nielsen and Jay D Keasling. Engineering cellular metabolism.Cell, 164(6):1185–1197, 2016

work page 2016

[74] [74]

Open babel: An open chemical toolbox.Journal of cheminformatics, 3(1):33, 2011

Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open babel: An open chemical toolbox.Journal of cheminformatics, 3(1):33, 2011

work page 2011

[75] [75]

Kegg: Kyoto encyclopedia of genes and genomes.Nucleic acids research, 27(1):29– 34, 1999

Hiroyuki Ogata, Susumu Goto, Kazushige Sato, Wataru Fujibuchi, Hidemasa Bono, and Minoru Kanehisa. Kegg: Kyoto encyclopedia of genes and genomes.Nucleic acids research, 27(1):29– 34, 1999. 16

work page 1999

[76] [76]

A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009

Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009

work page 2009

[77] [77]

Blackwell Scientific Publications, Oxford, 1993

R Panico, WH Powell, and Jean-Claude Richer.A guide to IUPAC Nomenclature of Organic Compounds, volume 2. Blackwell Scientific Publications, Oxford, 1993

work page 1993

[78] [78]

Rhea. Rhea. https://www.rhea-db.org/download, 2020. [Online; accessed 24-September- 2020]

work page 2020

[79] [79]

Open-source platform to benchmark fingerprints for ligand-based virtual screening.Journal of cheminformatics, 5(1):26, 2013

Sereina Riniker and Gregory A Landrum. Open-source platform to benchmark fingerprints for ligand-based virtual screening.Journal of cheminformatics, 5(1):26, 2013

work page 2013

[80] [80]

Production of the antimalarial drug precursor artemisinic acid in engineered yeast.Nature, 440(7086):940–943, 2006

Dae-Kyun Ro, Eric M Paradise, Mario Ouellet, Karl J Fisher, Karyn L Newman, John M Ndungu, Kimberly A Ho, Rachel A Eachus, Timothy S Ham, James Kirby, et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast.Nature, 440(7086):940–943, 2006

work page 2006