pith. sign in

arxiv: 1705.04612 · v2 · pith:DGXE3SDPnew · submitted 2017-05-12 · 💻 cs.LG · q-bio.BM

Molecular Generation with Recurrent Neural Networks (RNNs)

classification 💻 cs.LG q-bio.BM
keywords moleculeschemicalcompoundsdruggeneratelikenumbertraining
0
0 comments X
read the original abstract

The potential number of drug like small molecules is estimated to be between 10^23 and 10^60 while current databases of known compounds are orders of magnitude smaller with approximately 10^8 compounds. This discrepancy has led to an interest in generating virtual libraries using hand crafted chemical rules and fragment based methods to cover a larger area of chemical space and generate chemical libraries for use in in silico drug discovery endeavors. Here it is explored to what extent a recurrent neural network with long short term memory cells can figure out sensible chemical rules and generate synthesizable molecules by being trained on existing compounds encoded as SMILES. The networks can to a high extent generate novel, but chemically sensible molecules. The properties of the molecules are tuned by training on two different datasets consisting of fragment like molecules and drug like molecules. The produced molecules and the training databases have very similar distributions of molar weight, predicted logP, number of hydrogen bond acceptors and donors, number of rotatable bonds and topological polar surface area when compared to their respective training sets. The compounds are for the most cases synthesizable as assessed with SA score and Wiley ChemPlanner.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ParetoPilot: Zero-Surrogate Offline Multi-Objective Optimization via Infer-Perturb-Guide Diffusion

    cs.LG 2026-06 unverdicted novelty 6.0

    ParetoPilot is a zero-surrogate diffusion framework for offline MOO that uses an IPG engine to steer generation via inferred objective directions and orthogonal perturbations, outperforming 14 baselines on 51 tasks.

  2. Demystifying Multimodal Biomolecular Co-design With Intrinsic Geodesic Coupling

    q-bio.BM 2026-06 unverdicted novelty 6.0

    GeoCoupling optimizes temporal couplings between modalities in biomolecular generative models and outperforms synchronous baselines on drug design and protein design tasks.