SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Abhinav Vishnu; Charles Siegel; Garrett B. Goh; Nathan O. Hodas

arxiv: 1712.02034 · v2 · pith:6PW4FQDNnew · submitted 2017-12-06 · 📊 stat.ML · cs.AI· cs.CL· cs.LG

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Garrett B. Goh , Nathan O. Hodas , Charles Siegel , Abhinav Vishnu This is my paper

classification 📊 stat.ML cs.AIcs.CLcs.LG

keywords chemicalneuralpropertiesdeepnetworknetworkssmilessmiles2vec

0 comments

read the original abstract

Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for predicting distinct chemical properties including toxicity, activity, solubility and solvation energy, while also outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by developing an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, it identified specific parts of a chemical that is consistent with established first-principles knowledge with an accuracy of 88%. Our work demonstrates that neural networks can learn technically accurate chemical concept and provide state-of-the-art accuracy, making interpretable deep neural networks a useful tool of relevance to the chemical industry.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery
physics.comp-ph 2019-07 unverdicted novelty 7.0

IRNet uses per-layer residual shortcuts in fully connected networks to achieve better prediction accuracy and training convergence than prior ML methods on OQMD and Materials Project datasets for material properties.