Molecule Attention Transformer

Jacek Tabor; Krzysztof Rataj; {\L}ukasz Maziarka; S{\l}awomir Mucha; Stanis{\l}aw Jastrz\k{e}bski; Tomasz Danel

arxiv: 2002.08264 · v1 · pith:CYNLGDADnew · submitted 2020-02-19 · 💻 cs.LG · physics.comp-ph· stat.ML

Molecule Attention Transformer

{\L}ukasz Maziarka , Tomasz Danel , S{\l}awomir Mucha , Krzysztof Rataj , Jacek Tabor , Stanis{\l}aw Jastrz\k{e}bski This is my paper

classification 💻 cs.LG physics.comp-phstat.ML

keywords attentionmoleculetaskstransformercompetitivelymolecularperformsprediction

0 comments

read the original abstract

Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction
cs.LG 2026-05 unverdicted novelty 7.0

Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.
TIGER: Text-Informed Generalized Enzyme-Reaction Retrieval
cs.AI 2026-05 unverdicted novelty 5.0

TIGER is a text-informed ML framework that improves bidirectional enzyme-reaction retrieval by distilling semantic knowledge from sequences and aligning representations across tasks and distributions.
A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era
cs.LG 2026-04 accept novelty 5.0

A systematic survey and benchmark of four deep learning paradigms for molecular property prediction that organizes the field, critiques current data practices, and outlines three future directions.