pith. sign in

arxiv: 2502.01184 · v2 · pith:FUEQA3NZnew · submitted 2025-02-03 · 💻 cs.LG · cs.AI· physics.chem-ph· q-bio.QM

FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

classification 💻 cs.LG cs.AIphysics.chem-phq-bio.QM
keywords molecularchemicallygranularitylearningrepresentationadaptiveatomsfragment
0
0 comments X
read the original abstract

Molecular representation learning methods typically tokenize molecules as individual atoms or use rigid, rule-based fragment decompositions, limiting their ability to capture meaningful chemical substructure context. We introduce FragmentNet, a graph-to-sequence model built around a novel adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments of adjustable granularity, complemented by chemically aware spatial positional encodings that preserve molecular topology in the resulting sequence. Extending masked pre-training strategies from natural language processing to the molecular domain, we mask and reconstruct molecules at the level of chemically meaningful fragments rather than individual atoms. Evaluating across multiple property prediction benchmarks, we find that pre-training at fragment granularity leads to improved downstream performance on the majority of tasks, demonstrating that tokenization granularity is an important design choice for molecular representation learning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning

    cs.LG 2026-04 unverdicted novelty 6.0

    BiScale-GTR achieves claimed state-of-the-art results on MoleculeNet, PharmaBench and LRGB by combining improved fragment tokenization with a parallel GNN-Transformer architecture that operates at both atom and fragme...