GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation
read the original abstract
Predicting molecular conformations from molecular graphs is a fundamental problem in cheminformatics and drug discovery. Recently, significant progress has been achieved with machine learning approaches, especially with deep generative models. Inspired by the diffusion process in classical non-equilibrium thermodynamics where heated particles will diffuse from original states to a noise distribution, in this paper, we propose a novel generative model named GeoDiff for molecular conformation prediction. GeoDiff treats each atom as a particle and learns to directly reverse the diffusion process (i.e., transforming from a noise distribution to stable conformations) as a Markov chain. Modeling such a generation process is however very challenging as the likelihood of conformations should be roto-translational invariant. We theoretically show that Markov chains evolving with equivariant Markov kernels can induce an invariant distribution by design, and further propose building blocks for the Markov kernels to preserve the desirable equivariance property. The whole framework can be efficiently trained in an end-to-end fashion by optimizing a weighted variational lower bound to the (conditional) likelihood. Experiments on multiple benchmarks show that GeoDiff is superior or comparable to existing state-of-the-art approaches, especially on large molecules.
This paper has not been read by Pith yet.
Forward citations
Cited by 21 Pith papers
-
Generative Modeling with Flux Matching
Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
-
World Models as Group Actions
Formalizes video world models as group actions on states and uses latent regularization with synthesized supervision to enforce consistency, introducing GAC and GAR metrics that improve structural correctness in SOTA models.
-
Training-Free Generative Sampling via Moment-Matched Score Smoothing
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
-
From Holo Pockets to Electron Density: GPT-style Drug Design with Density
EDMolGPT generates drug-like molecules from low-resolution electron density point clouds of holo binding pockets and shows effectiveness across 101 biological targets.
-
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for sp...
-
h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
h-MINT improves ligand-protein binding affinity prediction by 2-4% and virtual screening metrics by 1-3% via overlapping fragment tokenization and hierarchical modeling.
-
How Creative Are Large Language Models in Generating Molecules?
Large language models exhibit distinct creative patterns in molecule generation, including higher constraint satisfaction when more constraints are added, and this is the first work to reframe molecule generation abil...
-
Time-Aware Diffusion based on Preference Disentanglement for Generative Recommendation
TDPM is a diffusion-based generative recommender that disentangles user preferences into period and point components to enable time-aware diffusion on semantic indices, reporting up to 29% gains on HR@20 and NDCG@20 o...
-
Latent Diffusion Pretraining for Crystal Property Prediction
CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.
-
DiffATS: Diffusion in Aligned Tensor Space
DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with...
-
From Holo Pockets to Electron Density: GPT-style Drug Design with Density
EDMolGPT generates molecules from low-resolution electron density for de novo structure-based drug design, claiming better performance than pocket-based methods on 101 targets.
-
Toward Better Geometric Representations for Molecule Generative Models
LENSEs improves representation-conditioned molecule generation by jointly training a multi-level representation head, perceptual loss, and REPA alignment on pretrained encoders, yielding 97.28% validity and 98.51% sta...
-
FlashMol: High-Quality Molecule Generation in as Few as Four Steps
FlashMol produces chemically valid 3D molecules in 4 steps via distribution matching distillation with respaced timesteps and Jensen-Shannon regularization, matching or exceeding 1000-step teacher performance on QM9 a...
-
SymDrift: One-Shot Generative Modeling under Symmetries
SymDrift makes drifting models produce symmetry-invariant samples in one step via symmetrized coordinate drifts or G-invariant embeddings, outperforming prior one-shot baselines on molecular benchmarks and cutting com...
-
Interests Burn-down Diffusion Process for Personalized Collaborative Filtering
A new interests burn-down diffusion process models decaying user interests for personalized collaborative filtering and outperforms prior generative methods in the StageCF implementation.
-
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
PerFlow decouples observation conditioning from physics enforcement in rectified flows using constraint-preserving projections and invariance guarantees for fast, physics-consistent reconstruction of spatiotemporal dynamics.
-
LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design
LEGO-MOF maps MOF linkers to an equivariant latent space for continuous editing and uses test-time optimization to achieve a 147.5% average boost in pure CO2 uptake while preserving structural validity.
-
MolDA: Molecular Understanding and Generation via Large Language Diffusion Model
MolDA is a multimodal molecular model that uses a discrete large language diffusion backbone plus a hybrid graph encoder to achieve better global coherence and validity than autoregressive approaches.
-
Energy-Guided Generative Modeling for Low-Energy Molecular Structure Discovery
EnFlow integrates flow-based conformer generation with energy landscape modeling to enable joint ensemble generation and ground-state identification using only 1-2 ODE steps.
-
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.
-
On the Limits of Latent Reuse in Diffusion Models
Reusing source latent spaces in diffusion models under distribution shift produces target score error set by principal-angle misalignment and diffusion-time-amplified ambient noise.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.