arxiv: 2605.07020 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI

Recognition: no theorem link

FlashMol: High-Quality Molecule Generation in as Few as Four Steps

Xinyuan Wei , Zian Li , Shaoheng Yan , Cai Zhou , Muhan Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords molecule generationdiffusion modelsmodel distillation3D molecular conformationsdrug discoveryfew-step generationJensen-Shannon divergence

0 comments

The pith

FlashMol generates high-quality 3D molecular conformations in only four diffusion steps by distilling a 1000-step teacher model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that classical diffusion models for 3D molecules, which normally need hundreds of steps, can be accelerated to just four steps while keeping or improving quality. It adapts distribution matching distillation as the training objective, respace the timesteps to give the generator a stronger starting point, and adds Jensen-Shannon regularization to keep diversity from collapsing. Experiments on QM9 and GEOM-DRUG datasets confirm the four-step model matches or beats the original 1000-step teacher and delivers up to 250 times faster sampling. This matters for computational drug discovery because slow generation has made large-scale virtual screening impractical.

Core claim

FlashMol produces chemically valid 3D molecular conformations in as few as four steps. It adapts distribution matching distillation to minimize reverse KL divergence in the molecular domain, respace the generation timesteps for better initialization, and regularizes the objective with a Jensen-Shannon divergence term to balance mode-seeking and mean-seeking behavior. On QM9 and GEOM-DRUG the resulting model matches or surpasses the 1000-step GeoLDM teacher while achieving up to 250 times faster sampling.

What carries the argument

Distribution matching distillation adapted with timestep respacing and Jensen-Shannon regularization, which distills a slow diffusion teacher into a fast generator while preserving stability and diversity of 3D molecular conformations.

If this is right

Large-scale in silico screening for drug discovery becomes feasible because generation time drops by up to 250 times.
The distilled four-step model matches or exceeds the 1000-step teacher on standard quality metrics for 3D conformations.
Timestep respacing supplies a stronger initialization that makes the local minimization of distribution matching distillation effective.
Jensen-Shannon regularization counters the mode-seeking tendency of reverse KL and restores sample diversity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distillation recipe could be tested on diffusion models for protein structure or crystal generation to check whether four-step sampling generalizes.
If the regularization term proves robust across domains, similar few-step techniques might shorten inference in image or point-cloud diffusion models.
Running the model on larger, more diverse molecular libraries would test whether the speed-quality trade-off holds outside the QM9 and GEOM-DRUG regimes.

Load-bearing premise

Adapting distribution matching distillation with timestep respacing and Jensen-Shannon regularization will preserve sample stability and diversity when applied to 3D molecular conformations.

What would settle it

If the four-step model produces molecules with substantially lower validity rates or higher average strain energies than the 1000-step teacher on the QM9 test set, the claim of maintained quality collapses.

Figures

Figures reproduced from arXiv: 2605.07020 by Cai Zhou, Muhan Zhang, Shaoheng Yan, Xinyuan Wei, Zian Li.

**Figure 2.** Figure 2: Training framework for FlashMol. In each iteration, the few-step generator G produces a batch of sampled molecules. Then, the µfake model which approximates fake score sfake and the discriminator D are first updated using diffusion loss and GAN loss respectively for 5 steps. After that, µfake and the discriminator D’s output are used to compute the DMD and Jensen-Shannon divergence gradient. Lastly, the gr… view at source ↗

**Figure 4.** Figure 4: Distribution matching distillation training dynamics under different sampling noise schedules. performance. We choose GeoLDM [43], a strong latent molecule generative model, as the backbone of FlashMol. The overall training framework is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Noise schedules for different values of the exponent ρ in the respaced timesteps in Equation (7). As presented in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Generating chemically valid 3D molecular conformations is critical for computational drug discovery. Classical diffusion-based models like GeoLDM perform well but require hundreds of steps, making large-scale in silico screening impractical. Recent efforts on few-step molecular generation have accelerated this process to 12-50 steps, but they often largely sacrifice sample stability. In this work, we present FlashMol, an ultra-fast molecule generative model producing high-quality molecular conformations in as few as 4 steps. To achieve this, we adapt distribution matching distillation (DMD) - a reverse KL-divergence minimization objective - to the molecular domain for effective distillation. Considering the local minimization behavior of DMD, we respace the molecule generation timesteps, providing the generator with much better initialization and enables effective distillation. Additionally, to mitigate the mode-seeking behavior of DMD and improve diversity, we further regularize it with a Jensen-Shannon divergence term, which incorporates the mean-seeking behavior of the forward KL divergence. Extensive experiments on QM9 and GEOM-DRUG datasets demonstrate that FlashMol matches and even surpasses the original 1000-step teacher, achieving up to 250$\times$ acceleration in sampling speed while maintaining high molecular quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FlashMol distills GeoLDM to 4-step 3D molecule generation that matches or beats the 1000-step teacher on QM9 and GEOM-DRUG via DMD plus respacing and JS regularization.

read the letter

The main thing to know is that this work gets high-quality 3D molecular conformations in just 4 sampling steps, matching or surpassing the 1000-step GeoLDM teacher on QM9 and GEOM-DRUG with a 250x speedup. They adapt distribution matching distillation to the molecular setting. DMD is reverse KL and tends to be mode-seeking and local, so they add timestep respacing to give the generator a better initialization and a JS divergence regularizer to bring in mean-seeking behavior and preserve diversity. The paper reports that this combination works well enough to keep validity, stability, and diversity metrics at or above the teacher's level. That is the practical contribution for drug discovery screening. What the paper does well is identify the specific issues with applying DMD to 3D molecules and propose targeted fixes that seem to deliver on the benchmarks. The motivation is clear and the speed gain is substantial if the quality holds. The soft spots are in the verification of the components. The stress-test concern is valid on the surface: without isolated ablations showing the JS term's isolated effect on diversity or confirming the respacing isn't overfit to the data, it's possible the results depend on careful tuning of the weighted objective for the molecular manifold. The abstract doesn't detail error bars or splits, so soundness is not fully clear from what's here, but the claims are empirical and not circular. This paper is for people in ML for chemistry who want faster sampling from diffusion models. A reader working on accelerating generative models for molecules would get concrete techniques and results to build on. It deserves a serious referee because the problem is important and the approach is a direct, testable improvement. I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper presents FlashMol, an adaptation of distribution matching distillation (DMD) for 3D molecular conformation generation. By combining DMD (reverse KL minimization) with timestep respacing for better initialization and a Jensen-Shannon divergence term to mitigate mode-seeking collapse, the method claims to produce high-quality samples in as few as 4 steps. Experiments on QM9 and GEOM-DRUG report that FlashMol matches or exceeds the 1000-step GeoLDM teacher on stability, validity, and diversity metrics while delivering up to 250× sampling acceleration.

Significance. If the results prove robust, the work would be significant for computational drug discovery by removing the computational barrier of hundreds of diffusion steps in large-scale in silico screening. The targeted use of respacing and JS regularization to stabilize few-step distillation on constrained 3D molecular manifolds addresses a practical bottleneck in the field.

major comments (2)

[Experiments] Experiments section: the central claim that the 4-step model matches or surpasses the 1000-step teacher rests on the joint effect of DMD, timestep respacing, and the JS regularization term, yet no component-wise ablations are provided (e.g., performance with DMD+respacing alone or with altered JS coefficient). Because the weighted objective is domain-specific for bond-length/angle constraints and conformer energies, the reported metrics could depend on hyperparameter choices tuned to the test sets rather than emerging from the method itself.
[Method] Method section: the adaptation of DMD to molecular data, including the precise loss formulation after timestep respacing and the weighting of the JS term, is described at a high level. Without the explicit equations or pseudocode for the combined objective and the respacing schedule, it is difficult to verify that the 4-step results are stable and do not rely on post-hoc adjustments that affect the performance claims.

minor comments (2)

[Abstract] Abstract: the claim of 'up to 250× acceleration' should specify the exact teacher sampling steps, hardware, and batch settings used for the timing comparison.
Ensure all reported metrics (stability, validity, diversity) include explicit definitions or citations to the standard molecular-generation literature (e.g., how validity is assessed for 3D conformations).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. The comments highlight important aspects of experimental validation and methodological clarity that we will address in the revision to strengthen the presentation of FlashMol. We respond to each major comment below.

read point-by-point responses

Referee: [Experiments] Experiments section: the central claim that the 4-step model matches or surpasses the 1000-step teacher rests on the joint effect of DMD, timestep respacing, and the JS regularization term, yet no component-wise ablations are provided (e.g., performance with DMD+respacing alone or with altered JS coefficient). Because the weighted objective is domain-specific for bond-length/angle constraints and conformer energies, the reported metrics could depend on hyperparameter choices tuned to the test sets rather than emerging from the method itself.

Authors: We agree that component-wise ablations would better isolate the contributions of each element and address potential concerns about hyperparameter sensitivity. The manuscript focuses on the combined objective because individual components (DMD alone or respacing without JS) do not achieve the target 4-step performance on their own, as motivated by the mode-seeking behavior of reverse KL and the need for better initialization on the molecular manifold. However, to strengthen the claims, we will add ablation tables in the revised Experiments section showing results for DMD+respacing (without JS), JS with different coefficients, and variations in the weighting for bond/angle constraints. Hyperparameters were tuned on a validation split separate from the test sets used for final reporting, following standard practice; we will explicitly state this and include sensitivity analysis to confirm robustness. revision: yes
Referee: [Method] Method section: the adaptation of DMD to molecular data, including the precise loss formulation after timestep respacing and the weighting of the JS term, is described at a high level. Without the explicit equations or pseudocode for the combined objective and the respacing schedule, it is difficult to verify that the 4-step results are stable and do not rely on post-hoc adjustments that affect the performance claims.

Authors: We acknowledge that the Method section presents the adaptations at a conceptual level to maintain readability, but we agree that explicit formulations are necessary for full reproducibility and verification. In the revised manuscript, we will expand the Method section to include the precise combined loss equation (reverse KL from DMD plus weighted JS term), the mathematical definition of the respaced timestep schedule (including how it provides improved initialization for the generator), and pseudocode for the distillation training procedure. This will clarify that the 4-step results arise directly from the described objective without undisclosed post-hoc tuning. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical adaptation validated against external teacher

full rationale

The paper adapts DMD (reverse KL minimization), timestep respacing, and a JS regularization term to distill a 1000-step GeoLDM teacher into a 4-step generator for 3D molecular conformations. All central claims are supported by direct empirical comparisons on QM9 and GEOM-DRUG using standard metrics (stability, validity, diversity) against the independent teacher model. No equation, objective, or performance result is shown to reduce by construction to fitted parameters, self-citations, or renamed inputs; the method description and results remain externally falsifiable and do not rely on internal self-reference for their validity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that DMD's local minimization can be effectively countered by respacing and JS regularization in the molecular domain, plus standard ML training assumptions; no new physical entities or axioms are introduced.

free parameters (1)

number of sampling steps
Chosen as 4 to achieve ultra-fast generation; value is a design choice rather than fitted to data.

axioms (1)

domain assumption DMD objective can be adapted to 3D molecular conformations without loss of chemical validity
Invoked when stating that the reverse KL minimization transfers effectively to the molecular domain.

pith-pipeline@v0.9.0 · 5520 in / 1263 out tokens · 32798 ms · 2026-05-11T00:57:39.584173+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 24 canonical work pages · 7 internal anchors

[1]

Geom, energy-annotated molecular conforma- tions for property prediction and molecular generation.Scientific Data, 2022

Simon Axelrod and Rafael Gómez-Bombarelli. Geom, energy-annotated molecular conforma- tions for property prediction and molecular generation.Scientific Data, 2022

2022
[2]

Optimizing few-step generation with adaptive matching distillation.arXiv preprint arXiv:2602.07345, 2026

Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, and Zeke Xie. Optimizing few-step generation with adaptive matching distillation.arXiv preprint arXiv:2602.07345, 2026

work page arXiv 2026
[3]

How to build a consistency model: Learning flow maps via self-distillation.arXiv preprint arXiv:2505.18825,

Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation.arXiv preprint arXiv:2505.18825, 2025

work page arXiv 2025
[4]

Computational redesign of bacterial biotin carboxylase inhibitors using structure-based virtual screening of combinatorial libraries.Molecules, 2014

Michal Brylinski and Grover Waldrop. Computational redesign of bacterial biotin carboxylase inhibitors using structure-based virtual screening of combinatorial libraries.Molecules, 2014

2014
[5]

Ian Dunn and David R. Koes. Flowmol3: flow matching for 3d de novo small-molecule generation.Digital Discovery, 2026

2026
[6]

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, Jeremy Z. Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447, 2025

work page internal anchor Pith review arXiv 2025
[7]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, 2014

2014
[8]

Greenaway and Kim E

Rebecca L. Greenaway and Kim E. Jelfs. Integrating computational and experimental workflows for accelerated organic materials discovery.Advanced Materials, 2021

2021
[9]

Equivariant flow matching for molecular conformer generation.ICML 2024 Workshop, 2024

Majdi Hassan, Nikhil Shenoy, Jungyoon Lee, Hannes Stark, Stephan Thaler, and Dominique Beaini. Equivariant flow matching for molecular conformer generation.ICML 2024 Workshop, 2024

2024
[10]

Accelerating 3d molecule generation via jointly geometric optimal transport.arXiv preprint arXiv:2405.15252, 2024

Haokai Hong, Wanyu Lin, and Kay Chen Tan. Accelerating 3d molecule generation via jointly geometric optimal transport.arXiv preprint arXiv:2405.15252, 2024

work page arXiv 2024
[11]

Equivariant diffusion for molecule generation in 3d.Proceedings of the 39th International Conference on Machine Learning, 2022

Emiel Hoogeboom, Victor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d.Proceedings of the 39th International Conference on Machine Learning, 2022

2022
[12]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv preprint arXiv:2506.08009, 2025

work page internal anchor Pith review arXiv 2025
[13]

SemlaFlow–Efficient 3D molecular generation with latent attention and equivariant flow matching.arXiv preprint arXiv:2406.07266, 2024

Ross Irwin, Alessandro Tibo, Jon Paul Janet, and Simon Olsson. Semlaflow–efficient 3d molecular generation with latent attention and equivariant flow matching.arXiv preprint arXiv:2406.07266, 2024

work page arXiv 2024
[14]

Hierarchical graph generation with K 2-trees

Yunhui Jang, Dongwoo Kim, and Sungsoo Ahn. Hierarchical graph generation with K 2-trees. InICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023. 10

2023
[15]

Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649, 2025

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, Bo Zhang, Mengmeng Wang, Steven Hoi, Peng Gao, and Harry Yang. Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649, 2025

work page arXiv 2025
[16]

Elucidating the design space of diffusion-based generative models.Advances in Neural Information Processing Systems, 2022

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in Neural Information Processing Systems, 2022

2022
[17]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[18]

Molecule generation by principal subgraph mining and assembling.Advances in Neural Information Processing Systems, 2022

Xiangzhe Kong, Wenbing Huang, Zhixing Tan, and Yang Liu. Molecule generation by principal subgraph mining and assembling.Advances in Neural Information Processing Systems, 2022

2022
[19]

Accelerating the generation of molecular conforma- tions with progressive distillation of equivariant latent diffusion models.arXiv preprint arXiv:2404.13491, 2024

Romain Lacombe and Neal Vaidya. Accelerating the generation of molecular conforma- tions with progressive distillation of equivariant latent diffusion models.arXiv preprint arXiv:2404.13491, 2024

work page arXiv 2024
[20]

Geometric representation condition improves equivariant molecule generation.arXiv preprint arXiv:2410.03655, 2024

Zian Li, Cai Zhou, Xiyuan Wang, Xingang Peng, and Muhan Zhang. Geometric representation condition improves equivariant molecule generation.arXiv preprint arXiv:2410.03655, 2024

work page arXiv 2024
[21]

Haitao Lin, Peiyan Hu, Minsi Ren, Zhifeng Gao, Zhi-Ming Ma, Guolin ke, Tailin Wu, and Stan Z. Li. On the design of one-step diffusion via shortcutting flow paths.arXiv preprint arXiv:2512.11831, 2025

work page arXiv 2025
[22]

Divergence measures based on the shannon entropy.IEEE Transactions on Information theory, 2002

Jianhua Lin. Divergence measures based on the shannon entropy.IEEE Transactions on Information theory, 2002

2002
[23]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.arXiv preprint arXiv:2410.11081, 2024

work page internal anchor Pith review arXiv 2024
[24]

Which training methods for gans do actually converge?Proceedings of the 35th International Conference on Machine Learning, 2018

Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for gans do actually converge?Proceedings of the 35th International Conference on Machine Learning, 2018

2018
[25]

Straight-line diffusion model for efficient 3d molecular generation.arXiv preprint arXiv:2503.02918, 2025

Yuyan Ni, Shikun Feng, Haohan Chi, Bowen Zheng, Huan ang Gao, Wei-Ying Ma, Zhi-Ming Ma, and Yanyan Lan. Straight-line diffusion model for efficient 3d molecular generation.arXiv preprint arXiv:2503.02918, 2025

work page arXiv 2025
[26]

Automatic differentiation in pytorch.NeurIPS 2017 Workshop on Autodiff, 2017

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch.NeurIPS 2017 Workshop on Autodiff, 2017

2017
[27]

Defog: Discrete flow matching for graph generation.arXiv preprint arXiv:2410.04263, 2024

Yiming Qin, Manuel Madeira, Dorina Thanou, and Pascal Frossard. Defog: Discrete flow matching for graph generation.arXiv preprint arXiv:2410.04263, 2024

work page arXiv 2024
[28]

Dral, Matthias Rupp, and Anatole von Lilienfeld

Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and Anatole von Lilienfeld. Quan- tum chemistry structures and properties of 134 kilo molecules.Scientific Data, 2014

2014
[29]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022

work page internal anchor Pith review arXiv 2022
[30]

E (n) equivariant graph neural networks

Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. InInternational conference on machine learning, pages 9323–9332. PMLR, 2021

2021
[31]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[32]

Improved techniques for training consistency models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models.arXiv preprint arXiv:2310.14189, 2023

work page arXiv 2023
[33]

Consistency models.Proceedings of the 40th International Conference on Machine Learning, 2023

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models.Proceedings of the 40th International Conference on Machine Learning, 2023. 11

2023
[34]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[35]

Unified generative modeling of 3d molecules with bayesian flow networks.The Twelfth International Conference on Learning Representations, 2024

Yuxuan Song, Jingjing Gong, Yanru Qu, Hao Zhou, Mingyue Zheng, Jingjing Liu, and Wei-Ying Ma. Unified generative modeling of 3d molecules with bayesian flow networks.The Twelfth International Conference on Learning Representations, 2024

2024
[36]

Equivariant flow matching with hybrid probability transport for 3d molecule generation.Advances in Neural Information Processing Systems, 2023

Yuxuan Song, Jingjing Gong, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, and Wei-Ying Ma. Equivariant flow matching with hybrid probability transport for 3d molecule generation.Advances in Neural Information Processing Systems, 2023

2023
[37]

Flow map distillation without data.arXiv preprint arXiv:2511.19428, 2025

Shangyuan Tong, Nanye Ma, Saining Xie, and Tommi Jaakkola. Flow map distillation without data.arXiv preprint arXiv:2511.19428, 2025

work page arXiv 2025
[38]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 2017

2017
[39]

The Eleventh International Conference on Learning Representations , publisher =

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, V olkan Cevher, and Pas- cal Frossard. Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734, 2022

work page arXiv 2022
[40]

Learning diffusion models with flexible representation guidance.arXiv preprint arXiv:2507.08980, 2025

Chenyu Wang, Cai Zhou, Sharut Gupta, Zongyu Lin, Stefanie Jegelka, Stephen Bates, and Tommi Jaakkola. Learning diffusion models with flexible representation guidance.arXiv preprint arXiv:2507.08980, 2025

work page arXiv 2025
[41]

Warr, Marc C

Wendy A. Warr, Marc C. Nicklaus, Christos A. Nicolaou, and Matthias Rarey. Exploration of ultralarge compound collections for drug discovery.Journal of Chemical Information and Modeling, 2022

2022
[42]

Diffusion-based molecule generation with informative prior bridges.Advances in Neural Information Processing Systems, 2022

Lemeng Wu, Chengyue Gong, Xingchao Liu, Mao Ye, and Qiang Liu. Diffusion-based molecule generation with informative prior bridges.Advances in Neural Information Processing Systems, 2022

2022
[43]

Geometric latent diffusion models for 3d molecule generation.Proceedings of the 40th International Conference on Machine Learning, 2023

Minkai Xu, Alexander Powers, Ron Dror, Stefano Ermon, and Jure Leskovec. Geometric latent diffusion models for 3d molecule generation.Proceedings of the 40th International Conference on Machine Learning, 2023

2023
[44]

Geodiff: A geo- metric diffusion model for molecular conformation generation.arXiv preprint arXiv:2203.02923, 2022

Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. Geodiff: A geo- metric diffusion model for molecular conformation generation.arXiv preprint arXiv:2203.02923, 2022

work page arXiv 2022
[45]

arXiv preprint arXiv:2502.15681 , year=

Yilun Xu, Weili Nie, and Arash Vahdat. One-step diffusion models with f-divergence distribu- tion matching.arXiv preprint arXiv:2502.15681, 2025

work page arXiv 2025
[46]

Next-gen therapeutics: pioneering drug discovery with ipscs, genomics, ai, and clinical trials in a dish.Annual Review of Pharmacology and Toxicology, 2025

Zehra Yildirim, Kyle Swanson, Xuekun Wu, James Zou, and Joseph Wu. Next-gen therapeutics: pioneering drug discovery with ipscs, genomics, ai, and clinical trials in a dish.Annual Review of Pharmacology and Toxicology, 2025

2025
[47]

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T. Freeman. Improved distribution matching distillation for fast image synthesis. Advances in Neural Information Processing Systems, 2024

2024
[48]

Free- man, and Taesung Park

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Free- man, and Taesung Park. One-step diffusion with distribution matching distillation.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2024
[49]

Freeman, Frédo Durand, Eli Shecht- man, and Xun Huang

Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Frédo Durand, Eli Shecht- man, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2025
[50]

Accelerating 3d molecule generative models with trajectory diagnosis

Zhilong Zhang, Yuxuan Song, Yichun Wang, Jingjing Gong, Hanlin Wu, Dongzhan Zhou, Hao Zhou, and Wei-Ying Ma. Accelerating 3d molecule generative models with trajectory diagnosis. The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 12

2025
[51]

Unifying generation and prediction on graphs with latent graph diffusion.Advances in Neural Information Processing Systems, 2024

Cai Zhou, Xiyuan Wang, and Muhan Zhang. Unifying generation and prediction on graphs with latent graph diffusion.Advances in Neural Information Processing Systems, 2024

2024
[52]

(pages 4 and 10)

Linqi Zhou, Mathias Parger, Ayaan Haque, and Jiaming Song. Terminal velocity matching. arXiv preprint arXiv:2511.19797, 2025. 13 Appendix A Additional Preliminaries A.1 Molecule Diffusion Models We provide additional details on the molecule diffusion model summarized in Section 3. Following GeoLDM [43], a molecule with N atoms is represented as G=⟨x, h⟩ ,...

work page arXiv 2025