Generative Molecular Morphing for Flexible-Size Design via Unbalanced Optimal Transport
Pith reviewed 2026-06-27 22:51 UTC · model grok-4.3
The pith
Morph generates 3D molecules of variable atom count by applying unbalanced optimal transport to geometric graphs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Morph is a flexible-size generative model for conditional and unconditional 3D molecular design based on geometric graphs. By dynamically adapting the number of atoms through unbalanced optimal transport, it integrates existing structural priors and enhances property steering without post-hoc adjustments, matching fixed-size state-of-the-art performance while enabling sampling in regimes where prior models fail.
What carries the argument
Unbalanced optimal transport on geometric graphs, which performs dynamic size adaptation while preserving structural information.
If this is right
- Molecular generation can now target properties whose optimum depends on atom count without separate size-selection steps.
- Scaffolds and other partial structures can be used directly as conditioning input rather than requiring post-processing.
- Steering toward high-reward molecules becomes stronger because size and property distributions are modeled jointly.
- Generation succeeds in property-size regimes outside the training distribution where fixed-size models produce invalid outputs.
Where Pith is reading between the lines
- The same transport mechanism could be tested on other variable-length structured objects such as proteins or materials lattices.
- Pairing Morph with reinforcement learning loops might further improve property optimization by exploiting the continuous size flexibility.
- Efficiency at very large atom counts remains untested and could reveal whether the transport computation scales without additional approximations.
Load-bearing premise
Unbalanced optimal transport on geometric graphs can adapt molecular size dynamically without lowering generation quality or losing the ability to use structural priors.
What would settle it
A side-by-side evaluation on standard fixed-size molecular benchmarks in which Morph produces lower validity, novelty, or property scores than current diffusion or flow models.
Figures
read the original abstract
The success of generative molecular design hinges on a model's steerability toward high-reward samples. Because many molecular properties are intrinsically linked to molecular size, accurately capturing the joint distribution of properties and the number of atoms is essential. However, current diffusion and flow-based models fix the number of atoms, which ultimately limits their ability to navigate this complex relationship. To address this, we introduce Morph, a flexible-size generative model for conditional and unconditional 3D molecular design based on geometric graphs. By dynamically adapting size, Morph can seamlessly integrate existing structural priors, like scaffolds, and significantly enhances property steering. We show that Morph matches current fixed-size state-of-the-art models while offering the benefit of unparalleled sampling flexibility. We demonstrate out-of-distribution generation in regimes where previous models fail, paving the way for enhanced generative modeling for molecular design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Morph, a generative model for conditional and unconditional 3D molecular design that operates on geometric graphs via unbalanced optimal transport. Unlike prior diffusion and flow-based approaches that fix the number of atoms, Morph dynamically adapts molecular size. The central claims are that it matches the generation quality of fixed-size state-of-the-art models, seamlessly incorporates structural priors such as scaffolds, improves property steering, and enables out-of-distribution generation in regimes where previous models fail.
Significance. If the empirical claims are substantiated, the work would represent a meaningful advance in generative molecular design. Allowing variable atom counts directly addresses the coupling between molecular size and many target properties, potentially improving steerability and practical applicability without post-hoc fixes. The use of unbalanced OT on geometric graphs to achieve this flexibility while preserving structural priors is a technically interesting direction.
minor comments (2)
- [Abstract / Results] The abstract states that Morph 'matches current fixed-size state-of-the-art models' on generation quality; the results section should include direct quantitative comparisons (e.g., validity, uniqueness, property optimization metrics) against the specific baselines referenced.
- [Method] Clarify how the unbalanced OT formulation is implemented to enforce size adaptation without introducing additional hyperparameters that could affect the 'parameter-free' aspects of the structural priors.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the manuscript and for recommending minor revision. We are encouraged by the recognition that addressing variable molecular size via unbalanced optimal transport on geometric graphs represents a meaningful technical direction with potential practical benefits for steerability and out-of-distribution generation.
Circularity Check
No significant circularity
full rationale
The paper introduces Morph as a new generative model based on unbalanced optimal transport applied to geometric graphs, claiming it enables flexible-size sampling while matching fixed-size SOTA performance and enabling OOD generation. The abstract and provided claims present this as an empirical method with stated benefits, without any equations, derivations, or performance assertions that reduce by construction to fitted inputs, self-definitions, or self-citation chains. No load-bearing step is shown to rename a known result, smuggle an ansatz, or import uniqueness from prior author work as an external theorem. The central claims rest on the described OT formulation and experimental demonstrations rather than tautological reductions, making the derivation self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Machine learning-aided generative molecular design.Nature Machine Intelligence, 6(6):589–604, 2024
Yuanqi Du, Arian R Jamasb, Jeff Guo, Tianfan Fu, Charles Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, and Tom L Blundell. Machine learning-aided generative molecular design.Nature Machine Intelligence, 6(6):589–604, 2024
2024
-
[2]
Midi: Mixed graph and 3d denoising diffusion for molecule generation
Clement Vignac, Nagham Osman, Laura Toni, and Pascal Frossard. Midi: Mixed graph and 3d denoising diffusion for molecule generation. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 560–576. Springer, 2023
2023
-
[3]
Tuan Le, Julian Cremer, Frank Noe, Djork-Arné Clevert, and Kristof Schütt. Navigating the design space of equivariant diffusion-based generative models for de novo 3d molecule generation.arXiv preprint arXiv:2309.17296, 2023
arXiv 2023
-
[4]
Flowmol3: flow matching for 3d de novo small-molecule generation.Digital Discovery, 5(5):2052–2066, 2026
Ian Dunn and David R Koes. Flowmol3: flow matching for 3d de novo small-molecule generation.Digital Discovery, 5(5):2052–2066, 2026
2052
-
[5]
Ross Irwin, Alessandro Tibo, Jon Paul Janet, and Simon Olsson. SemlaFlow – efficient 3d molecular generation with latent attention and equivariant flow matching.arXiv preprint arXiv:2406.07266, 2025
arXiv 2025
-
[6]
Marton Havasi, Brian Karrer, Itai Gat, and Ricky T. Q. Chen. Edit flows: Flow matching with edit operations.arXiv preprint arXiv:2506.09018, 2025
arXiv 2025
-
[7]
Peter Holderrieth, Marton Havasi, Jason Yim, Neta Shaul, Itai Gat, Tommi Jaakkola, Brian Karrer, Ricky T. Q. Chen, and Yaron Lipman. Generator matching: Generative modeling with arbitrary markov processes.arXiv preprint arXiv:2410.20587, 2025
arXiv 2025
-
[8]
Cambridge university press, 2009
David Applebaum.Lévy processes and stochastic calculus. Cambridge university press, 2009
2009
-
[9]
John Wiley & Sons, 2009
Stewart N Ethier and Thomas G Kurtz.Markov processes: characterization and convergence. John Wiley & Sons, 2009
2009
-
[10]
Approximate graph edit distance computation by means of bipartite graph matching.Image and Vision computing, 27(7):950–959, 2009
Kaspar Riesen and Horst Bunke. Approximate graph edit distance computation by means of bipartite graph matching.Image and Vision computing, 27(7):950–959, 2009
2009
-
[11]
Unbalanced optimal transport, from theory to numerics.Handbook of Numerical Analysis, 24:407–471, 2023
Thibault Séjourné, Gabriel Peyré, and François-Xavier Vialard. Unbalanced optimal transport, from theory to numerics.Handbook of Numerical Analysis, 24:407–471, 2023
2023
-
[12]
The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97, 1955
Harold W Kuhn. The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97, 1955
1955
-
[13]
Algorithms for the assignment and transportation problems.Journal of the society for industrial and applied mathematics, 5(1):32–38, 1957
James Munkres. Algorithms for the assignment and transportation problems.Journal of the society for industrial and applied mathematics, 5(1):32–38, 1957. 10
1957
-
[14]
A solution for the best rotation to relate two sets of vectors.Foundations of Crystallography, 32(5):922–923, 1976
Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors.Foundations of Crystallography, 32(5):922–923, 1976
1976
-
[15]
Equivariant flow matching.Advances in Neural Information Processing Systems, 36:59886–59910, 2023
Leon Klein, Andreas Krämer, and Frank Noé. Equivariant flow matching.Advances in Neural Information Processing Systems, 36:59886–59910, 2023
2023
-
[16]
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching.arXiv preprint arXiv:2407.15595, 2024
arXiv 2024
-
[17]
Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, and Tommi Jaakkola. Gener- ative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design.arXiv preprint arXiv:2402.04997, 2024
arXiv 2024
-
[18]
Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022
Pith/arXiv arXiv 2022
-
[19]
John Nguyen, Marton Havasi, Tariq Berrada, Luke Zettlemoyer, and Ricky T. Q. Chen. One- Flow: Concurrent mixed-modal and interleaved generation with edit flows.arXiv preprint arXiv:2510.03506, 2025
arXiv 2025
-
[20]
Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17.Journal of chemical information and modeling, 52(11):2864–2875, 2012
Lars Ruddigkeit, Ruud Van Deursen, Lorenz C Blum, and Jean-Louis Reymond. Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17.Journal of chemical information and modeling, 52(11):2864–2875, 2012
2012
-
[21]
Quantum chemistry structures and properties of 134 kilo molecules.Scientific Data, 1, 2014
Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules.Scientific Data, 1, 2014
2014
-
[22]
Geom, energy-annotated molecular conforma- tions for property prediction and molecular generation.Scientific data, 9(1):185, 2022
Simon Axelrod and Rafael Gomez-Bombarelli. Geom, energy-annotated molecular conforma- tions for property prediction and molecular generation.Scientific data, 9(1):185, 2022
2022
-
[23]
Martin Buttenschoen, Garrett M. Morris, and Charlotte M. Deane. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chemical Science, 15:3130–3139, 2024. doi: 10.1039/D3SC04185A. URL http://dx.doi. org/10.1039/D3SC04185A
-
[24]
Cheng Zeng, Jirui Jin, Connor Ambrose, George Karypis, Mark Transtrum, Ellad B. Tadmor, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, and Mingjie Liu. PropMolFlow: property-guided molecule generation with geometry-complete flow matching.Nature Computa- tional Science, pages 1–10, January 2026. ISSN 2662-8457. doi: 10.1038/s43588-025-00946-y. URL...
-
[25]
Rdkit: Open-source cheminformatics, 2026
Greg Landrum et al. Rdkit: Open-source cheminformatics, 2026
2026
-
[26]
Vendy Fialková, Jiaxi Zhao, Kostas Papadopoulos, Ola Engkvist, Esben Jannik Bjerrum, Thierry Kogej, and Atanas Patronov. Libinvent: Reaction-based generative scaffold decoration for in silico library design.Journal of Chemical Information and Modeling, 62(9):2046– 2063, 2022. doi: 10.1021/acs.jcim.1c00469. URL https://doi.org/10.1021/acs.jcim. 1c00469. PM...
-
[27]
Structure-based drug design with equivariant diffusion models.Nature Computational Science, 4(12):899–909, 2024
Arne Schneuing, Charles Harris, Yuanqi Du, Kieran Didi, Arian Jamasb, Ilia Igashov, Weitao Du, Carla Gomes, Tom L Blundell, Pietro Lio, Max Welling, Michael Bronstein, and Bruno Correia. Structure-based drug design with equivariant diffusion models.Nature Computational Science, 4(12):899–909, 2024
2024
-
[28]
Levenshtein transformer.arXiv preprint arXiv:1905.11006, 2019
Jiatao Gu, Changhan Wang, and Jake Zhao. Levenshtein transformer.arXiv preprint arXiv:1905.11006, 2019
arXiv 1905
-
[29]
Insertion-deletion transformer
Laura Ruis, Mitchell Stern, Julia Proskurnia, and William Chan. Insertion-deletion transformer. arXiv preprint arXiv:2001.05540, 2020
arXiv 2001
-
[30]
Johnson, Jacob Austin, Rianne van den Berg, and Daniel Tarlow
Daniel D. Johnson, Jacob Austin, Rianne van den Berg, and Daniel Tarlow. Beyond in- place corruption: Insertion and deletion in denoising probabilistic models.arXiv preprint arXiv:2107.07675, 2021. 11
arXiv 2021
-
[31]
Any-order flexible length masked diffusion
Jaeyeon Kim, Lee Cheuk-Kit, Carles Domingo-Enrich, Yilun Du, Sham Kakade, Timothy Ngotiaoco, Sitan Chen, and Michael Albergo. Any-order flexible length masked diffusion. arXiv preprint arXiv:2509.01025, 2025
arXiv 2025
-
[32]
Dhruvesh Patel, Aishwarya Sahoo, Avinash Amballa, Tahira Naseem, Tim GJ Rudner, and Andrew McCallum. Insertion language models: Sequence generation with arbitrary-position insertions.arXiv preprint arXiv:2505.05755, 2025
arXiv 2025
-
[33]
Graphrnn: Generat- ing realistic graphs with deep auto-regressive models
Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. Graphrnn: Generat- ing realistic graphs with deep auto-regressive models. InInternational conference on machine learning, pages 5708–5717. PMLR, 2018
2018
-
[34]
Efficient graph generation with graph recurrent attention networks
Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will Hamilton, David K Duvenaud, Raquel Urtasun, and Richard Zemel. Efficient graph generation with graph recurrent attention networks. Advances in neural information processing systems, 32, 2019
2019
-
[35]
Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734, 2022
Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, V olkan Cevher, and Pas- cal Frossard. Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734, 2022
arXiv 2022
-
[36]
Score-based generative modeling of graphs via the system of stochastic differential equations
Jaehyeong Jo, Seul Lee, and Sung Ju Hwang. Score-based generative modeling of graphs via the system of stochastic differential equations. InInternational conference on machine learning, pages 10362–10383. PMLR, 2022
2022
-
[37]
Graph diffusion that can insert and delete
Matteo Ninniri, Marco Podda, and Davide Bacciu. Graph diffusion that can insert and delete. arXiv preprint arXiv:2506.15725, 2025
arXiv 2025
-
[38]
Equivariant diffusion for molecule generation in 3d.International conference on machine learning, pages 8867–8887, 2022
Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d.International conference on machine learning, pages 8867–8887, 2022
2022
-
[39]
Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram, and Zachary W
Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram, and Zachary W. Ulissi. All-atom diffusion transformers: Unified generative modelling of molecules and materials.arXiv preprint arXiv:2503.03965, 2025
arXiv 2025
-
[40]
Benjamin Erichson, and Michael W
Alex Morehead, Miruna Cretu, Antonia Panescu, Rishabh Anand, Maurice Weiler, Tynan Perez, Samuel Blau, Steven Farrell, Wahid Bhimji, Anubhav Jain, Hrushikesh Sahasrabuddhe, Pietro Lio, Tommi Jaakkola, Rafael Gomez-Bombarelli, Rex Ying, N. Benjamin Erichson, and Michael W. Mahoney. Zatom-1: A multimodal flow foundation model for 3d molecules and materials....
Pith/arXiv arXiv 2026
-
[41]
Junjie Xie, Sheng Chen, Jinping Lei, and Yuedong Yang. Diffdec: Structure-aware scaffold decoration with an end-to-end diffusion model.Journal of Chemical Information and Modeling, 64(7):2554–2564, 2024. doi: 10.1021/acs.jcim.3c01466. URL https://doi.org/10.1021/ acs.jcim.3c01466. PMID: 38267393
-
[42]
Carlos V onessen, Charles Harris, Miruna Cretu, and Pietro Liò. TABASCO: A fast, sim- plified model for molecular generation with improved physical quality.arXiv preprint arXiv:2507.00899, 2025
arXiv 2025
-
[43]
Applications of modular co-design for de novo 3d molecule generation.Digital Discovery, 5(2):754–768, 2026
Danny Reidenbach, Filipp Nikitin, Olexandr Isayev, and Saee Gopal Paliwal. Applications of modular co-design for de novo 3d molecule generation.Digital Discovery, 5(2):754–768, 2026
2026
-
[44]
Equivariant 3d-conditional diffusion model for molecular linker design.Nature Machine Intelligence, 6(4):417–427, 2024
Ilia Igashov, Hannes Stärk, Clément Vignac, Arne Schneuing, Victor Garcia Satorras, Pascal Frossard, Max Welling, Michael Bronstein, and Bruno Correia. Equivariant 3d-conditional diffusion model for molecular linker design.Nature Machine Intelligence, 6(4):417–427, 2024
2024
-
[45]
Jingyuan Zhou, Hao Qian, Shikui Tu, and Lei Xu. Prior-guided flow matching for target-aware molecule design with learnable atom number.arXiv preprint arXiv: 2509.01486, 2025
arXiv 2025
-
[46]
Andrew Campbell, William Harvey, Christian Weilbach, Valentin De Bortoli, Tom Rainforth, and Arnaud Doucet. Trans-dimensional generative modeling via jump diffusion models.arXiv preprint arXiv:2305.16261, 2023. 12
arXiv 2023
-
[47]
Cheng, Chong Sun, and Alán Aspuru-Guzik
Austin H. Cheng, Chong Sun, and Alán Aspuru-Guzik. Scalable autoregressive 3d molecule generation.arXiv preprint arXiv:2505.13791, 2025
arXiv 2025
-
[48]
Rajendra P. Joshi, Niklas W. A. Gebauer, Mridula Bontha, Mercedeh Khazaieli, Rhema M. James, James B. Brown, and Neeraj Kumar. 3d-scaffold: A deep learning framework to generate 3d coordinates of drug-like molecules with desired scaffolds.The Journal of Physical Chemistry B, 125(44):12166–12176, 2021. doi: 10.1021/acs.jpcb.1c06437. URL https: //doi.org/10...
-
[49]
Symphony: Symmetry- equivariant point-centered spherical harmonics for 3d molecule generation
Ameya Daigavane, Song Eun Kim, Mario Geiger, and Tess Smidt. Symphony: Symmetry- equivariant point-centered spherical harmonics for 3d molecule generation. InInternational Conference on Learning Representations, volume 2024, pages 33975–34002, 2024
2024
-
[50]
Daniel Rose, Roxane Axel Jacob, Johannes Kirchmair, and Thierry Langer. Neat: Neighborhood- guided, efficient, autoregressive set transformer for 3d molecular generation.arXiv preprint arXiv:2512.05844, 2025
Pith/arXiv arXiv 2025
-
[51]
Lukas Billera, Hedwig Nora Nordlinder, Jack Collier Ryder, Anton Oresten, Aron Stålmarck, Theodor Mosetti Björk, and Ben Murrell. Branching flows: Discrete, continuous, and manifold flow matching with splits and deletions.arXiv preprint arXiv:2511.09465, 2025
Pith/arXiv arXiv 2025
-
[52]
3d molecule generation from rigid motifs via SE(3) flows.arXiv preprint arXiv:2601.16955, 2026
Roman Poletukhin, Marcel Kollovieh, Eike Eberhard, and Stephan Günnemann. 3d molecule generation from rigid motifs via SE(3) flows.arXiv preprint arXiv:2601.16955, 2026
arXiv 2026
-
[53]
Pinheiro, Arian Jamasb, Omar Mahmood, Vishnu Sresht, and Saeed Saremi
Pedro O. Pinheiro, Arian Jamasb, Omar Mahmood, Vishnu Sresht, and Saeed Saremi. Structure- based drug design by denoising voxel grids.arXiv preprint arXiv:2405.03961, 2024
arXiv 2024
-
[54]
Pedro O. Pinheiro, Joshua Rackers, Joseph Kleinhenz, Michael Maser, Omar Mahmood, An- drew Martin Watkins, Stephen Ra, Vishnu Sresht, and Saeed Saremi. 3d molecule generation by denoising voxel grids.arXiv preprint arXiv:2306.07473, 2024
arXiv 2024
-
[55]
ProxelGen: Generating proteins as 3d densities.arXiv preprint arXiv:2506.19820, 2025
Felix Faltings, Hannes Stark, Regina Barzilay, and Tommi Jaakkola. ProxelGen: Generating proteins as 3d densities.arXiv preprint arXiv:2506.19820, 2025
arXiv 2025
-
[56]
CORDS: Continuous representations of discrete structures.arXiv preprint arXiv:2601.21583, 2026
Tin Hadži Veljkovi´c, Erik Bekkers, Michael Tiemann, and Jan-Willem van de Meent. CORDS: Continuous representations of discrete structures.arXiv preprint arXiv:2601.21583, 2026. A Technical Appendices and Supplementary Material A.1 Data details QM9The QM9 dataset [ 21] contains around 134k small organic molecules with up to 9 heavy atoms of types { H, C, ...
arXiv 2026
-
[57]
Invariant Scalar Predictions For each node i, the invariant features hi ∈R d are mapped to the scalar Gaussian mixture parameters via a Multi-Layer Perceptron (MLP): [πi, σi, ai, ci] =Split(MLP scalar(hi))(26) Appropriate activations are applied to the partitions: Softmax to obtain the K mixture weights πi, atom type probabilities Ai, and charge type prob...
-
[58]
drug-likeness
Equivariant Mean Predictions To compute the equivariant GMM means µi,k ∈R 3, we predict spatial weights wij ∈R K for each neighbor j∈ N(i) . This is done using the source and target invariant features, along with a Radial Basis Function (RBF) embedding of the Euclidean distanced ij =∥x i −x j∥2: wij =γtanh MLPcoord hi ∥h j ∥RBF(d ij) (27) where ∥ denotes ...
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.