Fast Organic Crystal Structure Prediction with Unit Cell Flow Matching
Pith reviewed 2026-06-28 11:24 UTC · model grok-4.3
The pith
Clari generates organic crystal structures in seconds via unit cell flow matching and exceeds prior solve rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Clari is a large-scale flow matching model that produces stable organic crystal structures by generating explicit unit cells with pure pair-bias attention; it requires only atom types and bonds, models explicit hydrogens, and achieves solve rates above those of OXtal together with 15-30x speedups on the OXtal test sets and further gains from direct energy ranking without relaxation or decoration.
What carries the argument
Unit cell flow matching with pair-bias attention, which directly parametrizes the lattice and samples structures without triangle layers or post-processing steps.
If this is right
- Crystal structure prediction for a single molecule drops from minutes to seconds, making screening of thousands of candidates routine.
- The method applies to fullerenes, metal complexes, and atom clusters because it does not require RDKit-sanitizable inputs.
- Explicit hydrogens allow inference-time energy ranking to improve accuracy without separate decoration or relaxation stages.
- The CSD Teaching Subset provides a new benchmark of diverse, complex molecules for comparing future CSP methods.
Where Pith is reading between the lines
- The same unit-cell flow matching approach could be retrained on inorganic or hybrid materials to test whether the speedup transfers beyond organics.
- Integration with existing energy models might allow end-to-end differentiable crystal design loops that optimize both structure and property simultaneously.
- The reported scaling behavior suggests that generating even larger batches and ranking by more accurate energies could push solve rates higher while still remaining faster than previous generative baselines.
Load-bearing premise
The flow matching model trained on CSD data accurately samples the distribution of experimentally stable crystal structures for molecules not seen during training.
What would settle it
Running Clari on a held-out collection of molecules absent from the CSD training data and checking whether the fraction of structures that match experimental forms drops sharply below the reported solve rates.
read the original abstract
Organic crystal structure prediction (CSP) is a requirement for computational modelling of organic solids, but traditionally costs several CPU-years per molecule. Generative models such as OXtal dramatically reduce this cost by sampling stable organic crystal structures directly. However, OXtal forgoes explicit lattice parametrization in favour of modelling large crops of the bulk material with expensive triangle layers, which can incur a computational cost of minutes per molecule. In this paper, we reduce this to seconds with Clari, a large-scale flow matching model that generates redundancy-free unit cells and replaces triangle layers with pure pair-bias attention. Clari requires only atom types and bonds as input and does not need an RDKit-sanitizable input molecule, which expands its applicability to challenging chemistries such as fullerenes, metal complexes, and atom clusters. We further ablate key design choices such as auxiliary losses, timestep distributions, noise priors, and self-conditioning. On OXtal's test sets, we surpass OXtal's solve rate while obtaining a speedup of $15$-$30\times$. Because Clari also models explicit hydrogens, it supports inference-time scaling via direct energy ranking, without any decoration or relaxation step. When generating 150 crystals and selecting the top-30 by energy, we further improve solve rate while maintaining a speedup of $5$-$8\times$. We also introduce the CSD Teaching Subset as a new test split of diverse and complex molecules for future benchmarking. Our contributions enable CSP within seconds, making large-scale virtual screening of organic solids practical. Code is available at https://github.com/aspuru-guzik-group/clari.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Clari, a flow-matching model for organic crystal structure prediction that generates redundancy-free unit cells via pair-bias attention, taking only atom types and bonds as input. It claims to surpass OXtal's solve rate on OXtal's test sets at 15-30× speedup, with further solve-rate gains (while retaining 5-8× speedup) obtained by generating 150 samples per molecule and ranking the top-30 by energy without relaxation or decoration. The work ablates auxiliary losses, timestep distributions, noise priors, and self-conditioning; introduces the CSD Teaching Subset as a new benchmark; and releases code.
Significance. If the central claims hold, the work would make large-scale CSP practical for virtual screening by reducing per-molecule cost from minutes to seconds. Explicit credit is due for releasing code at the cited GitHub repository and for introducing the CSD Teaching Subset, both of which directly support reproducibility and future benchmarking.
major comments (2)
- [Experimental Results] Experimental Results (and abstract): the claim that direct energy ranking of 150 samples improves solve rate without relaxation or decoration rests on the unvalidated assumption that the CSD-trained flow model samples near energy minima for unseen molecules (including fullerenes and the CSD Teaching Subset). The provided ablations cover auxiliary losses, timestep distributions, noise priors, and self-conditioning but supply no independent metrics (e.g., energy histograms, RMSD distributions to experimental structures, or fraction of samples within 0.1 eV of minima) that would confirm the distributional assumption for OOD cases.
- [Results on OXtal test sets] Results on OXtal test sets: the reported solve-rate improvements and speedups (15-30× baseline, 5-8× with ranking) are load-bearing for the central contribution, yet the manuscript supplies insufficient detail on exact data splits, the precise definition and computation of 'solve rate,' error bars, and validation procedures, consistent with the low soundness rating of the experimental evidence.
minor comments (2)
- [Abstract] Abstract: the performance claims are stated without reference to the specific tables or figures that contain the supporting numbers; cross-references should be added.
- [Methods] Notation: the distinction between 'unit cell' generation and 'redundancy-free' sampling is introduced without an explicit equation or diagram in the methods; a short clarifying equation would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the experimental claims. We address each major point below with clarifications and commit to revisions that strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: [Experimental Results] Experimental Results (and abstract): the claim that direct energy ranking of 150 samples improves solve rate without relaxation or decoration rests on the unvalidated assumption that the CSD-trained flow model samples near energy minima for unseen molecules (including fullerenes and the CSD Teaching Subset). The provided ablations cover auxiliary losses, timestep distributions, noise priors, and self-conditioning but supply no independent metrics (e.g., energy histograms, RMSD distributions to experimental structures, or fraction of samples within 0.1 eV of minima) that would confirm the distributional assumption for OOD cases.
Authors: We acknowledge that direct validation metrics would strengthen the distributional claim. The observed solve-rate gains from energy ranking provide supporting empirical evidence, but we agree this is indirect. In revision we will add energy histograms for generated samples versus known minima on the OXtal test sets and CSD Teaching Subset, plus RMSD distributions to experimental structures for a subset of cases. These additions will address OOD behavior for fullerenes and other challenging chemistries. revision: yes
-
Referee: [Results on OXtal test sets] Results on OXtal test sets: the reported solve-rate improvements and speedups (15-30× baseline, 5-8× with ranking) are load-bearing for the central contribution, yet the manuscript supplies insufficient detail on exact data splits, the precise definition and computation of 'solve rate,' error bars, and validation procedures, consistent with the low soundness rating of the experimental evidence.
Authors: We agree that additional methodological detail is warranted. The splits follow the exact train/test partitions released with OXtal; solve rate is the fraction of molecules for which at least one generated unit cell lies within 0.5 Å RMSD of the experimental structure after lattice alignment and hydrogen placement. In the revised manuscript we will expand the experimental section with the precise definition, computation code, standard-error bars from five independent sampling runs, and full validation protocol. revision: yes
Circularity Check
Minor self-citation to OXtal; central claims remain empirically grounded on external data
full rationale
The paper trains Clari on the external CSD dataset and reports empirical solve rates/speedups on held-out test sets (including the new CSD Teaching Subset). No derivation step reduces a claimed prediction to a fitted parameter or self-citation by construction; the flow-matching architecture, ablations, and direct energy ranking are presented as independent modeling choices evaluated against external benchmarks. The single reference to OXtal is comparative and does not bear the load of the core distributional assumption or performance claims.
Axiom & Free-Parameter Ledger
free parameters (3)
- timestep distribution
- noise prior
- auxiliary loss weights
axioms (1)
- domain assumption The distribution of stable organic crystal structures is learnable from CSD examples via flow matching.
Reference graph
Works this paper leans on
-
[1]
2025 , booktitle =
A General Framework for Inference-time Scaling and Steering of Diffusion Models , author =. 2025 , booktitle =
2025
-
[2]
2025 , journal =
A generative model for inorganic materials design , author =. 2025 , journal =
2025
-
[3]
2025 , journal =
A robust crystal structure prediction method to support small molecule drug development with large scale validation and blind study , author =. 2025 , journal =
2025
-
[4]
1976 , journal =
A solution for the best rotation to relate two sets of vectors , author =. 1976 , journal =
1976
-
[5]
2000 , journal =
A test of crystal structure prediction of small organic molecules , author =. 2000 , journal =
2000
-
[6]
2005 , journal =
A third blind test of crystal structure prediction , author =. 2005 , journal =
2005
-
[7]
Accurate and efficient polymorph energy ranking with
Price, Alastair JA and Mayo, R Alex and Otero-de-la-Roza, Alberto and Johnson, Erin R , year =. Accurate and efficient polymorph energy ranking with. CrystEngComm , publisher =
-
[8]
Accurate structure prediction of biomolecular interactions with
Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick, Joshua and others , year =. Accurate structure prediction of biomolecular interactions with. Nature , publisher =
-
[9]
2015 , booktitle =
Adam: A Method for Stochastic Optimization , author =. 2015 , booktitle =
2015
-
[10]
2026 , booktitle =
Align Your Flow: Scaling Continuous-Time Flow Map Distillation , author =. 2026 , booktitle =
2026
-
[11]
Bowen Jing and Bonnie Berger and Tommi Jaakkola , year =
-
[12]
2023 , journal =
An atomistic mechanism for elasto-plastic bending in molecular crystals , author =. 2023 , journal =
2023
-
[13]
2023 , journal =
An efficient solid-solution crystalline organic light-emitting diode with deep-blue emission , author =. 2023 , journal =
2023
-
[14]
2023 , booktitle =
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning , author =. 2023 , booktitle =
2023
-
[15]
2026 , journal =
Applications of modular co-design for de novo 3d molecule generation , author =. 2026 , journal =
2026
-
[16]
2017 , journal =
Attention is all you need , author =. 2017 , journal =
2017
-
[17]
bioRxiv , pages =
Wohlwend, Jeremy and Corso, Gabriele and Passaro, Saro and Getz, Noah and Reveiz, Mateo and Leidal, Ken and Swiderski, Wojtek and Atkinson, Liam and Portnoi, Tally and Chinn, Itamar and others , year =. bioRxiv , pages =
-
[18]
2019 , journal =
Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning , author =. 2019 , journal =
2019
-
[19]
2022 , journal =
Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models , author =. 2022 , journal =
2022
-
[20]
2023 , booktitle =
Building Normalizing Flows with Stochastic Interpolants , author =. 2023 , booktitle =
2023
-
[21]
2016 , journal =
Can computed crystal energy landscapes help understand pharmaceutical solids? , author =. 2016 , journal =
2016
-
[22]
2020 , journal =
Charge mobility calculation of organic semiconductors without use of experimental single-crystal data , author =. 2020 , journal =
2020
-
[23]
2003 , journal =
Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups , author =. 2003 , journal =
2003
-
[24]
Applied Crystallography , publisher =
Chisholm, James Alexander and Motherwell, Sam , year =. Applied Crystallography , publisher =
-
[25]
2019 , journal =
Computational polymorph screening reveals late-appearing and poorly-soluble form of rotigotine , author =. 2019 , journal =
2019
-
[26]
2023 , booktitle =
Consistency Models , author =. 2023 , booktitle =
2023
-
[27]
2016 , journal =
Convergence properties of crystal structure prediction by quasi-random sampling , author =. 2016 , journal =
2016
-
[28]
2008 , journal =
Covalent radii revisited , author =. 2008 , journal =
2008
-
[29]
2014 , journal =
Critic2: A program for real-space analysis of quantum chemical interactions in solids , author =. 2014 , journal =
2014
-
[30]
2022 , booktitle =
Crystal Diffusion Variational Autoencoder for Periodic Material Generation , author =. 2022 , booktitle =
2022
-
[31]
2023 , booktitle =
Crystal Structure Prediction by Joint Equivariant Diffusion on Lattices and Fractional Coordinates , author =. 2023 , booktitle =
2023
-
[32]
2021 , journal =
Crystal structure prediction methods for organic molecules: State of the art , author =. 2021 , journal =
2021
-
[33]
2023 , journal =
Crystal structure prediction of energetic materials , author =. 2023 , journal =
2023
-
[34]
2002 , journal =
Crystal structure prediction of small organic molecules: a second blind test , author =. 2002 , journal =
2002
-
[35]
Luo, Xiaoshan and Wang, Zhenyu and Wang, Qingchang and Shao, Xuechen and Lv, Jian and Wang, Lei and Wang, Yanchao and Ma, Yanming , year =. Crystal. Nature Communications , publisher =
-
[36]
1988 , journal =
Crystals from first principles , author =. 1988 , journal =
1988
-
[37]
2015 , month = aug, url =
2015
-
[38]
Angewandte Chemie , publisher =
Yang, Jingxiang and Hu, Chunhua Tony and Zhu, Xiaolong and Zhu, Qiang and Ward, Michael D and Kahr, Bart , year =. Angewandte Chemie , publisher =
-
[39]
De novo design of protein structure and function with
Watson, Joseph L and Juergens, David and Bennett, Nathaniel R and Trippe, Brian L and Yim, Jason and Eisenach, Helen E and Ahern, Woody and Borst, Andrew J and Ragotte, Robert J and Milles, Lukas F and others , year =. De novo design of protein structure and function with. Nature , publisher =
-
[40]
De novo determination of the crystal structure of a large drug molecule by crystal structure prediction-based powder
Baias, Maria and Dumez, Jean-Nicolas and Svensson, Per H and Schantz, Staffan and Day, Graeme M and Emsley, Lyndon , year =. De novo determination of the crystal structure of a large drug molecule by crystal structure prediction-based powder. Journal of the American Chemical Society , publisher =
-
[41]
2016 , booktitle =
Deep residual learning for image recognition , author =. 2016 , booktitle =
2016
-
[42]
2022 , journal =
Development and assessment of an improved powder-diffraction-based method for molecular crystal structure similarity , author =. 2022 , journal =
2022
-
[43]
2023 , journal =
Ketata, Mohamed Amine and Laue, Cedrik and Mammadov, Ruslan and St. 2023 , journal =
2023
-
[44]
2022 , journal =
Corso, Gabriele and St. 2022 , journal =
2022
-
[45]
2022 , journal =
Diffusion posterior sampling for general noisy inverse problems , author =. 2022 , journal =
2022
-
[46]
Diffusion probabilistic modeling of protein backbones in
Trippe, Brian L and Yim, Jason and Tischer, Doug and Broderick, Tamara and Baker, David and Barzilay, Regina and Jaakkola, Tommi , year =. Diffusion probabilistic modeling of protein backbones in
-
[47]
Jing, Bowen and Erives, Ezra and Pao-Huang, Peter and Corso, Gabriele and Berger, Bonnie and Jaakkola, Tommi , year =
-
[48]
2022 , journal =
Elucidating the design space of diffusion-based generative models , author =. 2022 , journal =
2022
-
[49]
2023 , booktitle =
Equivariant flow matching , author =. 2023 , booktitle =
2023
-
[50]
Evolutionary niching in the
Curtis, Farren and Rose, Timothy and Marom, Noa , year =. Evolutionary niching in the. Faraday discussions , publisher =
-
[51]
2010 , journal =
Extended-connectivity fingerprints , author =. 2010 , journal =
2010
-
[52]
Gharakhanyan, Vahe and Yang, Yi and Barroso-Luque, Luis and Shuaibi, Muhammed and Levine, Daniel S and Michel, Kyle and Bernat, Viachaslau and Dzamba, Misko and Fu, Xiang and Gao, Meng and others , year =
-
[53]
2023 , booktitle =
Flow Matching for Generative Modeling , author =. 2023 , booktitle =
2023
-
[54]
2026 , journal =
Flow Matching Meets Biology and Life Science: A Survey , author =. 2026 , journal =
2026
-
[55]
2023 , booktitle =
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , author =. 2023 , booktitle =
2023
-
[56]
Benjamin Kurt Miller and Ricky T. Q. Chen and Anuroop Sriram and Brandon M Wood , year =. Flow
-
[57]
Digital Discovery , publisher =
Dunn, Ian and Koes, David R , year =. Digital Discovery , publisher =
-
[58]
2022 , journal =
Formal mathematics statement curriculum learning , author =. 2022 , journal =
2022
-
[59]
2023 , journal =
Frontiers of molecular crystal structure prediction for pharmaceuticals and functional organic materials , author =. 2023 , journal =
2023
-
[60]
2026 , booktitle =
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free , author =. 2026 , booktitle =
2026
-
[61]
2025 , journal =
Genarris 3.0: Generating close-packed molecular crystal structures with rigid press , author =. 2025 , journal =
2025
-
[62]
2023 , booktitle =
Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds , author =. 2023 , booktitle =
2023
-
[63]
2026 , journal =
Generative Modeling via Drifting , author =. 2026 , journal =
2026
-
[64]
Shazeer, Noam , year =
-
[65]
2024 , booktitle =
Harmonic Self-Conditioned Flow Matching for joint Multi-Ligand Docking and Binding Site Design , author =. 2024 , booktitle =
2024
-
[66]
2021 , journal =
High-throughput virtual screening for organic electronics: a comparative study of alternative strategies , author =. 2021 , journal =
2021
-
[67]
2021 , journal =
Highly accurate protein structure prediction for the human proteome , author =. 2021 , journal =
2021
-
[68]
Highly accurate protein structure prediction with
Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and. Highly accurate protein structure prediction with. 2021 , journal =
2021
-
[69]
2026 , booktitle =
How to build a consistency model: Learning flow maps via self-distillation , author =. 2026 , booktitle =
2026
-
[70]
2024 , journal =
Improving and generalizing flow-based generative models with minibatch optimal transport , author =. 2024 , journal =
2024
-
[71]
Tomas Geffner and Kieran Didi and Zhonglin Cao and Danny Reidenbach and Zuobai Zhang and Christian Dallago and Emine Kucukbenli and Karsten Kreis and Arash Vahdat , year =
-
[72]
2026 , booktitle =
Mean Flows for One-step Generative Modeling , author =. 2026 , booktitle =
2026
-
[73]
2017 , journal =
Mechanosynthesis of magnesium and calcium salt--urea ionic cocrystal fertilizer materials for improved nitrogen management , author =. 2017 , journal =
2017
-
[74]
2023 , journal =
Modeling the expansion of virtual screening libraries , author =. 2023 , journal =
2023
-
[75]
2026 , journal =
Zeng, Cheng and Sullivan, Harry W and Egg, Thomas and Martirossyan, Maya M and H. 2026 , journal =
2026
-
[76]
Muon is Scalable for
Jingyuan Liu and Jianlin Su and Xingcheng Yao and Zhejun Jiang and Guokun Lai and Yulun Du and Yidao Qin and Weixin Xu and Enzhe Lu and Junjie Yan and Yanru Chen and Huabin Zheng and Yibo Liu and Shaowei Liu and Bohong Yin and Weiran He and Han Zhu and Yuzhi Wang and Jianzhou Wang and Mengnan Dong and Zheng Zhang and Yongsheng Kang and Hao Zhang and Xinra...
-
[77]
2024 , url =
Muon: An optimizer for hidden layers in neural networks , author =. 2024 , url =
2024
-
[78]
2025 , booktitle =
Open Materials Generation with Stochastic Interpolants , author =. 2025 , booktitle =
2025
-
[79]
Out of many, one: Designing and scaffolding proteins at the scale of the structural universe with
Lin, Yeqing and Lee, Minji and Zhang, Zhao and AlQuraishi, Mohammed , year =. Out of many, one: Designing and scaffolding proteins at the scale of the structural universe with
-
[80]
Arnold and Michael M
Emily Jin and Andrei Cristian Nica and Mikhail Galkin and Jarrid Rector-Brooks and Kin Long Kelvin Lee and Santiago Miret and Frances H. Arnold and Michael M. Bronstein and Joey Bose and Alexander Tong and Cheng-Hao Liu , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.