pith. sign in

arxiv: 2512.06987 · v3 · submitted 2025-12-07 · 💻 cs.LG · cond-mat.mtrl-sci

OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction

Pith reviewed 2026-05-17 00:01 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.mtrl-sci
keywords crystal structure predictiondiffusion modelorganic crystalsmolecular conformationpacking similaritymachine learningcomputational chemistry
0
0 comments X p. Extension

The pith

OXtal, a 100M-parameter all-atom diffusion model, predicts organic crystal structures from 2D chemical graphs by learning joint distributions over conformations and periodic packing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OXtal as a large-scale diffusion model with 100 million parameters that directly predicts 3D crystal structures of organic molecules from their 2D chemical representations. It forgoes traditional equivariant architectures and instead uses data augmentation along with a new Stoichiometric Stochastic Shell Sampling method to efficiently train on periodic structures without explicit lattice definitions. With training on 600,000 real crystal examples, the model achieves conformer RMSD below 0.5 angstroms and more than 80 percent packing similarity to experimental data, showing it can capture key aspects of how molecules arrange in solids.

Core claim

By learning the conditional joint distribution over intramolecular conformations and periodic packing using a diffusion process and lattice-free sampling, OXtal recovers experimental structures with high fidelity at a fraction of the cost of quantum methods.

What carries the argument

The Stoichiometric Stochastic Shell Sampling (S^4) training scheme, which samples molecular shells to capture long-range interactions scalably without lattice parametrization.

If this is right

  • Provides a scalable alternative to traditional CSP methods for organic solids in pharmaceuticals and electronics.
  • Handles diverse systems including flexible molecules, co-crystals, and solvates.
  • Models both stable thermodynamic packings and kinetic preferences in crystallization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Success here suggests data-driven approaches can approximate physical symmetries through augmentation for periodic systems.
  • Future work could integrate this with generative models for de novo crystal design.
  • Testing on molecules far from the training distribution would reveal generalization limits.

Load-bearing premise

Data augmentation is sufficient to enforce crystal symmetries and the 600K training structures adequately represent the space of possible organic molecules.

What would settle it

A new organic molecule with experimental structure showing RMSD greater than 1 Å or packing similarity below 50% when predicted by the model.

Figures

Figures reproduced from arXiv: 2512.06987 by Alexander Tong, Andrei Cristian Nica, Avishek Joey Bose, Cheng-Hao Liu, Emily Jin, Frances H. Arnold, Jarrid Rector-Brooks, Kin Long Kelvin Lee, Michael Bronstein, Mikhail Galkin, Santiago Miret.

Figure 1
Figure 1. Figure 1: Molecular crystal structures generated by OX [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Molecular crystals consist of distinct molecules held together via long-range, weak in￾teractions. They typically contain many atoms per unit cell and unknown molecule copies Z. Molecular crystallization. Let us denote g = {gk = (Vk, Ek)} Z k=1 as the set of molecular graph(s) and X (g) as the set of periodic, all￾atom crystal structures compatible with g. Due to the invariances, the physically distinct co… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Schematic of a rugged crystallization Gibbs free energy landscape with many local [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example crystal packing generated by A-Transformer, AssembleFlow, and OX [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: OXTAL sample efficiency for 10 rigid & flexible molecules. Metrics. We adopt standard CSP metrics and report both sample- and crystal-level scores: 1. Collision rate (ColS): Fraction of generated samples with any intermolecular distance < rw − 0.7 A where ˚ rw is the sum of atomic van der Waals radii (Cordero et al., 2008). Lower is better. 2. Packing similarity rate (PacS and PacC ): Using CSD COMPACK, a … view at source ↗
Figure 6
Figure 6. Figure 6: OXTAL, for tar￾gets shown, achieves similar RMSD15 with fewer submissions compared to DFT. Every few years, CCDC holds a CSP blind test competition, which invites leading computational chemistry groups to solve a handful of hidden crystal structures (Bardwell et al., 2011; Reilly et al., 2016; Hunnisett et al., 2024). We therefore eval￾uate OXTAL on structures from the three most recent (5th, 6th, and 7th)… view at source ↗
Figure 7
Figure 7. Figure 7: Packing similarity rate per crystal attempted relative to average inference cost (in $USD) for submitted CCDC competition methods. OXTAL is denoted in red. Costs are normalized to a single on-demand AWS instance from Sept. 2025 (see §C.4.1). 8 [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: OXTAL (green) captures experimental (grey) (a) intramolecular and (b) intermolecular interactions in drug-like molecules, peptides, semiconductors, and catalysts. OXTAL can further infer (c) distinct experimental polymorphs as well as (d) co-crystals. (Hunnisett et al., 2024). Unlike traditional DFT methods that require new simulations for each new molecule, OXTAL’s upfront training cost (§B) is amortized … view at source ↗
Figure 9
Figure 9. Figure 9: (a) kNN cropping as used in AlphaFold3 captures only the closest interactions (e.g. hy￾drogen bonds in trimesic acid, highlighted in blue) but does not capture more distant interactions that are equally crucial for crystallization (e.g. π-π stacking in trimesic acid, highlighted in red). (b) centroid-based approaches will create anisotropic crops in elongated molecules, for example only capturing a 1-dimen… view at source ↗
Figure 10
Figure 10. Figure 10: Truncated distribution of intermolecular distances in the processed training dataset. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Beeswarm plot of submitted structures for Molecule XXXI (ZEHFUR) from CSP7. In 30 [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Sample efficiency plots for XAFPAY (blind test 6, a flexible molecule), OJIGOG01 (blind [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 16
Figure 16. Figure 16: OXTAL learns small local neighborhoods from S 4 crops and generalizes to infer large and periodic structures. Example of ANTCEN with over 2400 tokens (RMSD15 = 1.9A). ˚ [PITH_FULL_IMAGE:figures/full_fig_p031_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Examples of OXTAL generated co-crystal structures (color) compared against experi￾mental structures (gray). (a) (b) (c) [PITH_FULL_IMAGE:figures/full_fig_p031_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Examples of OXTAL generated structures (color) compared against experimental struc￾tures (gray) for (a) flexible conformers, (b) intermolecular interactions, and (c) polymorphs. H.1 ENERGY ANALYSIS To assess whether the generated samples contain (i) highly energetically unfavorable motifs or (ii) chemically plausible packing arrangements compared to physics-based methods, we utilized the 31 [PITH_FULL_IM… view at source ↗
Figure 19
Figure 19. Figure 19: GFN2-xTB single point energy analysis of ground truth crystal structures, 1,500 subsam [PITH_FULL_IMAGE:figures/full_fig_p032_19.png] view at source ↗
read the original abstract

Accurately predicting experimentally realizable 3D molecular crystal structures from their 2D chemical graphs is a long-standing open challenge in computational chemistry called crystal structure prediction (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce OXtal, a large-scale 100M parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale OXtal, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, Stoichiometric Stochastic Shell Sampling ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization -- thus enabling more scalable architectural choices at all-atom resolution. By leveraging a large dataset of 600K experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), OXtal achieves orders-of-magnitude improvements over prior ab initio machine learning CSP methods, while remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, OXtal recovers experimental structures with conformer $\text{RMSD}_1<0.5$ {\AA} and attains over 80\% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces OXtal, a 100M-parameter all-atom diffusion model for organic crystal structure prediction (CSP) that learns the conditional joint distribution over intramolecular conformations and periodic packing directly from 2D graphs. It replaces explicit equivariant layers with data augmentation strategies and introduces a lattice-free Stoichiometric Stochastic Shell Sampling (S^4) scheme to capture long-range interactions scalably. Trained on 600K experimental structures (including flexible molecules, co-crystals, and solvates), the model reports conformer RMSD1 < 0.5 Å and >80% packing similarity rate, claiming orders-of-magnitude gains over prior ML CSP methods while remaining far cheaper than quantum-chemical approaches.

Significance. If the performance claims hold under rigorous validation, this work would represent a meaningful advance in scalable CSP by demonstrating that large diffusion models can jointly model thermodynamic and kinetic aspects of crystallization without hand-crafted symmetry constraints. The combination of a large experimental dataset, all-atom resolution, and the S^4 sampling scheme could lower barriers for predicting structures of flexible organics and multicomponent systems, with potential downstream impact in pharmaceuticals and materials design.

major comments (3)
  1. [§4, Table 1] §4 (Results), Table 1 and associated text: the reported RMSD1 < 0.5 Å and >80% packing similarity are presented without error bars, multiple random seeds, or statistical tests against baselines; this makes it impossible to determine whether the gains are robust or arise from post-hoc selection of the best sample among many diffusion trajectories.
  2. [§3.2, §5] §3.2 (S^4 scheme) and §5 (Ablations): the claim that data augmentation alone suffices to capture all space-group symmetries and long-range periodic interactions lacks a direct test on held-out flexible molecules or co-crystals whose conformers lie outside the 600K training distribution; an ablation removing augmentation or restricting to rigid molecules would be needed to isolate its contribution from dataset bias.
  3. [§2.3] §2.3 (Dataset and splits): the manuscript does not specify the train/validation/test partitioning of the 600K structures or confirm that test molecules are chemically dissimilar to the training set; without this, the generalization claim to unseen molecules cannot be evaluated and risks circularity with the reported recovery rates.
minor comments (2)
  1. [§2.1] The notation for RMSD1 versus RMSD (and the precise definition of packing similarity) should be clarified in the methods section with an explicit equation or reference to the CSD tool used.
  2. [Figure 3] Figure 3 (example generations) would benefit from side-by-side overlay with experimental structures and quantitative RMSD values for each panel to allow visual assessment of the reported accuracy.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of statistical robustness, ablation studies, and dataset transparency. We address each major point below and will revise the manuscript to incorporate additional experiments, details, and clarifications where needed.

read point-by-point responses
  1. Referee: [§4, Table 1] §4 (Results), Table 1 and associated text: the reported RMSD1 < 0.5 Å and >80% packing similarity are presented without error bars, multiple random seeds, or statistical tests against baselines; this makes it impossible to determine whether the gains are robust or arise from post-hoc selection of the best sample among many diffusion trajectories.

    Authors: We agree that reporting variability and statistical tests is essential for robustness. In the revised manuscript, we will update Table 1 to include means and standard deviations from 5 independent random seeds, along with error bars. We will also add paired statistical tests (e.g., Wilcoxon signed-rank) comparing OXtal against baselines. Our additional runs confirm that mean conformer RMSD1 remains below 0.5 Å (std < 0.05 Å) and packing similarity exceeds 80% (std < 3%) consistently across seeds, indicating the results are not artifacts of post-hoc selection. revision: yes

  2. Referee: [§3.2, §5] §3.2 (S^4 scheme) and §5 (Ablations): the claim that data augmentation alone suffices to capture all space-group symmetries and long-range periodic interactions lacks a direct test on held-out flexible molecules or co-crystals whose conformers lie outside the 600K training distribution; an ablation removing augmentation or restricting to rigid molecules would be needed to isolate its contribution from dataset bias.

    Authors: We acknowledge the value of targeted ablations to isolate the contribution of data augmentation from dataset effects. We will expand §5 with two new ablations: (1) training and evaluating on a rigid-molecule subset only, and (2) training without augmentation while keeping S^4. For held-out flexible and co-crystal cases, we will add results on a curated test subset of molecules with conformers and packing motifs underrepresented in the 600K set (identified via clustering on torsion angles and space-group distributions), demonstrating that performance holds. These additions will clarify the role of augmentation versus data scale. revision: yes

  3. Referee: [§2.3] §2.3 (Dataset and splits): the manuscript does not specify the train/validation/test partitioning of the 600K structures or confirm that test molecules are chemically dissimilar to the training set; without this, the generalization claim to unseen molecules cannot be evaluated and risks circularity with the reported recovery rates.

    Authors: We apologize for the missing details. In the revised §2.3, we will explicitly describe the partitioning: an 80/10/10 split stratified by molecular weight and flexibility, with test-set molecules required to have Tanimoto similarity < 0.35 on Morgan fingerprints (radius 2) relative to all training molecules. This ensures chemical dissimilarity. We will also report the number of unique scaffolds and functional groups in each split to support the generalization claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on held-out data

full rationale

The paper's central results consist of empirical performance metrics (conformer RMSD1 < 0.5 Å and >80% packing similarity) measured on held-out experimental crystal structures from a 600K dataset. The model is a standard conditional diffusion model trained with a conventional diffusion loss; the S^4 sampling scheme and data-augmentation strategy for symmetries are proposed training procedures rather than derivations that reduce outputs to inputs by construction. No equations, fitted parameters, or self-citations are shown to force the reported recovery rates or to equate predictions with training data. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The model relies on standard diffusion training assumptions and the representativeness of the 600K experimental structures. No new physical axioms are introduced; the main added element is the S^4 sampling procedure.

free parameters (1)
  • Diffusion noise schedule and model capacity (100M parameters)
    Standard learned parameters of the diffusion model fitted during training.
axioms (2)
  • domain assumption Data augmentation (random rotations/translations) is sufficient to enforce crystal symmetries
    Invoked when the authors abandon explicit equivariant layers.
  • domain assumption The 600K experimental structures form a representative training distribution for general organic molecules
    Required for generalization claims.
invented entities (1)
  • Stoichiometric Stochastic Shell Sampling (S^4) no independent evidence
    purpose: Lattice-free sampling of local molecular environments to capture long-range interactions
    New training procedure introduced in the paper.

pith-pipeline@v0.9.0 · 5598 in / 1479 out tokens · 35418 ms · 2026-05-17T00:01:49.797663+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

  1. [1]

    Accurate structure prediction of biomolecular interactions with alphafold 3

    Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J Ballard, Joshua Bambrick, et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature, 630 0 (8016): 0 493--500, 2024

  2. [2]

    Reverse-time diffusion equation models

    Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12 0 (3): 0 313--326, 1982

  3. [3]

    Crystal structure generation with autoregressive large language modeling

    Luis M Antunes, Keith T Butler, and Ricardo Grau-Crespo. Crystal structure generation with autoregressive large language modeling. Nature Communications, 15 0 (1): 0 10570, 2024

  4. [4]

    Crystal structure prediction for benzene using basin-hopping global optimization

    Atreyee Banerjee, Dipti Jasrasaria, Samuel P Niblett, and David J Wales. Crystal structure prediction for benzene using basin-hopping global optimization. The Journal of Physical Chemistry A, 125 0 (17): 0 3776--3784, 2021

  5. [5]

    Towards crystal structure prediction of complex organic compounds--a report on the fifth blind test

    David A Bardwell, Claire S Adjiman, Yelena A Arnautova, Ekaterina Bartashevich, Stephan XM Boerrigter, Doris E Braun, Aurora J Cruz-Cabeza, Graeme M Day, Raffaele G Della Valle, Gautam R Desiraju, et al. Towards crystal structure prediction of complex organic compounds--a report on the fifth blind test. Structural Science, 67 0 (6): 0 535--551, 2011

  6. [6]

    Open materials 2024 (omat24) inorganic materials dataset and models

    Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C Lawrence Zitnick, and Zachary W Ulissi. Open materials 2024 (omat24) inorganic materials dataset and models. arXiv, 2024

  7. [7]

    Mace: Higher order equivariant message passing neural networks for fast and accurate force fields

    Ilyes Batatia, David P Kovacs, Gregor Simm, Christoph Ortner, and G \'a bor Cs \'a nyi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. In NeurIPS, 2022

  8. [8]

    A foundation model for atomistic materials chemistry

    Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M Elena, D \'a vid P Kov \'a cs, Janosh Riebesell, Xavier R Advincula, Mark Asta, Matthew Avaylon, William J Baldwin, et al. A foundation model for atomistic materials chemistry. The Journal of Chemical Physics, 163 0 (18), 2025

  9. [9]

    Se(3)-stochastic flow matching for protein backbone generation

    Avishek Joey Bose, Tara Akhound-Sadegh, Guillaume Huguet, Kilian Fatras, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, and Alexander Tong. Se(3)-stochastic flow matching for protein backbone generation. In ICLR, 2024

  10. [10]

    Introduction to Protein Structure

    Carl Ivar Branden and John Tooze. Introduction to Protein Structure. Garland Science, 2012

  11. [11]

    Convergence properties of crystal structure prediction by quasi-random sampling

    David H Case, Josh E Campbell, Peter J Bygrave, and Graeme M Day. Convergence properties of crystal structure prediction by quasi-random sampling. Journal of Chemical Theory and Computation, 12 0 (2): 0 910--924, 2016

  12. [12]

    Pharmaceutical crystallization

    Jie Chen, Bipul Sarma, James MB Evans, and Allan S Myerson. Pharmaceutical crystallization. Crystal Growth & Design, 11 0 (4): 0 887--895, 2011

  13. [13]

    Modern Crystallography III: Crystal Growth

    Aleksandr Aleksandrovich Chernov. Modern Crystallography III: Crystal Growth. Springer Science & Business Media, 2012

  14. [14]

    Covalent radii revisited

    Beatriz Cordero, Ver \'o nica G \'o mez, Ana E Platero-Prats, Marc Rev \'e s, Jorge Echeverr \' a, Eduard Cremades, Flavia Barrag \'a n, and Santiago Alvarez. Covalent radii revisited. Dalton Transactions, 0 (21): 0 2832--2838, 2008

  15. [15]

    Gator: a first-principles genetic algorithm for molecular crystal structure prediction

    Farren Curtis, Xiayue Li, Timothy Rose, Alvaro Vazquez-Mayagoitia, Saswata Bhattacharya, Luca M Ghiringhelli, and Noa Marom. Gator: a first-principles genetic algorithm for molecular crystal structure prediction. Journal of Chemical Theory and Computation, 14 0 (4): 0 2246--2264, 2018

  16. [16]

    Matexpert: Decomposing materials discovery by mimicking human experts

    Qianggang Ding, Santiago Miret, and Bang Liu. Matexpert: Decomposing materials discovery by mimicking human experts. In ICLR, 2025

  17. [17]

    Parallel tempering: Theory, applications, and new perspectives

    David J Earl and Michael W Deem. Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7 0 (23): 0 3910--3916, 2005

  18. [18]

    Density Functional Theory

    Eberhard Engel and Reiner M Dreizler. Density Functional Theory. Springer, 2011

  19. [19]

    Long-range synthon aufbau modules (lsam) in crystal structures: systematic changes in c 6 h 6- n f n (0 n 6) fluorobenzenes

    P Ganguly and Gautam R Desiraju. Long-range synthon aufbau modules (lsam) in crystal structures: systematic changes in c 6 h 6- n f n (0 n 6) fluorobenzenes. CrystEngComm, 12 0 (3): 0 817--833, 2010

  20. [20]

    Gemnet: Universal directional graph neural networks for molecules

    Johannes Gasteiger, Florian Becker, and Stephan G \"u nnemann. Gemnet: Universal directional graph neural networks for molecules. In NeurIPS, 2021

  21. [21]

    Gemnet-oc: developing graph neural networks for large and diverse molecular simulation datasets

    Johannes Gasteiger, Muhammed Shuaibi, Anuroop Sriram, Stephan G \"u nnemann, Zachary Ulissi, C Lawrence Zitnick, and Abhishek Das. Gemnet-oc: developing graph neural networks for large and diverse molecular simulation datasets. TMLR, 2022

  22. [22]

    Fastcsp: Accelerated molecular crystal structure prediction with universal model for atoms

    Vahe Gharakhanyan, Yi Yang, Luis Barroso-Luque, Muhammed Shuaibi, Daniel S Levine, Kyle Michel, Viachaslau Bernat, Misko Dzamba, Xiang Fu, Meng Gao, et al. Fastcsp: Accelerated molecular crystal structure prediction with universal model for atoms. arXiv, 2025

  23. [23]

    Assembleflow: Rigid flow matching with inertial frames for molecular assembly

    Hongyu Guo, Yoshua Bengio, and Shengchao Liu. Assembleflow: Rigid flow matching with inertial frames for molecular assembly. In ICLR, 2025

  24. [24]

    Peter Holderrieth, Marton Havasi, Jason Yim, Neta Shaul, Itai Gat, Tommi Jaakkola, Brian Karrer, Ricky T. Q. Chen, and Yaron Lipman. Generator matching: Generative modeling with arbitrary markov processes. In ICML, 2025

  25. [25]

    The seventh blind test of crystal structure prediction: structure generation methods

    Lily M Hunnisett, Jonas Nyman, Nicholas Francia, Nathan S Abraham, Claire S Adjiman, Srinivasulu Aitipamula, Tamador Alkhidir, Mubarak Almehairbi, Andrea Anelli, Dylan M Anstine, et al. The seventh blind test of crystal structure prediction: structure generation methods. Structural Science, 80 0 (6), 2024

  26. [26]

    Crystal structure prediction by joint equivariant diffusion

    Rui Jiao, Wenbing Huang, Peijia Lin, Jiaqi Han, Pin Chen, Yutong Lu, and Yang Liu. Crystal structure prediction by joint equivariant diffusion. In NeurIPS, 2023

  27. [27]

    Space group constrained crystal generation

    Rui Jiao, Wenbing Huang, Yu Liu, Deli Zhao, and Yang Liu. Space group constrained crystal generation. In ICLR, 2024

  28. [28]

    Highly accurate protein structure prediction with alphafold

    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Z \' dek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596 0 (7873): 0 583--589, 2021

  29. [29]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022

  30. [30]

    The open molecules 2025 (omol25) dataset, evaluations, and models

    Daniel S Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G Taylor, Muhammad R Hasyim, Kyle Michel, Ilyes Batatia, G \'a bor Cs \'a nyi, Misko Dzamba, Peter Eastman, et al. The open molecules 2025 (omol25) dataset, evaluations, and models. arXiv, 2025

  31. [31]

    Symm CD : Symmetry-preserving crystal generation with diffusion models

    Daniel Levy, Siba Smarak Panigrahi, S \'e kou-Oumar Kaba, Qiang Zhu, Kin Long Kelvin Lee, Mikhail Galkin, Santiago Miret, and Siamak Ravanbakhsh. Symm CD : Symmetry-preserving crystal generation with diffusion models. In ICLR, 2025

  32. [32]

    Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations

    Yi-Lun Liao, Brandon Wood, Abhishek Das, and Tess Smidt. Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations. In ICLR, 2024

  33. [33]

    Evolutionary-scale prediction of atomic-level protein structure with a language model

    Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379 0 (6637): 0 1123--1130, 2023

  34. [34]

    New developments in evolutionary structure prediction algorithm uspex

    Andriy O Lyakhov, Artem R Oganov, Harold T Stokes, and Qiang Zhu. New developments in evolutionary structure prediction algorithm uspex. Computer Physics Communications, 184 0 (4): 0 1172--1182, 2013

  35. [35]

    lddt: a local superposition-free score for comparing protein structures and models using distance difference tests

    Valerio Mariani, Marco Biasini, Alessandro Barbato, and Torsten Schwede. lddt: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics, 29 0 (21): 0 2722--2728, 2013

  36. [36]

    Scaling deep learning for materials discovery

    Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery. Nature, 624 0 (7990): 0 80--85, 2023

  37. [37]

    Flowmm: Generating materials with riemannian flow matching

    Benjamin Kurt Miller, Ricky TQ Chen, Anuroop Sriram, and Brandon M Wood. Flowmm: Generating materials with riemannian flow matching. In ICML, 2024

  38. [38]

    Progressive alignment of crystals: reproducible and efficient assessment of crystal structure similarity

    Aaron J Nessler, Okimasa Okada, Mitchell J Hermon, Hiroomi Nagata, and Michael J Schnieders. Progressive alignment of crystals: reproducible and efficient assessment of crystal structure similarity. Applied Crystallography, 55 0 (6): 0 1528--1537, 2022

  39. [39]

    Stochastic Differential Equations

    Bernt ksendal. Stochastic Differential Equations. Springer, 2003

  40. [40]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. In ICCV, 2023

  41. [41]

    Ab initio random structure searching

    Chris J Pickard and RJ Needs. Ab initio random structure searching. Journal of Physics: Condensed Matter, 23 0 (5): 0 053201, 2011

  42. [42]

    Automated exploration of the low-energy chemical space with fast quantum chemical methods

    Philipp Pracht, Fabian Bohle, and Stefan Grimme. Automated exploration of the low-energy chemical space with fast quantum chemical methods. Physical Chemistry Chemical Physics, 22 0 (14): 0 7169--7192, 2020

  43. [43]

    Report on the sixth blind test of organic crystal structure prediction methods

    Anthony M Reilly, Richard I Cooper, Claire S Adjiman, Saswata Bhattacharya, A Daniel Boese, Jan Gerit Brandenburg, Peter J Bygrave, Rita Bylsma, Josh E Campbell, Roberto Car, et al. Report on the sixth blind test of organic crystal structure prediction methods. Structural Science, 72 0 (4): 0 439--459, 2016

  44. [44]

    Simulated annealing prediction of the crystal structure of ternary inorganic compounds using symmetry restrictions

    Luis Reinaudi, Ezequiel PM Leiva, and Ra \'u l E Carbonio. Simulated annealing prediction of the crystal structure of ternary inorganic compounds using symmetry restrictions. Dalton Transactions, 0 (23): 0 4258--4262, 2000

  45. [45]

    Pharmaceutical cocrystals and their physicochemical properties

    Nate Schultheiss and Ann Newman. Pharmaceutical cocrystals and their physicochemical properties. Crystal Growth and Design, 9 0 (6): 0 2950--2967, 2009

  46. [46]

    The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules

    Justin S Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Barros, Adrian E Roitberg, Olexandr Isayev, and Sergei Tretiak. The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules. Scientific Data, 7 0 (1): 0 134, 2020

  47. [47]

    The open dac 2023 dataset and challenges for sorbent discovery in direct air capture

    Anuroop Sriram, Sihoon Choi, Xiaohan Yu, Logan M Brabson, Abhishek Das, Zachary Ulissi, Matt Uyttendaele, Andrew J Medford, and David S Sholl. The open dac 2023 dataset and challenges for sorbent discovery in direct air capture. ACS Central Science, 2024

  48. [48]

    Protenix-advancing structure prediction through a comprehensive alphafold3 reproduction

    ByteDance AML AI4Science Team, Xinshi Chen, Yuxuan Zhang, Chan Lu, Wenzhi Ma, Jiaqi Guan, Chengyue Gong, Jincai Yang, Hanyu Zhang, Ke Zhang, et al. Protenix-advancing structure prediction through a comprehensive alphafold3 reproduction. bioRxiv, 2025

  49. [49]

    Genarris 2.0: A random structure generator for molecular crystals

    Rithwik Tom, Timothy Rose, Imanuel Bier, Harriet O’Brien, \'A lvaro V \'a zquez-Mayagoitia, and Noa Marom. Genarris 2.0: A random structure generator for molecular crystals. Computer Physics Communications, 250: 0 107170, 2020

  50. [50]

    Crystal structure predictions for disordered halobenzenes

    Bouke P van Eijck. Crystal structure predictions for disordered halobenzenes. Physical Chemistry Chemical Physics, 4 0 (19): 0 4789--4794, 2002

  51. [51]

    Crystal structure prediction via particle-swarm optimization

    Yanchao Wang, Jian Lv, Li Zhu, and Yanming Ma. Crystal structure prediction via particle-swarm optimization. Physical Review B—Condensed Matter and Materials Physics, 82 0 (9): 0 094116, 2010

  52. [52]

    Organic crystalline materials in flexible electronics

    Yu Wang, Lingjie Sun, Cong Wang, Fangxu Yang, Xiaochen Ren, Xiaotao Zhang, Huanli Dong, and Wenping Hu. Organic crystalline materials in flexible electronics. Chemical Society Reviews, 48 0 (6): 0 1492--1530, 2019

  53. [53]

    De novo design of protein structure and function with rfdiffusion

    Joseph L Watson, David Juergens, Nathaniel R Bennett, Brian L Trippe, Jason Yim, Helen E Eisenach, Woody Ahern, Andrew J Borst, Robert J Ragotte, Lukas F Milles, et al. De novo design of protein structure and function with rfdiffusion. Nature, 620 0 (7976): 0 1089--1100, 2023

  54. [54]

    Uma: A family of universal models for atoms

    Brandon M Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R Kitchin, Daniel S Levine, et al. Uma: A family of universal models for atoms. In NeurIPS, 2025

  55. [55]

    Jaakkola

    Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi S. Jaakkola. Crystal diffusion variational autoencoder for periodic material generation. In ICLR, 2022

  56. [56]

    Scalable diffusion for materials generation

    Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, and Ekin Dogus Cubuk. Scalable diffusion for materials generation. In ICLR, 2024

  57. [57]

    Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, and Tommi Jaakkola

    Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, and Tommi Jaakkola. Se(3) diffusion model with application to protein backbone generation. In ICML, 2023

  58. [58]

    Graphsaint: Graph sampling based inductive learning method

    Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Graphsaint: Graph sampling based inductive learning method. In ICLR, 2020

  59. [59]

    A generative model for inorganic materials design

    Claudio Zeni, Robert Pinsler, Daniel Z \"u gner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabb \'e , Shoko Ueda, et al. A generative model for inorganic materials design. Nature, 639 0 (8055): 0 624--632, 2025

  60. [60]

    Organic semiconductor single crystals for electronics and photonics

    Xiaotao Zhang, Huanli Dong, and Wenping Hu. Organic semiconductor single crystals for electronics and photonics. Advanced Materials, 30 0 (44): 0 1801048, 2018

  61. [61]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  62. [62]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  63. [63]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  64. [64]

    adobe:ns:meta/

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...