CrystalX: High-accuracy Crystal Structure Analysis Using Deep Learning
Pith reviewed 2026-05-23 18:35 UTC · model grok-4.3
The pith
CrystalX uses deep learning to automate full-atom crystal structure analysis from X-ray diffraction data and outperforms prior automated methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CrystalX, a deep learning model, performs high-accuracy automated structure analysis of crystalline materials from X-ray diffraction data at the full-atom level. With training and test sets separated by publication time, the model surpasses automated baselines, deciphers intricate patterns, and rectifies expert errors in published structures that evade CheckCIF A/B alerts, supporting its deployment for human-free analysis of new compounds.
What carries the argument
The CrystalX deep learning model trained on X-ray diffraction measurements to predict full atomic crystal structures.
If this is right
- Routine crystal structure analysis can proceed without human intervention for compounds discovered after the training period.
- Errors in published structures that escape automated validation alerts can be identified and corrected by the model.
- Self-driving laboratories gain the capacity for end-to-end automated structural characterization.
- The approach scales to daily processing of new experimental diffraction measurements.
Where Pith is reading between the lines
- Integration with robotic synthesis systems could shorten the cycle from material discovery to confirmed structure.
- Similar models might be trained on other diffraction or spectroscopic modalities for broader material identification tasks.
- Widespread adoption could shift the bottleneck in crystallography from data interpretation to data collection.
- The method provides a benchmark for measuring how much expert judgment remains necessary after automated analysis.
Load-bearing premise
The temporal split of data by publication date suffices to show that the model will correctly analyze structures of compounds never seen during training.
What would settle it
Apply CrystalX to a collection of crystal structures published after the training cutoff and compare its output atomic positions and space groups against independent expert manual determinations on the same data.
read the original abstract
Atomic structure analysis of crystalline materials is a paramount endeavor in both chemical and material sciences. This sophisticated technique necessitates not only a solid foundation in crystallography but also a profound comprehension of the intricacies of the accompanying software, posing a significant challenge in meeting the rigorous daily demands. For the first time, we confront this challenge head-on by harnessing the power of deep learning for fully automated routine structure analysis at the full-atom level. To validate the performance of the model, named CrystalX, we employed a dataset comprising over 50,000 X-ray diffraction measurements derived from authentic experiments. Under a strict temporal validation scheme that separates training and test data by publication time, CrystalX substantially outperformed the automated baseline and adept at deciphering intricate geometric patterns. Remarkably, CrystalX revealed that even peer-reviewed publications harbor expert interpretation errors that can evade stringent CheckCIF A/B-level alerts, yet CrystalX adeptly rectifies them. It has already been successfully applied in our day-to-day pipeline, enabling fully automated, human-free structure analysis for newly discovered compounds. Overall, CrystalX marks the beginning of a new era in automating routine structural analysis within self-driving laboratories.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CrystalX, a deep learning model for fully automated, full-atom crystal structure analysis from X-ray diffraction measurements. It reports results on a dataset of over 50,000 experimental structures using a strict temporal split (training and test separated by publication date), claims substantial outperformance over automated baselines, the ability to detect and correct expert interpretation errors missed by CheckCIF A/B alerts, and successful deployment in a day-to-day pipeline for newly discovered compounds.
Significance. If the performance and generalization claims hold with rigorous supporting evidence, the work could enable substantial automation of routine crystallography, reducing expert time in self-driving laboratories and potentially improving data quality by catching subtle errors. The temporal-split validation approach is a positive step toward realistic evaluation, though its sufficiency for true novelty remains to be demonstrated.
major comments (2)
- [Abstract] Abstract: The central claim that CrystalX enables reliable human-free analysis for newly discovered compounds rests on the temporal validation demonstrating out-of-distribution performance. However, no analysis is provided showing that test-set structures differ from the training distribution in chemical composition, space-group statistics, or geometric motifs; publication-date separation alone permits substantial overlap, so outperformance on the held-out set does not establish generalization to genuinely novel compounds.
- [Abstract] Abstract: No model architecture, training procedure, quantitative metrics (e.g., R-factors, success rates, error distributions), or error analysis are reported, preventing evaluation of whether the stated performance advantage over the automated baseline is statistically or practically meaningful.
minor comments (1)
- [Abstract] Abstract: The sentence 'CrystalX substantially outperformed the automated baseline and adept at deciphering intricate geometric patterns' is grammatically incomplete.
Simulated Author's Rebuttal
We thank the referee for their comments. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that CrystalX enables reliable human-free analysis for newly discovered compounds rests on the temporal validation demonstrating out-of-distribution performance. However, no analysis is provided showing that test-set structures differ from the training distribution in chemical composition, space-group statistics, or geometric motifs; publication-date separation alone permits substantial overlap, so outperformance on the held-out set does not establish generalization to genuinely novel compounds.
Authors: We agree that publication-date separation alone does not guarantee distributional shift across all relevant features. In the revised manuscript we will add a direct comparison of the train and test sets on chemical composition, space-group frequencies, and representative geometric motifs to better substantiate the out-of-distribution character of the test data. revision: yes
-
Referee: [Abstract] Abstract: No model architecture, training procedure, quantitative metrics (e.g., R-factors, success rates, error distributions), or error analysis are reported, preventing evaluation of whether the stated performance advantage over the automated baseline is statistically or practically meaningful.
Authors: The abstract is intentionally concise. The full manuscript contains the model architecture, training details, quantitative metrics (including success rates and R-factor comparisons), and error analysis. To make the abstract more self-contained we will insert the principal quantitative performance figures into the revised abstract. revision: yes
Circularity Check
No circularity; empirical ML evaluation on external temporal data split
full rationale
The paper presents an applied deep-learning system for crystal structure determination with no equations, derivations, or first-principles claims. Performance is assessed via a temporal train/test split on >50k real experimental X-ray datasets, which is an independent external benchmark rather than any quantity fitted from or defined by the target result. No self-citations, ansatzes, or uniqueness theorems are invoked to justify the core method; the reported outperformance and error-correction examples rest on direct comparison to baselines and CheckCIF on held-out data. This is the standard non-circular pattern for supervised ML papers whose central claim is empirical generalization on real measurements.
Axiom & Free-Parameter Ledger
free parameters (1)
- Neural network parameters
axioms (1)
- domain assumption Publication-date temporal split prevents information leakage and tests generalization to newly discovered compounds
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We harness the power of an advanced Equivariant Transformer model, TorchMD-NET, to decode geometric interaction patterns from electron density peaks
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CrystalX achieves a 99.80% accuracy in determining non-hydrogen atoms
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Nature 171(4356), 737–738 (1953)
Watson, J.D., Crick, F.H.: Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171(4356), 737–738 (1953)
work page 1953
-
[2]
Chemistry of penicillin, 310–367 (1949)
Crowfoot, D., Bunn, C.W., Rogers-Low, B.W., Turner-Jones, A.: X-ray crys- tallographic investigation of the structure of penicillin. Chemistry of penicillin, 310–367 (1949)
work page 1949
-
[3]
Nature 176(4477), 325–328 (1955)
DG, H., Pickworth, J., JH, R., KN, T., RJ, P., JG, W.: The crystal structure of the hexacarboxylic acid derived from b12 and the molecular structure of the vitamin. Nature 176(4477), 325–328 (1955)
work page 1955
-
[4]
Nature 224(5218), 491–495 (1969)
Adams, M.J., Blundell, T.L., Dodson, E.J., Dodson, G.G., Vijayan, M., Baker, E.N., Harding, M.M., Hodgkin, D.C., Rimmer, B., Sheat, S.: Structure of rhombohedral 2 zinc insulin crystals. Nature 224(5218), 491–495 (1969)
work page 1969
-
[5]
Science 347(6227), 1221–1226 (2015)
Li, J., Ballmer, S.G., Gillis, E.P., Fujii, S., Schmidt, M.J., Palazzolo, A.M., Lehmann, J.W., Morehouse, G.F., Burke, M.D.: Synthesis of many different types 10 of organic small molecules using one automated process. Science 347(6227), 1221–1226 (2015)
work page 2015
-
[6]
Angewandte Chemie International Edition 57(16), 4192–4214 (2018)
Trobe, M., Burke, M.D.: The molecular industrial revolution: automated synthesis of small molecules. Angewandte Chemie International Edition 57(16), 4192–4214 (2018)
work page 2018
-
[7]
Nature communications 9(1), 2849 (2018)
Greenaway, R., Santolini, V., Bennison, M., Alston, B., Pugh, C., Little, M., Mik- litz, M., Eden-Rump, E., Clowes, R., Shakil, A.,et al.: High-throughput discovery of organic cages and catenanes using computational screening fused with robotic synthesis. Nature communications 9(1), 2849 (2018)
work page 2018
-
[8]
: Automated iterative c sp 3–c bond formation
Blair, D.J., Chitti, S., Trobe, M., Kostyra, D.M., Haley, H.M., Hansen, R.L., Ballmer, S.G., Woods, T.J., Wang, W., Mubayi, V., et al. : Automated iterative c sp 3–c bond formation. Nature 604(7904), 92–97 (2022)
work page 2022
-
[9]
Wang, W., Angello, N.H., Blair, D.J., Tyrikos-Ergas, T., Krueger, W.H., Medine, K.N., LaPorte, A.J., Berger, J.M., Burke, M.D.: Rapid automated iterative small- molecule synthesis. Nature Synthesis, 1–8 (2024)
work page 2024
-
[10]
Metherall, J.P., Carroll, R.C., Coles, S.J., Hall, M.J., Probert, M.R.: Advanced crystallisation methods for small organic molecules. Chem. Soc. Rev. 52, 1995– 2010 (2023)
work page 1995
-
[11]
Chemical reviews 122(13), 11514–11603 (2022)
Bolla, G., Sarma, B., Nangia, A.K.: Crystal engineering of pharmaceutical cocrys- tals in the discovery and development of improved drugs. Chemical reviews 122(13), 11514–11603 (2022)
work page 2022
-
[12]
Tyler, A.R., Ragbirsingh, R., McMonagle, C.J., Waddell, P.G., Heaps, S.E., Steed, J.W., Thaw, P., Hall, M.J., Probert, M.R.: Encapsulated nanodroplet crystallization of organic-soluble small molecules. Chem 6(7), 1755–1765 (2020)
work page 2020
-
[13]
Acta Crystallographica Section A: Foundations and Advances 71(1), 3–8 (2015)
Sheldrick, G.M.: Shelxt–integrated space-group and crystal-structure determina- tion. Acta Crystallographica Section A: Foundations and Advances 71(1), 3–8 (2015)
work page 2015
-
[14]
Acta Crystallographica Section A: Foundations of Crystallography 64(1), 112–122 (2008)
Sheldrick, G.M.: A short history of shelx. Acta Crystallographica Section A: Foundations of Crystallography 64(1), 112–122 (2008)
work page 2008
-
[15]
Acta Crystallographica Section A: Foundations of Crystallography 60(2), 134–141 (2004)
Oszl´ anyi, G., S¨ ut˝ o, A.: Ab initio structure solution by charge flipping. Acta Crystallographica Section A: Foundations of Crystallography 60(2), 134–141 (2004)
work page 2004
-
[16]
Journal of applied crystallogra- phy 32(1), 115–119 (1999) 11
Altomare, A., Burla, M.C., Camalli, M., Cascarano, G.L., Giacovazzo, C., Guagliardi, A., Moliterni, A.G., Polidori, G., Spagna, R.: Sir97: a new tool for crystal structure determination and refinement. Journal of applied crystallogra- phy 32(1), 115–119 (1999) 11
work page 1999
-
[17]
Journal of Applied Crystallography 48(1), 306–309 (2015)
Burla, M.C., Caliandro, R., Carrozzini, B., Cascarano, G.L., Cuocci, C., Giacov- azzo, C., Mallamo, M., Mazzone, A., Polidori, G.: Crystal structure determination and refinement via SIR2014. Journal of Applied Crystallography 48(1), 306–309 (2015)
work page 2015
-
[18]
Science 385(6708), 522–528 (2024)
Larsen, A.S., Rekis, T., Madsen, A.Ø.: Phai: A deep-learning approach to solve the crystallographic phase problem. Science 385(6708), 522–528 (2024)
work page 2024
-
[19]
Acta Crystallographica Section C: Structural Chemistry 71(1), 3–8 (2015)
Sheldrick, G.M.: Crystal structure refinement with shelxl. Acta Crystallographica Section C: Structural Chemistry 71(1), 3–8 (2015)
work page 2015
-
[20]
Journal of Applied Crystallography 36(6), 1487–1487 (2003)
Betteridge, P.W., Carruthers, J.R., Cooper, R.I., Prout, K., Watkin, D.J.: Crys- tals version 12: software for guided crystal structure analysis. Journal of Applied Crystallography 36(6), 1487–1487 (2003)
work page 2003
-
[21]
Journal of applied crystallography 42(2), 339–341 (2009)
Dolomanov, O.V., Bourhis, L.J., Gildea, R.J., Howard, J.A., Puschmann, H.: Olex2: a complete structure solution, refinement and analysis program. Journal of applied crystallography 42(2), 339–341 (2009)
work page 2009
-
[22]
Journal of applied crystallography 44(6), 1281–1284 (2011)
H¨ ubschle, C.B., Sheldrick, G.M., Dittrich, B.: Shelxle: a qt graphical user interface for shelxl. Journal of applied crystallography 44(6), 1281–1284 (2011)
work page 2011
-
[23]
journal of Applied Crystallography 32(4), 837–838 (1999)
Farrugia, L.J.: Wingx suite for small-molecule single-crystal crystallography. journal of Applied Crystallography 32(4), 837–838 (1999)
work page 1999
-
[24]
nature 542(7639), 115–118 (2017)
Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. nature 542(7639), 115–118 (2017)
work page 2017
-
[25]
Nature 624(7990), 80–85 (2023)
Merchant, A., Batzner, S., Schoenholz, S.S., Aykol, M., Cheon, G., Cubuk, E.D.: Scaling deep learning for materials discovery. Nature 624(7990), 80–85 (2023)
work page 2023
-
[26]
: A principal odor map unifies diverse tasks in olfactory perception
Lee, B.K., Mayhew, E.J., Sanchez-Lengeling, B., Wei, J.N., Qian, W.W., Little, K.A., Andres, M., Nguyen, B.B., Moloy, T., Yasonik, J., et al. : A principal odor map unifies diverse tasks in olfactory perception. Science 381(6661), 999–1006 (2023)
work page 2023
-
[27]
: Improved protein structure prediction using potentials from deep learning
Senior, A.W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., ˇZ´ ıdek, A., Nelson, A.W., Bridgland, A., et al. : Improved protein structure prediction using potentials from deep learning. Nature577(7792), 706–710 (2020)
work page 2020
-
[28]
Nature Methods 19(6), 730–739 (2022)
Tubiana, J., Schneidman-Duhovny, D., Wolfson, H.J.: Scannet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nature Methods 19(6), 730–739 (2022)
work page 2022
-
[29]
Park, W.B., Chung, J., Jung, J., Sohn, K., Singh, S.P., Pyo, M., Shin, N., Sohn, K.-S.: Classification of crystal structure using a convolutional neural network. 12 IUCrJ 4(4), 486–494 (2017)
work page 2017
-
[30]
Nature communications 9(1), 2775 (2018)
Ziletti, A., Kumar, D., Scheffler, M., Ghiringhelli, L.M.: Insightful classification of crystal structures using deep learning. Nature communications 9(1), 2775 (2018)
work page 2018
-
[31]
npj Computational Materials 5(1), 60 (2019)
Oviedo, F., Ren, Z., Sun, S., Settens, C., Liu, Z., Hartono, N.T.P., Ramasamy, S., DeCost, B.L., Tian, S.I., Romano, G., et al.: Fast and interpretable classifica- tion of small x-ray diffraction datasets using data augmentation and deep neural networks. npj Computational Materials 5(1), 60 (2019)
work page 2019
-
[32]
Science 367(6477), 564–568 (2020)
Kaufmann, K., Zhu, C., Rosengarten, A.S., Maryanovsky, D., Harrington, T.J., Marin, E., Vecchio, K.S.: Crystal symmetry determination in electron diffraction using machine learning. Science 367(6477), 564–568 (2020)
work page 2020
-
[33]
Journal of applied crystallography 42(4), 726–729 (2009)
Graˇ zulis, S., Chateigner, D., Downs, R.T., Yokochi, A., Quir´ os, M., Lutterotti, L., Manakova, E., Butkus, J., Moeck, P., Le Bail, A.: Crystallography open database– an open-access collection of crystal structures. Journal of applied crystallography 42(4), 726–729 (2009)
work page 2009
-
[34]
https://clarivate.com/products/scientific-and-academic-research/research- analytics-evaluation-and-management-solutions/journal-citation-reports/
-
[35]
Advances in neural information processing systems 30 (2017)
Sch¨ utt, K., Kindermans, P.-J., Sauceda Felix, H.E., Chmiela, S., Tkatchenko, A., M¨ uller, K.-R.: Schnet: A continuous-filter convolutional neural network for mod- eling quantum interactions. Advances in neural information processing systems 30 (2017)
work page 2017
-
[36]
In: International Conference on Learning Representations (2020)
Gasteiger, J., Groß, J., G¨ unnemann, S.: Directional message passing for molecular graphs. In: International Conference on Learning Representations (2020)
work page 2020
-
[37]
In: International Conference on Learning Representations (2022)
Liu, Y., Wang, L., Liu, M., Lin, Y., Zhang, X., Oztekin, B., Ji, S.: Spherical message passing for 3d molecular graphs. In: International Conference on Learning Representations (2022)
work page 2022
-
[38]
In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K
Wang, L., Liu, Y., Lin, Y., Liu, H., Ji, S.: ComENet: Towards complete and efficient message passing for 3d molecular graphs. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)
work page 2022
-
[39]
In: International Conference on Learning Representations (2022)
Th¨ olke, P., Fabritiis, G.D.: Equivariant transformers for neural network based molecular potentials. In: International Conference on Learning Representations (2022)
work page 2022
-
[40]
Science 355(6321), 166–169 (2017) 13
Palatinus, L., Br´ azda, P., Boullay, P., Perez, O., Klementov´ a, M., Petit, S., Eigner, V., Zaarour, M., Mintova, S.: Hydrogen positions in single nanocrystals revealed by electron diffraction. Science 355(6321), 166–169 (2017) 13
work page 2017
-
[41]
Acta Crystallographica Section A: Foundations and Advances 75(1), 82–93 (2019)
Clabbers, M.T., Gruene, T., Genderen, E., Abrahams, J.P.: Reducing dynamical electron scattering reveals hydrogen atoms. Acta Crystallographica Section A: Foundations and Advances 75(1), 82–93 (2019)
work page 2019
-
[42]
Acta Crystallo- graphica Section D: Biological Crystallography 65(2), 148–155 (2009)
Spek, A.L.: Structure validation in chemical crystallography. Acta Crystallo- graphica Section D: Biological Crystallography 65(2), 148–155 (2009)
work page 2009
-
[43]
In: Juraf- sky, D., Chai, J., Schluter, N., Tetreault, J
Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. In: Juraf- sky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4190–4197. Association for Computational Linguistics, Online (2020)
work page 2020
-
[44]
Journal of machine learning research 9(11) (2008)
Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research 9(11) (2008)
work page 2008
-
[45]
Nature Synthesis 2(6), 483–492 (2023)
Abolhasani, M., Kumacheva, E.: The rise of self-driving labs in chemical and materials sciences. Nature Synthesis 2(6), 483–492 (2023)
work page 2023
-
[46]
Nature Chemistry 15(6), 848–855 (2023)
Klar, P.B., Krysiak, Y., Xu, H., Steciuk, G., Cho, J., Zou, X., Palatinus, L.: Accu- rate structure models and absolute configuration determination using dynamical effects in continuous-rotation 3d electron diffraction data. Nature Chemistry 15(6), 848–855 (2023)
work page 2023
-
[47]
Journal of the American Chemical Society 143(23), 8713–8719 (2021)
Kapaca, E., Jiang, J., Cho, J., Jord´ a, J.L., Diaz-Cabanas, M.J., Zou, X., Corma, A., Willhammar, T.: Synthesis and structure of a 22 × 12× 12 extra-large pore zeolite itq-56 determined by 3d electron diffraction. Journal of the American Chemical Society 143(23), 8713–8719 (2021)
work page 2021
-
[48]
Chemical Reviews 122(17), 13883–13914 (2022)
Saha, A., Nia, S.S., Rodr´ ıguez, J.A.: Electron diffraction of 3d molecular crystals. Chemical Reviews 122(17), 13883–13914 (2022)
work page 2022
-
[49]
Nature 601(7893), 360–365 (2022)
Schriber, E.A., Paley, D.W., Bolotovsky, R., Rosenberg, D.J., Sierra, R.G., Aquila, A., Mendez, D., Poitevin, F., Blaschke, J.P., Bhowmick, A., et al.: Chem- ical crystallography by serial femtosecond x-ray diffraction. Nature 601(7893), 360–365 (2022)
work page 2022
-
[50]
Nature chemistry 15(4), 491–497 (2023)
Takaba, K., Maki-Yonekura, S., Inoue, I., Tono, K., Hamaguchi, T., Kawakami, K., Naitow, H., Ishikawa, T., Yabashi, M., Yonekura, K.: Structural resolu- tion of a small organic molecule by serial x-ray free-electron laser and electron crystallography. Nature chemistry 15(4), 491–497 (2023)
work page 2023
-
[51]
https://doi.org/10.5281/zenodo.13820303 Data and code availability: All data required to validate the conclusions of this paper are provided in the manuscript, the supplementary materials, and a data archive containing all relevant 14 data, code, and model parameters, accessible at [51]. A web application is available at https://crystalx.intern-ai.org.cn/...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.