SWORD: Symmetry and Wyckoff-sequence of Ordered and Disordered crystals
Pith reviewed 2026-05-10 04:18 UTC · model grok-4.3
The pith
SWORD introduces a Wyckoff-sequence string that standardizes symmetry-equivalent descriptions of ordered and disordered crystals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SWORD is a symmetry-aware, Wyckoff-based string representation compatible with both ordered and disordered crystals. It standardizes symmetry-equivalent structural descriptions into a consistent label, explicitly represents co-occupying species on partially occupied sites, and quantifies complex disorder through a degree of mixing descriptor that captures continuous variation in site stoichiometry. These features enable efficient structure grouping, duplicate identification, and finer refinement of disordered structures, with demonstrated invariance under identity-preserving transformations and competitive performance in linking unrelaxed configurations to their relaxed states.
What carries the argument
The SWORD string: a Wyckoff-sequence encoding that incorporates site occupancy details and a continuous mixing-degree metric to produce a standardized, interpretable label for any crystal.
If this is right
- Structures receive the same label regardless of how symmetry or disorder is described in the input file.
- Duplicate entries can be detected and removed even when sites are fractionally occupied.
- Novelty checks become feasible on partially relaxed or unrelaxed candidate structures.
- Large-scale curation of the ICSD becomes practical, producing a cleaner base for data-driven materials design.
Where Pith is reading between the lines
- Generative models could use the string as a filter to avoid proposing duplicate or near-duplicate candidates during high-throughput screening.
- The mixing descriptor might serve as a continuous order parameter for monitoring disorder evolution along molecular-dynamics or relaxation paths.
- Similar Wyckoff-sequence encodings could be tested on defect-containing or surface structures to extend the same deduplication logic beyond bulk crystals.
Load-bearing premise
The Wyckoff-based string stays the same under any symmetry-preserving re-description of a structure yet changes when the underlying atomic arrangement actually differs, and this behavior holds without hidden errors at database scale.
What would settle it
Finding two symmetry-equivalent but differently written descriptions of the same disordered crystal that receive different SWORD strings, or two genuinely distinct structures that receive identical strings.
read the original abstract
Novelty in materials discovery requires candidates to be distinct, non-redundant, and thermodynamically plausible. While crystallographic databases continue to expand in both size and complexity, making efficient and reliable novelty assessment has become increasingly difficult. This becomes particularly acute when crystallographic disorder is involved, as partial occupancies greatly enlarge the structure-composition space and obscure the identification of genuinely distinct structures. Here, we introduce SWORD, a symmetry-aware, Wyckoff-based string representation compatible with both ordered and disordered crystals. SWORD provides (i) standardization of symmetry-equivalent structural descriptions into a consistent label, (ii) explicitly represents co-occupying species on partially occupied sites, and (iii) quantifies complex disorder through a degree of mixing descriptor that captures continuous variation in site stoichiometry. These features enable efficient structure grouping, duplicate identification, and finer refinement of disordered structures. Benchmarking against existing fingerprint and structure-matching methods shows that SWORD remains invariant under identity-preserving transformations while retaining interpretable sensitivity to structural perturbations. In addition, SWORD shows competitive performance in associating unrelaxed and intermediate configurations with their final relaxed states along relaxation trajectories. This feature could enable more reliable novelty assessment directly from partially relaxed or even unrelaxed generated structures. Finally, SWORD was used to showcase its capability of disorder-aware database-scale deduplication and curation for the Inorganic Crystal Structure Database (ICSD). The curated ICSD would serve as the basis for the materials informatics and data-driven materials design in the era of artificial intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SWORD, a symmetry-aware Wyckoff-sequence string representation for both ordered and disordered crystals. It claims three main capabilities: (i) standardization of symmetry-equivalent structural descriptions into a consistent label, (ii) explicit representation of co-occupying species on partially occupied sites, and (iii) a degree-of-mixing descriptor that quantifies continuous variation in site stoichiometry. The work further asserts that SWORD is invariant under identity-preserving transformations while remaining sensitive to structural changes, shows competitive performance in associating unrelaxed and intermediate structures with relaxed endpoints along relaxation trajectories, and enables disorder-aware deduplication and curation of the full ICSD.
Significance. If the invariance, sensitivity, and curation claims hold with reproducible implementation, SWORD would provide a practical advance for materials databases and informatics by addressing the long-standing difficulty of handling partial occupancies and disorder in structure matching and novelty assessment. The representation is built from standard crystallographic inputs (space-group operations and Wyckoff positions) without circular dependence on the paper's own outputs, and the explicit treatment of mixed-site stoichiometry is a timely contribution for AI-driven materials design.
major comments (3)
- [Abstract / Results] Abstract and Results section: Benchmarking is described only qualitatively (invariance under identity-preserving transformations, competitive performance on relaxation trajectories, successful ICSD curation) with no quantitative metrics, error bars, success rates, false-positive/false-negative rates, exclusion criteria, or comparison tables, leaving the central performance claims without verifiable numerical support.
- [Methods] Methods section: The precise canonicalization rules required to construct the Wyckoff-sequence string (site sorting order, species ordering on mixed-occupancy sites, numerical tolerance on occupancy floats, handling of origin shifts, atom relabeling, and incommensurate or modulated disorder) are not supplied in a form that permits independent re-implementation, which is load-bearing for confirming the claimed invariance under all identity-preserving transformations.
- [Results] Results section (ICSD curation): No details are given on how false merges or splits were detected or avoided during database-scale deduplication, nor on the size of the curated set or any validation against known duplicates, undermining the reliability claim for the final curated ICSD.
minor comments (2)
- [Methods] Notation for the degree-of-mixing descriptor should be defined explicitly with its mathematical form and parameter values (if any) in the main text rather than left to supplementary material.
- [Figures] Figure captions for any trajectory or invariance plots should include the exact number of structures tested and the definition of 'competitive performance' relative to the baselines used.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of SWORD's potential impact. We address each major comment below with specific plans for revision to improve clarity, reproducibility, and quantitative support.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results section: Benchmarking is described only qualitatively (invariance under identity-preserving transformations, competitive performance on relaxation trajectories, successful ICSD curation) with no quantitative metrics, error bars, success rates, false-positive/false-negative rates, exclusion criteria, or comparison tables, leaving the central performance claims without verifiable numerical support.
Authors: We agree that the current presentation of benchmarking is primarily qualitative. In the revised manuscript we will expand the Results section to include quantitative metrics: success rates for invariance under identity-preserving transformations (tested on a large set of structures), performance statistics (e.g., association accuracy) on relaxation trajectories with direct comparisons to existing methods, and tables reporting these values. Where appropriate, error bars from repeated sampling or bootstrapping will be added, along with explicit false-positive/false-negative rates and exclusion criteria used in the tests. These additions will provide the verifiable numerical support requested. revision: yes
-
Referee: [Methods] Methods section: The precise canonicalization rules required to construct the Wyckoff-sequence string (site sorting order, species ordering on mixed-occupancy sites, numerical tolerance on occupancy floats, handling of origin shifts, atom relabeling, and incommensurate or modulated disorder) are not supplied in a form that permits independent re-implementation, which is load-bearing for confirming the claimed invariance under all identity-preserving transformations.
Authors: We acknowledge that the Methods section lacks the level of detail needed for full reproducibility. The revised version will include a dedicated subsection with explicit, step-by-step canonicalization rules: the precise site-sorting order, lexicographic ordering of species on mixed-occupancy sites, numerical tolerances applied to occupancy values, procedures for origin shifts and atom relabeling, and handling of incommensurate or modulated structures. Pseudocode or a clear algorithmic outline will be added so that the invariance properties can be independently verified. revision: yes
-
Referee: [Results] Results section (ICSD curation): No details are given on how false merges or splits were detected or avoided during database-scale deduplication, nor on the size of the curated set or any validation against known duplicates, undermining the reliability claim for the final curated ICSD.
Authors: We agree that additional transparency is required for the ICSD curation claim. In the revised Results section we will describe the deduplication procedure in detail, including the criteria and secondary checks used to detect and avoid false merges or splits, the exact size of the final curated ICSD, and validation steps performed against known duplicate sets or through sampling and manual review. These additions will substantiate the reliability of the curation process. revision: yes
Circularity Check
No circularity detected; SWORD is a direct construction from independent crystallographic inputs
full rationale
The paper defines SWORD explicitly from standard, externally supplied crystallographic primitives (space-group symmetry operations, Wyckoff positions, and site occupancies) without any self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations. Invariance and deduplication claims are supported by benchmarking against external methods rather than by internal reduction. No derivation step reduces to its own output by construction, satisfying the default expectation of a non-circular representation tool.
Axiom & Free-Parameter Ledger
free parameters (1)
- degree of mixing descriptor parameters
axioms (2)
- standard math Symmetry-equivalent descriptions represent the same physical structure
- domain assumption Crystal structures are fully described by space group and Wyckoff site occupancies
Reference graph
Works this paper leans on
-
[1]
Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020)
work page 2020
-
[2]
Curtarolo, S. et al. The high-throughput highway to computational materials design. Nature Mater 12, 191–201 (2013)
work page 2013
-
[3]
Koinuma, H. & Takeuchi, I. Combinatorial solid -state chemistry of inorganic materials. Nature Mater 3, 429–438 (2004)
work page 2004
-
[4]
Yang, K., Setyawan, W., Wang, S., Buongiorno Nardelli, M. & Curtarolo, S. A search model for topological insulators with high- throughput robustness descriptors. Nature Mater 11, 614–619 (2012)
work page 2012
-
[5]
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023)
work page 2023
-
[6]
Space group constrained crystal generation.arXiv preprint arXiv:2402.03992,
Jiao, R., Huang, W., Liu, Y ., Zhao, D. & Liu, Y . Space Group Constrained Crystal Generation. Preprint at https://doi.org/10.48550/arXiv.2402.03992 (2024)
-
[7]
Kazeev, N. et al. Wyckoff Transformer: Generation of Symmetric Crystals
-
[8]
Zhu, R., Nong, W., Yamazaki, S. & Hippalgaonkar, K. WyCryst: Wyckoff inorganic crystal generator framework. Matter 7, 3469–3488 (2024)
work page 2024
-
[9]
Zeni, C. et al. A generative model for inorganic materials design. Nature 639, 624–632 (2025)
work page 2025
-
[10]
Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, (2013)
work page 2013
-
[11]
Cheetham, A. K. & Seshadri, R. Artificial Intelligence Driving Materials Discovery? Perspective on the Article: Scaling Deep Learning for Materials Discovery. Chem. Mater. 36, 3490–3495 (2024)
work page 2024
-
[12]
Martirossyan, M. M. et al. All that structure matches does not glitter. Preprint at https://doi.org/10.48550/arXiv.2509.12178 (2025)
-
[13]
Li, Q., Fu, N., Omee, S. S. & Hu, J. MD -HIT: Machine learning for material property prediction with dataset redundancy control. npj Comput Mater 10, 245 (2024)
work page 2024
-
[14]
Li, K. et al. Exploiting redundancy in large materials datasets for efficient machine learning with less data. Nat Commun 14, 7283 (2023)
work page 2023
-
[15]
Xiao, B., Tang, Y . & Liu, Y . Integrating Materials Representations Into Feature Engineering in Machine Learning for Crystalline Materials: From Local to Global Chemistry-Structure Information Coupling. Wiley Interdisciplinary Reviews: Computational Molecular Science 15, e70044 (2025)
work page 2025
-
[16]
Isayev, O. et al. Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints. Chem. Mater. 27, 735–743 (2015)
work page 2015
-
[17]
Zhang, R.-Z., Seth, S. & Cumby, J. Grouped representation of interatomic distances as a similarity measure for crystal structures. Digital Discovery 2, 81–90 (2023)
work page 2023
-
[18]
-P., Wang, H.- C., Rignanese, G.- M., Botti, S
De Breuck, P. -P., Wang, H.- C., Rignanese, G.- M., Botti, S. & Marques, M. A. L. 20 / 26 Generative AI for crystal structures: a review. npj Comput Mater 11, 370 (2025)
work page 2025
-
[19]
Mehl, M. J. et al. The AFLOW Library of Crystallographic Prototypes: Part 1. Computational Materials Science 136, S1–S828 (2017)
work page 2017
-
[20]
Allmann, R. & Hinek, R. The introduction of structure types into the Inorganic Crystal Structure Database ICSD. Acta Crystallogr A 63, 412–417 (2007)
work page 2007
-
[21]
Gong, S. et al. Examining graph neural networks for crystal structures: Limitations and opportunities for capturing periodicity. Science Advances 9, eadi3245 (2023)
work page 2023
-
[22]
Siron, M. et al. LeMat-Bulk: aggregating, and de-duplicating quantum chemistry materials databases. Preprint at https://doi.org/10.48550/ARXIV .2511.05178 (2025)
work page internal anchor Pith review doi:10.48550/arxiv 2025
-
[23]
Widdowson, D. & Kurlin, V . Geographic-style maps with a local novelty distance help navigate in the materials space. Sci Rep 15, 27588 (2025)
work page 2025
-
[24]
Thomas, J. C., Natarajan, A. R. & Van der V en, A. Comparing crystal structures with symmetry and geometry. npj Comput Mater 7, 164 (2021)
work page 2021
-
[25]
Gelato, L. M. & Parthé, E. STRUCTURE TIDY – a computer program to standardize crystal structure data. J Appl Cryst 20, 139–143 (1987)
work page 1987
-
[26]
de la Flor, G., Orobengoa, D., Tasci, E., Perez-Mato, J. M. & Aroyo, M. I. Comparison of structures applying the tools available at the Bilbao Crystallographic Server. Journal of Applied Crystallography 49, 653–664 (2016)
work page 2016
-
[27]
Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open- source python library for materials analysis. Computational Materials Science 68, 314–319 (2013)
work page 2013
-
[28]
Hicks, D. et al. AFLOW-XtalFinder: a reliable choice to identify crystalline prototypes. npj Comput Mater 7, 30 (2021)
work page 2021
-
[29]
Life and death of colloidal bonds control the rate-dependent rheology of gels
Xu, C., Zhu, S. & Viswanathan, V . CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning. Nat Commun https://doi.org/10.1038/s41467- 026-70467-3 (2026) doi:10.1038/s41467-026-70467-3
-
[30]
Xiao, H. et al. An invertible, invariant crystal representation for inverse design of solid- state materials using generative deep learning. Nat Commun 14, 7027 (2023)
work page 2023
-
[31]
Simonov, A. & Goodwin, A. L. Designing disorder into crystalline materials. Nat Rev Chem 4, 657–673 (2020)
work page 2020
-
[32]
Divilov, S. et al. AFLOW4: heading toward disorder. High Entropy Alloys Mater. 3, 178– 187 (2025)
work page 2025
-
[33]
Qiu, G. et al. High entropy powering green energy: hydrogen, batteries, electronics, and catalysis. npj Comput Mater 11, 145 (2025)
work page 2025
-
[34]
The Inorganic Crystal Structure Database (ICSD): A Tool for Materials Sciences
Rühl, S. The Inorganic Crystal Structure Database (ICSD): A Tool for Materials Sciences. in Materials Informatics 41–54 (John Wiley & Sons, Ltd, 2019). doi:10.1002/9783527802265.ch2
-
[35]
Antypov, D., Collins, C. M., Dyer, M. S., Claridge, J. B. & Rosseinsky, M. J. Classification and statistical analysis of structural disorder in crystalline materials. J Appl Crystallogr 58, 21 / 26 659–677 (2025)
work page 2025
-
[36]
Juelsholt, M. Continued Challenges in High-Throughput Materials Predictions: MatterGen predicts compounds from the training dataset. Preprint at https://doi.org/10.26434/chemrxiv-2025-mkls8 (2025)
-
[37]
Leeman, J. et al. Challenges in High -Throughput Inorganic Materials Prediction and Autonomous Synthesis. PRX Energy 3, 011002 (2024)
work page 2024
-
[38]
Oses, C. et al. aflow++: A C++ framework for autonomous materials design. Computational Materials Science 217, 111889 (2023)
work page 2023
-
[39]
Pielou, E. C. The measurement of diversity in different types of biological collections. Journal of Theoretical Biology 13, 131–144 (1966)
work page 1966
-
[40]
Togo, A., Shinohara, K. & Tanaka, I. Spglib: a software library for crystal symmetry search. Science and Technology of Advanced Materials: Methods 4, 2384822 (2024)
work page 2024
-
[41]
International Tables for Crystallography: Space -Group Symmetry. vol. A (International Union of Crystallography, Chester, England, 2016)
work page 2016
-
[42]
Hall, S. R. Space-group notation with an explicit origin. Acta Crystallographica Section A 37, 517–525 (1981)
work page 1981
-
[43]
Schmidt, J. et al. Machine-Learning-Assisted Determination of the Global Zero - Temperature Phase Diagram of Materials. Adv Mater 35, e2210788 (2023)
work page 2023
-
[44]
Kirklin, S. et al. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput Mater 1, 15010 (2015)
work page 2015
-
[45]
Aykol, M., Dwaraknath, S. S., Sun, W. & Persson, K. A. Thermodynamic limit for synthesis of metastable inorganic materials. Sci. Adv. 4, eaaq0148 (2018)
work page 2018
-
[46]
Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge - informed atomistic modelling. Nat Mach Intell 5, 1031–1041 (2023)
work page 2023
-
[47]
Shannon, R. D. Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. Acta Crystallographica Section A 32, 751–767 (1976). 22 / 26 Supporting Information S1. Glossary - Identity-preserving transformations - Transformations that are treated in this benchmark as preserving structural identity. In this wor...
work page 1976
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.