pith. sign in

arxiv: 2604.17994 · v1 · submitted 2026-04-20 · ❄️ cond-mat.mtrl-sci

SWORD: Symmetry and Wyckoff-sequence of Ordered and Disordered crystals

Pith reviewed 2026-05-10 04:18 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci
keywords Wyckoff positionscrystal structure representationdisordered crystalssymmetry standardizationdatabase deduplicationmaterials informaticspartial occupancy
0
0 comments X

The pith

SWORD introduces a Wyckoff-sequence string that standardizes symmetry-equivalent descriptions of ordered and disordered crystals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SWORD as a symmetry-aware string representation built from Wyckoff positions that creates a single consistent label for any crystal structure. The string explicitly records co-occupying atoms on partially filled sites and adds a numerical degree-of-mixing value that tracks continuous changes in site stoichiometry. If the method works as claimed, large crystallographic databases can group equivalent entries, remove duplicates, and judge whether newly proposed structures are genuinely distinct even when they contain disorder or are only partially relaxed.

Core claim

SWORD is a symmetry-aware, Wyckoff-based string representation compatible with both ordered and disordered crystals. It standardizes symmetry-equivalent structural descriptions into a consistent label, explicitly represents co-occupying species on partially occupied sites, and quantifies complex disorder through a degree of mixing descriptor that captures continuous variation in site stoichiometry. These features enable efficient structure grouping, duplicate identification, and finer refinement of disordered structures, with demonstrated invariance under identity-preserving transformations and competitive performance in linking unrelaxed configurations to their relaxed states.

What carries the argument

The SWORD string: a Wyckoff-sequence encoding that incorporates site occupancy details and a continuous mixing-degree metric to produce a standardized, interpretable label for any crystal.

If this is right

  • Structures receive the same label regardless of how symmetry or disorder is described in the input file.
  • Duplicate entries can be detected and removed even when sites are fractionally occupied.
  • Novelty checks become feasible on partially relaxed or unrelaxed candidate structures.
  • Large-scale curation of the ICSD becomes practical, producing a cleaner base for data-driven materials design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Generative models could use the string as a filter to avoid proposing duplicate or near-duplicate candidates during high-throughput screening.
  • The mixing descriptor might serve as a continuous order parameter for monitoring disorder evolution along molecular-dynamics or relaxation paths.
  • Similar Wyckoff-sequence encodings could be tested on defect-containing or surface structures to extend the same deduplication logic beyond bulk crystals.

Load-bearing premise

The Wyckoff-based string stays the same under any symmetry-preserving re-description of a structure yet changes when the underlying atomic arrangement actually differs, and this behavior holds without hidden errors at database scale.

What would settle it

Finding two symmetry-equivalent but differently written descriptions of the same disordered crystal that receive different SWORD strings, or two genuinely distinct structures that receive identical strings.

read the original abstract

Novelty in materials discovery requires candidates to be distinct, non-redundant, and thermodynamically plausible. While crystallographic databases continue to expand in both size and complexity, making efficient and reliable novelty assessment has become increasingly difficult. This becomes particularly acute when crystallographic disorder is involved, as partial occupancies greatly enlarge the structure-composition space and obscure the identification of genuinely distinct structures. Here, we introduce SWORD, a symmetry-aware, Wyckoff-based string representation compatible with both ordered and disordered crystals. SWORD provides (i) standardization of symmetry-equivalent structural descriptions into a consistent label, (ii) explicitly represents co-occupying species on partially occupied sites, and (iii) quantifies complex disorder through a degree of mixing descriptor that captures continuous variation in site stoichiometry. These features enable efficient structure grouping, duplicate identification, and finer refinement of disordered structures. Benchmarking against existing fingerprint and structure-matching methods shows that SWORD remains invariant under identity-preserving transformations while retaining interpretable sensitivity to structural perturbations. In addition, SWORD shows competitive performance in associating unrelaxed and intermediate configurations with their final relaxed states along relaxation trajectories. This feature could enable more reliable novelty assessment directly from partially relaxed or even unrelaxed generated structures. Finally, SWORD was used to showcase its capability of disorder-aware database-scale deduplication and curation for the Inorganic Crystal Structure Database (ICSD). The curated ICSD would serve as the basis for the materials informatics and data-driven materials design in the era of artificial intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces SWORD, a symmetry-aware Wyckoff-sequence string representation for both ordered and disordered crystals. It claims three main capabilities: (i) standardization of symmetry-equivalent structural descriptions into a consistent label, (ii) explicit representation of co-occupying species on partially occupied sites, and (iii) a degree-of-mixing descriptor that quantifies continuous variation in site stoichiometry. The work further asserts that SWORD is invariant under identity-preserving transformations while remaining sensitive to structural changes, shows competitive performance in associating unrelaxed and intermediate structures with relaxed endpoints along relaxation trajectories, and enables disorder-aware deduplication and curation of the full ICSD.

Significance. If the invariance, sensitivity, and curation claims hold with reproducible implementation, SWORD would provide a practical advance for materials databases and informatics by addressing the long-standing difficulty of handling partial occupancies and disorder in structure matching and novelty assessment. The representation is built from standard crystallographic inputs (space-group operations and Wyckoff positions) without circular dependence on the paper's own outputs, and the explicit treatment of mixed-site stoichiometry is a timely contribution for AI-driven materials design.

major comments (3)
  1. [Abstract / Results] Abstract and Results section: Benchmarking is described only qualitatively (invariance under identity-preserving transformations, competitive performance on relaxation trajectories, successful ICSD curation) with no quantitative metrics, error bars, success rates, false-positive/false-negative rates, exclusion criteria, or comparison tables, leaving the central performance claims without verifiable numerical support.
  2. [Methods] Methods section: The precise canonicalization rules required to construct the Wyckoff-sequence string (site sorting order, species ordering on mixed-occupancy sites, numerical tolerance on occupancy floats, handling of origin shifts, atom relabeling, and incommensurate or modulated disorder) are not supplied in a form that permits independent re-implementation, which is load-bearing for confirming the claimed invariance under all identity-preserving transformations.
  3. [Results] Results section (ICSD curation): No details are given on how false merges or splits were detected or avoided during database-scale deduplication, nor on the size of the curated set or any validation against known duplicates, undermining the reliability claim for the final curated ICSD.
minor comments (2)
  1. [Methods] Notation for the degree-of-mixing descriptor should be defined explicitly with its mathematical form and parameter values (if any) in the main text rather than left to supplementary material.
  2. [Figures] Figure captions for any trajectory or invariance plots should include the exact number of structures tested and the definition of 'competitive performance' relative to the baselines used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of SWORD's potential impact. We address each major comment below with specific plans for revision to improve clarity, reproducibility, and quantitative support.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results section: Benchmarking is described only qualitatively (invariance under identity-preserving transformations, competitive performance on relaxation trajectories, successful ICSD curation) with no quantitative metrics, error bars, success rates, false-positive/false-negative rates, exclusion criteria, or comparison tables, leaving the central performance claims without verifiable numerical support.

    Authors: We agree that the current presentation of benchmarking is primarily qualitative. In the revised manuscript we will expand the Results section to include quantitative metrics: success rates for invariance under identity-preserving transformations (tested on a large set of structures), performance statistics (e.g., association accuracy) on relaxation trajectories with direct comparisons to existing methods, and tables reporting these values. Where appropriate, error bars from repeated sampling or bootstrapping will be added, along with explicit false-positive/false-negative rates and exclusion criteria used in the tests. These additions will provide the verifiable numerical support requested. revision: yes

  2. Referee: [Methods] Methods section: The precise canonicalization rules required to construct the Wyckoff-sequence string (site sorting order, species ordering on mixed-occupancy sites, numerical tolerance on occupancy floats, handling of origin shifts, atom relabeling, and incommensurate or modulated disorder) are not supplied in a form that permits independent re-implementation, which is load-bearing for confirming the claimed invariance under all identity-preserving transformations.

    Authors: We acknowledge that the Methods section lacks the level of detail needed for full reproducibility. The revised version will include a dedicated subsection with explicit, step-by-step canonicalization rules: the precise site-sorting order, lexicographic ordering of species on mixed-occupancy sites, numerical tolerances applied to occupancy values, procedures for origin shifts and atom relabeling, and handling of incommensurate or modulated structures. Pseudocode or a clear algorithmic outline will be added so that the invariance properties can be independently verified. revision: yes

  3. Referee: [Results] Results section (ICSD curation): No details are given on how false merges or splits were detected or avoided during database-scale deduplication, nor on the size of the curated set or any validation against known duplicates, undermining the reliability claim for the final curated ICSD.

    Authors: We agree that additional transparency is required for the ICSD curation claim. In the revised Results section we will describe the deduplication procedure in detail, including the criteria and secondary checks used to detect and avoid false merges or splits, the exact size of the final curated ICSD, and validation steps performed against known duplicate sets or through sampling and manual review. These additions will substantiate the reliability of the curation process. revision: yes

Circularity Check

0 steps flagged

No circularity detected; SWORD is a direct construction from independent crystallographic inputs

full rationale

The paper defines SWORD explicitly from standard, externally supplied crystallographic primitives (space-group symmetry operations, Wyckoff positions, and site occupancies) without any self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations. Invariance and deduplication claims are supported by benchmarking against external methods rather than by internal reduction. No derivation step reduces to its own output by construction, satisfying the default expectation of a non-circular representation tool.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method rests on standard crystallographic descriptions of symmetry and Wyckoff positions without introducing new physical entities or heavily fitted parameters beyond the mixing descriptor definition.

free parameters (1)
  • degree of mixing descriptor parameters
    The continuous descriptor for quantifying site stoichiometry variation is introduced but its exact functional form or tuning is not detailed in the abstract.
axioms (2)
  • standard math Symmetry-equivalent descriptions represent the same physical structure
    Invoked to justify standardization of labels under identity-preserving transformations.
  • domain assumption Crystal structures are fully described by space group and Wyckoff site occupancies
    Core premise enabling the string representation for both ordered and disordered cases.

pith-pipeline@v0.9.0 · 5598 in / 1533 out tokens · 52578 ms · 2026-05-10T04:18:32.746644+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020)

  2. [2]

    Curtarolo, S. et al. The high-throughput highway to computational materials design. Nature Mater 12, 191–201 (2013)

  3. [3]

    & Takeuchi, I

    Koinuma, H. & Takeuchi, I. Combinatorial solid -state chemistry of inorganic materials. Nature Mater 3, 429–438 (2004)

  4. [4]

    & Curtarolo, S

    Yang, K., Setyawan, W., Wang, S., Buongiorno Nardelli, M. & Curtarolo, S. A search model for topological insulators with high- throughput robustness descriptors. Nature Mater 11, 614–619 (2012)

  5. [5]

    Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023)

  6. [6]

    Joshi, C

    Jiao, R., Huang, W., Liu, Y ., Zhao, D. & Liu, Y . Space Group Constrained Crystal Generation. Preprint at https://doi.org/10.48550/arXiv.2402.03992 (2024)

  7. [7]

    Kazeev, N. et al. Wyckoff Transformer: Generation of Symmetric Crystals

  8. [8]

    & Hippalgaonkar, K

    Zhu, R., Nong, W., Yamazaki, S. & Hippalgaonkar, K. WyCryst: Wyckoff inorganic crystal generator framework. Matter 7, 3469–3488 (2024)

  9. [9]

    Zeni, C. et al. A generative model for inorganic materials design. Nature 639, 624–632 (2025)

  10. [10]

    Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, (2013)

  11. [11]

    Cheetham, A. K. & Seshadri, R. Artificial Intelligence Driving Materials Discovery? Perspective on the Article: Scaling Deep Learning for Materials Discovery. Chem. Mater. 36, 3490–3495 (2024)

  12. [12]

    Martirossyan, M. M. et al. All that structure matches does not glitter. Preprint at https://doi.org/10.48550/arXiv.2509.12178 (2025)

  13. [13]

    Li, Q., Fu, N., Omee, S. S. & Hu, J. MD -HIT: Machine learning for material property prediction with dataset redundancy control. npj Comput Mater 10, 245 (2024)

  14. [14]

    Li, K. et al. Exploiting redundancy in large materials datasets for efficient machine learning with less data. Nat Commun 14, 7283 (2023)

  15. [15]

    & Liu, Y

    Xiao, B., Tang, Y . & Liu, Y . Integrating Materials Representations Into Feature Engineering in Machine Learning for Crystalline Materials: From Local to Global Chemistry-Structure Information Coupling. Wiley Interdisciplinary Reviews: Computational Molecular Science 15, e70044 (2025)

  16. [16]

    Isayev, O. et al. Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints. Chem. Mater. 27, 735–743 (2015)

  17. [17]

    & Cumby, J

    Zhang, R.-Z., Seth, S. & Cumby, J. Grouped representation of interatomic distances as a similarity measure for crystal structures. Digital Discovery 2, 81–90 (2023)

  18. [18]

    -P., Wang, H.- C., Rignanese, G.- M., Botti, S

    De Breuck, P. -P., Wang, H.- C., Rignanese, G.- M., Botti, S. & Marques, M. A. L. 20 / 26 Generative AI for crystal structures: a review. npj Comput Mater 11, 370 (2025)

  19. [19]

    Mehl, M. J. et al. The AFLOW Library of Crystallographic Prototypes: Part 1. Computational Materials Science 136, S1–S828 (2017)

  20. [20]

    & Hinek, R

    Allmann, R. & Hinek, R. The introduction of structure types into the Inorganic Crystal Structure Database ICSD. Acta Crystallogr A 63, 412–417 (2007)

  21. [21]

    Gong, S. et al. Examining graph neural networks for crystal structures: Limitations and opportunities for capturing periodicity. Science Advances 9, eadi3245 (2023)

  22. [22]

    Siron, M. et al. LeMat-Bulk: aggregating, and de-duplicating quantum chemistry materials databases. Preprint at https://doi.org/10.48550/ARXIV .2511.05178 (2025)

  23. [23]

    & Kurlin, V

    Widdowson, D. & Kurlin, V . Geographic-style maps with a local novelty distance help navigate in the materials space. Sci Rep 15, 27588 (2025)

  24. [24]

    C., Natarajan, A

    Thomas, J. C., Natarajan, A. R. & Van der V en, A. Comparing crystal structures with symmetry and geometry. npj Comput Mater 7, 164 (2021)

  25. [25]

    Gelato, L. M. & Parthé, E. STRUCTURE TIDY – a computer program to standardize crystal structure data. J Appl Cryst 20, 139–143 (1987)

  26. [26]

    de la Flor, G., Orobengoa, D., Tasci, E., Perez-Mato, J. M. & Aroyo, M. I. Comparison of structures applying the tools available at the Bilbao Crystallographic Server. Journal of Applied Crystallography 49, 653–664 (2016)

  27. [27]

    Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open- source python library for materials analysis. Computational Materials Science 68, 314–319 (2013)

  28. [28]

    Hicks, D. et al. AFLOW-XtalFinder: a reliable choice to identify crystalline prototypes. npj Comput Mater 7, 30 (2021)

  29. [29]

    Life and death of colloidal bonds control the rate-dependent rheology of gels

    Xu, C., Zhu, S. & Viswanathan, V . CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning. Nat Commun https://doi.org/10.1038/s41467- 026-70467-3 (2026) doi:10.1038/s41467-026-70467-3

  30. [30]

    Xiao, H. et al. An invertible, invariant crystal representation for inverse design of solid- state materials using generative deep learning. Nat Commun 14, 7027 (2023)

  31. [31]

    & Goodwin, A

    Simonov, A. & Goodwin, A. L. Designing disorder into crystalline materials. Nat Rev Chem 4, 657–673 (2020)

  32. [32]

    Divilov, S. et al. AFLOW4: heading toward disorder. High Entropy Alloys Mater. 3, 178– 187 (2025)

  33. [33]

    Qiu, G. et al. High entropy powering green energy: hydrogen, batteries, electronics, and catalysis. npj Comput Mater 11, 145 (2025)

  34. [34]

    The Inorganic Crystal Structure Database (ICSD): A Tool for Materials Sciences

    Rühl, S. The Inorganic Crystal Structure Database (ICSD): A Tool for Materials Sciences. in Materials Informatics 41–54 (John Wiley & Sons, Ltd, 2019). doi:10.1002/9783527802265.ch2

  35. [35]

    M., Dyer, M

    Antypov, D., Collins, C. M., Dyer, M. S., Claridge, J. B. & Rosseinsky, M. J. Classification and statistical analysis of structural disorder in crystalline materials. J Appl Crystallogr 58, 21 / 26 659–677 (2025)

  36. [36]

    Continued Challenges in High-Throughput Materials Predictions: MatterGen predicts compounds from the training dataset

    Juelsholt, M. Continued Challenges in High-Throughput Materials Predictions: MatterGen predicts compounds from the training dataset. Preprint at https://doi.org/10.26434/chemrxiv-2025-mkls8 (2025)

  37. [37]

    Leeman, J. et al. Challenges in High -Throughput Inorganic Materials Prediction and Autonomous Synthesis. PRX Energy 3, 011002 (2024)

  38. [38]

    Oses, C. et al. aflow++: A C++ framework for autonomous materials design. Computational Materials Science 217, 111889 (2023)

  39. [39]

    Pielou, E. C. The measurement of diversity in different types of biological collections. Journal of Theoretical Biology 13, 131–144 (1966)

  40. [40]

    & Tanaka, I

    Togo, A., Shinohara, K. & Tanaka, I. Spglib: a software library for crystal symmetry search. Science and Technology of Advanced Materials: Methods 4, 2384822 (2024)

  41. [41]

    International Tables for Crystallography: Space -Group Symmetry. vol. A (International Union of Crystallography, Chester, England, 2016)

  42. [42]

    Hall, S. R. Space-group notation with an explicit origin. Acta Crystallographica Section A 37, 517–525 (1981)

  43. [43]

    Schmidt, J. et al. Machine-Learning-Assisted Determination of the Global Zero - Temperature Phase Diagram of Materials. Adv Mater 35, e2210788 (2023)

  44. [44]

    Kirklin, S. et al. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput Mater 1, 15010 (2015)

  45. [45]

    S., Sun, W

    Aykol, M., Dwaraknath, S. S., Sun, W. & Persson, K. A. Thermodynamic limit for synthesis of metastable inorganic materials. Sci. Adv. 4, eaaq0148 (2018)

  46. [46]

    Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge - informed atomistic modelling. Nat Mach Intell 5, 1031–1041 (2023)

  47. [47]

    a b c d e f g h i j

    Shannon, R. D. Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. Acta Crystallographica Section A 32, 751–767 (1976). 22 / 26 Supporting Information S1. Glossary - Identity-preserving transformations - Transformations that are treated in this benchmark as preserving structural identity. In this wor...