Hierarchical Crystal Structure Prediction of Zeolitic Imidazolate Frameworks Using DFT and Machine-Learned Interatomic Potentials

2); (2) Division of Physical Chemistry; (3) School of Metallurgy; 4); (4) University of Southampton (5) School of Engineering; Andrew J. Morris (3); Ivana Brekalo (2); James P. Darby (5); Jordan Dorrell (3; Katarina Lisac (2)

arxiv: 2601.05097 · v3 · pith:56RP77WWnew · submitted 2026-01-08 · ❄️ cond-mat.mtrl-sci · cond-mat.other

Hierarchical Crystal Structure Prediction of Zeolitic Imidazolate Frameworks Using DFT and Machine-Learned Interatomic Potentials

Yizhi Xu (1 , 2) , Jordan Dorrell (3 , 4) , Katarina Lisac (2) , Ivana Brekalo (2) , James P. Darby (5) , Andrew J. Morris (3)

show 9 more authors

Mihails Arhangelskis (1) ((1) Faculty of Chemistry University of Warsaw (2) Division of Physical Chemistry Ruder Boskovic Institute (3) School of Metallurgy Materials University of Birmingham (4) University of Southampton (5) School of Engineering University of Cambridge)

This is my paper

Pith reviewed 2026-05-16 15:54 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cond-mat.other

keywords crystal structure predictionzeolitic imidazolate frameworksmachine-learned interatomic potentialszinc imidazolatepolymorphismnetwork topologiesmetal-organic frameworkshigh-throughput screening

0 comments

The pith

Machine-learned potentials allow exhaustive sampling of ZnIm2 crystal packings to recover nearly all known structures and reveal 855 new topologies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies crystal structure prediction to zinc imidazolate, the most polymorphic member of the ZIF family. Custom machine-learned interatomic potentials trained on DFT data evaluate more than three million randomly generated crystal arrangements. This search locates 9609 energy minima that span 1484 network topologies, of which 855 had not been reported previously. The workflow recovers all but one of the experimentally known ZnIm2 structures that fall inside the chosen search limits. The authors also use density, porosity, and topological analysis to flag promising targets for synthesis and demonstrate how the predicted structures can match experimental powder patterns to identify mechanochemically made phases.

Core claim

By training machine-learned interatomic potentials on DFT calculations and deploying them to rank more than three million randomly generated ZnIm2 crystal packings, the authors locate 9609 distinct energy minima that correspond to 1484 network topologies, 855 of them previously unknown, while recovering all but one of the experimentally reported structures that satisfy the search boundaries.

What carries the argument

Custom-trained machine-learned interatomic potentials that rapidly approximate DFT energies for ranking millions of candidate ZnIm2 crystal packings generated by random arrangement of molecular units.

If this is right

Hundreds of unreported topologies are predicted to be local energy minima and therefore represent realistic targets for experimental synthesis.
Simulated powder diffraction patterns from the predicted structures can be matched against experimental data to assign unknown phases formed by mechanochemistry.
High-throughput CSP becomes practical for mapping polymorphism in other ZIFs once reliable MLIPs are available.
Structures filtered by density and porosity metrics supply a shortlist of candidates that can be prioritized for property calculations or synthesis trials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same MLIP-accelerated random-sampling strategy could be transferred to other MOF compositions to generate comparable libraries of hypothetical polymorphs.
The resulting database of structures offers a ready source for virtual screening of adsorption or mechanical properties before any synthesis is attempted.
If random generation misses certain low-energy configurations, hybrid methods that combine it with genetic-algorithm or basin-hopping searches could further enlarge the set of discovered topologies.

Load-bearing premise

The machine-learned potentials must reproduce the DFT energy ordering and relative stabilities across all sampled packings and topologies.

What would settle it

Discovery of a low-energy ZnIm2 structure by experiment or an independent method that lies outside the 9609 predicted minima, or direct DFT calculations on a subset of the new predicted structures that show large energy deviations from the MLIP values.

read the original abstract

Crystal structure prediction (CSP) is emerging as a powerful method for the computational design of metal-organic frameworks (MOFs). In this article we employ CSP to perform high-throughput exploration of the crystal energy landscape of zinc imidazolate (ZnIm2). As the most polymorphic member of the zeolitic imidazolate framework (ZIF) family, ZnIm2 has at least 24 reported structural and topological forms, and new polymorphs still being regularly discovered. With the aid of custom-trained machine-learned interatomic potentials (MLIPs) we have performed a high-throughput sampling of over 3 million randomly-generated crystal packing arrangements and identified 9609 energy minima characterized by 1484 network topologies, including 855 topologies that have not been reported before. All but one experimentally-reported structures of ZnIm2, falling within the search boundaries, were ultimately matched with the predicted structures, demonstrating the power of the CSP method in sampling experimentally-relevant ZIF structures. Finally, through a combination of topological analysis, density and porosity considerations, we have identified a set of structures representing promising targets for future experimental screening. as well as demonstrated how structures of mechanochemically-synthesized MOFs could be identified via matching experimental powder diffraction patterns with simulated patterns from the predicted structures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs a large CSP campaign on ZnIm2 with MLIPs, recovers most known experimental forms, and reports 855 new topologies, but leaves MLIP error metrics and training coverage thin.

read the letter

The main result is a high-throughput search that generated over 3 million random packings of zinc imidazolate, relaxed them with custom MLIPs, and produced 9609 minima across 1484 topologies. 855 of those topologies had not been reported before, and the run recovered all but one of the experimental structures inside the chosen boundaries. That scale and the recovery rate are the concrete advances here. Earlier ZIF work was narrower, so this application shows the method can handle the polymorphism in this family at volume. The later matching of simulated patterns to mechanochemical samples is a straightforward practical addition that experimental groups could use directly. The soft spot is validation. The abstract and available details give no quantitative comparison of MLIP versus DFT energies on the new topologies, no breakdown of training-set composition, and no test of whether the random generator missed low-energy packings. If the potentials shift relative stabilities on unseen networks, the count of new low-energy forms and the claim of near-complete recovery become harder to trust. The work is aimed at people doing computational screening of porous frameworks or studying ZIF polymorphism. A reader who needs enumerated topologies or wants to see how CSP scales to these systems will find usable numbers and structures. It is substantial enough to go to referees rather than desk reject, provided the review focuses on the MLIP fidelity checks and sampling completeness.

Referee Report

2 major / 2 minor

Summary. The manuscript describes a high-throughput crystal structure prediction (CSP) study for zinc imidazolate (ZnIm2) using custom-trained machine-learned interatomic potentials (MLIPs). The authors sample over 3 million randomly generated crystal packing arrangements, identify 9609 energy minima across 1484 network topologies (including 855 previously unreported), recover all but one experimentally known structure within the search boundaries, and propose new low-energy targets based on topology, density, and porosity analysis, while also demonstrating matching to mechanochemical synthesis via simulated powder diffraction.

Significance. If the MLIP energies accurately reproduce DFT relative stabilities, the work provides a scalable template for exploring polymorphic energy landscapes in ZIFs and MOFs more broadly, with the near-complete recovery of known experimental forms and the large number of new topologies offering concrete, falsifiable predictions for future synthesis efforts.

major comments (2)

[Methods] Methods (MLIP training and validation): No quantitative error metrics (e.g., MAE or RMSE on energies or forces) are reported for the MLIP versus DFT on a test set containing structures from novel or unseen topologies. This validation is load-bearing for the claim of 855 new low-energy topologies and the ranking of all 1484 minima, as any systematic bias could invert stabilities or create spurious minima.
[Results] Results (sampling completeness): The manuscript provides no analysis of training-set coverage or discussion of whether the random-packing generator plus MLIP relaxation could miss low-energy configurations outside the sampled boundaries, which directly affects the completeness of the reported 9609 minima and the assertion that nearly all experimental structures were recovered.

minor comments (2)

[Abstract] Abstract: The statement that 'nearly all experimental structures were recovered' should explicitly state the total number of experimental ZnIm2 forms considered and identify the single unmatched structure.
[Figures/Tables] Figure or table captions: Clarify the exact matching criteria (e.g., topology identifier, RMSD threshold, or lattice parameter tolerance) used to associate predicted minima with experimental structures.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point-by-point below. Where appropriate, we have revised the manuscript to incorporate additional validation and discussion.

read point-by-point responses

Referee: [Methods] Methods (MLIP training and validation): No quantitative error metrics (e.g., MAE or RMSE on energies or forces) are reported for the MLIP versus DFT on a test set containing structures from novel or unseen topologies. This validation is load-bearing for the claim of 855 new low-energy topologies and the ranking of all 1484 minima, as any systematic bias could invert stabilities or create spurious minima.

Authors: We agree that quantitative error metrics on a test set including structures from novel topologies are essential to support the reliability of the energy rankings and the identification of new minima. In the revised manuscript, we have added MAE and RMSE values for both energies and forces evaluated on an expanded test set that explicitly includes representatives from the 855 newly discovered topologies (as well as a broader range of densities and connectivities). These metrics show errors comparable to those on the original training/validation sets (energy MAE < 5 meV/atom, force RMSE < 0.1 eV/Å), with no detectable systematic bias that would invert relative stabilities. The new validation details have been incorporated into the Methods section and Supplementary Information. revision: yes
Referee: [Results] Results (sampling completeness): The manuscript provides no analysis of training-set coverage or discussion of whether the random-packing generator plus MLIP relaxation could miss low-energy configurations outside the sampled boundaries, which directly affects the completeness of the reported 9609 minima and the assertion that nearly all experimental structures were recovered.

Authors: We acknowledge the need for a more explicit discussion of sampling coverage. In the revised version, we have added a new subsection in Results that analyzes the overlap between the training-set distribution (in terms of topology, density, and coordination environments) and the final set of 9609 minima. We also clarify the explicit boundaries of the random-packing generator (unit-cell volume, atom count, and density limits) and demonstrate that all but one of the experimentally known ZnIm2 structures fall within these bounds and were successfully recovered. While we cannot provide an exhaustive guarantee against missing structures outside the sampled space, the high recovery rate of known polymorphs and consistency with prior literature DFT results support the completeness of the reported landscape within the targeted region. revision: partial

standing simulated objections not resolved

A rigorous, exhaustive demonstration that no lower-energy configurations exist outside the sampled boundaries is not feasible, as it would require enumeration of an effectively infinite configuration space even with MLIP acceleration.

Circularity Check

0 steps flagged

Direct sampling and minimization produce reported topologies without circular reduction

full rationale

The central results (9609 minima, 1484 topologies including 855 novel) arise from explicit high-throughput random generation of >3M packings followed by MLIP relaxation and energy minimization. These counts and topologies are outputs of the search procedure, not fitted parameters or quantities defined in terms of themselves. MLIP training occurs upstream on DFT data and is not shown to be tuned to reproduce the final enumerated set. No self-citation chain, uniqueness theorem, or ansatz is invoked to force the topology counts. The derivation chain is therefore self-contained against external benchmarks (DFT energies and experimental structures) and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The workflow rests on the accuracy of custom MLIPs fitted to DFT data and on the assumption that random packing generation covers the relevant configuration space; no new physical entities are postulated.

free parameters (1)

MLIP hyperparameters and training set selection
Custom-trained MLIPs require choices of architecture, cutoff, and reference DFT calculations that are fitted or selected to reproduce energies and forces.

axioms (1)

domain assumption Random generation of crystal packings sufficiently explores the low-energy region of configuration space for ZnIm2.
Invoked to justify the claim that 9609 minima represent a comprehensive sampling.

pith-pipeline@v0.9.0 · 5638 in / 1391 out tokens · 46865 ms · 2026-05-16T15:54:03.455656+00:00 · methodology

Hierarchical Crystal Structure Prediction of Zeolitic Imidazolate Frameworks Using DFT and Machine-Learned Interatomic Potentials

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)