Benchmarking Universal Machine-Learned Interatomic Potentials for High-Temperature Metal-Organic Framework Chemistry
Pith reviewed 2026-05-07 16:11 UTC · model grok-4.3
The pith
Long MD simulations reveal that universal interatomic potentials have much larger errors in high-temperature MOF dynamics than their static losses suggest.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a new high-temperature AIMD benchmarking dataset for nine MOFs, the paper shows that leading uMLIPs achieve low errors on static properties but exhibit significantly larger errors when used to generate long molecular dynamics trajectories at elevated temperatures, demonstrating their limitations for simulating high-temperature MOF chemistry.
What carries the argument
The high-temperature AIMD trajectories dataset for MOFs at multiple temperatures, employed to quantify the difference between static validation errors and generative errors in uMLIPs.
Load-bearing premise
The 40 ps AIMD trajectories at 300-2000 K for the nine selected MOFs provide a representative ground truth for evaluating uMLIP performance in high-temperature regimes including early decomposition.
What would settle it
Running long MD with a uMLIP and finding that the simulated structures and decomposition events closely match new AIMD data at 2000 K would support the claim of adequacy, whereas persistent large deviations would falsify it.
Figures
read the original abstract
Universal machine-learned interatomic potentials (uMLIPs) offer a promising approach to performing atomistic simulations at near-DFT accuracy with greatly reduced computational cost. Here, we present a new high-temperature benchmarking dataset of 40~ps ab~initio molecular dynamics (AIMD) trajectories simulated at 300, 1000, and 2000 K for nine zinc- and zirconium-based metal-organic frameworks (MOFs): ZIF-8, CALF-20, MOF-10, MOF-5, MIP-206, UiO-66, UiO-67, UiO-66-NH2, and NU-1000. These trajectories capture equilibrium dynamics, thermally induced distortions, and early-stage decomposition events, including linker degradation and metal node aggregation. Subsequently, we use this dataset to benchmark five leading uMLIPs: ORB-v3, MACE-MP-0a, MACE-MPA-0, fairchem ODAC23, and fairchem OMAT. Our results reveal that ORB-v3 and fairchem OMAT achieve the lowest energy, force, and stress errors across all temperatures. However, all models exhibit significant error under high-temperature conditions. Long-timescale molecular dynamics simulations produced with ORB-v3 demonstrate that the generative error of uMLIPs far exceeds model losses captured during static validation, highlighting the limitations of current universal models for simulating high-temperature MOF dynamics. This work provides a benchmark for assessing the robustness of uMLIPs in extreme regimes and guides future development of potentials capable of accurately modeling the chemistry of high-temperature MOF dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a new dataset of 40 ps AIMD trajectories at 300 K, 1000 K, and 2000 K for nine Zn- and Zr-based MOFs (ZIF-8, UiO-66, NU-1000, etc.) that capture equilibrium dynamics and early decomposition. It benchmarks five uMLIPs (ORB-v3, MACE-MP-0a, MACE-MPA-0, fairchem ODAC23, fairchem OMAT) on energy/force/stress errors, reports that ORB-v3 and fairchem OMAT perform best yet all degrade at high T, and uses long-timescale ORB-v3 MD to argue that generative errors substantially exceed static-validation losses, thereby exposing limitations of current universal potentials for high-temperature MOF chemistry.
Significance. If the central claims hold, the work is significant because it supplies the first dedicated high-temperature AIMD benchmark focused on MOF thermal stability and decomposition, a regime relevant to catalysis and materials processing. The dataset itself is a reusable contribution, and the demonstration that static losses underestimate generative errors in long MD provides a concrete caution for uMLIP users and developers working on extreme-condition simulations.
major comments (2)
- [Methods (AIMD trajectory generation)] Methods section on AIMD protocol: the 40 ps trajectories at 2000 K are presented as ground truth for both quantitative error metrics and identification of early decomposition (linker degradation, node aggregation). At these temperatures such events are rare and initial-condition dependent; the absence of reported equilibration times, block averaging, multiple independent runs, or convergence diagnostics means the reference data may be undersampled, directly affecting the claimed contrast between generative and static errors.
- [Results (long-timescale simulations)] Results section on long-timescale ORB-v3 MD: the central claim that 'generative error ... far exceeds model losses captured during static validation' is load-bearing yet lacks an explicit definition of the generative-error metric, the time window over which it is evaluated, and how deviations are measured against the 40 ps AIMD references; without these details the quantitative contrast cannot be assessed.
minor comments (3)
- [Abstract and Results] Abstract and §3 (benchmarking results): error metrics (MAE vs. RMSE) and whether error bars or standard deviations across frames/trajectories are reported should be stated explicitly.
- [Figures] Figures showing temperature-dependent errors and decomposition events: axis labels, color scales, and legends should be enlarged for readability; inclusion of per-MOF variability would improve clarity.
- [Introduction] The manuscript should cite prior uMLIP benchmarks on MOFs or high-temperature dynamics to better situate the novelty of the 40 ps high-T dataset.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments have prompted us to clarify methodological details and strengthen the presentation of our central claims. We provide point-by-point responses below.
read point-by-point responses
-
Referee: [Methods (AIMD trajectory generation)] Methods section on AIMD protocol: the 40 ps trajectories at 2000 K are presented as ground truth for both quantitative error metrics and identification of early decomposition (linker degradation, node aggregation). At these temperatures such events are rare and initial-condition dependent; the absence of reported equilibration times, block averaging, multiple independent runs, or convergence diagnostics means the reference data may be undersampled, directly affecting the claimed contrast between generative and static errors.
Authors: We agree that additional protocol details are required for full reproducibility and to address potential undersampling. In the revised Methods section we now report the equilibration protocol (5 ps NVT equilibration prior to the 40 ps production run), block averaging over 4 ps windows for error bars on energies and forces, and convergence diagnostics consisting of total-energy drift (< 0.5 meV/atom/ps) and stability of metal–linker radial distribution functions. However, only single trajectories were generated for most 2000 K cases owing to the substantial cost of AIMD; we have added an explicit limitations paragraph acknowledging that decomposition events remain initial-condition dependent and that the trajectories capture representative early-stage behavior rather than statistically converged rates. This revision does not alter the reported static-error trends but tempers the interpretation of decomposition statistics. revision: partial
-
Referee: [Results (long-timescale simulations)] Results section on long-timescale ORB-v3 MD: the central claim that 'generative error ... far exceeds model losses captured during static validation' is load-bearing yet lacks an explicit definition of the generative-error metric, the time window over which it is evaluated, and how deviations are measured against the 40 ps AIMD references; without these details the quantitative contrast cannot be assessed.
Authors: We accept that the original text did not define the generative-error metric with sufficient precision. The revised Results section now contains an explicit definition: generative error is quantified as the time-averaged root-mean-square deviation of atomic positions together with the integrated absolute difference in metal–linker radial distribution functions, both evaluated over the 10–100 ps production window of the ORB-v3 trajectories and compared directly to the same quantities extracted from the corresponding 40 ps AIMD reference runs. We have added a dedicated paragraph and supplementary figure that report these values, showing generative errors 3–5 times larger than the static validation losses. This definition and the associated quantitative comparison are now fully specified. revision: yes
Circularity Check
No circularity: benchmarking uses independent AIMD reference trajectories
full rationale
The paper constructs a new 40 ps AIMD dataset at multiple temperatures for nine MOFs and directly compares uMLIP predictions (energy, force, stress errors plus long-timescale MD behavior) against this external reference. No equations, fitted parameters, or predictions are shown to reduce to the same inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes imported from prior author work appear in the derivation chain. The central contrast between static validation losses and generative MD errors is computed from the held-out AIMD data and separate long ORB-v3 runs, keeping the evaluation self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Ab initio molecular dynamics trajectories provide reliable ground-truth atomic forces and energies at the chosen temperatures
- domain assumption The selected zinc- and zirconium-based MOFs and their decomposition pathways are representative of high-temperature MOF chemistry
Reference graph
Works this paper leans on
-
[2]
Amorphousmetal–organicframeworks
ThomasD.BennettandAnthonyK.Cheetham. Amorphousmetal–organicframeworks. Accounts of Chemical Research, 47(5):1555–1562, April 2014
work page 2014
-
[3]
Cordova, Michael O’Keeffe, and Omar M
Hiroyasu Furukawa, Kyle E. Cordova, Michael O’Keeffe, and Omar M. Yaghi. The chemistry and applications of metal-organic frameworks.Science, 341(6149), August 2013
work page 2013
-
[4]
Yaghi, Michael O’Keeffe, Nathan W
Omar M. Yaghi, Michael O’Keeffe, Nathan W. Ockwig, Hee K. Chae, Mohamed Ed- daoudi, and Jaheon Kim. Reticular synthesis and the design of new materials.Nature, 423(6941):705–714, June 2003
work page 2003
-
[6]
Hannelore Konnerth, Babasaheb M. Matsagar, Season S. Chen, Martin H.G. Prechtl, Fa-Kuen Shieh, and Kevin C.-W. Wu. Metal-organic framework (mof)-derived catalysts for fine chemical production.Coordination Chemistry Reviews, 416:213319, August 2020
work page 2020
-
[7]
Songsong Li, Yangqin Gao, Ning Li, Lei Ge, Xianhui Bu, and Pingyun Feng. Transi- tion metal-based bimetallic mofs and mof-derived catalysts for electrochemical oxygen evolution reaction.Energy & Environmental Science, 14(4):1897–1927, 2021
work page 1927
-
[8]
Linder-Patton, Lizhuo Wang, Jack D
Oliver M. Linder-Patton, Lizhuo Wang, Jack D. Evans, Nor Hafizah Yasin, Nor Hafizah Berahim-Jusoh, Siqi Li, Jun Huang, Chan Zhe Phak, Akbar A. Seman, Christopher J. Sumby, and Christian J. Doonan. Understanding the role of the zr-mof support struc- ture on templated ternary co2 hydrogenation catalyst structure and activity.ACS Applied Materials & Interfac...
work page 2025
-
[9]
Bau, Adrian Ramirez, and Jorge Gascon
Anastasiya Bavykina, Nikita Kolobov, Il Son Khan, Jeremy A. Bau, Adrian Ramirez, and Jorge Gascon. Metal–organic frameworks in heterogeneous catalysis: Recent progress, new trends, and future perspectives.Chemical Reviews, 120(16):8468–8535, March 2020
work page 2020
-
[10]
Nicolas Castel and François-Xavier Coudert. Atomistic models of amorphous metal–organic frameworks.The Journal of Physical Chemistry C, 126(16):6905–6914, April 2022
work page 2022
-
[11]
D. A. Keen and T. D. Bennett. Structural investigations of amorphous metal–organic frameworks formed via different routes.Physical Chemistry Chemical Physics, 20(11):7857–7861, 2018
work page 2018
-
[12]
Remo N. Widmer, Giulio I. Lampronti, Siwar Chibani, Craig W. Wilson, Simone Anzellini, Stefan Farsang, Annette K. Kleppe, Nicola P. M. Casati, Simon G. MacLeod, Simon A. T. Redfern, François-Xavier Coudert, and Thomas D. Bennett. Rich poly- morphism of a metal–organic framework in pressure–temperature space.Journal of the American Chemical Society, 141(23...
work page 2019
-
[13]
Kui Shen, Xiaodong Chen, Junying Chen, and Yingwei Li. Development of mof-derived carbon-based nanomaterials for efficient catalysis.ACS Catalysis, 6(9):5887–5903, Au- gust 2016
work page 2016
-
[14]
Bing Huang, Guido Falk von Rudorff, and O. Anatole von Lilienfeld. The central role of density functional theory in the ai age.Science, 381(6654):170–175, July 2023
work page 2023
-
[15]
Yifei Yue, Mohamed Saad Aldin, N. Duane Loh, and Jianwen Jiang. Towards a gener- alizable machine-learned potential for metal-organic frameworks. August 2024
work page 2024
-
[16]
Connor W. Edwards and Jack D. Evans. Exploring foundational machine learned potentials for treating the high temperature dynamics of metal-organic frameworks. Advanced Theory and Simulations, October 2025
work page 2025
-
[17]
Evans, and François-Xavier Coudert
Nicolas Castel, Dune André, Connor Edwards, Jack D. Evans, and François-Xavier Coudert. Machine learning interatomic potentials for amorphous zeolitic imidazolate frameworks.Digital Discovery, 3(2):355–368, 2024
work page 2024
-
[18]
Fuchun Ge, Ran Wang, Chen Qu, Peikun Zheng, Apurba Nandi, Riccardo Conte, Paul L. Houston, Joel M. Bowman, and Pavlo O. Dral. Tell machine learning potentials what they are needed for: Simulation-oriented training exemplified for glycine.The Journal of Physical Chemistry Letters, 15(16):4451–4460, April 2024
work page 2024
-
[19]
Pascal Friederich, Florian Häse, Jonny Proppe, and Alán Aspuru-Guzik. Machine- learned potentials for next-generation matter simulations.Nature Materials, 20(6):750–761, May 2021
work page 2021
-
[20]
sAlex dataset used in MACE second-generation models
sAlex: a Matbench-Discovery compliant subsample of the Alexandria dataset.https: //matbench-discovery.materialsproject.org/data/salex, 2024. sAlex dataset used in MACE second-generation models. Accessed 2025-10-30
work page 2024
-
[21]
Figshare dataset (MPtrj) — accessed 2025-10-30
Materials Project Trajectory (MPtrj) dataset.https://figshare.com/articles/ dataset/Materials_Project_Trjectory_MPtrj_Dataset/23713842, 2023. Figshare dataset (MPtrj) — accessed 2025-10-30
-
[22]
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, and Gábor Csányi. The design space of e(3)-equivariant atom-centered interatomic potentials, 2022
work page 2022
-
[23]
Ilyes Batatia, David Peter Kovacs, Gregor N. C. Simm, Christoph Ortner, and Gabor Csanyi. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022
work page 2022
-
[24]
Orb-v3: atomistic simulation at scale, 2025
Benjamin Rhodes, Sander Vandenhaute, Vaidotas Šimkus, James Gin, Jonathan God- win, Tim Duignan, and Mark Neumann. Orb-v3: atomistic simulation at scale, 2025
work page 2025
-
[25]
Muhammed Shuaibi, Abhishek Das, Anuroop Sriram, Misko, Luis Barroso-Luque, Ray Gao, Siddharth Goyal, Zachary Ulissi, Brandon Wood, Tian Xie, Junwoong Yoon, Brook Wander, Adeesh Kolluru, Richard Barnes, Ethan Sunshine, Kevin Tran, Xiang, DanielLevine, NimaShoghi, IliasChair, , JaniceLan, KayleeTian, JosephMusielewicz, clz55, Weihua Hu, , Kyle Michel, willi...
-
[26]
Data-efficient multifidelity training for high-fidelity machine learning interatomic potentials.J
Jaesun Kim, Jisu Kim, Jaehoon Kim, Jiho Lee, Yutack Park, Youngho Kang, and Seungwu Han. Data-efficient multifidelity training for high-fidelity machine learning interatomic potentials.J. Am. Chem. Soc., 147(1):1042–1054, 2024
work page 2024
-
[27]
Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Gerbrand Ceder, Mark Asta, Alpha A. Lee, Anubhav Jain, and Kristin A. Persson. Matbench discovery – a framework to evaluate machine learning crystal stability pre- dictions, 2023
work page 2023
-
[28]
Hendrik Kraß, Ju Huang, and Seyed Mohamad Moosavi. Mofsimbench: evaluating universalmachinelearninginteratomicpotentialsinmetal-organicframeworkmolecular modeling.npj Computational Materials, 12(1), December 2025
work page 2025
-
[29]
Antoine Loew, Dewen Sun, Hai-Chen Wang, Silvana Botti, and Miguel A. L. Mar- ques. Universal machine learning interatomic potentials are ready for phonons.npj Computational Materials, 11(1), June 2025. 11
work page 2025
-
[30]
Konstantin Stracke, Connor W. Edwards, and Jack D. Evans. Evaluating mechanical property prediction across material classes using molecular dynamics simulations with universal machine-learned interatomic potentials, 2025
work page 2025
-
[31]
Connor W. Edwards, Oliver M. Linder-Patton, and Jack D. Evans. Simulations of high temperature decomposition of metal-organic frameworks to form amorphous catalysts, 2026
work page 2026
-
[32]
BowenDeng, YunyeongChoi, PeichenZhong, JanoshRiebesell, ShashwatAnand, Zhuo- han Li, KyuJung Jun, Kristin A. Persson, and Gerbrand Ceder. Systematic softening in universal machine learning interatomic potentials.npj Computational Materials, 11(1), January 2025
work page 2025
-
[33]
AnuroopSriram, LoganM.Brabson, XiaohanYu, SihoonChoi, KareemAbdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, and David S. Sholl. The open dac 2025 dataset for sorbent discovery in direct air capture, 2025
work page 2025
-
[34]
Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, and Zachary W. Ulissi. Open materials 2024 (omat24) inorganic materials dataset and models
work page 2024
-
[35]
Edwards, Fengxu Yang, Konstantin Stracke, and Jack D
Connor W. Edwards, Fengxu Yang, Konstantin Stracke, and Jack D. Evans. Mlip-mc: A framework for adsorption simulations using machine-learned interatomic potentials, 2026
work page 2026
-
[36]
Joost VandeVondele, Matthias Krack, Fawzi Mohamed, Michele Parrinello, Thomas Chassaing, and Jürg Hutter. Quickstep: Fast and accurate density functional calcula- tions using a mixed gaussian and plane waves approach.Computer Physics Communi- cations, 167(2):103–128, April 2005
work page 2005
-
[37]
Kühne, Marcella Iannuzzi, Mauro Del Ben, Vladimir V
Thomas D. Kühne, Marcella Iannuzzi, Mauro Del Ben, Vladimir V. Rybkin, Patrick Seewald, Frederick Stein, Teodoro Laino, Rustam Z. Khaliullin, Ole Schütt, Florian Schiffmann, Dorothea Golze, Jan Wilhelm, Sergey Chulkov, Mohammad Hossein Bani- Hashemian, Valéry Weber, Urban Borštnik, Mathieu Taillefumier, Alice Shoshana Jakobovits, Alfio Lazzaro, Hans Pabst...
work page 2020
-
[38]
S. Goedecker, M. Teter, and J. Hutter. Separable dual-space gaussian pseudopotentials. Physical Review B, 54(3):1703–1710, July 1996
work page 1996
-
[39]
Perdew, Kieron Burke, and Matthias Ernzerhof
John P. Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized gradient approx- imation made simple.Physical Review Letters, 77(18):3865–3868, October 1996
work page 1996
-
[40]
Stefan Grimme, Jens Antony, Stephan Ehrlich, and Helge Krieg. A consistent and accurateab initioparametrization of density functional dispersion correction (dft-d) for the 94 elements h-pu.The Journal of Chemical Physics, 132(15), April 2010
work page 2010
-
[41]
Canonical sampling through velocity rescaling.The Journal of Chemical Physics, 126(1), January 2007
Giovanni Bussi, Davide Donadio, and Michele Parrinello. Canonical sampling through velocity rescaling.The Journal of Chemical Physics, 126(1), January 2007
work page 2007
-
[42]
Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M. Elena, Dávid P. Kovács, Janosh Riebesell, Xavier R. Advincula, Mark Asta, Matthew Avaylon, William J. Baldwin, Fabian Berger, Noam Bernstein, Arghya Bhowmik, Samuel M. Blau, Vlad Cărare, James P. Darby, Sandip De, Flaviano Della Pia, Volker L. Deringer, Rokas Eli- jošius, Zakariya El-Machachi, Fabio Falc...
work page 2024
-
[43]
Orb: A fast, scalable neural net- work potential, 2024
Mark Neumann, James Gin, Benjamin Rhodes, Steven Bennett, Zhiyi Li, Hitarth Choubisa, Arthur Hussey, and Jonathan Godwin. Orb: A fast, scalable neural net- work potential, 2024
work page 2024
-
[44]
Ask Hjorth Larsen, Jens Jørgen Mortensen, Jakob Blomqvist, Ivano E Castelli, Rune Christensen, Marcin Dułak, Jesper Friis, Michael N Groves, Bjørk Hammer, Cory Hargus, Eric D Hermes, Paul C Jennings, Peter Bjerre Jensen, James Kermode, John R Kitchin, Esben Leonhard Kolsbjerg, Joseph Kubal, Kristen Kaasbjerg, Steen Lysgaard, Jón Bergmann Maronsson, Trista...
work page 2017
-
[45]
Nilsen, Søren Jakobsen, Karl Petter Lillerud, and Carlo Lamberti
Loredana Valenzano, Bartolomeo Civalleri, Sachin Chavan, Silvia Bordiga, Merete H. Nilsen, Søren Jakobsen, Karl Petter Lillerud, and Carlo Lamberti. Disclosing the com- plexstructureofuio-66metalorganicframework: Asynergiccombinationofexperiment and theory.Chemistry of Materials, 23(7):1700–1718, March 2011
work page 2011
-
[46]
Jing-Yang Chung, Chi-Wei Liao, Yi-Wei Chang, Bor Kae Chang, Hao Wang, Jing Li, and Cheng-Yu Wang. Influence of metal–organic framework porosity on hydrogen generation from nanoconfined ammonia borane.The Journal of Physical Chemistry C, 121(49):27369–27378, December 2017
work page 2017
-
[47]
Jasmina Hafizovic Cavka, Søren Jakobsen, Unni Olsbye, Nathalie Guillou, Carlo Lam- berti, SilviaBordiga, andKarlPetterLillerud. Anewzirconiuminorganicbuildingbrick forming metal organic frameworks with exceptional stability.Journal of the American Chemical Society, 130(42):13850–13851, September 2008
work page 2008
-
[48]
Sujing Wang, Liyu Chen, Mohammad Wahiduzzaman, Antoine Tissot, Lin Zhou, Ilich A. Ibarra, Aída Gutiérrez-Alejandre, Ji Sun Lee, Jong-San Chang, Zheng Liu, Jérôme Marrot, William Shepard, Guillaume Maurin, Qiang Xu, and Christian Serre. A mesoporous zirconium-isophthalate multifunctional platform.Matter, 4(1):182–194, January 2021
work page 2021
-
[49]
Sander Vandenhaute, Maarten Cools-Ceuppens, Simon DeKeyser, Toon Verstraelen, and Veronique Van Speybroeck. Machine learning potentials for metal-organic frame- works using an incremental learning approach.npj Computational Materials, 9(1), February 2023
work page 2023
-
[50]
Abhishek Sharma and Stefano Sanvito. Quantum-accurate machine learning potentials for metal-organic frameworks using temperature driven active learning.npj Computa- tional Materials, 10(1), October 2024. 13
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.