arxiv: 2410.12771 · v1 · submitted 2024-10-16 · ❄️ cond-mat.mtrl-sci · cs.AI· physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

Luis Barroso-Luque , Muhammed Shuaibi , Xiang Fu , Brandon M. Wood , Misko Dzamba , Meng Gao , Ammar Rizvi , C. Lawrence Zitnick

show 1 more author

Zachary W. Ulissi

Authors on Pith no claims yet

Pith reviewed 2026-05-16 23:39 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AIphysics.comp-ph

keywords materials discoverydensity functional theorymachine learninggraph neural networksformation energystability predictionopen datasetEquiformerV2

0 comments

The pith

Open Materials 2024 supplies 110 million DFT calculations and EquiformerV2 models that reach F1 scores above 0.9 for ground-state stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases the OMat24 dataset of over 110 million density functional theory calculations on inorganic materials chosen for structural and compositional diversity. It pairs this data with pre-trained EquiformerV2 models that set new state-of-the-art results on the Matbench Discovery leaderboard. The models predict ground-state stability to an F1 score above 0.9 and formation energies to roughly 20 meV per atom. The authors examine how model size, denoising objectives, and fine-tuning affect performance on OMat24, MPtraj, and Alexandria. The open release is intended to let others build directly on the data and models for faster materials exploration.

Core claim

The OMat24 dataset contains more than 110 million DFT calculations focused on inorganic structural and compositional diversity, and the accompanying EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard while predicting ground-state stability with an F1 score above 0.9 and formation energies with an accuracy of 20 meV/atom.

What carries the argument

EquiformerV2 graph neural network models trained on the large-scale OMat24 DFT dataset, with auxiliary denoising objectives and fine-tuning across multiple materials datasets.

If this is right

High-accuracy stability predictions can be used to screen millions of candidate structures before any DFT or experiment.
Larger models and auxiliary denoising tasks improve accuracy across OMat24, MPtraj, and Alexandria, indicating scalable training strategies.
Open data and models allow fine-tuning on domain-specific datasets to reach usable performance on formation energies and stability.
Community access removes the prior barrier of proprietary training data, enabling faster iteration on AI-assisted materials design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the accuracy holds on experimental benchmarks, the approach could shorten the cycle from composition idea to stable candidate by orders of magnitude for climate-relevant materials.
The same training recipe may extend to other properties such as electronic band gaps or mechanical moduli once additional labels are added to the dataset.
Combining these models with active-learning loops could further reduce the number of expensive DFT calculations needed for new discoveries.

Load-bearing premise

Density functional theory calculations supply accurate enough representations of real material ground-state stabilities and formation energies, and the models generalize reliably to materials outside the training set.

What would settle it

Direct experimental synthesis and stability measurement of several high-confidence predictions for previously unseen compositions, or comparison of model energies against higher-accuracy methods such as quantum Monte Carlo on a held-out test set.

read the original abstract

The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OMat24 is a useful large-scale DFT dataset release with models that reach the top of the external Matbench Discovery leaderboard, but the no-leakage claim needs explicit checks in the full text.

read the letter

The paper releases the OMat24 dataset of over 110 million DFT calculations on inorganic materials plus pre-trained EquiformerV2 models. They report state-of-the-art results on the Matbench Discovery leaderboard for stability (F1 above 0.9) and formation energies (20 meV/atom error). They also test effects from model size, denoising objectives, and fine-tuning on sets like MPtraj and Alexandria. This directly tackles the shortage of public training data that has slowed progress in the area. The open release of both data and models is the practical step forward here, letting others build on the scale and diversity without starting over. The external leaderboard evaluation keeps the numbers from being circular with their own training data. One soft spot is the risk of overlap with the Matbench test sets. The abstract gives no details on deduplication, compositional splits, or fingerprint filtering against the benchmark, and the sources overlap with prior DFT collections. If the full paper shows strict exclusion of test compositions and close analogs, the generalization numbers are solid. If not, the SOTA claim weakens. The usual DFT accuracy limits apply but are not the main issue. This is for groups working on machine learning for materials screening or discovery who need bigger training resources. The dataset itself is concrete enough that it deserves peer review so referees can verify the construction details and splits.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Open Materials 2024 (OMat24) dataset containing over 110 million DFT calculations on inorganic materials, with emphasis on structural and compositional diversity. It releases pre-trained EquiformerV2 models and reports that these achieve state-of-the-art performance on the independent Matbench Discovery leaderboard, with F1 scores above 0.9 for ground-state stability and 20 meV/atom accuracy for formation energies. The work also examines the effects of model scale, auxiliary denoising objectives, and fine-tuning across OMat24, MPtraj, and Alexandria datasets.

Significance. The open release of a large-scale DFT dataset and accompanying pre-trained models is a clear strength that can accelerate community progress in AI-assisted materials discovery. If the reported generalization performance holds, the results would mark a meaningful advance in predictive accuracy for stability and formation energies. The use of an external public leaderboard for evaluation is a positive design choice that reduces circularity risk.

major comments (2)

[Abstract] Abstract: the central claim of SOTA performance (F1 > 0.9 and 20 meV/atom) on Matbench Discovery rests on the assumption of no data leakage, yet the abstract provides no description of deduplication protocols, compositional splits, fingerprint-based filtering, or any exclusion of Matbench test compositions/structures from OMat24.
[Results] Results section (performance reporting): the headline metrics are given without error bars, detailed data-split descriptions, or explicit confirmation that OMat24 construction avoided overlap with the Matbench Discovery test partition, which is load-bearing for the generalization claim given that both draw from the same inorganic DFT ecosystem.

minor comments (1)

[Methods] The manuscript would benefit from a dedicated subsection or table summarizing the exact train/validation/test splits used for OMat24 and any hyperparameter selection procedures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater transparency around data leakage prevention and performance reporting. These points strengthen the manuscript, and we have revised the abstract and results section to address them directly while preserving the original scientific claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of SOTA performance (F1 > 0.9 and 20 meV/atom) on Matbench Discovery rests on the assumption of no data leakage, yet the abstract provides no description of deduplication protocols, compositional splits, fingerprint-based filtering, or any exclusion of Matbench test compositions/structures from OMat24.

Authors: We agree that the abstract should explicitly reference the deduplication steps to support the generalization claim. In the revised version we have added one sentence to the abstract stating that OMat24 was constructed with compositional and structural deduplication (via fingerprint-based filtering and exclusion of Matbench test compositions) to ensure no overlap with the Matbench Discovery test partition. These protocols were already described in the Methods and SI of the original submission; the revision simply makes them visible at the abstract level. revision: yes
Referee: [Results] Results section (performance reporting): the headline metrics are given without error bars, detailed data-split descriptions, or explicit confirmation that OMat24 construction avoided overlap with the Matbench Discovery test partition, which is load-bearing for the generalization claim given that both draw from the same inorganic DFT ecosystem.

Authors: We accept the referee’s observation. The revised results section now includes (i) error bars on all headline F1 and MAE values, (ii) an expanded paragraph detailing the train/validation/test splits and the exact filtering criteria applied during OMat24 construction, and (iii) an explicit statement confirming that no Matbench Discovery test compositions or structures were included in OMat24. These additions were moved from the SI into the main text for clarity; the underlying data-handling procedures remain unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces the OMat24 dataset of 110M DFT calculations and reports EquiformerV2 model performance on the external Matbench Discovery leaderboard (F1 > 0.9, 20 meV/atom). No load-bearing step reduces to a self-definition, fitted parameter renamed as prediction, or self-citation chain; the benchmark evaluation is independent of the training data construction described. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The contribution rests on the empirical generation of a large DFT dataset using standard computational methods and the training of ML models on that data; no new theoretical entities or derivations are introduced.

free parameters (1)

EquiformerV2 model hyperparameters and training schedule
Standard neural network training choices that are tuned on the dataset but not derived from first principles.

axioms (1)

domain assumption Density functional theory calculations yield sufficiently accurate ground-state energies and stabilities for inorganic materials
The entire dataset is produced with DFT; this is a standard but approximate method whose accuracy limits are well-known in the field.

pith-pipeline@v0.9.0 · 5579 in / 1315 out tokens · 54553 ms · 2026-05-16T23:39:17.163341+00:00 · methodology

discussion (0)

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SLayerGen: a Crystal Generative Model for all Space and Layer Groups
cond-mat.mtrl-sci 2026-05 unverdicted novelty 8.0

SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting ...
Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows
cs.LG 2026-05 unverdicted novelty 7.0

Lang2MLIP is an LLM multi-agent framework that automates end-to-end development of machine learning interatomic potentials from natural language input for heterogeneous materials systems.
Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials
cs.DC 2026-04 conditional novelty 7.0

MatRIS-MoE and Janus enable efficient exascale training of billion-parameter universal interatomic potentials by addressing second-order derivative computation and communication overheads.
CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models
cond-mat.mtrl-sci 2026-05 unverdicted novelty 6.0

CrystalREPA closes the representation gap between crystal generators and universal MLIPs via contrastive alignment, yielding more stable and valid generated crystals while revealing that MLIP teacher quality is better...
Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning
cs.LG 2026-05 unverdicted novelty 6.0

Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on m...
MatterSim-MT: A multi-task foundation model for in silico materials characterization
cond-mat.mtrl-sci 2026-05 unverdicted novelty 6.0

MatterSim-MT is a foundation model pretrained on over 35 million first-principles structures that predicts material structure, dynamics, and thermodynamics while enabling multi-task simulations of phonon splitting, fe...
Density diversity in training data governs thermodynamic transferability of machine learning interatomic potentials
physics.chem-ph 2026-05 unverdicted novelty 6.0

Density diversity in training data is the key factor for making machine learning interatomic potentials transferable across thermodynamic states, outperforming temperature diversity.
VibroML: an automated toolkit for high-throughput vibrational analysis and dynamic instability remediation of crystalline materials using machine-learned potentials
cond-mat.mtrl-sci 2026-04 unverdicted novelty 6.0

VibroML automates remediation of dynamic instabilities in crystalline materials by combining MLIPs with genetic algorithms for polymorph search, finite-temperature MD validation, and compositional alloying to yield st...
Errors that matter: Uncertainty-aware universal machine-learning potentials calibrated on experiments
physics.chem-ph 2026-04 conditional novelty 6.0

PET-UAFD ensemble of ML potentials, calibrated on experimental cohesive energies and moduli, matches experimental accuracy on liquid properties and supplies uncertainty estimates via the PET-EXP protocol.
Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductor Discovery
cs.LG 2026-04 unverdicted novelty 6.0

An agentic framework fusing large atomic and language models rediscovers 66 known superconductors and guides experimental verification of four new ones with transition temperatures from 2.5 K to 6.5 K.
AI-Driven Expansion and Application of the Alexandria Database
cond-mat.mtrl-sci 2025-12 accept novelty 6.0

A combined generative model, ML potential, and graph neural network pipeline expands the Alexandria database by 1.3 million DFT-validated compounds with 99% success near the convex hull and releases training data for ...
An experimentally validated end-to-end framework for operando modeling of intrinsically complex metallosilicates
cond-mat.mtrl-sci 2025-12 conditional novelty 6.0

An end-to-end framework combining domain separation, lightweight ML potentials, and de novo in silico synthesis enables quantitative atomistic modeling of mesoporous metallosilicates that matches experimental densitie...
Systematic Fine-Tuning of MACE Interatomic Potentials for Catalysis
physics.chem-ph 2026-05 conditional novelty 5.0

Fine-tuned MACE MLIPs achieve lower mean absolute errors on catalytic reaction energies and barriers than from-scratch models, with a large fine-tuned model performing best on both metallic and oxide systems including...
OptiMat Alloys: a FAIR, living database of multi-principal element alloys enabled by a conversational agent
cond-mat.mtrl-sci 2026-04 unverdicted novelty 5.0

OptiMat Alloys is a conversational AI system that maintains a living FAIR database of multi-principal element alloy calculations and enables natural-language, on-demand computations with built-in uncertainty checks.
Accuracy and Efficiency Benchmarks of Pretrained Machine Learning Potentials for Molecular Simulations
physics.chem-ph 2026-01 unverdicted novelty 5.0

Benchmarks of 15 MLIPs show parameter count and training set size correlate with accuracy, architecture drives speed and memory, and explicit Coulomb terms provide no benefit.
Comparing the latent features of universal machine-learning interatomic potentials
physics.chem-ph 2025-12 unverdicted novelty 5.0

Different uMLIPs encode chemical space in distinct ways, with high cross-model feature reconstruction errors, and fine-tuning preserves strong pre-training bias in the latent features.
Assessing foundational atomistic models for iron alloys under Earth's core conditions
physics.geo-ph 2026-05 unverdicted novelty 4.0

Foundational atomistic models reproduce some structural and dynamical properties of iron alloys under core conditions but none consistently match first-principles benchmarks due to missing explicit treatment of therma...
Accurate and Efficient Interatomic Potentials for Dislocations in InP
cond-mat.mtrl-sci 2026-04 unverdicted novelty 4.0

New ACE and MACE potentials for InP achieve at most 4% error on partial dislocation formation energies versus DFT, outperforming literature models by factors of 4-12 while being computationally faster.
Machine Learning Interatomic Potentials for Million-Atom Simulations of Multicomponent Alloys
cond-mat.mtrl-sci 2026-04 unverdicted novelty 4.0

GRACE MLIPs train faster and predict alloy properties more accurately than NEP, but NEP's 60-fold speed advantage enables reliable million-atom simulations of shock propagation when paired with ensemble uncertainty qu...
Inverse Design of Inorganic Compounds with Generative AI
physics.chem-ph 2026-04 unverdicted novelty 2.0

A review of generative AI for inverse design of inorganic compounds, analyzing adaptations for their complexity in composition, geometry, symmetry, and electronic structure, with discussion of future benchmarks and sy...

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · cited by 20 Pith papers · 1 internal anchor

[1]

Lawrence Zitnick, and Zachary Ulissi

Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, and Zachary Ulissi. Open catalyst 2020 (oc20) dataset and community challenges.ACS Catalysis, 11(10):6059–6072, 2021

work page 2020
[2]

An introduction to electrocatalyst design using machine learning for renewable energy storage.arXiv preprint arXiv:2010.09435, 2020

C Lawrence Zitnick, Lowik Chanussot, Abhishek Das, Siddharth Goyal, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Thibaut Lavril, Aini Palizhati, Morgane Riviere, et al. An introduction to electrocatalyst design using machine learning for renewable energy storage.arXiv preprint arXiv:2010.09435, 2020

work page arXiv 2010
[3]

Robust and synthesizable photocatalysts for co2 reduction: a data-driven materials discovery.Nature Communications, 10(1):443, 2019

Arunima K Singh, Joseph H Montoya, John M Gregoire, and Kristin A Persson. Robust and synthesizable photocatalysts for co2 reduction: a data-driven materials discovery.Nature Communications, 10(1):443, 2019

work page 2019
[4]

Brabson, Abhishek Das, Zachary Ulissi, Matt Uyttendaele, Andrew J

Anuroop Sriram, Sihoon Choi, Xiaohan Yu, Logan M. Brabson, Abhishek Das, Zachary Ulissi, Matt Uyttendaele, Andrew J. Medford, and David S. Sholl. The open dac 2023 dataset and challenges for sorbent discovery in direct air capture.ACS Central Science, 10(5):923–941, 2024

work page 2023
[5]

Recent advances and applications of deep learning methods in materials science.npj Computational Materials, 8(1):59, 2022

Kamal Choudhary, Brian DeCost, Chi Chen, Anubhav Jain, Francesca Tavazza, Ryan Cohn, Cheol Woo Park, Alok Choudhary, Ankit Agrawal, Simon JL Billinge, et al. Recent advances and applications of deep learning methods in materials science.npj Computational Materials, 8(1):59, 2022

work page 2022
[6]

Ceder, Y

G. Ceder, Y. M. Chiang, D. R. Sadoway, M. K. Aydinol, Y. I. Jang, and B. Huang. Identification of cathode materials for lithium batteries guided by first-principles calculations.Nature, 392(6677):694–696, 1998

work page 1998
[7]

Tibbitt, Christopher B

Mark W. Tibbitt, Christopher B. Rodell, Jason A. Burdick, and Kristi S. Anseth. Progress in material design for biomedical applications.Proceedings of the National Academy of Sciences, 112(47):14444–14451, 2015

work page 2015
[8]

Computational materials science and chemistry: Accelerating discovery and innovation through simulation-based engineering and science

George Crabtree, Sharon Glotzer, Bill McCurdy, and Jim Roberto. Computational materials science and chemistry: Accelerating discovery and innovation through simulation-based engineering and science. Technical report, United States, 2010

work page 2010
[9]

Accelerated discovery of 3d printing materials using data-driven multiobjective optimization.Science advances, 7(42):eabf7435, 2021

Timothy Erps, Michael Foshey, Mina Konaković Luković, Wan Shou, Hanns Hagen Goetzke, Herve Dietsch, Klaus Stoll, Bernhard von Vacano, and Wojciech Matusik. Accelerated discovery of 3d printing materials using data-driven multiobjective optimization.Science advances, 7(42):eabf7435, 2021

work page 2021
[10]

High-throughput computational materials screening and discovery of optoelectronic semiconductors.WIREs Computational Molecular Science, 11(1):e1489, 2021

Shulin Luo, Tianshu Li, Xinjiang Wang, Muhammad Faizan, and Lijun Zhang. High-throughput computational materials screening and discovery of optoelectronic semiconductors.WIREs Computational Molecular Science, 11(1):e1489, 2021

work page 2021
[11]

Pogue, Alexander New, Kyle McElroy, Nam Q

Elizabeth A. Pogue, Alexander New, Kyle McElroy, Nam Q. Le, Michael J. Pekala, Ian McCue, Eddie Gienger, Janna Domenico, Elizabeth Hedrick, Tyrel M. McQueen, Brandon Wilfong, Christine D. Piatko, Christopher R. Ratto, Andrew Lennon, Christine Chung, Timothy Montalbano, Gregory Bassen, and Christopher D. Stiles. Closed-loop superconducting materials discov...

work page 2023
[12]

A universal graph deep learning interatomic potential for the periodic table

Chi Chen and Shyue Ping Ong. A universal graph deep learning interatomic potential for the periodic table. Nature Computational Science, 2(11):718–728, 2022

work page 2022
[13]

Atomistic line graph neural network for improved materials property predictions

Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials, 7(1):185, 2021

work page 2021
[14]

Ilyes Batatia, David Peter Kovacs, Gregor N. C. Simm, Christoph Ortner, and Gabor Csanyi. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. 11

work page 2022
[15]

The design space of e (3)-equivariant atom-centered interatomic potentials.arXiv preprint arXiv:2205.06643, 2022

Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor NC Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, and Gábor Csányi. The design space of e (3)-equivariant atom-centered interatomic potentials.arXiv preprint arXiv:2205.06643, 2022

work page arXiv 2022
[16]

Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data- efficient and accurate interatomic potentials.Nature Communications, 13(1):2453, 2022

work page 2022
[17]

Battaglia

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020

work page 2020
[18]

Lawrence Zitnick, and Abhishek Das

Johannes Gasteiger, Muhammed Shuaibi, Anuroop Sriram, Stephan Günnemann, Zachary Ward Ulissi, C. Lawrence Zitnick, and Abhishek Das. Gemnet-OC: Developing graph neural networks for large and diverse molecular simulation datasets.Transactions on Machine Learning Research, 2022

work page 2022
[19]

Reducing so (3) convolutions to so (2) for efficient equivariant gnns

Saro Passaro and C Lawrence Zitnick. Reducing so (3) convolutions to so (2) for efficient equivariant gnns. In International Conference on Machine Learning, pages 27420–27438. Proceedings of Machine Learning Research, 2023

work page 2023
[20]

Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations.arXiv preprint arXiv:2306.12059, 2023

Yi-Lun Liao, Brandon Wood, Abhishek Das, and Tess Smidt. Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations.arXiv preprint arXiv:2306.12059, 2023

work page arXiv 2023
[21]

Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of Chemical Theory and Computation, 2024

Yutack Park, Jaesun Kim, Seungwoo Hwang, and Seungwu Han. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of Chemical Theory and Computation, 2024

work page 2024
[22]

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, et al. Mattersim: A deep learning atomistic model across elements, temperatures and pressures. arXiv preprint arXiv:2405.04967, 2024

work page internal anchor Pith review arXiv 2024
[23]

orb-models github repository

Orbital Materials. orb-models github repository. https:// github.com/ orbital-materials/ orb-models, 2024

work page 2024
[24]

Markland

Peter Eastman, Pavan Kumar Behara, David L Dotson, Raimondas Galvelis, John E Herr, Josh T Horton, Yuezhi Mao, John D Chodera, Benjamin P Pritchard, Yuanqing Wang, Gianni De Fabritiis, and Thomas E. Markland. Spice, a dataset of drug-like molecules and peptides for training machine learning potentials. Scientific Data, 10(1):11, 2023

work page 2023
[25]

The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules.Scientific Data, 7(1):134, 2020

Justin S Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Barros, Adrian E Roitberg, Olexandr Isayev, and Sergei Tretiak. The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules.Scientific Data, 7(1):134, 2020

work page 2020
[26]

The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts.ACS Catalysis, 13(5):3066–3084, 2023

Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, et al. The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts.ACS Catalysis, 13(5):3066–3084, 2023

work page 2022
[27]

Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm

Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, and Anubhav Jain. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm. npj Computational Materials, 6(1):138, 2020

work page 2020
[28]

A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A. Persson. The Materials Project: A materials genome approach to accelerating materials innovation.APL Materials, 1(1):011002, 2013

work page 2013
[29]

Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling

Bowen Deng, Peichen Zhong, KyuJung Jun, Janosh Riebesell, Kevin Han, Christopher J Bartel, and Gerbrand Ceder. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nature Machine Intelligence, 5(9):1031–1041, 2023

work page 2023
[30]

Riebesell, R.E

Janosh Riebesell, Rhys EA Goodall, Anubhav Jain, Philipp Benner, Kristin A Persson, and Alpha A Lee. Matbench discovery–an evaluation framework for machine learning crystal stability prediction.arXiv preprint arXiv:2308.14920, 2023. 12

work page arXiv 2023
[32]

Improving machine-learning models in materials science through large datasets

Jonathan Schmidt, Tiago FT Cerqueira, Aldo H Romero, Antoine Loew, Fabian Jäger, Hai-Chen Wang, Silvana Botti, and Miguel AL Marques. Improving machine-learning models in materials science through large datasets. Materials Today Physics, page 101560, 2024

work page 2024
[33]

A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL Materials, 1(1), 2013

work page 2013
[35]

Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks.Nature Communications, 12(1):5104, 2021

Daniel Schwalbe-Koda, Aik Rui Tan, and Rafael Gómez-Bombarelli. Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks.Nature Communications, 12(1):5104, 2021

work page 2021
[36]

Exploiting redundancy in large materials datasets for efficient machine learning with less data

Kangming Li, Daniel Persaud, Kamal Choudhary, Brian DeCost, Michael Greenwood, and Jason Hattrick- Simpers. Exploiting redundancy in large materials datasets for efficient machine learning with less data. Nature Communications, 14(1):7283, 2023

work page 2023
[37]

Ab initio molecular-dynamics simulation of the liquid-metal–amorphous- semiconductor transition in germanium.Physical Review B, 49(20):14251–14269, 1994

Georg Kresse and Jürgen Hafner. Ab initio molecular-dynamics simulation of the liquid-metal–amorphous- semiconductor transition in germanium.Physical Review B, 49(20):14251–14269, 1994

work page 1994
[38]

Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set.Physical Review B, 54(16):11169–11186, 1996

Georg Kresse and Jürgen Furthmüller. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set.Physical Review B, 54(16):11169–11186, 1996

work page 1996
[39]

Python materials genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science, 68:314–319, 2013

Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent L Chevrier, Kristin A Persson, and Gerbrand Ceder. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science, 68:314–319, 2013

work page 2013
[40]

Issue #2968: Add support for new crystal structure prediction method

Materials Project. Issue #2968: Add support for new crystal structure prediction method. https: //github.com/materialsproject/pymatgen/issues/2968, 2023

work page 2023
[41]

Issue #3016: Add support for new crystal structure prediction method

Materials Project. Issue #3016: Add support for new crystal structure prediction method. https: //github.com/materialsproject/pymatgen/issues/3016, 2023

work page 2023
[42]

Generalized gradient approximation made simple

John P Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized gradient approximation made simple. Physical Review Letters, 77(18):3865, 1996

work page 1996
[43]

Restoring the density-gradient expansion for exchange in solids and surfaces.Physical review letters, 100(13):136406, 2008

John P Perdew, Adrienn Ruzsinszky, Gábor I Csonka, Oleg A Vydrov, Gustavo E Scuseria, Lucian A Constantin, Xiaolan Zhou, and Kieron Burke. Restoring the density-gradient expansion for exchange in solids and surfaces.Physical review letters, 100(13):136406, 2008

work page 2008
[44]

Strongly constrained and appropriately normed semilocal density functional.Physical Review Letters, 115(3):036402, 2015

Jianwei Sun, Adrienn Ruzsinszky, and John P Perdew. Strongly constrained and appropriately normed semilocal density functional.Physical Review Letters, 115(3):036402, 2015

work page 2015
[45]

Accurate and numerically efficient r2scan meta-generalized gradient approximation.The Journal of Physical Chemistry Letters, 11(19):8208–8215, 2020

James W Furness, Aaron D Kaplan, Jinliang Ning, John P Perdew, and Jianwei Sun. Accurate and numerically efficient r2scan meta-generalized gradient approximation.The Journal of Physical Chemistry Letters, 11(19):8208–8215, 2020

work page 2020
[46]

Predicting stable crystalline compounds using chemical similarity.npj Computational Materials, 7(1):12, 2021

Hai-Chen Wang, Silvana Botti, and Miguel AL Marques. Predicting stable crystalline compounds using chemical similarity.npj Computational Materials, 7(1):12, 2021

work page 2021
[47]

The aflow library of crystallographic prototypes: part 1.Computational Materials Science, 136:S1–S828, 2017

Michael J Mehl, David Hicks, Cormac Toher, Ohad Levy, Robert M Hanson, Gus Hart, and Stefano Curtarolo. The aflow library of crystallographic prototypes: part 1.Computational Materials Science, 136:S1–S828, 2017

work page 2017
[48]

Rapid discovery of stable materials by coordinate-free coarse graining.Science Advances, 8(30):eabn4117, 2022

Rhys EA Goodall, Abhijith S Parackal, Felix A Faber, Rickard Armiento, and Alpha A Lee. Rapid discovery of stable materials by coordinate-free coarse graining.Science Advances, 8(30):eabn4117, 2022

work page 2022
[49]

Graph networks as a universal machine learning framework for molecules and crystals.Chemistry of Materials, 31(9):3564–3572, 2019

Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong. Graph networks as a universal machine learning framework for molecules and crystals.Chemistry of Materials, 31(9):3564–3572, 2019. 13

work page 2019
[50]

A foundation model for atomistic materials chemistry

Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M Elena, Dávid P Kovács, Janosh Riebesell, Xavier R Advincula, Mark Asta, William J Baldwin, Noam Bernstein, et al. A foundation model for atomistic materials chemistry.arXiv preprint arXiv:2401.00096, 2023

work page arXiv 2023
[51]

Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

work page 2023
[52]

A critical examination of compound stability predictions from machine-learned formation energies.npj Computational Materials, 6(1):97, 2020

Christopher J Bartel, Amalie Trewartha, Qi Wang, Alexander Dunn, Anubhav Jain, and Gerbrand Ceder. A critical examination of compound stability predictions from machine-learned formation energies.npj Computational Materials, 6(1):97, 2020

work page 2020
[53]

A hitchhiker’s guide to geometric gnns for 3d atomic systems.arXiv preprint arXiv:2312.07511, 2023

Alexandre Duval, Simon V Mathis, Chaitanya K Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D Malliaros, Taco Cohen, Pietro Lio, Yoshua Bengio, and Michael Bronstein. A hitchhiker’s guide to geometric gnns for 3d atomic systems.arXiv preprint arXiv:2312.07511, 2023

work page arXiv 2023
[54]

The oc20 leaderboard

Meta Fundamental AI Research and collaborators. The oc20 leaderboard. https:// opencatalystproject.org/ leaderboard.html, 2024

work page 2024
[55]

Generalizing denoising to non-equilibrium structures improves equivariant force fields.arXiv preprint arXiv:2403.09549, 2024

Yi-Lun Liao, Tess Smidt, and Abhishek Das. Generalizing denoising to non-equilibrium structures improves equivariant force fields.arXiv preprint arXiv:2403.09549, 2024

work page arXiv 2024
[56]

The omat24 dataset

Meta Fundamental AI Research. The omat24 dataset. https:// huggingface.co/ datasets/ fairchem/ OMAT24, 2024

work page 2024
[57]

The fair chemistry (fairchem) model repository

Meta Fundamental AI Research and Collaborators. The fair chemistry (fairchem) model repository. https:// github.com/ F AIR-Chem/ fairchem, 2024

work page 2024
[58]

The omat24 trained model checkpoints.https:// huggingface.co/ fairchem/ OMAT24, 2024

Meta Fundamental AI Research. The omat24 trained model checkpoints.https:// huggingface.co/ fairchem/ OMAT24, 2024

work page 2024
[59]

Review of computational approaches to predict the thermodynamic stability of inorganic solids

Christopher J Bartel. Review of computational approaches to predict the thermodynamic stability of inorganic solids. Journal of Materials Science, 57(23):10475–10498, 2022

work page 2022
[60]

The thermodynamic scale of inorganic crystalline metastability.Science Advances, 2(11):e1600225, 2016

Wenhao Sun, Stephen T Dacek, Shyue Ping Ong, Geoffroy Hautier, Anubhav Jain, William D Richards, Anthony C Gamst, Kristin A Persson, and Gerbrand Ceder. The thermodynamic scale of inorganic crystalline metastability.Science Advances, 2(11):e1600225, 2016

work page 2016
[61]

Ryan Kingsbury, Ayush S Gupta, Christopher J Bartel, Jason M Munro, Shyam Dwaraknath, Matthew Horton, and Kristin A Persson. Performance comparison of r2-scan and scan metagga density functionals for solid materials via an automated, high-throughput computational workflow.Physical Review Materials, 6(1):013801, 2022

work page 2022
[62]

Materials project database versions.https:// docs.materialsproject.org/ changes/ database-versions, 2024

The Materials Project. Materials project database versions.https:// docs.materialsproject.org/ changes/ database-versions, 2024

work page 2024
[63]

How robust are modern graph neural network potentials in long and hot molecular dynamics simulations? Machine Learning: Science and Technology, 3(4):045010, 2022

Sina Stocker, Johannes Gasteiger, Florian Becker, Stephan Günnemann, and Johannes T Margraf. How robust are modern graph neural network potentials in long and hot molecular dynamics simulations? Machine Learning: Science and Technology, 3(4):045010, 2022

work page 2022
[64]

Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations.arXiv preprint arXiv:2210.07237, 2022

Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez-Bombarelli, and Tommi Jaakkola. Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations.arXiv preprint arXiv:2210.07237, 2022

work page arXiv 2022
[65]

S. Raja, I. Amin, F. Pedregosa, and A. S. Krishnapriyan. Stability-aware training of neural network interatomic potentials with differentiable boltzmann estimators.arXiv preprint arXiv:2402.13984, 2024

work page arXiv 2024
[66]

Crystal toolkit: A web app framework to improve usability and accessibility of materials science research algorithms.arXiv preprint arXiv:2302.06147, 2023

Matthew Horton, Jimmy-Xuan Shen, Jordan Burns, Orion Cohen, François Chabbey, Alex M Ganose, Rishabh Guha, Patrick Huck, Hamming Howard Li, Matthew McDermott, et al. Crystal toolkit: A web app framework to improve usability and accessibility of materials science research algorithms.arXiv preprint arXiv:2302.06147, 2023

work page arXiv 2023
[67]

Janosh Riebesell, Haoyu Yang, Rhys Goodall, and Sterling G. Baird. Pymatviz: visualization toolkit for materials informatics, 2022. 10.5281/zenodo.7486816 - https://github.com/janosh/pymatviz. 14 Appendix A Dataset statistics Figures 4 shows histograms for the number of atoms per structure in the sub-datasets that make up the OMat24 dataset. Similarly fig...

work page doi:10.5281/zenodo.7486816 2022