arxiv: 2512.09169 · v2 · submitted 2025-12-09 · ❄️ cond-mat.mtrl-sci · cs.AI

Recognition: 1 theorem link

· Lean Theorem

AI-Driven Expansion and Application of the Alexandria Database

Th\'eo Cavignac (1) , Jonathan Schmidt (2) , Pierre-Paul De Breuck (1) , Antoine Loew (1) , Tiago F. T. Cerqueira (3) , Hai-Chen Wang (1) , Anton Bochkarev (4) , Yury Lysogorskiy (4)

show 25 more authors

Aldo H. Romero (5) Ralf Drautz (4) Silvana Botti (1) Miguel A. L. Marques (1) ((1) Research Center Future Energy Materials Systems of the University Alliance Ruhr ICAMS Ruhr University Bochum Bochum Germany (2) Department of Materials ETH Z\"urich Z\"urich Switzerland (3) CFisUC Department of Physics University of Coimbra Coimbra Portugal (4) ICAMS Ruhr-Universit\"at Bochum ACEworks GmbH (5) Department of Physics West Virginia University Morgantown USA)

Authors on Pith no claims yet

Pith reviewed 2026-05-16 23:14 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI

keywords materials discoverymachine learning interatomic potentialsconvex hullthermodynamic stabilitydatabase expansiongenerative modelsDFT validation

0 comments

The pith

A multi-stage AI workflow expands the ALEXANDRIA database by 1.3 million DFT-validated compounds at 99% success rate near thermodynamic stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a workflow that combines a generative model, a universal machine learning potential, and a graph neural network to screen 119 million candidate structures for materials discovery. This produces 1.3 million new compounds added to the ALEXANDRIA database after density functional theory validation, including 74 thousand new stable materials. The approach reaches 99% success in finding compounds within 100 meV/atom of the convex hull, tripling the rate of earlier methods. The expanded database now holds 5.8 million structures with 175 thousand on the hull, and the authors release 14 million out-of-equilibrium structures with forces and stresses for training force fields. They also show that fine-tuning an existing model on this data improves its accuracy on standard benchmarks.

Core claim

By chaining the Matra-Genoa generative model with Orb-v2 interatomic potentials and ALIGNN energy predictions, the workflow filters candidates so that 99% of those sent to full DFT calculations lie within 100 meV/atom of thermodynamic stability. This yields 1.3 million newly validated entries and 74 thousand new stable compounds, bringing the ALEXANDRIA database to 5.8 million total structures and 175 thousand on the convex hull. The same pipeline produces a 14-million-structure out-of-equilibrium dataset that, when used to fine-tune a GRACE model, improves benchmark performance. Structural disorder statistics in the new data match experimental databases, and analysis of the hull reveals sub

What carries the argument

The multi-stage filtering pipeline that runs Matra-Genoa generation, Orb-v2 relaxation, and ALIGNN energy ranking before DFT validation.

If this is right

The larger set of stable and near-stable compounds supplies more training data for machine-learning potentials, as demonstrated by the improved GRACE benchmark scores.
Space-group and coordination-environment statistics extracted from the expanded hull can be used to test theories of phase stability networks.
The released 14 million out-of-equilibrium structures with forces and stresses enable training of universal force fields that capture dynamic behavior beyond the convex hull.
Sub-linear growth of convex-hull connectivity with database size implies that exhaustive enumeration of all stable phases may remain computationally tractable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The workflow could be applied to targeted searches for materials with specific functional properties by adding property filters after the stability stage.
Matching experimental disorder rates suggests the generated structures can serve as realistic starting points for finite-temperature simulations.
Releasing the full dataset under open licenses removes a common barrier for smaller research groups that lack access to large-scale DFT resources.

Load-bearing premise

The combined Matra-Genoa, Orb-v2, and ALIGNN filters select candidates without systematic bias, so that 99 percent of those reaching DFT truly lie within 100 meV/atom of the hull.

What would settle it

Perform independent DFT relaxations on a random subset of the 1.3 million newly added compounds and measure whether the fraction within 100 meV/atom of the hull remains at or above 99 percent.

Figures

Figures reproduced from arXiv: 2512.09169 by (2) Department of Materials, (3) CFisUC, (4) ICAMS, (5) Department of Physics, ACEworks GmbH, Aldo H. Romero (5), Antoine Loew (1), Anton Bochkarev (4), Bochum, Coimbra, Department of Physics, ETH Z\"urich, Germany, Hai-Chen Wang (1), ICAMS, Jonathan Schmidt (2), Miguel A. L. Marques (1) ((1) Research Center Future Energy Materials, Morgantown, Pierre-Paul De Breuck (1), Portugal, Ralf Drautz (4), Ruhr-Universit\"at Bochum, Ruhr University Bochum, Silvana Botti (1), Switzerland, Systems of the University Alliance Ruhr, Th\'eo Cavignac (1), Tiago F. T. Cerqueira (3), University of Coimbra, USA), West Virginia University, Yury Lysogorskiy (4), Z\"urich.

**Figure 2.** Figure 2: FIG. 2. Histograms of the distances to the convex hull of the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Evolution of the number of structures introduced to [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Elemental distribution within the database. Each cell of the periodic table indicates the fraction of materials containing [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Space group distribution for of structures on the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Average and most frequent coordination number of elements in structures on the convex hull. The background color [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Nobility index of elementary substances on the convex-hull of [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Distribution of maximum forces, energies and [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

read the original abstract

We present a novel multi-stage workflow for computational materials discovery that achieves a 99% success rate in identifying compounds within 100 meV/atom of thermodynamic stability, with a threefold improvement over previous approaches. By combining the Matra-Genoa generative model, Orb-v2 universal machine learning interatomic potential, and ALIGNN graph neural network for energy prediction, we generated 119 million candidate structures and added 1.3 million DFT-validated compounds to the ALEXANDRIA database, including 74 thousand new stable materials. The expanded ALEXANDRIA database now contains 5.8 million structures with 175 thousand compounds on the convex hull. Predicted structural disorder rates (37-43%) match experimental databases, unlike other recent AI-generated datasets. Analysis reveals fundamental patterns in space group distributions, coordination environments, and phase stability networks, including sub-linear scaling of convex hull connectivity. We release the complete dataset, including sAlex25 with 14 million out-of-equilibrium structures containing forces and stresses for training universal force fields. We demonstrate that fine-tuning a GRACE model on this data improves benchmark accuracy. All data, models, and workflows are freely available under Creative Commons licenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper scales up the Alexandria database with a three-model ML cascade, adding 1.3 million validated compounds and releasing a large force-inclusive dataset that others can actually use.

read the letter

The main advance is the practical application of Matra-Genoa plus Orb-v2 plus ALIGNN to screen 119 million candidates and add 1.3 million DFT-validated entries, including 74k new stable ones. They also release sAlex25 with 14 million out-of-equilibrium structures that include forces and stresses, plus a quick demonstration that fine-tuning GRACE on it improves standard benchmarks. The disorder rate check against experiment is a straightforward sanity test that the generated set does not look wildly unphysical, and the sub-linear hull connectivity observation is a small but concrete pattern they pull out of the data.

Referee Report

2 major / 2 minor

Summary. The paper presents a multi-stage AI workflow combining the Matra-Genoa generative model, Orb-v2 universal ML interatomic potential, and ALIGNN graph neural network to generate 119 million candidate structures, perform DFT validation on selected compounds, and expand the ALEXANDRIA database by 1.3 million entries (including 74 thousand new stable materials). It claims a 99% success rate for compounds within 100 meV/atom of thermodynamic stability (threefold improvement over prior methods), reports that predicted structural disorder rates (37-43%) match experimental databases, analyzes space-group and coordination patterns, and releases the full dataset plus sAlex25 (14 million out-of-equilibrium structures with forces/stresses) for training universal force fields, demonstrating improved GRACE model performance after fine-tuning.

Significance. If the 99% success rate and lack of systematic ML bias hold, the work provides a major expansion of a validated materials database (now 5.8 million structures, 175 thousand on the convex hull) with open data and models, plus large-scale training data that improves downstream ML accuracy. The explicit match to experimental disorder statistics and sub-linear hull-connectivity scaling are strengths that distinguish it from other AI-generated datasets and support broader use in materials discovery.

major comments (2)

[Abstract and Results] Abstract and Results: The central 99% success-rate claim for structures within 100 meV/atom of the hull rests on the unverified assumption that the Matra-Genoa + Orb-v2 + ALIGNN cascade introduces no net energy bias favoring near-hull candidates. No error histogram, calibration curve on a near-hull test set, or direct ML-vs-DFT energy distribution for the final 1.3 M compounds is provided, leaving open the possibility that model under-prediction in selected space groups or coordinations inflates the observed rate.
[Methods] Methods: The multi-stage filtering procedure (Matra-Genoa generative step followed by Orb-v2 and ALIGNN surrogates) requires explicit quantification of how selection thresholds and post-hoc rules affect the final DFT-validated set; without this, it is impossible to confirm that the threefold improvement is not partly an artifact of the filtering rather than intrinsic model performance.

minor comments (2)

[Results] The description of sAlex25 and its use for fine-tuning GRACE would benefit from a table summarizing benchmark improvements (e.g., force MAE before/after) to make the training-data value concrete.
[Figures] Figure captions for space-group and coordination-environment distributions should explicitly state the binning or normalization used so that sub-linear hull-connectivity scaling can be directly compared to prior literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to improve the clarity of our work. We address the major comments point by point below, with revisions planned to strengthen the supporting evidence for our claims.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results: The central 99% success-rate claim for structures within 100 meV/atom of the hull rests on the unverified assumption that the Matra-Genoa + Orb-v2 + ALIGNN cascade introduces no net energy bias favoring near-hull candidates. No error histogram, calibration curve on a near-hull test set, or direct ML-vs-DFT energy distribution for the final 1.3 M compounds is provided, leaving open the possibility that model under-prediction in selected space groups or coordinations inflates the observed rate.

Authors: We appreciate the referee's emphasis on rigorously validating the absence of systematic bias in the ML cascade. The reported 99% success rate is computed from direct DFT energies on the final selected compounds, which serve as the ground truth. To address the concern directly, the revised manuscript will include (i) an error histogram of ML-predicted versus DFT energies on a held-out near-hull validation set, (ii) a calibration curve for the ALIGNN surrogate restricted to low-energy candidates, and (iii) a side-by-side ML-versus-DFT energy distribution for a statistically representative subset of the 1.3 million DFT-validated entries. These additions will allow readers to quantify any residual bias and confirm that the threefold improvement is not an artifact of under-prediction in particular space groups or coordinations. revision: yes
Referee: [Methods] Methods: The multi-stage filtering procedure (Matra-Genoa generative step followed by Orb-v2 and ALIGNN surrogates) requires explicit quantification of how selection thresholds and post-hoc rules affect the final DFT-validated set; without this, it is impossible to confirm that the threefold improvement is not partly an artifact of the filtering rather than intrinsic model performance.

Authors: We agree that greater transparency on the filtering pipeline is necessary. The revised Methods section will be expanded to report retention fractions after each stage (Matra-Genoa generation, Orb-v2 energy cutoff, ALIGNN ranking, and post-hoc stability rules), together with the precise numerical thresholds employed. We will also include a supplementary table showing how varying these thresholds alters the final DFT-validated composition and the resulting success rate. This quantification will demonstrate that the observed improvement arises primarily from the generative model's ability to produce high-quality candidates rather than from overly aggressive filtering alone. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper's central result—a 99% success rate for near-hull compounds—is an empirical count obtained by running independent DFT calculations on structures proposed by pre-trained generative and surrogate models (Matra-Genoa, Orb-v2, ALIGNN). No parameters are fitted inside the target dataset and then reused to generate the reported success metric; the DFT validation step lies outside the ML pipeline. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the workflow or the reported statistics. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The workflow assumes standard DFT accuracy for final validation and relies on pre-trained external models whose internal parameters are not re-derived here.

axioms (1)

domain assumption DFT calculations supply reliable ground-truth energies for thermodynamic stability assessment
Used to validate the 99% success claim after ML filtering

pith-pipeline@v0.9.0 · 5690 in / 1239 out tokens · 31194 ms · 2026-05-16T23:14:41.681365+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OptiMat Alloys: a FAIR, living database of multi-principal element alloys enabled by a conversational agent
cond-mat.mtrl-sci 2026-04 unverdicted novelty 5.0

OptiMat Alloys is a conversational AI system that maintains a living FAIR database of multi-principal element alloy calculations and enables natural-language, on-demand computations with built-in uncertainty checks.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

We then selected the 88 thousands structures compounds predicted to have the smallest distance to the hull for further DFT relaxation

We used theM3GNetenergy to compute directly the distance to the DFT convex hull. We then selected the 88 thousands structures compounds predicted to have the smallest distance to the hull for further DFT relaxation. This dataset is labeled m3gnet/m3gnetin Table I and Figure 2

work page
[2]

This dataset is labeled m3gnet/faenetin Table I and Figure 2

We used theF AENetmodel to predict the distance to the convex hull and selected 69 thousands compounds for further DFT relaxation that were closest to the hull. This dataset is labeled m3gnet/faenetin Table I and Figure 2

work page
[3]

collective

We used theALIGNNmodel to predict the distance to the convex hull and selected 143 thousands compounds for further DFT relaxation that were closest to the hull. This dataset is labeled m3gnet/alignnin Table I and Figure 2. The analysis of the DFT energies showed clearly that distance to the hull obtained directly from the M3GNetwas not a good estimator of...

work page arXiv 2014
[4]

dandelion

provides a rigorous mathematical foundation for machine learning interatomic potentials, unifying both local and message-passing graph neural network (GNN) architectures in a single framework.GRACEgeneralizes the Atomic Cluster Expansion (ACE) [67], which builds on a complete basis for local, star graphs, by introducing a complete set of tree graph cluste...

work page 2022
[5]

Merchant, S

A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, Nature624, 80–85 (2023)

work page 2023
[6]

S. D. Griesemer, B. Baldassarri, R. Zhu, J. Shen, K. Pal, C. W. Park, and C. Wolverton, Science Advances11, 10.1126/sciadv.adq1431 (2025). 12

work page doi:10.1126/sciadv.adq1431 2025
[7]

Schmidt, T

J. Schmidt, T. F. Cerqueira, A. H. Romero, A. Loew, F. Jäger, H.-C. Wang, S. Botti, and M. A. Marques, Materials Today Physics48, 101560 (2024)

work page 2024
[8]

Schmidt, N

J. Schmidt, N. Hoffmann, H. Wang, P. Borlido, P. J. M. A. Carriço, T. F. T. Cerqueira, S. Botti, and M. A. L. Marques, Adv. Mater.35, 2210788 (2023)

work page 2023
[9]

H. J. Kulik, T. Hammerschmidt, J. Schmidt, S. Botti, M. A. L. Marques, M. Boley, M. Scheffler, M. Todorović, P. Rinke, C. Oses, A. Smolyanyuk, S. Curtarolo, A. Tkatchenko, A. P. Bartók, S. Manzhos, M. Ihara, T. Carrington, J. Behler, O. Isayev, M. Veit, A. Grisafi, J. Nigam, M. Ceriotti, K. T. Schütt, J. Westermayr, M. Gastegger, R. J. Maurer, B. Kalita, ...

work page 2022
[10]

Z. W. Ulissi, K. Tran, J. Yoon, M. S. Shuaibi, L. Mingjie, N. Zhan, K. Broderick, and J. R. Kitchin,Computational Catalysis(Royal Society of Chemistry, 2024) pp. 224 – 279

work page 2024
[11]

J. Abed, J. Kim, M. Shuaibi, B. Wander, B. Duijf, S. Mahesh, H. Lee, V. Gharakhanyan, S. Hoogland, E. Irtem, J. Lan, N. Schouten, A. U. Vijayakumar, J. Hattrick-Simpers, J. R. Kitchin, Z. W. Ulissi, A. van Vugt, E. H. Sargent, D. Sinton, and C. L. Zitnick, Open catalyst experiments 2024 (ocx24): Bridging experiments and computational models (2024)

work page 2024
[12]

Sriram, S

A. Sriram, S. Choi, X. Yu, L. M. Brabson, A. Das, Z. Ulissi, M. Uyttendaele, A. J. Medford, and D. S. Sholl, ACS Central Science10, 923–941 (2024)

work page 2024
[13]

Schmidt, L

J. Schmidt, L. Pettersson, C. Verdozzi, S. Botti, and M. A. L. Marques, Sci. Adv.7, eabi7948 (2021)

work page 2021
[14]

Neumann, J

M. Neumann, J. Gin, B. Rhodes, S. Bennett, Z. Li, H. Choubisa, A. Hussey, and J. Godwin, Orb: A fast, scalable neural network potential (2024)

work page 2024
[15]

Batatia, D

I. Batatia, D. P. Kovacs, G. N. C. Simm, C. Ortner, and G. Csanyi, inAdv. Neural Inf. Process. Syst., edited by A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho (2022)

work page 2022
[16]

Chen and S

C. Chen and S. P. Ong, Nat. Comput. Sci.2, 718–728 (2022)

work page 2022
[17]

Bochkarev, Y

A. Bochkarev, Y. Lysogorskiy, and R. Drautz, Physical Review X14, 021036 (2024)

work page 2024
[18]

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

L. Barroso-Luque, M. Shuaibi, X. Fu, B. M. Wood, M. Dzamba, M. Gao, A. Rizvi, C. L. Zitnick, and Z. W. Ulissi, arXiv preprint arXiv:2410.12771 10.48550/arXiv.2410.12771 (2024)

work page internal anchor Pith review doi:10.48550/arxiv.2410.12771 2024
[19]

X. Fu, B. M. Wood, L. Barroso-Luque, D. S. Levine, M. Gao, M. Dzamba, and C. L. Zitnick, Learning smooth and expressive interatomic potentials for physical property prediction (2025), 2502.12147 [physics.comp- ph]

work page arXiv 2025
[20]

J. Kim, J. Kim, J. Kim, J. Lee, Y. Park, Y. Kang, and S. Han, Data-efficient multi-fidelity training for high- fidelity machine learning interatomic potentials (2024), arXiv:2409.07947 [cond-mat.mtrl-sci]

work page arXiv 2024
[21]

J. Zeng, D. Zhang, A. Peng, X. Zhang, S. He, Y. Wang, X. Liu, H. Bi, Y. Li, C. Cai, C. Zhang, Y. Du, J.-X. Zhu, P. Mo, Z. Huang, Q. Zeng, S. Shi, X. Qin, Z. Yu, C. Luo, Y.Ding, Y.-P.Liu, R.Shi, Z.Wang, S.L.Bore, J.Chang, Z. Deng, Z. Ding, S. Han, W. Jiang, G. Ke, Z. Liu, D. Lu, K. Muraoka, H. Oliaei, A. K. Singh, H. Que, W. Xu, Z. Xu, Y.-B. Zhuang, J. Dai...

work page arXiv 2025
[22]

Lysogorskiy, A

Y. Lysogorskiy, A. Bochkarev, and R. Drautz, Graph atomic cluster expansion for foundational machine learning interatomic potentials (2025), arXiv:2508.17936 [cond-mat]

work page arXiv 2025
[23]

Riebesell, R.E

J. Riebesell, R. E. A. Goodall, P. Benner, Y. Chiang, B.Deng, G.Ceder, M.Asta, A.A.Lee, A.Jain,andK.A. Persson, arXiv 10.48550/ARXIV.2308.14920 (2023)

work page doi:10.48550/arxiv.2308.14920 2023
[24]

A. Loew, D. Sun, H.-C. Wang, S. Botti, and M. A. L. Marques, npj Computational Materials11, 178 (2025), publisher: Nature Publishing Group

work page 2025
[25]

Eastman, R

P. Eastman, R. Galvelis, R. P. Peláez, C. R. A. Abreu, S. E. Farr, E. Gallicchio, A. Gorenko, M. M. Henry, F. Hu, J. Huang, A. Krämer, J. Michel, J. A. Mitchell, V. S. Pande, J. P. Rodrigues, J. Rodriguez- Guerra, A. C. Simmonett, S. Singh, J. Swails, P. Turner, Y. Wang, I. Zhang, J. D. Chodera, G. D. Fabritiis, and T. E. Markland, OpenMM 8: Molecular Dyn...

work page arXiv 2023
[26]

Sharma, A

K. Sharma, A. Loew, H. Wang, F. A. Nilsson, M. Jain, M. A. L. Marques, and K. S. Thygesen, Accelerating point defect photo-emission calculations with machine learning interatomic potentials (2025), arXiv:2505.01403 [cond-mat]

work page arXiv 2025
[27]

Bhatia, O

N. Bhatia, O. Krejci, S. Botti, P. Rinke, and M. A. L. Marques, MACE4IR: A foundation model for molecular infrared spectroscopy (2025), arXiv:2508.19118 [physics]

work page arXiv 2025
[28]

P.-P. D. Breuck, H.-C. Wang, G.-M. Rignanese, S. Botti, and M. A. L. Marques, Generative AI for Crystal Structures: A Review (2025), arXiv:2509.02723 [cond- mat]

work page arXiv 2025
[29]

Grunert, M

M. Grunert, M. Großmann, and E. Runge, Phys. Rev. Mater.8, L122201 (2024)

work page 2024
[30]

H.-C. Wang, J. Schmidt, M. A. L. Marques, L. Wirtz, and A. H. Romero, 2D Mater.10, 035007 (2023)

work page 2023
[31]

Vishina, O

A. Vishina, O. Eriksson, and H. C. Herper, Acta Materialia261, 119348 (2023)

work page 2023
[32]

T. F. T. Cerqueira, A. Sanna, and M. A. L. Marques, Advanced Materials36, 10.1002/adma.202307085 (2023)

work page doi:10.1002/adma.202307085 2023
[33]

T. F. T. Cerqueira, Y. Fang, I. Errea, A. Sanna, and M. A. L. Marques, Advanced Functional Materials34, 10.1002/adfm.202404043 (2024)

work page doi:10.1002/adfm.202404043 2024
[34]

T. H. B. da Silva, T. Cavignac, T. F. T. Cerqueira, H.- C. Wang, and M. A. L. Marques, Materials Horizons 10.1039/d4mh01753f (2025)

work page doi:10.1039/d4mh01753f 2025
[35]

Schmidt, J

J. Schmidt, J. Shi, P. Borlido, L. Chen, S. Botti, and M. A. L. Marques, Chem. Mater.29, 5090–5103 (2017)

work page 2017
[36]

H.-C. Wang, S. Botti, and M. A. L. Marques, npj Comput. Mater.7, 12 (2021)

work page 2021
[37]

Merchant, S

A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, Nature624, 80 (2023), publisher: Nature Publishing Group

work page 2023
[38]

Jakob, A

K. Jakob, A. Walsh, K. Reuter, and J. T. Margraf, ChemRxiv 10.26434/chemrxiv-2025-f52qs (2025)

work page doi:10.26434/chemrxiv-2025-f52qs 2025
[39]

T. F. T. Cerqueira, A. Sanna, and M. A. L. Marques, Adv. Mater.36, 2307085 (2024)

work page 2024
[40]

Glawe, A

H. Glawe, A. Sanna, E. K. U. Gross, and M. A. L. Marques, New Journal of Physics18, 093011 (2016), publisher: IOP Publishing. 13

work page 2016
[41]

Fredericks, K

S. Fredericks, K. Parrish, D. Sayre, and Q. Zhu, Comput. Phys. Commun.261, 107810 (2021)

work page 2021
[42]

A. R. Oganov, ed.,Modern methods of crystal structure prediction(Wiley-VCH, Weinheim, Germany, 2011)

work page 2011
[43]

A. R. Oganov, A. O. Lyakhov, and M. Valle, Accounts of Chemical Research44, 227 (2011)

work page 2011
[44]

Y. Wang, J. Lv, L. Zhu, and Y. Ma, Computer Physics Communications183, 2063 (2012)

work page 2063
[45]

S.Goedecker,TheJournalofChemicalPhysics120,9911 (2004)

work page 2004
[46]

A. D. Handoko and R. I. Made, Artificial Intelligence and Generative Models for Materials Discovery – A Review (2025), arXiv:2508.03278 [cond-mat]

work page arXiv 2025
[47]

P.-P. D. Breuck, H. A. Piracha, G.-M. Rignanese, and M. A. L. Marques, A generative material transformer using Wyckoff representation (2025)

work page 2025
[48]

Dacek, S

A.Jain, S.P.Ong, G.Hautier, W.Chen, W.D.Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A. Persson, APL Mater.1, 011002 (2013)

work page 2013
[49]

Neumann, J

M. Neumann, J. Gin, B. Rhodes, S. Bennett, Z. Li, H. Choubisa, A. Hussey, and J. Godwin, Orb: A Fast, Scalable Neural Network Potential (2024)

work page 2024
[50]

Schmidt, L

J. Schmidt, L. Chen, S. Botti, and M. A. L. Marques, J. Chem. Phys.148, 241728 (2018)

work page 2018
[51]

Zagorac, H

D. Zagorac, H. Müller, S. Ruehl, J. Zagorac, and S. Rehme, Journal of Applied Crystallography52, 918 (2019)

work page 2019
[52]

Kresse and D

G. Kresse and D. Joubert, Phys. Rev. B59, 1758–1775 (1999)

work page 1999
[53]

A. J. C. Wilson, Acta Crystallographica Section A Foundations of Crystallography44, 715 (1988)

work page 1988
[54]

A. J. C. Wilson, Acta Crystallographica Section A Foundations of Crystallography46, 742 (1990)

work page 1990
[55]

S. W. Wukovitz and T. O. Yeates, Nature Structural & Molecular Biology2, 1062 (1995)

work page 1995
[56]

H. Pan, A. M. Ganose, M. Horton, M. Aykol, K. A. Persson, N. E. R. Zimmermann, and A. Jain, Inorganic Chemistry60, 1590 (2021), publisher: American Chemical Society

work page 2021
[57]

S. P. Ong, W. D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V. L. Chevrier, K. A. Persson, and G. Ceder, Computational Materials Science 68, 314 (2013)

work page 2013
[58]

S. V. Krivovichev, Coordination Chemistry Reviews498, 215484 (2024)

work page 2024
[59]

Leeman, Y

J. Leeman, Y. Liu, J. Stiles, S. B. Lee, P. Bhatt, L. M. Schoop, and R. G. Palgrave, PRX Energy3, 011002 (2024)

work page 2024
[60]

V. I. Hegde, M. Aykol, S. Kirklin, and C. Wolverton, Science Advances6, 10.1126/sciadv.aay5606 (2020)

work page doi:10.1126/sciadv.aay5606 2020
[61]

Benedini, A

G. Benedini, A. Loew, M. Hellstrom, S. Botti, and M. A. L. Marques, Universal Machine Learning Potential for Systems with Reduced Dimensionality (2025), arXiv:2508.15614 [cond-mat]

work page arXiv 2025
[62]

A. Loew, J. Schmidt, S. Botti, and M. A. L. Marques, Universal Machine Learning Potentials under Pressure (2025), arXiv:2508.17792 [cond-mat]

work page arXiv 2025
[63]

A. Peng, C. Cai, M. Guo, D. Zhang, C. Zhang, W. Jiang, Y.Wang, A.Loew, C.Wu, W.E,L.Zhang,andH.Wang, LAMBench: A Benchmark for Large Atomistic Models (2025), arXiv:2504.19578 [physics]

work page arXiv 2025
[64]

G.KresseandJ.Furthmüller,Comp.Mater.Sci.6,15–50 (1996)

work page 1996
[65]

Kresse and J

G. Kresse and J. Furthmüller, Phys. Rev. B54, 11169–11186 (1996)

work page 1996
[66]

P. E. Blöchl, Phys. Rev. B50, 17953–17979 (1994)

work page 1994
[67]

J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett.77, 3865–3868 (1996)

work page 1996
[68]

K. L. K. Lee, C. Gonzales, M. Nassar, M. Spellings, M. Galkin, and S. Miret, Matsciml: A broad, multi-task benchmark for solid-state materials modeling (2023), arXiv:2309.05934

work page arXiv 2023
[69]

Ramachandran, B

P. Ramachandran, B. Zoph, and Q. V. Le, Searching for activation functions (2017)

work page 2017
[70]

Loshchilov and F

I. Loshchilov and F. Hutter, Decoupled weight decay regularization (2017)

work page 2017
[71]

Drautz, Phys

R. Drautz, Phys. Rev. B99, 014104 (2019)

work page 2019
[72]

Riebesell, H

J. Riebesell, H. D. YANG, R. Goodall, S. G. Baird, M.- H. chiu, B. B. Maranville, Colin, J. George, J. Wang, and T. Keane, janosh/pymatviz: v0.17.3 (2025)

work page 2025