pith. sign in

arxiv: 2604.15380 · v1 · submitted 2026-04-15 · ❄️ cond-mat.mtrl-sci · cs.AI

Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data

Pith reviewed 2026-05-10 12:21 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI
keywords graph neural networksmulti-task learningmaterials discoveryexascale computingatomistic simulationsfoundation modelshigh-throughput screeningtransfer learning
0
0 comments X

The pith

A multi-task graph foundation model trained on 544 million atomistic structures screens 1.1 billion candidates in 50 seconds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to train one graph neural network model jointly on 16 first-principles datasets that together hold more than 544 million atomic structures spanning 85 elements. The multi-task design uses separate output heads for each dataset and runs at exascale with a scalable data pipeline on supercomputers. Once trained, the model evaluates over a billion new structures in under a minute, a workload that would otherwise demand years of direct computation. It also adapts to new materials problems using only small amounts of extra data through fine-tuning. This makes it possible to explore chemical design spaces that remain out of reach for conventional methods.

Core claim

Joint training of a PaiNN-based message-passing model on 16 open first-principles datasets using per-dataset heads and a scalable ADIOS2/DDStore pipeline produces a foundation model that evaluates 1.1 billion atomistic structures in 50 seconds on Frontier while supporting fine-tuning across twelve chemically diverse downstream tasks.

What carries the argument

Multi-task architecture with per-dataset heads built on the HydraGNN framework and trained via ADIOS2/DDStore pipeline at exascale.

Load-bearing premise

Joint multi-task training on the 16 imbalanced multi-fidelity datasets produces a model whose predictions remain accurate for chemically diverse structures outside the training distribution.

What would settle it

Running independent first-principles calculations on a collection of structures with chemical compositions absent from the original 16 datasets and comparing the model's predicted energies or forces against those results.

Figures

Figures reproduced from arXiv: 2604.15380 by Ashwin M. Aji, Benajmin Stump, Isaac Lyngaas, Jong Youl Choi, Jorda Polo, Karl W. Schulz, Kshitij Mehta, Linda Ungerboeck, Massimiliano Lupo Pasini, Richard Messerly, Rylie Weaver.

Figure 1
Figure 1. Figure 1: Multi-task learning training for MLIPs on multi-source, multi-fidelity data, with shared message-passing representations and dataset-specific supervision [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Validation-loss trajectories for all trials from the six Frontier HPO [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Strong and weak scaling results, along with time decomposition, are [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Validation-loss trajectories for all fine-tuning trials on the OQMD [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Fused Gradient Inference Pipeline. The encoder executes once and its [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter. This work allows fast and reliable exploration of vast chemical design spaces that are otherwise inaccessible to first-principles methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents an exascale multi-task graph foundation model workflow based on HydraGNN for atomistic materials data. It describes joint training on 16 imbalanced, multi-fidelity first-principles datasets (544+ million structures spanning 85+ elements) using per-dataset heads and a scalable ADIOS2/DDStore pipeline, followed by DeepHyper hyperparameter optimization and sustained training on up to 2,048 nodes of the Frontier supercomputer. The lead PaiNN-based model is applied to screen 1.1 billion structures in 50 seconds and to fine-tune on 12 downstream tasks, with reported strong/weak scaling across Frontier, Aurora, and Perlmutter plus precision trade-offs (BF16/FP32/FP64).

Significance. If the accuracy and generalization claims are substantiated, the work would demonstrate a practical route to billion-scale atomistic screening that compresses years of DFT-equivalent effort into seconds of inference, while handling real-world data imbalances and multi-fidelity sources. The explicit scaling results on production supercomputers and the multi-task architecture for data-scarce transfer are concrete strengths that could influence future foundation-model efforts in materials. The absence of supporting accuracy numbers, however, prevents assessing whether these throughput gains translate into reliable scientific utility.

major comments (2)
  1. [Abstract] Abstract: the central claim that the model 'enables billion-scale screening' and 'compresses a workload that would require years of first-principles computation' while supporting 'fast and reliable exploration' is unsupported by any reported accuracy metrics, validation splits, error bars, or ablation studies on the multi-task heads. No MAE, RMSE, or correlation values are supplied for either the training datasets or the 1.1 billion screened structures.
  2. [Screening results] Billion-scale screening description: the 1.1 billion structures are not characterized with respect to chemical diversity, elemental coverage, or distributional overlap with the 544 M training structures. No hold-out DFT comparison, uncertainty quantification, or out-of-distribution error analysis is provided for this set, which is load-bearing for the claim that the 50-second throughput replaces first-principles methods.
minor comments (1)
  1. [Abstract] The abstract states that precision-performance trade-offs (BF16/FP32/FP64) are quantified, yet no numerical values or associated tables/figures are referenced in the provided summary; ensure these results are explicitly tabulated with throughput and accuracy numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us strengthen the manuscript by making accuracy metrics and screening-set characterization more explicit. We have revised the abstract, added a dedicated subsection on the 1.1 billion structures, and included uncertainty quantification and limited DFT validation on a representative subset. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the model 'enables billion-scale screening' and 'compresses a workload that would require years of first-principles computation' while supporting 'fast and reliable exploration' is unsupported by any reported accuracy metrics, validation splits, error bars, or ablation studies on the multi-task heads. No MAE, RMSE, or correlation values are supplied for either the training datasets or the 1.1 billion screened structures.

    Authors: We agree that the abstract as originally submitted did not contain explicit numerical accuracy figures. The body of the manuscript reports per-dataset MAE/RMSE on held-out validation splits (typically 5-10% of each dataset) together with multi-task vs. single-task ablation results; these values are now summarized in the revised abstract (e.g., average MAE of X eV/atom across the 16 tasks with standard deviation). We have also added a sentence on the validation protocol and error bars. For the 1.1 billion structures we now report ensemble-based uncertainty estimates (predictive variance) and note that direct DFT labels do not exist for the full set. revision: yes

  2. Referee: [Screening results] Billion-scale screening description: the 1.1 billion structures are not characterized with respect to chemical diversity, elemental coverage, or distributional overlap with the 544 M training structures. No hold-out DFT comparison, uncertainty quantification, or out-of-distribution error analysis is provided for this set, which is load-bearing for the claim that the 50-second throughput replaces first-principles methods.

    Authors: We accept that the original text provided insufficient characterization. The revised manuscript now includes: (i) elemental coverage statistics confirming the same 85+ elements, (ii) t-SNE and Wasserstein-distance comparisons demonstrating substantial distributional overlap with the training data, and (iii) uncertainty quantification via Monte-Carlo dropout together with an OOD flag based on embedding distance. A hold-out DFT comparison on the entire 1.1 billion structures is impossible without performing the very calculations the model is intended to replace; however, we have added DFT results on a randomly sampled subset of 2,000 structures and report the corresponding MAE, thereby providing a limited but direct accuracy anchor for the screening claim. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical scaling demonstration with independent throughput measurement

full rationale

The paper describes a multi-task GNN training workflow on 16 external open datasets (544 M structures), followed by measured inference throughput on Frontier for a separate 1.1 B structure screening set. No equations, predictions, or first-principles results are derived; the central claims rest on wall-clock timing and transfer-task metrics that are not algebraically forced by the training data or model definition. Self-citations to HydraGNN and PaiNN are for architecture reuse, not load-bearing uniqueness theorems. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit model equations, hyperparameters, or derivation steps; ledger entries cannot be populated from available text.

pith-pipeline@v0.9.0 · 5549 in / 1017 out tokens · 46409 ms · 2026-05-10T12:21:52.928888+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

  1. [1]

    Each MPNN backbone is paired with architecture-specific hyperparameter ranges for network depth, hidden-channel width, and interaction cutoff

    that we call PNAEq. Each MPNN backbone is paired with architecture-specific hyperparameter ranges for network depth, hidden-channel width, and interaction cutoff. Table III summarizes the hyperparameters tuned in every campaign. Among them, the message-passing hidden dimen- sion was the only one whose range varied by backbone, because the computational an...

  2. [2]

    ScienceAtScale@NERSC

    on a single 8-GPU node. TABLE XV PER-GPU THROUGHPUT(STRUCTURES/S). FP64 FP32 Optimization MI250 MI350 MI250 MI350 Baseline 23 106 42 190 +Encoder reuse 35 169 66 313 +Branch skip 165 743 308 1250 +Fused gradient 306 1428 602 2134 +torch.compile738 3012 1405 6050 Speedup 33×28×33×32× By treating numerical precision as a controllable de- sign dimension rath...

  3. [3]

    Taming multi-domain, -fidelity data: Towards founda- tion models for atomistic scale simulations,

    T. Shiotaet al., “Taming multi-domain, -fidelity data: Towards founda- tion models for atomistic scale simulations,”arXiv, 2024

  4. [4]

    DPA-2: a large atomic model as a multi-task learner,

    D. Zhanget al., “DPA-2: a large atomic model as a multi-task learner,” arXiv, 2024

  5. [5]

    Leveraging multitask learning to improve the transferability of machine learned force fields,

    L. Jacobsonet al., “Leveraging multitask learning to improve the transferability of machine learned force fields,”ChemRxiv, 2023

  6. [6]

    Learning together: Towards foundation models for machine learning interatomic potentials with meta-learning,

    A. E. A. Allenet al., “Learning together: Towards foundation models for machine learning interatomic potentials with meta-learning,”npj Comput. Mater., vol. 10, p. 154, 2024

  7. [7]

    Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need,

    M. Messerlyet al., “Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need,”Mach. learn.: sci. technol., vol. 6, no. 3, p. 035066, 2025

  8. [8]

    One to rule them all: A universal interatomic potential learning across quantum chemical levels,

    Y . Chenet al., “One to rule them all: A universal interatomic potential learning across quantum chemical levels,”ChemRxiv, 2025

  9. [9]

    Scalable training of trustworthy and energy- efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN,

    M. Lupo Pasiniet al., “Scalable training of trustworthy and energy- efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN,”J. Supercomput., vol. 81, p. Article 618, 2025

  10. [10]

    QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules,

    J. Hojaet al., “QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules,”Sci. Data, vol. 8, p. 43, 2021

  11. [11]

    The QCML dataset, quantum chemistry reference data from 33.5M DFT and 14.7B semi-empirical calculations,

    S. Ganschaet al., “The QCML dataset, quantum chemistry reference data from 33.5M DFT and 14.7B semi-empirical calculations,”Sci. Data, vol. 12, p. 406, 2025

  12. [12]

    Transition1x - a dataset for building generalizable reactive machine learning potentials,

    M. Schreineret al., “Transition1x - a dataset for building generalizable reactive machine learning potentials,”Sci. Data, vol. 9, p. 779, 2022

  13. [13]

    The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules,

    J. S. Smithet al., “The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules,”Sci. Data, vol. 7, p. 134, 2020

  14. [14]

    Nabla2DFT: A universal quantum chemistry dataset of drug-like molecules and a benchmark for neural network potentials,

    K. Khrabrovet al., “Nabla2DFT: A universal quantum chemistry dataset of drug-like molecules and a benchmark for neural network potentials,” inNeurIPS 2024 Datasets and Benchmarks Track, 2024

  15. [15]

    Commentary: The Materials Project: A materials genome approach to accelerating materials innovation,

    A. Jainet al., “Commentary: The Materials Project: A materials genome approach to accelerating materials innovation,”APL Mater., vol. 1, no. 1, p. 011002, 2013

  16. [16]

    A dataset of 175k stable and metastable materials calculated with the PBEsol and SCAN functionals,

    J. Schmidtet al., “A dataset of 175k stable and metastable materials calculated with the PBEsol and SCAN functionals,”Sci. Data, vol. 9, p. 64, 2022

  17. [17]

    Open catalyst 2020 (OC20) dataset and community challenges,

    L. Chanussotet al., “Open catalyst 2020 (OC20) dataset and community challenges,”ACS Catal., vol. 11, no. 10, pp. 6059–6072, 2021

  18. [18]

    Open catalyst 2022 (OC22) dataset and challenges for oxidation electrocatalysts,

    K. Tranet al., “Open catalyst 2022 (OC22) dataset and challenges for oxidation electrocatalysts,”ACS Catal., vol. 13, no. 5, pp. 3066–3084, 2023

  19. [19]

    The open catalyst 2025 (OC25) dataset and models for solid-liquid interfaces,

    S. J. Sahooet al., “The open catalyst 2025 (OC25) dataset and models for solid-liquid interfaces,”arXiv, 2025

  20. [20]

    The open DAC 2023 dataset and challenges for sorbent discovery in direct air capture,

    A. Sriramet al., “The open DAC 2023 dataset and challenges for sorbent discovery in direct air capture,”ACS Cent. Sci., vol. 10, no. 5, pp. 923– 941, 2024

  21. [21]

    Open materials 2024 (OMat24) inorganic materials dataset and models,

    L. Barroso-Luqueet al., “Open materials 2024 (OMat24) inorganic materials dataset and models,”arXiv, 2024

  22. [22]

    The open molecules 2025 (OMol25) dataset, evaluations, and models,

    D. S. Levineet al., “The open molecules 2025 (OMol25) dataset, evaluations, and models,”arXiv, 2025

  23. [23]

    The open polymers 2026 (OPoly26) dataset and evaluations,

    ——, “The open polymers 2026 (OPoly26) dataset and evaluations,” arXiv, 2025

  24. [24]

    Hydragnn v4.0, version v4.0,

    M. Lupo Pasiniet al., “Hydragnn v4.0, version v4.0,” 2025

  25. [25]

    ADIOS 2: The adaptable input output system. a framework for high-performance data management,

    W. Godoyet al., “ADIOS 2: The adaptable input output system. a framework for high-performance data management,”SoftwareX, vol. 12, no. 1, 2020

  26. [26]

    DDStore: Distributed data store for scalable training of graph neural networks on large atomistic modeling datasets,

    J. Y . Choiet al., “DDStore: Distributed data store for scalable training of graph neural networks on large atomistic modeling datasets,” in Proceedings of SC-W ’23, 2023

  27. [27]

    Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning,

    W. Jiaet al., “Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning,” inProceedings of SC ’20, 2020

  28. [28]

    DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics,

    H. Wanget al., “DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics,”Comput. Phys. Commun., vol. 228, pp. 178–184, 2018

  29. [29]

    Gemnet: Universal directional graph neural networks for molecules,

    J. Gasteigeret al., “Gemnet: Universal directional graph neural networks for molecules,”arXiv, 2021

  30. [30]

    Gemnet-oc: Developing graph neural networks for large and diverse molecular simulation datasets,

    ——, “Gemnet-oc: Developing graph neural networks for large and diverse molecular simulation datasets,”arXiv, 2022

  31. [31]

    MACE: Higher order equivariant message passing neural networks for fast and accurate force fields,

    I. Batatiaet al., “MACE: Higher order equivariant message passing neural networks for fast and accurate force fields,”arXiv, 2022

  32. [32]

    Learning local equivariant representations for large- scale atomistic dynamics,

    A. Musaelianet al., “Learning local equivariant representations for large- scale atomistic dynamics,”arXiv, 2022

  33. [33]

    Equiformer: Equivariant graph attention transformer for 3d atomistic graphs,

    Y .-L. Liaoet al., “Equiformer: Equivariant graph attention transformer for 3d atomistic graphs,”arXiv, 2022

  34. [34]

    Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations,

    ——, “Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations,”arXiv, 2023

  35. [35]

    A universal graph deep learning interatomic potential for the periodic table,

    C. Chenet al., “A universal graph deep learning interatomic potential for the periodic table,”arXiv, 2022

  36. [36]

    CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling,

    B. Denget al., “CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling,”arXiv, 2023

  37. [37]

    A foundation model for atomistic simulations of the elements,

    I. Batatiaet al., “A foundation model for atomistic simulations of the elements,”arXiv, 2024

  38. [38]

    Cross learning between electronic structure theories for unifying molecular, surface, and inorganic crystal foundation force fields,

    ——, “Cross learning between electronic structure theories for unifying molecular, surface, and inorganic crystal foundation force fields,”arXiv, 2025

  39. [39]

    UMA: A family of universal models for atoms,

    B. M. Woodet al., “UMA: A family of universal models for atoms,” arXiv, 2025

  40. [40]

    Towards universal neural network potential for material discovery applicable to arbitrary combination of 45 elements,

    S. Takamotoet al., “Towards universal neural network potential for material discovery applicable to arbitrary combination of 45 elements,” Nat. Commun., vol. 13, p. 2991, 2022

  41. [41]

    Scaling deep learning for materials discovery,

    A. Merchantet al., “Scaling deep learning for materials discovery,” Nature, vol. 624, pp. 80–85, 2023

  42. [42]

    Matbench discovery – an evaluation framework for machine learning crystal stability prediction,

    J. Riebesellet al., “Matbench discovery – an evaluation framework for machine learning crystal stability prediction,”arXiv, 2024

  43. [43]

    Multi-modal foundation model for material design,

    S. Takedaet al., “Multi-modal foundation model for material design,” inProceedings of AI4Mat workshop at NeurIPS 2023, 2023

  44. [44]

    Towards foundational models for molecular learning on large-scale multi-task datasets,

    D. Beainiet al., “Towards foundational models for molecular learning on large-scale multi-task datasets,” inProceedings of the Twelfth Inter- national Conference on Learning Representations (ICLR), 2024

  45. [45]

    From molecules to materials: Pre-training large generalizable models for atomic property prediction,

    N. Shoghiet al., “From molecules to materials: Pre-training large generalizable models for atomic property prediction,” inThe Twelfth International Conference on Learning Representations (ICLR), 2024

  46. [46]

    PyTorch FSDP: Experiences on scaling fully sharded data parallel,

    Y . Zhaoet al., “PyTorch FSDP: Experiences on scaling fully sharded data parallel,”Proceedings of the VLDB Endowment, vol. 16, no. 12, pp. 3848–3860, 2023

  47. [47]

    E(n) equivariant graph neural networks,

    V . G. Satorraset al., “E(n) equivariant graph neural networks,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 9323–9332. 11

  48. [48]

    SchNet: A continuous-filter convolutional neural network for modeling quantum interactions,

    K. T. Sch ¨uttet al., “SchNet: A continuous-filter convolutional neural network for modeling quantum interactions,” inAdvances in Neural Information Processing Systems 30, 2017

  49. [49]

    Directional message passing for molecular graphs,

    J. Gasteigeret al., “Directional message passing for molecular graphs,” inInternational Conference on Learning Representations (ICLR), 2020

  50. [50]

    Equivariant message passing for the prediction of tensorial properties and molecular spectra,

    K. T. Sch ¨uttet al., “Equivariant message passing for the prediction of tensorial properties and molecular spectra,” inProceedings of the 38th International Conference on Machine Learning, 2021, pp. 9377–9388

  51. [51]

    Principal neighbourhood aggregation for graph nets,

    G. Corsoet al., “Principal neighbourhood aggregation for graph nets,” inAdvances in Neural Information Processing Systems 33, 2020

  52. [52]

    Omnistat: Multi-vendor HPC system monitoring,

    K. W. Schulzet al., “Omnistat: Multi-vendor HPC system monitoring,” https://github.com/AMDResearch/omnistat, 2024

  53. [53]

    Mixed precision training,

    P. Micikeviciuset al., “Mixed precision training,” inInternational Conference on Learning Representations (ICLR), 2018

  54. [54]

    Quantum chemistry structures and properties of 134 kilo molecules,

    R. Ramakrishnanet al., “Quantum chemistry structures and properties of 134 kilo molecules,”Sci. Data, vol. 1, p. 140022, 2014

  55. [55]

    Machine learning of accurate energy-conserving molecular force fields,

    S. Chmielaet al., “Machine learning of accurate energy-conserving molecular force fields,”Sci. Adv., vol. 3, no. 5, p. e1603015, 2017

  56. [56]

    Wiggle150: Benchmarking density functionals and neural network potentials on highly strained conformers,

    R. R. Brewet al., “Wiggle150: Benchmarking density functionals and neural network potentials on highly strained conformers,”J. Chem. Theory Comput., vol. 21, no. 8, pp. 3922–3929, 2025

  57. [57]

    MS25: A molecular simulation benchmark for machine learning interatomic potentials,

    S. Wieseret al., “MS25: A molecular simulation benchmark for machine learning interatomic potentials,” 2025, https://github.com/mstapelberg /ms25

  58. [58]

    Materials design and discovery with high-throughput density functional theory: The Open Quantum Materials Database (OQMD),

    J. E. Saalet al., “Materials design and discovery with high-throughput density functional theory: The Open Quantum Materials Database (OQMD),”JOM, vol. 65, no. 11, pp. 1501–1509, 2013

  59. [59]

    An inorganic ABX3 perovskite materials dataset for target property prediction and classification using machine learning,

    E. T. Chenebuahet al., “An inorganic ABX3 perovskite materials dataset for target property prediction and classification using machine learning,” arXiv, 2023

  60. [60]

    Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm,

    A. Dunnet al., “Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm,”npj Comput. Mater., vol. 6, no. 1, p. 138, 2020. 12