Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data
Pith reviewed 2026-05-10 12:21 UTC · model grok-4.3
The pith
A multi-task graph foundation model trained on 544 million atomistic structures screens 1.1 billion candidates in 50 seconds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Joint training of a PaiNN-based message-passing model on 16 open first-principles datasets using per-dataset heads and a scalable ADIOS2/DDStore pipeline produces a foundation model that evaluates 1.1 billion atomistic structures in 50 seconds on Frontier while supporting fine-tuning across twelve chemically diverse downstream tasks.
What carries the argument
Multi-task architecture with per-dataset heads built on the HydraGNN framework and trained via ADIOS2/DDStore pipeline at exascale.
Load-bearing premise
Joint multi-task training on the 16 imbalanced multi-fidelity datasets produces a model whose predictions remain accurate for chemically diverse structures outside the training distribution.
What would settle it
Running independent first-principles calculations on a collection of structures with chemical compositions absent from the original 16 datasets and comparing the model's predicted energies or forces against those results.
Figures
read the original abstract
We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter. This work allows fast and reliable exploration of vast chemical design spaces that are otherwise inaccessible to first-principles methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an exascale multi-task graph foundation model workflow based on HydraGNN for atomistic materials data. It describes joint training on 16 imbalanced, multi-fidelity first-principles datasets (544+ million structures spanning 85+ elements) using per-dataset heads and a scalable ADIOS2/DDStore pipeline, followed by DeepHyper hyperparameter optimization and sustained training on up to 2,048 nodes of the Frontier supercomputer. The lead PaiNN-based model is applied to screen 1.1 billion structures in 50 seconds and to fine-tune on 12 downstream tasks, with reported strong/weak scaling across Frontier, Aurora, and Perlmutter plus precision trade-offs (BF16/FP32/FP64).
Significance. If the accuracy and generalization claims are substantiated, the work would demonstrate a practical route to billion-scale atomistic screening that compresses years of DFT-equivalent effort into seconds of inference, while handling real-world data imbalances and multi-fidelity sources. The explicit scaling results on production supercomputers and the multi-task architecture for data-scarce transfer are concrete strengths that could influence future foundation-model efforts in materials. The absence of supporting accuracy numbers, however, prevents assessing whether these throughput gains translate into reliable scientific utility.
major comments (2)
- [Abstract] Abstract: the central claim that the model 'enables billion-scale screening' and 'compresses a workload that would require years of first-principles computation' while supporting 'fast and reliable exploration' is unsupported by any reported accuracy metrics, validation splits, error bars, or ablation studies on the multi-task heads. No MAE, RMSE, or correlation values are supplied for either the training datasets or the 1.1 billion screened structures.
- [Screening results] Billion-scale screening description: the 1.1 billion structures are not characterized with respect to chemical diversity, elemental coverage, or distributional overlap with the 544 M training structures. No hold-out DFT comparison, uncertainty quantification, or out-of-distribution error analysis is provided for this set, which is load-bearing for the claim that the 50-second throughput replaces first-principles methods.
minor comments (1)
- [Abstract] The abstract states that precision-performance trade-offs (BF16/FP32/FP64) are quantified, yet no numerical values or associated tables/figures are referenced in the provided summary; ensure these results are explicitly tabulated with throughput and accuracy numbers.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped us strengthen the manuscript by making accuracy metrics and screening-set characterization more explicit. We have revised the abstract, added a dedicated subsection on the 1.1 billion structures, and included uncertainty quantification and limited DFT validation on a representative subset. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the model 'enables billion-scale screening' and 'compresses a workload that would require years of first-principles computation' while supporting 'fast and reliable exploration' is unsupported by any reported accuracy metrics, validation splits, error bars, or ablation studies on the multi-task heads. No MAE, RMSE, or correlation values are supplied for either the training datasets or the 1.1 billion screened structures.
Authors: We agree that the abstract as originally submitted did not contain explicit numerical accuracy figures. The body of the manuscript reports per-dataset MAE/RMSE on held-out validation splits (typically 5-10% of each dataset) together with multi-task vs. single-task ablation results; these values are now summarized in the revised abstract (e.g., average MAE of X eV/atom across the 16 tasks with standard deviation). We have also added a sentence on the validation protocol and error bars. For the 1.1 billion structures we now report ensemble-based uncertainty estimates (predictive variance) and note that direct DFT labels do not exist for the full set. revision: yes
-
Referee: [Screening results] Billion-scale screening description: the 1.1 billion structures are not characterized with respect to chemical diversity, elemental coverage, or distributional overlap with the 544 M training structures. No hold-out DFT comparison, uncertainty quantification, or out-of-distribution error analysis is provided for this set, which is load-bearing for the claim that the 50-second throughput replaces first-principles methods.
Authors: We accept that the original text provided insufficient characterization. The revised manuscript now includes: (i) elemental coverage statistics confirming the same 85+ elements, (ii) t-SNE and Wasserstein-distance comparisons demonstrating substantial distributional overlap with the training data, and (iii) uncertainty quantification via Monte-Carlo dropout together with an OOD flag based on embedding distance. A hold-out DFT comparison on the entire 1.1 billion structures is impossible without performing the very calculations the model is intended to replace; however, we have added DFT results on a randomly sampled subset of 2,000 structures and report the corresponding MAE, thereby providing a limited but direct accuracy anchor for the screening claim. revision: partial
Circularity Check
No circularity: empirical scaling demonstration with independent throughput measurement
full rationale
The paper describes a multi-task GNN training workflow on 16 external open datasets (544 M structures), followed by measured inference throughput on Frontier for a separate 1.1 B structure screening set. No equations, predictions, or first-principles results are derived; the central claims rest on wall-clock timing and transfer-task metrics that are not algebraically forced by the training data or model definition. Self-citations to HydraGNN and PaiNN are for architecture reuse, not load-bearing uniqueness theorems. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
that we call PNAEq. Each MPNN backbone is paired with architecture-specific hyperparameter ranges for network depth, hidden-channel width, and interaction cutoff. Table III summarizes the hyperparameters tuned in every campaign. Among them, the message-passing hidden dimen- sion was the only one whose range varied by backbone, because the computational an...
-
[2]
on a single 8-GPU node. TABLE XV PER-GPU THROUGHPUT(STRUCTURES/S). FP64 FP32 Optimization MI250 MI350 MI250 MI350 Baseline 23 106 42 190 +Encoder reuse 35 169 66 313 +Branch skip 165 743 308 1250 +Fused gradient 306 1428 602 2134 +torch.compile738 3012 1405 6050 Speedup 33×28×33×32× By treating numerical precision as a controllable de- sign dimension rath...
-
[3]
Taming multi-domain, -fidelity data: Towards founda- tion models for atomistic scale simulations,
T. Shiotaet al., “Taming multi-domain, -fidelity data: Towards founda- tion models for atomistic scale simulations,”arXiv, 2024
work page 2024
-
[4]
DPA-2: a large atomic model as a multi-task learner,
D. Zhanget al., “DPA-2: a large atomic model as a multi-task learner,” arXiv, 2024
work page 2024
-
[5]
Leveraging multitask learning to improve the transferability of machine learned force fields,
L. Jacobsonet al., “Leveraging multitask learning to improve the transferability of machine learned force fields,”ChemRxiv, 2023
work page 2023
-
[6]
A. E. A. Allenet al., “Learning together: Towards foundation models for machine learning interatomic potentials with meta-learning,”npj Comput. Mater., vol. 10, p. 154, 2024
work page 2024
-
[7]
M. Messerlyet al., “Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need,”Mach. learn.: sci. technol., vol. 6, no. 3, p. 035066, 2025
work page 2025
-
[8]
One to rule them all: A universal interatomic potential learning across quantum chemical levels,
Y . Chenet al., “One to rule them all: A universal interatomic potential learning across quantum chemical levels,”ChemRxiv, 2025
work page 2025
-
[9]
M. Lupo Pasiniet al., “Scalable training of trustworthy and energy- efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN,”J. Supercomput., vol. 81, p. Article 618, 2025
work page 2025
-
[10]
J. Hojaet al., “QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules,”Sci. Data, vol. 8, p. 43, 2021
work page 2021
-
[11]
S. Ganschaet al., “The QCML dataset, quantum chemistry reference data from 33.5M DFT and 14.7B semi-empirical calculations,”Sci. Data, vol. 12, p. 406, 2025
work page 2025
-
[12]
Transition1x - a dataset for building generalizable reactive machine learning potentials,
M. Schreineret al., “Transition1x - a dataset for building generalizable reactive machine learning potentials,”Sci. Data, vol. 9, p. 779, 2022
work page 2022
-
[13]
J. S. Smithet al., “The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules,”Sci. Data, vol. 7, p. 134, 2020
work page 2020
-
[14]
K. Khrabrovet al., “Nabla2DFT: A universal quantum chemistry dataset of drug-like molecules and a benchmark for neural network potentials,” inNeurIPS 2024 Datasets and Benchmarks Track, 2024
work page 2024
-
[15]
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation,
A. Jainet al., “Commentary: The Materials Project: A materials genome approach to accelerating materials innovation,”APL Mater., vol. 1, no. 1, p. 011002, 2013
work page 2013
-
[16]
A dataset of 175k stable and metastable materials calculated with the PBEsol and SCAN functionals,
J. Schmidtet al., “A dataset of 175k stable and metastable materials calculated with the PBEsol and SCAN functionals,”Sci. Data, vol. 9, p. 64, 2022
work page 2022
-
[17]
Open catalyst 2020 (OC20) dataset and community challenges,
L. Chanussotet al., “Open catalyst 2020 (OC20) dataset and community challenges,”ACS Catal., vol. 11, no. 10, pp. 6059–6072, 2021
work page 2020
-
[18]
Open catalyst 2022 (OC22) dataset and challenges for oxidation electrocatalysts,
K. Tranet al., “Open catalyst 2022 (OC22) dataset and challenges for oxidation electrocatalysts,”ACS Catal., vol. 13, no. 5, pp. 3066–3084, 2023
work page 2022
-
[19]
The open catalyst 2025 (OC25) dataset and models for solid-liquid interfaces,
S. J. Sahooet al., “The open catalyst 2025 (OC25) dataset and models for solid-liquid interfaces,”arXiv, 2025
work page 2025
-
[20]
The open DAC 2023 dataset and challenges for sorbent discovery in direct air capture,
A. Sriramet al., “The open DAC 2023 dataset and challenges for sorbent discovery in direct air capture,”ACS Cent. Sci., vol. 10, no. 5, pp. 923– 941, 2024
work page 2023
-
[21]
Open materials 2024 (OMat24) inorganic materials dataset and models,
L. Barroso-Luqueet al., “Open materials 2024 (OMat24) inorganic materials dataset and models,”arXiv, 2024
work page 2024
-
[22]
The open molecules 2025 (OMol25) dataset, evaluations, and models,
D. S. Levineet al., “The open molecules 2025 (OMol25) dataset, evaluations, and models,”arXiv, 2025
work page 2025
-
[23]
The open polymers 2026 (OPoly26) dataset and evaluations,
——, “The open polymers 2026 (OPoly26) dataset and evaluations,” arXiv, 2025
work page 2026
-
[24]
M. Lupo Pasiniet al., “Hydragnn v4.0, version v4.0,” 2025
work page 2025
-
[25]
ADIOS 2: The adaptable input output system. a framework for high-performance data management,
W. Godoyet al., “ADIOS 2: The adaptable input output system. a framework for high-performance data management,”SoftwareX, vol. 12, no. 1, 2020
work page 2020
-
[26]
J. Y . Choiet al., “DDStore: Distributed data store for scalable training of graph neural networks on large atomistic modeling datasets,” in Proceedings of SC-W ’23, 2023
work page 2023
-
[27]
W. Jiaet al., “Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning,” inProceedings of SC ’20, 2020
work page 2020
-
[28]
H. Wanget al., “DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics,”Comput. Phys. Commun., vol. 228, pp. 178–184, 2018
work page 2018
-
[29]
Gemnet: Universal directional graph neural networks for molecules,
J. Gasteigeret al., “Gemnet: Universal directional graph neural networks for molecules,”arXiv, 2021
work page 2021
-
[30]
Gemnet-oc: Developing graph neural networks for large and diverse molecular simulation datasets,
——, “Gemnet-oc: Developing graph neural networks for large and diverse molecular simulation datasets,”arXiv, 2022
work page 2022
-
[31]
MACE: Higher order equivariant message passing neural networks for fast and accurate force fields,
I. Batatiaet al., “MACE: Higher order equivariant message passing neural networks for fast and accurate force fields,”arXiv, 2022
work page 2022
-
[32]
Learning local equivariant representations for large- scale atomistic dynamics,
A. Musaelianet al., “Learning local equivariant representations for large- scale atomistic dynamics,”arXiv, 2022
work page 2022
-
[33]
Equiformer: Equivariant graph attention transformer for 3d atomistic graphs,
Y .-L. Liaoet al., “Equiformer: Equivariant graph attention transformer for 3d atomistic graphs,”arXiv, 2022
work page 2022
-
[34]
Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations,
——, “Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations,”arXiv, 2023
work page 2023
-
[35]
A universal graph deep learning interatomic potential for the periodic table,
C. Chenet al., “A universal graph deep learning interatomic potential for the periodic table,”arXiv, 2022
work page 2022
-
[36]
CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling,
B. Denget al., “CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling,”arXiv, 2023
work page 2023
-
[37]
A foundation model for atomistic simulations of the elements,
I. Batatiaet al., “A foundation model for atomistic simulations of the elements,”arXiv, 2024
work page 2024
-
[38]
——, “Cross learning between electronic structure theories for unifying molecular, surface, and inorganic crystal foundation force fields,”arXiv, 2025
work page 2025
-
[39]
UMA: A family of universal models for atoms,
B. M. Woodet al., “UMA: A family of universal models for atoms,” arXiv, 2025
work page 2025
-
[40]
S. Takamotoet al., “Towards universal neural network potential for material discovery applicable to arbitrary combination of 45 elements,” Nat. Commun., vol. 13, p. 2991, 2022
work page 2022
-
[41]
Scaling deep learning for materials discovery,
A. Merchantet al., “Scaling deep learning for materials discovery,” Nature, vol. 624, pp. 80–85, 2023
work page 2023
-
[42]
Matbench discovery – an evaluation framework for machine learning crystal stability prediction,
J. Riebesellet al., “Matbench discovery – an evaluation framework for machine learning crystal stability prediction,”arXiv, 2024
work page 2024
-
[43]
Multi-modal foundation model for material design,
S. Takedaet al., “Multi-modal foundation model for material design,” inProceedings of AI4Mat workshop at NeurIPS 2023, 2023
work page 2023
-
[44]
Towards foundational models for molecular learning on large-scale multi-task datasets,
D. Beainiet al., “Towards foundational models for molecular learning on large-scale multi-task datasets,” inProceedings of the Twelfth Inter- national Conference on Learning Representations (ICLR), 2024
work page 2024
-
[45]
From molecules to materials: Pre-training large generalizable models for atomic property prediction,
N. Shoghiet al., “From molecules to materials: Pre-training large generalizable models for atomic property prediction,” inThe Twelfth International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[46]
PyTorch FSDP: Experiences on scaling fully sharded data parallel,
Y . Zhaoet al., “PyTorch FSDP: Experiences on scaling fully sharded data parallel,”Proceedings of the VLDB Endowment, vol. 16, no. 12, pp. 3848–3860, 2023
work page 2023
-
[47]
E(n) equivariant graph neural networks,
V . G. Satorraset al., “E(n) equivariant graph neural networks,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 9323–9332. 11
work page 2021
-
[48]
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions,
K. T. Sch ¨uttet al., “SchNet: A continuous-filter convolutional neural network for modeling quantum interactions,” inAdvances in Neural Information Processing Systems 30, 2017
work page 2017
-
[49]
Directional message passing for molecular graphs,
J. Gasteigeret al., “Directional message passing for molecular graphs,” inInternational Conference on Learning Representations (ICLR), 2020
work page 2020
-
[50]
Equivariant message passing for the prediction of tensorial properties and molecular spectra,
K. T. Sch ¨uttet al., “Equivariant message passing for the prediction of tensorial properties and molecular spectra,” inProceedings of the 38th International Conference on Machine Learning, 2021, pp. 9377–9388
work page 2021
-
[51]
Principal neighbourhood aggregation for graph nets,
G. Corsoet al., “Principal neighbourhood aggregation for graph nets,” inAdvances in Neural Information Processing Systems 33, 2020
work page 2020
-
[52]
Omnistat: Multi-vendor HPC system monitoring,
K. W. Schulzet al., “Omnistat: Multi-vendor HPC system monitoring,” https://github.com/AMDResearch/omnistat, 2024
work page 2024
-
[53]
P. Micikeviciuset al., “Mixed precision training,” inInternational Conference on Learning Representations (ICLR), 2018
work page 2018
-
[54]
Quantum chemistry structures and properties of 134 kilo molecules,
R. Ramakrishnanet al., “Quantum chemistry structures and properties of 134 kilo molecules,”Sci. Data, vol. 1, p. 140022, 2014
work page 2014
-
[55]
Machine learning of accurate energy-conserving molecular force fields,
S. Chmielaet al., “Machine learning of accurate energy-conserving molecular force fields,”Sci. Adv., vol. 3, no. 5, p. e1603015, 2017
work page 2017
-
[56]
R. R. Brewet al., “Wiggle150: Benchmarking density functionals and neural network potentials on highly strained conformers,”J. Chem. Theory Comput., vol. 21, no. 8, pp. 3922–3929, 2025
work page 2025
-
[57]
MS25: A molecular simulation benchmark for machine learning interatomic potentials,
S. Wieseret al., “MS25: A molecular simulation benchmark for machine learning interatomic potentials,” 2025, https://github.com/mstapelberg /ms25
work page 2025
-
[58]
J. E. Saalet al., “Materials design and discovery with high-throughput density functional theory: The Open Quantum Materials Database (OQMD),”JOM, vol. 65, no. 11, pp. 1501–1509, 2013
work page 2013
-
[59]
E. T. Chenebuahet al., “An inorganic ABX3 perovskite materials dataset for target property prediction and classification using machine learning,” arXiv, 2023
work page 2023
-
[60]
A. Dunnet al., “Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm,”npj Comput. Mater., vol. 6, no. 1, p. 138, 2020. 12
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.