Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

Claudia Draxl; Daniel T. Speckhard; Jonathan Godwin; Sebastian Kehl; Tim Bechtel

arxiv: 2502.00944 · v4 · submitted 2025-02-02 · 💻 cs.LG

Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

Daniel T. Speckhard , Tim Bechtel , Sebastian Kehl , Jonathan Godwin , Claudia Draxl This is my paper

Pith reviewed 2026-05-23 04:12 UTC · model grok-4.3

classification 💻 cs.LG

keywords graph neural networksbatching algorithmsstatic batchingdynamic batchingtraining speedupQM9 datasetAFLOW databasegeometric deep learning

0 comments

The pith

Changing the batching algorithm for graph neural networks can speed up training by up to 2.7 times, though the best choice depends on the data, model, batch size, hardware, and training length.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests static batching, where groups of graphs are fixed before training starts, against dynamic batching, where groups are assembled during each training pass. Experiments on the QM9 molecular dataset and the AFLOW materials database show that one method can finish training more than twice as fast as the other, but which one wins changes with the model architecture, batch size, and even the number of steps run. In a few specific combinations of these factors, the two batching styles also produce noticeably different final model accuracy or error metrics. A reader would care because graph neural networks are trained on large chemistry and materials datasets where each hour of compute time matters.

Core claim

The authors establish that static and dynamic batching algorithms for graph neural networks yield different wall-clock training times on the QM9 and AFLOW datasets, with observed speed ratios reaching 2.7 in favor of one algorithm or the other depending on batch size, model, hardware, and total steps; they further report that, for selected combinations of these variables, the two algorithms produce statistically significant differences in model performance metrics.

What carries the argument

The direct comparison of static batching (precomputed fixed batches) versus dynamic batching (batches assembled on the fly) when applied to graph neural network training loops.

If this is right

For any given GNN training job the faster batching method must be identified by direct timing rather than assumed in advance.
Speed gains from the better batching choice are largest at particular batch sizes and early in training.
In some dataset-model-batch-size triples the batching method also changes the final learned model quality.
Hardware platform influences which batching algorithm finishes first.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

An adaptive scheduler that switches between static and dynamic batching mid-training could capture the best of both for long runs.
The same batching comparison could be repeated on non-graph geometric models such as point-cloud networks to test generality.
Memory-access patterns in dynamic batching may explain part of the speed variation and could be measured separately.

Load-bearing premise

The measured speed differences and occasional metric differences come from the batching choice itself rather than from unmeasured differences in code implementation, random seeds, or hardware behavior.

What would settle it

Re-running the QM9 and AFLOW experiments with an independent code implementation of both batching methods on the same hardware and obtaining speed ratios near 1.0 with no metric differences would falsify the central claim.

Figures

Figures reproduced from arXiv: 2502.00944 by Claudia Draxl, Daniel T. Speckhard, Jonathan Godwin, Sebastian Kehl, Tim Bechtel.

**Figure 2.** Figure 2: The running average of the combined time (sum of the batching step and gradient-update step and) required per training step as a function of the total number of training steps run. Here only a single iteration is run for the batch size 32, MPEU model and QM9 dataset for both the dynamic, static-64 and static-2 N algorithms. Recompilation. For fewer training steps, the number of recompilations required in t… view at source ↗

**Figure 3.** Figure 3: Left: number of recompilations on the QM9 dataset after two million training steps in the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Timing measurements while varying the batch size for SchNet (left two columns), MPEU [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Speedup, on GPU, when switching from the slowest algorithm in terms of combined training time per step for the PaiNN model (not including the static-constant model) for the AFLOW (top) and QM9 (bottom) datasets. For both datasets the slowest algorithm is the static-2 N algorithm. If the static-constant algorithm is included the speedup increases to a maximum of to 12.5 One can either use the t-test stati… view at source ↗

**Figure 6.** Figure 6: The mean results for the same experiments are shown in Fig. 7. The difference in the two [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Mean batching (top) and mean gradient-update times (middle), and mean combined time [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Mean batching time (upper row), mean gradient-update time (middle row), and the mean [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Median batching time (upper row), median gradient-update time (middle row), and the [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Mean batching time (upper row), mean gradient-update time (middle row), and the mean [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Median batching time (upper row), median gradient-update time (middle row), and the [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: Mean test RMSE curves for the MPEU model (left) and SchNet (right) on QM9 test for [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

**Figure 13.** Figure 13: Heatmap of pairwise Student’s t-test values on the distribution of test RMSE values for [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: Heatmap of p-values from the pairwise Student’s t-test on the distribution of test RMSE [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗

read the original abstract

Graph neural networks (GNN) have shown promising results for several domains such as materials science, chemistry, and the social sciences. GNN models often contain millions of parameters, and like other neural network (NN) models, are often fed only a fraction of the graphs that make up the training dataset in batches to update model parameters. The effect of batching algorithms on training time and model performance has been thoroughly explored for NNs but not yet for GNNs. We analyze two different batching algorithms for graph-based models, namely static and dynamic batching for two datasets, the QM9 dataset of small molecules and the AFLOW materials database. Our experiments show that changing the batching algorithm can provide up to a 2.7x speedup, but the fastest algorithm depends on the data, model, batch size, hardware, and number of training steps run. Experiments show that for a select number of combinations of batch size, dataset, and model, significant differences in model learning metrics are observed between static and dynamic batching algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Batching choice speeds GNN training up to 2.7x on QM9 and AFLOW but the best option varies with data, model, batch size, hardware, and steps.

read the letter

The paper's main finding is that for graph neural networks, the choice between static and dynamic batching can lead to training speedups as large as 2.7 times on the QM9 molecule dataset and the AFLOW materials database. However, which algorithm is faster changes with the data, the model, batch size, hardware, and even the number of training steps. In some cases, the batching method also affects the model's learning metrics. This work takes ideas that are well known for ordinary neural networks and applies them to GNNs in a materials and chemistry context. The experiments compare the two batching approaches directly on these two datasets and document the speed differences along with the occasional metric shifts. That comparison itself is new for this type of model and data. The paper does a decent job of showing that batching is not neutral for GNN training time. The results are presented with the right caveats about dependence on many factors, which keeps the claims grounded. On the downside, the speedups are highly conditional, so the practical takeaway is mostly to try both and see. The work is limited to two specific datasets and does not explore why the differences occur or how they generalize. The abstract suggests the experiments isolate the batching effect, and the stress-test note finds no internal inconsistency in the qualified claim. This paper is aimed at people who train GNNs for chemistry or materials applications and want to reduce wall time. It is not a big theoretical step, but the empirical comparison is solid enough to be worth refereeing. A reader working on similar problems would find the numbers useful for deciding whether to experiment with batching strategies. I would bring this to a reading group focused on practical ML for science. I would not cite it unless referencing the specific speedup numbers for these datasets. It deserves peer review because the experiments address a real implementation question with direct measurements.

Referee Report

1 major / 0 minor

Summary. The manuscript examines the effects of static versus dynamic batching algorithms on training time and model performance for graph neural networks on the QM9 molecular dataset and AFLOW materials database. Experiments indicate that switching batching algorithms can yield speedups up to 2.7x, but the fastest choice depends on the dataset, model, batch size, hardware, and number of training steps; for select combinations of these factors, significant differences in learning metrics are also observed between the algorithms.

Significance. If the experimental comparisons hold after addressing controls, the work extends batching analysis from standard neural networks to GNNs and supplies practical, factor-dependent guidance for training efficiency in chemistry and materials applications. The qualified claims avoid overgeneralization and the direct experimental focus on geometric models addresses a documented gap.

major comments (1)

Experiments section: the central claim that observed speedups and metric differences arise from the static/dynamic batching choice requires explicit controls or ablations to exclude confounds such as implementation-specific data loading, memory allocation differences, or hardware variability; without these, attribution to batching remains unverified and load-bearing for the reported 2.7x figure and metric observations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The major comment highlights a valid concern about experimental controls, which we address below by outlining planned revisions to strengthen attribution of results to the batching algorithms.

read point-by-point responses

Referee: Experiments section: the central claim that observed speedups and metric differences arise from the static/dynamic batching choice requires explicit controls or ablations to exclude confounds such as implementation-specific data loading, memory allocation differences, or hardware variability; without these, attribution to batching remains unverified and load-bearing for the reported 2.7x figure and metric observations.

Authors: We agree that explicit controls are needed to isolate the batching effect. Both algorithms were implemented within the same PyTorch Geometric-based codebase, sharing identical data loaders, preprocessing, and memory allocation paths, with the sole difference being the batch construction method (precomputed static batches versus on-the-fly dynamic padding). Hardware was held constant by executing all runs on the same GPU cluster nodes. To further verify attribution, the revised manuscript will include: (1) timing breakdowns separating data loading from model forward/backward passes, (2) an ablation fixing batch sizes and padding patterns while varying only the algorithm, and (3) repeated runs with fixed seeds across multiple hardware instances to quantify variability. These additions will be reported in an expanded Experiments section with updated figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical comparison

full rationale

The paper reports experimental timings and learning metrics for static versus dynamic batching on QM9 and AFLOW with GNN models. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing steps exist. All claims rest on direct runtime and accuracy measurements under varying batch sizes, hardware, and step counts, with explicit qualification that fastest algorithm is data- and configuration-dependent. This matches the reader's 0.0 assessment and contains none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical benchmarking study; it introduces no free parameters, mathematical axioms, or invented entities. All claims rest on experimental observations of training runs.

pith-pipeline@v0.9.0 · 5726 in / 1317 out tokens · 75462 ms · 2026-05-23T04:12:41.055343+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We analyze two different batching algorithms for graph-based models, namely static and dynamic batching... changing the batching algorithm can provide up to a 2.7x speedup
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The effect of batching algorithms on training time and model performance has been thoroughly explored for NNs but not yet for GNNs.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 7 internal anchors

[1]

ptgnn: A pytorch gnn library, 2022

ALLAMANIS, M., MIR, A.,ANDPATI, S. ptgnn: A pytorch gnn library, 2022

work page 2022
[2]

T., GODWIN, J.,ANDDRAXL, C

BECHTEL, T., SPECKHARD, D. T., GODWIN, J.,ANDDRAXL, C. Band-gap regression with architecture-optimized message-passing neural networks.arXiv preprint arXiv:2309.06348 (2023)

work page arXiv 2023
[3]

M.,ANDNASRABADI, N

BISHOP, C. M.,ANDNASRABADI, N. M.Pattern recognition and machine learning, vol. 4. Springer, 2006

work page 2006
[4]

The tradeoffs of large scale learning.Advances in neural information processing systems 20(2007)

BOTTOU, L.,ANDBOUSQUET, O. The tradeoffs of large scale learning.Advances in neural information processing systems 20(2007)

work page 2007
[5]

Stochastic gradient learning in neural networks.Proceedings of Neuro- Nımes 91, 8 (1991), 12

BOTTOU, L.,ET AL. Stochastic gradient learning in neural networks.Proceedings of Neuro- Nımes 91, 8 (1991), 12

work page 1991
[6]

J., LEARY, C., MACLAURIN, D., NECULA, G., PASZKE, A., VANDERPLAS, J., WANDERMAN-MILNE, S.,ANDZHANG, Q

BRADBURY, J., FROSTIG, R., HAWKINS, P., JOHNSON, M. J., LEARY, C., MACLAURIN, D., NECULA, G., PASZKE, A., VANDERPLAS, J., WANDERMAN-MILNE, S.,ANDZHANG, Q. JAX: composable transformations of Python+NumPy programs, 2018

work page 2018
[7]

H., CHIN, G

BYRD, R. H., CHIN, G. M., NOCEDAL, J.,ANDWU, Y. Sample size selection in optimization methods for machine learning.Mathematical programming 134, 1 (2012), 127–155

work page 2012
[8]

CHEN, C.,ANDONG, S. P. A universal graph deep learning interatomic potential for the periodic table.Nature Computational Science 2, 11 (2022), 718–728

work page 2022
[9]

W., KUSNE, A

CHOUDHARY, K., YILDIRIM, T., SIDERIUS, D. W., KUSNE, A. G., MCDANNALD, A.,AND ORTIZ-MONTALVO, D. L. Graph neural network predictions of metal organic framework co2 adsorption properties.Computational Materials Science 210(2022), 111388

work page 2022
[10]

L., JAHNATEK, M., CHEPULSKII, R

CURTAROLO, S., SETYAWAN, W., HART, G. L., JAHNATEK, M., CHEPULSKII, R. V., TAYLOR, R. H., WANG, S., XUE, J., YANG, K., LEVY, O.,ET AL. Aflow: An automatic framework for high-throughput materials discovery.Computational Materials Science 58 (2012), 218–226

work page 2012
[11]

DECARLO, L. T. On the meaning and use of kurtosis.Psychological methods 2, 3 (1997), 292

work page 1997
[12]

FERLUDIN, O., EIGENWILLIG, A., BLAIS, M., ZELLE, D., PFEIFER, J., SANCHEZ- GONZALEZ, A., LI, W. L. S., ABU-EL-HAIJA, S., BATTAGLIA, P., BULUT, N., HALCROW, J.,DEALMEIDA, F. M. G., GONNET, P., JIANG, L., KOTHARI, P., LATTANZI, S., LINHARES, A., MAYER, B., MIRROKNI, V., PALOWITCH, J., PARADKAR, M., SHE, J., TSITSULIN, A., VILLELA, K., WANG, L., WONG, D.,AND...

work page arXiv 2023
[13]

FEY, M.,ANDLENSSEN, J. E. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903
[14]

Graph neural architecture search

GAO, Y., YANG, H., ZHANG, P., ZHOU, C.,ANDHU, Y. Graph neural architecture search. In International joint conference on artificial intelligence(2021), International Joint Conference on Artificial Intelligence

work page 2021
[15]

Jraph: A library for graph neural networks in jax., 2020

GODWIN*, J., KECK*, T., BATTAGLIA, P., BAPST, V., KIPF, T., LI, Y., STACHENFELD, K., VELI ˇCKOVI ´C, P.,ANDSANCHEZ-GONZALEZ, A. Jraph: A library for graph neural networks in jax., 2020

work page 2020
[16]

Statistics and geodata analysis using r, 2023

HARTMANN, K., KROIS, J.,ANDRUDOLPH, A. Statistics and geodata analysis using r, 2023

work page 2023
[17]

L.,ANDPATTERSON, D

HENNESSY, J. L.,ANDPATTERSON, D. A.Computer architecture: a quantitative approach. Elsevier, 2011

work page 2011
[18]

Neural Message Passing with Edge Updates for Predicting Properties of Molecules and Materials

JØRGENSEN, P. B., JACOBSEN, K. W.,ANDSCHMIDT, M. N. Neural message pass- ing with edge updates for predicting properties of molecules and materials.arXiv preprint arXiv:1806.03146(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Semi-Supervised Classification with Graph Convolutional Networks

KIPF, T. N.,ANDWELLING, M. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

Batch size influence on performance of graphic and tensor processing units during training and inference phases

KOCHURA, Y., GORDIENKO, Y., TARAN, V., GORDIENKO, N., ROKOVYI, A., ALIENIN, O.,ANDSTIRENKO, S. Batch size influence on performance of graphic and tensor processing units during training and inference phases. InInternational Conference on Computer Science, Engineering and Education Applications(2019), Springer, pp. 658–668

work page 2019
[21]

MIT press, 2009

KOLLER, D.,ANDFRIEDMAN, N.Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009
[22]

The carbon footprint of predicting co2 storage capacity in metal-organic frameworks within neural networks.Iscience 27, 5 (2024)

KOROLEV, V.,ANDMITROFANOV, A. The carbon footprint of predicting co2 storage capacity in metal-organic frameworks within neural networks.Iscience 27, 5 (2024)

work page 2024
[23]

Accurate modeling of the potential energy surface of atmospheric molecular clusters boosted by neural networks.Environmental Science: Advances 3, 10 (2024), 1438–1451

KUBE ˇCKA, J., AYOUBI, D., TANG, Z., KNATTRUP, Y., ENGSVANG, M., WU, H.,ANDELM, J. Accurate modeling of the potential energy surface of atmospheric molecular clusters boosted by neural networks.Environmental Science: Advances 3, 10 (2024), 1438–1451

work page 2024
[24]

A., LI, Y.-P.,ANDGREEN, W

LI, S.-C., WU, H., MENON, A., SPIEKERMANN, K. A., LI, Y.-P.,ANDGREEN, W. H. When do quantum mechanical descriptors help graph neural networks to predict chemical properties? Journal of the American Chemical Society 146, 33 (2024), 23103–23120

work page 2024
[25]

LISTER, R.,ANDSTONE, J. V. An empirical study of the time complexity of various error functions with conjugate gradient backpropagation. InProceedings of ICNN’95-International Conference on Neural Networks(1995), vol. 1, IEEE, pp. 237–241

work page 1995
[26]

Springer Nature, 2022

LIU, Z.,ANDZHOU, J.Introduction to graph neural networks. Springer Nature, 2022

work page 2022
[27]

Orb: A fast, scalable neural network potential.arXiv preprint arXiv:2410.22570(2024)

NEUMANN, M., GIN, J., RHODES, B., BENNETT, S., LI, Z., CHOUBISA, H., HUSSEY, A.,ANDGODWIN, J. Orb: A fast, scalable neural network potential.arXiv preprint arXiv:2410.22570(2024)

work page arXiv 2024
[28]

Carbon Emissions and Large Neural Network Training

PATTERSON, D., GONZALEZ, J., LE, Q., LIANG, C., MUNGUIA, L.-M., ROTHCHILD, D., SO, D., TEXIER, M.,ANDDEAN, J. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[29]

O., RUPP, M.,ANDVONLILIENFELD, O

RAMAKRISHNAN, R., DRAL, P. O., RUPP, M.,ANDVONLILIENFELD, O. A. Quantum chemistry structures and properties of 134 kilo molecules.Scientific data 1, 1 (2014), 1–7

work page 2014
[30]

M.Introductory statistics

ROSS, S. M.Introductory statistics. Academic Press, 2017

work page 2017
[31]

Learning to simulate complex physics with graph networks

SANCHEZ-GONZALEZ, A., GODWIN, J., PFAFF, T., YING, R., LESKOVEC, J.,AND BATTAGLIA, P. Learning to simulate complex physics with graph networks. InInternational conference on machine learning(2020), PMLR, pp. 8459–8468

work page 2020
[32]

M., SPENCER, J

SCHAARSCHMIDT, M., RIVIERE, M., GANOSE, A. M., SPENCER, J. S., GAUNT, A. L., KIRKPATRICK, J., AXELROD, S., BATTAGLIA, P. W.,ANDGODWIN, J. Learned force fields are ready for ground state catalyst discovery.arXiv preprint arXiv:2209.12466(2022)

work page arXiv 2022
[33]

Equivariant message passing for the prediction of tensorial properties and molecular spectra

SCHÜTT, K., UNKE, O.,ANDGASTEGGER, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. InInternational Conference on Machine Learning (2021), PMLR, pp. 9377–9388. 11

work page 2021
[34]

T., SAUCEDA, H

SCHÜTT, K. T., SAUCEDA, H. E., KINDERMANS, P.-J., TKATCHENKO, A.,ANDMÜLLER, K.-R. Schnet–a deep learning architecture for molecules and materials.The Journal of Chemical Physics 148, 24 (2018)

work page 2018
[35]

M., KUBAN, M., RIGAMONTI, S.,AND DRAXL, C

SPECKHARD, D., BECHTEL, T., GHIRINGHELLI, L. M., KUBAN, M., RIGAMONTI, S.,AND DRAXL, C. How big is big data?Faraday Discussions(2025)

work page 2025
[36]

SPECKHARD, D. T. Graph topology estimation of power grids using pairwise mutual informa- tion of time series data.arXiv preprint arXiv:2505.11517(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

T., MISIUNAS, K., PEREL, S., ZHU, T., CARLILE, S.,ANDSLANEY, M

SPECKHARD, D. T., MISIUNAS, K., PEREL, S., ZHU, T., CARLILE, S.,ANDSLANEY, M. Neural architecture search for energy-efficient always-on audio machine learning.Neural Computing and Applications 35, 16 (2023), 12133–12144

work page 2023
[38]

The impact of padding on image classification by using pre-trained convolutional neural networks

TANG, H., ORTIS, A.,ANDBATTIATO, S. The impact of padding on image classification by using pre-trained convolutional neural networks. InImage Analysis and Processing–ICIAP 2019: 20th International Conference, Trento, Italy, September 9–13, 2019, Proceedings, Part II 20(2019), Springer, pp. 337–344

work page 2019
[39]

How Powerful are Graph Neural Networks?

XU, K., HU, W., LESKOVEC, J.,ANDJEGELKA, S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[40]

A novel softplus linear unit for deep convolutional neural networks.Applied Intelligence 48(2018), 1707–1720

ZHAO, H., LIU, F., LI, L.,ANDLUO, C. A novel softplus linear unit for deep convolutional neural networks.Applied Intelligence 48(2018), 1707–1720

work page 2018
[41]

Neural Architecture Search with Reinforcement Learning

ZOPH, B. Neural architecture search with reinforcement learning.arXiv preprint arXiv:1611.01578(2016). A Model descriptions In this section we describe the three models used to evaluate the batching algorithms in greater detail. The models take in a graph, G, composed of nodes (or vertices) and edges G(n, e) [26, 21, 36]. The nodes are represented by feat...

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

ptgnn: A pytorch gnn library, 2022

ALLAMANIS, M., MIR, A.,ANDPATI, S. ptgnn: A pytorch gnn library, 2022

work page 2022

[2] [2]

T., GODWIN, J.,ANDDRAXL, C

BECHTEL, T., SPECKHARD, D. T., GODWIN, J.,ANDDRAXL, C. Band-gap regression with architecture-optimized message-passing neural networks.arXiv preprint arXiv:2309.06348 (2023)

work page arXiv 2023

[3] [3]

M.,ANDNASRABADI, N

BISHOP, C. M.,ANDNASRABADI, N. M.Pattern recognition and machine learning, vol. 4. Springer, 2006

work page 2006

[4] [4]

The tradeoffs of large scale learning.Advances in neural information processing systems 20(2007)

BOTTOU, L.,ANDBOUSQUET, O. The tradeoffs of large scale learning.Advances in neural information processing systems 20(2007)

work page 2007

[5] [5]

Stochastic gradient learning in neural networks.Proceedings of Neuro- Nımes 91, 8 (1991), 12

BOTTOU, L.,ET AL. Stochastic gradient learning in neural networks.Proceedings of Neuro- Nımes 91, 8 (1991), 12

work page 1991

[6] [6]

J., LEARY, C., MACLAURIN, D., NECULA, G., PASZKE, A., VANDERPLAS, J., WANDERMAN-MILNE, S.,ANDZHANG, Q

BRADBURY, J., FROSTIG, R., HAWKINS, P., JOHNSON, M. J., LEARY, C., MACLAURIN, D., NECULA, G., PASZKE, A., VANDERPLAS, J., WANDERMAN-MILNE, S.,ANDZHANG, Q. JAX: composable transformations of Python+NumPy programs, 2018

work page 2018

[7] [7]

H., CHIN, G

BYRD, R. H., CHIN, G. M., NOCEDAL, J.,ANDWU, Y. Sample size selection in optimization methods for machine learning.Mathematical programming 134, 1 (2012), 127–155

work page 2012

[8] [8]

CHEN, C.,ANDONG, S. P. A universal graph deep learning interatomic potential for the periodic table.Nature Computational Science 2, 11 (2022), 718–728

work page 2022

[9] [9]

W., KUSNE, A

CHOUDHARY, K., YILDIRIM, T., SIDERIUS, D. W., KUSNE, A. G., MCDANNALD, A.,AND ORTIZ-MONTALVO, D. L. Graph neural network predictions of metal organic framework co2 adsorption properties.Computational Materials Science 210(2022), 111388

work page 2022

[10] [10]

L., JAHNATEK, M., CHEPULSKII, R

CURTAROLO, S., SETYAWAN, W., HART, G. L., JAHNATEK, M., CHEPULSKII, R. V., TAYLOR, R. H., WANG, S., XUE, J., YANG, K., LEVY, O.,ET AL. Aflow: An automatic framework for high-throughput materials discovery.Computational Materials Science 58 (2012), 218–226

work page 2012

[11] [11]

DECARLO, L. T. On the meaning and use of kurtosis.Psychological methods 2, 3 (1997), 292

work page 1997

[12] [12]

FERLUDIN, O., EIGENWILLIG, A., BLAIS, M., ZELLE, D., PFEIFER, J., SANCHEZ- GONZALEZ, A., LI, W. L. S., ABU-EL-HAIJA, S., BATTAGLIA, P., BULUT, N., HALCROW, J.,DEALMEIDA, F. M. G., GONNET, P., JIANG, L., KOTHARI, P., LATTANZI, S., LINHARES, A., MAYER, B., MIRROKNI, V., PALOWITCH, J., PARADKAR, M., SHE, J., TSITSULIN, A., VILLELA, K., WANG, L., WONG, D.,AND...

work page arXiv 2023

[13] [13]

FEY, M.,ANDLENSSEN, J. E. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903

[14] [14]

Graph neural architecture search

GAO, Y., YANG, H., ZHANG, P., ZHOU, C.,ANDHU, Y. Graph neural architecture search. In International joint conference on artificial intelligence(2021), International Joint Conference on Artificial Intelligence

work page 2021

[15] [15]

Jraph: A library for graph neural networks in jax., 2020

GODWIN*, J., KECK*, T., BATTAGLIA, P., BAPST, V., KIPF, T., LI, Y., STACHENFELD, K., VELI ˇCKOVI ´C, P.,ANDSANCHEZ-GONZALEZ, A. Jraph: A library for graph neural networks in jax., 2020

work page 2020

[16] [16]

Statistics and geodata analysis using r, 2023

HARTMANN, K., KROIS, J.,ANDRUDOLPH, A. Statistics and geodata analysis using r, 2023

work page 2023

[17] [17]

L.,ANDPATTERSON, D

HENNESSY, J. L.,ANDPATTERSON, D. A.Computer architecture: a quantitative approach. Elsevier, 2011

work page 2011

[18] [18]

Neural Message Passing with Edge Updates for Predicting Properties of Molecules and Materials

JØRGENSEN, P. B., JACOBSEN, K. W.,ANDSCHMIDT, M. N. Neural message pass- ing with edge updates for predicting properties of molecules and materials.arXiv preprint arXiv:1806.03146(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Semi-Supervised Classification with Graph Convolutional Networks

KIPF, T. N.,ANDWELLING, M. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[20] [20]

Batch size influence on performance of graphic and tensor processing units during training and inference phases

KOCHURA, Y., GORDIENKO, Y., TARAN, V., GORDIENKO, N., ROKOVYI, A., ALIENIN, O.,ANDSTIRENKO, S. Batch size influence on performance of graphic and tensor processing units during training and inference phases. InInternational Conference on Computer Science, Engineering and Education Applications(2019), Springer, pp. 658–668

work page 2019

[21] [21]

MIT press, 2009

KOLLER, D.,ANDFRIEDMAN, N.Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009

[22] [22]

The carbon footprint of predicting co2 storage capacity in metal-organic frameworks within neural networks.Iscience 27, 5 (2024)

KOROLEV, V.,ANDMITROFANOV, A. The carbon footprint of predicting co2 storage capacity in metal-organic frameworks within neural networks.Iscience 27, 5 (2024)

work page 2024

[23] [23]

Accurate modeling of the potential energy surface of atmospheric molecular clusters boosted by neural networks.Environmental Science: Advances 3, 10 (2024), 1438–1451

KUBE ˇCKA, J., AYOUBI, D., TANG, Z., KNATTRUP, Y., ENGSVANG, M., WU, H.,ANDELM, J. Accurate modeling of the potential energy surface of atmospheric molecular clusters boosted by neural networks.Environmental Science: Advances 3, 10 (2024), 1438–1451

work page 2024

[24] [24]

A., LI, Y.-P.,ANDGREEN, W

LI, S.-C., WU, H., MENON, A., SPIEKERMANN, K. A., LI, Y.-P.,ANDGREEN, W. H. When do quantum mechanical descriptors help graph neural networks to predict chemical properties? Journal of the American Chemical Society 146, 33 (2024), 23103–23120

work page 2024

[25] [25]

LISTER, R.,ANDSTONE, J. V. An empirical study of the time complexity of various error functions with conjugate gradient backpropagation. InProceedings of ICNN’95-International Conference on Neural Networks(1995), vol. 1, IEEE, pp. 237–241

work page 1995

[26] [26]

Springer Nature, 2022

LIU, Z.,ANDZHOU, J.Introduction to graph neural networks. Springer Nature, 2022

work page 2022

[27] [27]

Orb: A fast, scalable neural network potential.arXiv preprint arXiv:2410.22570(2024)

NEUMANN, M., GIN, J., RHODES, B., BENNETT, S., LI, Z., CHOUBISA, H., HUSSEY, A.,ANDGODWIN, J. Orb: A fast, scalable neural network potential.arXiv preprint arXiv:2410.22570(2024)

work page arXiv 2024

[28] [28]

Carbon Emissions and Large Neural Network Training

PATTERSON, D., GONZALEZ, J., LE, Q., LIANG, C., MUNGUIA, L.-M., ROTHCHILD, D., SO, D., TEXIER, M.,ANDDEAN, J. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[29] [29]

O., RUPP, M.,ANDVONLILIENFELD, O

RAMAKRISHNAN, R., DRAL, P. O., RUPP, M.,ANDVONLILIENFELD, O. A. Quantum chemistry structures and properties of 134 kilo molecules.Scientific data 1, 1 (2014), 1–7

work page 2014

[30] [30]

M.Introductory statistics

ROSS, S. M.Introductory statistics. Academic Press, 2017

work page 2017

[31] [31]

Learning to simulate complex physics with graph networks

SANCHEZ-GONZALEZ, A., GODWIN, J., PFAFF, T., YING, R., LESKOVEC, J.,AND BATTAGLIA, P. Learning to simulate complex physics with graph networks. InInternational conference on machine learning(2020), PMLR, pp. 8459–8468

work page 2020

[32] [32]

M., SPENCER, J

SCHAARSCHMIDT, M., RIVIERE, M., GANOSE, A. M., SPENCER, J. S., GAUNT, A. L., KIRKPATRICK, J., AXELROD, S., BATTAGLIA, P. W.,ANDGODWIN, J. Learned force fields are ready for ground state catalyst discovery.arXiv preprint arXiv:2209.12466(2022)

work page arXiv 2022

[33] [33]

Equivariant message passing for the prediction of tensorial properties and molecular spectra

SCHÜTT, K., UNKE, O.,ANDGASTEGGER, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. InInternational Conference on Machine Learning (2021), PMLR, pp. 9377–9388. 11

work page 2021

[34] [34]

T., SAUCEDA, H

SCHÜTT, K. T., SAUCEDA, H. E., KINDERMANS, P.-J., TKATCHENKO, A.,ANDMÜLLER, K.-R. Schnet–a deep learning architecture for molecules and materials.The Journal of Chemical Physics 148, 24 (2018)

work page 2018

[35] [35]

M., KUBAN, M., RIGAMONTI, S.,AND DRAXL, C

SPECKHARD, D., BECHTEL, T., GHIRINGHELLI, L. M., KUBAN, M., RIGAMONTI, S.,AND DRAXL, C. How big is big data?Faraday Discussions(2025)

work page 2025

[36] [36]

SPECKHARD, D. T. Graph topology estimation of power grids using pairwise mutual informa- tion of time series data.arXiv preprint arXiv:2505.11517(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

T., MISIUNAS, K., PEREL, S., ZHU, T., CARLILE, S.,ANDSLANEY, M

SPECKHARD, D. T., MISIUNAS, K., PEREL, S., ZHU, T., CARLILE, S.,ANDSLANEY, M. Neural architecture search for energy-efficient always-on audio machine learning.Neural Computing and Applications 35, 16 (2023), 12133–12144

work page 2023

[38] [38]

The impact of padding on image classification by using pre-trained convolutional neural networks

TANG, H., ORTIS, A.,ANDBATTIATO, S. The impact of padding on image classification by using pre-trained convolutional neural networks. InImage Analysis and Processing–ICIAP 2019: 20th International Conference, Trento, Italy, September 9–13, 2019, Proceedings, Part II 20(2019), Springer, pp. 337–344

work page 2019

[39] [39]

How Powerful are Graph Neural Networks?

XU, K., HU, W., LESKOVEC, J.,ANDJEGELKA, S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[40] [40]

A novel softplus linear unit for deep convolutional neural networks.Applied Intelligence 48(2018), 1707–1720

ZHAO, H., LIU, F., LI, L.,ANDLUO, C. A novel softplus linear unit for deep convolutional neural networks.Applied Intelligence 48(2018), 1707–1720

work page 2018

[41] [41]

Neural Architecture Search with Reinforcement Learning

ZOPH, B. Neural architecture search with reinforcement learning.arXiv preprint arXiv:1611.01578(2016). A Model descriptions In this section we describe the three models used to evaluate the batching algorithms in greater detail. The models take in a graph, G, composed of nodes (or vertices) and edges G(n, e) [26, 21, 36]. The nodes are represented by feat...

work page internal anchor Pith review Pith/arXiv arXiv 2016