Machine Learning for Electrode Materials: Property Prediction via Composition

Arpit Mishra; Cameron Hargreaves; Gian-Marco Rignanese; Hao Wu

arxiv: 2603.07805 · v2 · submitted 2026-03-08 · ❄️ cond-mat.mtrl-sci

Machine Learning for Electrode Materials: Property Prediction via Composition

Hao Wu , Cameron Hargreaves , Arpit Mishra , Gian-Marco Rignanese This is my paper

Pith reviewed 2026-05-15 14:12 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci

keywords machine learningbattery electrodesproperty predictionCrabNetcomposition-based modelingmaterials discoveryMODNetclustering

0 comments

The pith

CrabNet outperforms MODNet and random forest models when predicting battery electrode properties from composition alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks three machine learning approaches for forecasting properties such as voltage and capacity in battery electrode materials, using only their chemical makeup as input. It tests MODNet, CrabNet, and a random forest built on Magpie descriptors against the Materials Project Battery Explorer data and finds CrabNet ahead on accuracy metrics. Multiple validation steps, including bootstrap resampling and two cross-validation schemes, plus t-SNE and DBSCAN clustering on the learned features, support the ranking. The work matters because reliable compositional screening can narrow the search space for new electrode candidates before costly synthesis and testing begin. The authors also flag practical limits when moving these models from benchmark to industrial workflows.

Core claim

CrabNet consistently records higher predictive accuracy than MODNet and the Magpie random forest across voltage, capacity, and energy-density targets on the Materials Project Battery Explorer dataset. Superiority persists under bootstrap resampling, leave-one-cluster-out cross-validation, and stratified five-fold cross-validation. Unsupervised clustering applied to MODNet-derived features produces coherent material groupings without any property labels supplied in advance.

What carries the argument

CrabNet, a graph neural network that treats material composition as an input graph to predict scalar properties.

If this is right

Compositional ML models can rapidly screen large libraries of candidate electrode formulas for promising voltage or capacity values.
Graph-based architectures capture composition-property relationships more effectively than feature-engineered random forests for this task.
Unsupervised clustering on model embeddings reveals natural families of battery materials without requiring labeled data.
Early-stage ML screening is feasible for battery research despite limits on data quality and model transfer to real devices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Retraining CrabNet on experimental rather than computed data could test whether the accuracy advantage survives outside the Materials Project.
The discovered clusters might highlight under-explored compositional regions for new electrode candidates.
Hybrid models that inject physical constraints could address the practical limitations the paper identifies.
Extending the benchmark to additional properties such as cycle life would show how far the compositional approach generalizes.

Load-bearing premise

The Materials Project Battery Explorer dataset and its chosen property labels accurately reflect real electrode behavior without major bias from computational methods or data collection.

What would settle it

Retraining and testing the three models on an independent set of experimentally measured electrode properties and finding that CrabNet no longer leads in accuracy metrics would refute the claim of consistent outperformance.

read the original abstract

In this work, we benchmark three leading Machine Learning (ML) frameworks-MODNet, CrabNet, and a random forest model based on Magpie feature-for predicting properties of battery electrode materials using the Materials Project Battery Explorer dataset. We evaluate these models based on predictive accuracy, visualize numerical features using two-dimensional embeddings, and quantify performance using standard metrics. Our results demonstrate that CrabNet consistently outperforms the other models across all tests. To validate these findings, we employ robust statistical methods: bootstrap resampling and two cross-validation (CV) strategies (leave one cluster out and stratified 5-fold CV), comparing each model against a control baseline. In addition, we apply unsupervised clustering on MODNet-derived features using t-SNE and DBSCAN, revealing coherent material groupings without prior labels. This analysis confirms the robustness of the evaluated models and underscores the potential of ML-driven approaches for accelerating the electrode materials discovery. However, our study also identifies practical limitations and quantifies challenges associated with integrating ML models into materials science workflows. Despite these constraints, our findings suggest that ML models are highly effective for early-stage compositional screening in the battery industry. This work provides a foundation for future research on ML applications in materials discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CrabNet edges out the baselines on this MP battery dataset with clean stats, but the ranking sits on DFT labels whose biases could flip the result for real materials.

read the letter

This paper runs a head-to-head benchmark of MODNet, CrabNet, and a Magpie random forest on the Materials Project Battery Explorer set for electrode properties. CrabNet comes out ahead on the usual accuracy metrics, and the authors back that with bootstrap resampling plus two cross-validation schemes: leave-one-cluster-out and stratified 5-fold. They also add t-SNE plus DBSCAN clustering to show the features group materials sensibly without labels. That combination of comparative numbers and basic robustness checks is the main deliverable. The execution looks careful for what it is: public data, off-the-shelf models, and transparent validation steps. No new algorithm or feature engineering appears, which keeps the scope modest but also avoids overreach. The soft spot is exactly what the stress-test note flags. All labels come from GGA(+U) calculations whose voltage and capacity errors are known to depend on chemistry. If CrabNet happens to exploit the same chemical patterns that the DFT errors follow, the observed win is partly an artifact of the surrogate rather than a general model advantage. The paper does not test transfer to experimental values or even flag how large those DFT discrepancies are in this specific dataset, so the practical takeaway stays provisional. This is useful reading for groups that already run ML screening on battery compositions and want a quick reference on which of these three frameworks performed best on the MP set. It will not change how anyone builds new models, but the numbers and clustering visuals can shorten someone else's setup time. I would send it to peer review. The statistical controls are in place and the comparison is fair; the main revision needed is a clearer discussion of how much the DFT biases might affect the ranking.

Referee Report

2 major / 2 minor

Summary. The manuscript benchmarks three ML models—MODNet, CrabNet, and a Magpie-feature-based random forest—for predicting compositional properties of battery electrode materials drawn from the Materials Project Battery Explorer dataset. It reports that CrabNet achieves superior accuracy on standard metrics, validates this via bootstrap resampling plus leave-one-cluster-out and stratified 5-fold CV, and supplements the analysis with t-SNE/DBSCAN clustering of MODNet-derived features to reveal unlabeled material groupings. The work concludes that ML approaches, particularly CrabNet, are effective for early-stage compositional screening despite practical workflow limitations.

Significance. If the central empirical ranking holds under broader validation, the study supplies a concrete, statistically supported comparison of off-the-shelf frameworks for battery-materials property prediction and demonstrates the utility of unsupervised clustering for exploratory analysis. The explicit use of bootstrap resampling and two distinct CV protocols is a positive methodological feature that strengthens internal reproducibility claims.

major comments (2)

[Abstract / Results] Abstract and Results sections: the central claim that 'CrabNet consistently outperforms the other models across all tests' rests exclusively on GGA(+U) computed labels from the Materials Project. Because these labels carry known composition-dependent systematic errors (e.g., underestimation of transition-metal redox voltages), any model that better exploits the same chemical features used to generate the surrogate labels could appear superior without necessarily generalizing to experimental ground truth. The CV and bootstrap procedures control only for statistical variation within the computed set and do not test transferability.
[Methods] Methods: hyperparameter selection, exact data-split indices, and any post-hoc sample exclusions are not described. Without these details it is impossible to determine whether the reported CrabNet advantage is robust to reasonable modeling choices or whether it partly reflects an advantageous (but unreported) tuning protocol.

minor comments (2)

[Abstract] The abstract refers to 'standard metrics' without naming them (MAE, RMSE, R², etc.); the main text should list the precise metrics and report numerical values with uncertainties for each model and property.
[Figures] Figure captions for the t-SNE/DBSCAN embeddings should state the perplexity, learning rate, and DBSCAN parameters used, as these choices affect the apparent cluster coherence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments highlight important limitations in scope and reproducibility that we address below. We agree that our benchmarking is confined to computed labels and will strengthen the discussion of this point; we will also expand the Methods section with the requested details to ensure full reproducibility.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results sections: the central claim that 'CrabNet consistently outperforms the other models across all tests' rests exclusively on GGA(+U) computed labels from the Materials Project. Because these labels carry known composition-dependent systematic errors (e.g., underestimation of transition-metal redox voltages), any model that better exploits the same chemical features used to generate the surrogate labels could appear superior without necessarily generalizing to experimental ground truth. The CV and bootstrap procedures control only for statistical variation within the computed set and do not test transferability.

Authors: We agree that all reported results are obtained on GGA(+U) formation energies and voltages from the Materials Project and that these labels contain known systematic biases relative to experiment. Our central claim is therefore limited to relative model performance on this specific computational dataset, which is the standard practice for compositional screening studies. The bootstrap and CV protocols demonstrate that the CrabNet advantage is statistically robust within the computed data distribution, but they do not address transfer to experimental values. In the revised manuscript we will (i) explicitly qualify the claim in the abstract and conclusions to refer to “computed properties from the Materials Project,” (ii) add a dedicated paragraph in the Discussion section acknowledging the composition-dependent errors in the labels and the absence of experimental validation, and (iii) suggest that future work should benchmark against experimental electrode datasets once they become sufficiently large. revision: partial
Referee: [Methods] Methods: hyperparameter selection, exact data-split indices, and any post-hoc sample exclusions are not described. Without these details it is impossible to determine whether the reported CrabNet advantage is robust to reasonable modeling choices or whether it partly reflects an advantageous (but unreported) tuning protocol.

Authors: We apologize for the incomplete Methods description. The revised manuscript will contain a new subsection “Hyperparameter Optimization and Data Splits” that specifies: (a) the exact hyperparameter search spaces and optimization procedure (random search with 5-fold inner CV for CrabNet and MODNet; default scikit-learn settings for the random forest after a small grid search), (b) the random seeds and stratification criteria used to generate the train/test partitions, and (c) the precise criteria and number of samples excluded (primarily entries with missing voltage or formation-energy values). We will also deposit the full list of material IDs used in each split as supplementary data to allow exact reproduction. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical benchmarking on held-out data

full rationale

The paper reports a direct comparison of three ML models (MODNet, CrabNet, Magpie-RF) trained and evaluated on the Materials Project Battery Explorer dataset. Performance is quantified via standard metrics on held-out folds using leave-one-cluster-out and stratified 5-fold CV plus bootstrap resampling. No derivation, equation, or central claim reduces to a fitted parameter renamed as a prediction, a self-referential definition, or a load-bearing self-citation chain. The ranking of models is an independent empirical result conditioned on the chosen dataset and features; it does not presuppose its own outcome by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study rests on the assumption that the public dataset labels are reliable ground truth and that standard ML training procedures produce generalizable predictors for electrode properties. No explicit free parameters or invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5518 in / 959 out tokens · 61727 ms · 2026-05-15T14:12:14.770974+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features

clearly separates materials on a planar map, with compounds that share the same working ion found clustering tightly together. Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features. The points have been colored according to their gravimetric and volumetric capacity using equidistance segmentation of the target variables to define ...

work page
[2]

A. G. Olabi, Q. Abbas, P. A. Shinde, M. A. Abdelkareem, Rechargeable batteries: Technological advancement, challenges, current and emerging applications.Energy266, 126408 (2023), doi: 10.1016/j.energy.2022.126408

work page doi:10.1016/j.energy.2022.126408 2023
[3]

M. E. Sotomayor,et al., Ultra-thick battery electrodes for high gravimetric and volumetric energy density Li-ion batteries.Journal of Power Sources437, 226923 (2019), doi:10.1016/j. jpowsour.2019.226923

work page doi:10.1016/j 2019
[4]

J. Wu,et al., Building efficient ion pathway in highly densified thick electrodes with high gravimetric and volumetric energy densities.Nano Letters21(21), 9339–9346 (2021), doi: 10.1021/acs.nanolett.1c03724

work page doi:10.1021/acs.nanolett.1c03724 2021
[5]

Zhang, C

Y. Zhang, C. Ling, A strategy to apply machine learning to small datasets in materials science. Npj Computational Materials4(1), 25 (2018), doi:10.1038/s41524-018-0081-z. 24

work page doi:10.1038/s41524-018-0081-z 2018
[6]

Wei,et al., Machine learning in materials science.InfoMat1(3), 338–358 (2019), doi: 10.1002/inf2.12028

J. Wei,et al., Machine learning in materials science.InfoMat1(3), 338–358 (2019), doi: 10.1002/inf2.12028

work page doi:10.1002/inf2.12028 2019
[7]

S. P. Ong,et al., The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science97, 209–215 (2015), doi:10.1016/j.commatsci.2014.10.037

work page doi:10.1016/j.commatsci.2014.10.037 2015
[8]

C. J. Hargreaves,et al., A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning.npj Computational Materials9(1), 9 (2023), doi:10.1038/s41524-022-00951-z

work page doi:10.1038/s41524-022-00951-z 2023
[9]

C. W. Andersen,et al., OPTIMADE, an API for exchanging materials data.Scientific data 8(1), 217 (2021), doi:10.1038/s41597-021-00974-z

work page doi:10.1038/s41597-021-00974-z 2021
[10]

M. K. Horton,et al., Accelerated data-driven materials science with the Materials Project. Nature Materialspp. 1–11 (2025), doi:10.1038/s41563-025-02272-0

work page doi:10.1038/s41563-025-02272-0 2025
[11]

Choudhary,et al., Recent advances and applications of deep learning methods in materials science.npj Computational Materials8(1), 59 (2022), doi:10.1038/s41524-022-00734-6

K. Choudhary,et al., Recent advances and applications of deep learning methods in materials science.npj Computational Materials8(1), 59 (2022), doi:10.1038/s41524-022-00734-6

work page doi:10.1038/s41524-022-00734-6 2022
[12]

A. D. Sendek,et al., Machine learning-assisted discovery of solid Li-ion conducting materials. Chemistry of Materials31(2), 342–352 (2018), doi:10.1021/acs.chemmater.8b03272

work page doi:10.1021/acs.chemmater.8b03272 2018
[13]

Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries

L. Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries. Advanced Theory and Simulations4(9), 2100196 (2021), doi:10.1002/adts.202100196

work page doi:10.1002/adts.202100196 2021
[14]

M. L. Adam,et al., Navigating materials chemical space to discover new battery electrodes using machine learning.Energy Storage Materials65, 103090 (2024), doi:10.1016/j.ensm. 2023.103090

work page doi:10.1016/j.ensm 2024
[15]

Zhang, Y

Z. Zhang, Y. Wang, S. Li, S. Li, M. Chen, Interpretable Machine Learning Prediction of Voltage and Specific Capacity for Electrode Materials.Advanced Theory and Simulations 7(8), 2400227 (2024), doi:10.1002/adts.202400227

work page doi:10.1002/adts.202400227 2024
[16]

2018 , issn =

L. Ward,et al., Matminer: An open source toolkit for materials data mining.Computational Materials Science152, 60–69 (2018), doi:10.1016/j.commatsci.2018.05.018. 25

work page doi:10.1016/j.commatsci.2018.05.018 2018
[17]

Materials Project Battery Explorer,https://legacy.materialsproject.org/#search/ batteries/

work page
[18]

Charting the complete elastic properties of inorganic crystalline compounds

M. de Jong,et al., Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data2(2015), doi:10.1038/sdata.2015.9

work page doi:10.1038/sdata.2015.9 2015
[19]

F. Zhou, M. Cococcioni, C. A. Marianetti, D. Morgan, G. Ceder, First-principles prediction of redox potentials in transition-metal compounds with LDA+U.Physical Review B70, 235121 (2004), doi:10.1103/PhysRevB.70.235121

work page doi:10.1103/physrevb.70.235121 2004
[20]

L. Wang, T. Maxisch, G. Ceder, A First-Principles Approach to Studying the Thermal Stability of Oxide Cathode Materials.Chemistry of Materials19(3), 543–552 (2007), doi:10.1021/ cm0620943

work page 2007
[21]

S. P. Ong, A. Jain, G. Hautier, B. Kang, G. Ceder, Thermal stabilities of delithiated olivine MPO4 (M=Fe, Mn) cathodes investigated using first principles calculations.Electrochemistry Communications12(3), 427–430 (2010), doi:10.1016/j.elecom.2010.01.010

work page doi:10.1016/j.elecom.2010.01.010 2010
[22]

S. P. Ong,et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science68, 314–319 (2013), doi: 10.1016/j.commatsci.2012.10.028

work page doi:10.1016/j.commatsci.2012.10.028 2013
[23]

L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials2(1), 1–7 (2016), doi:10.1038/npjcompumats.2016.28

work page doi:10.1038/npjcompumats.2016.28 2016
[24]

K. Choudhary,et al., The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design.npj computational materials6(1), 173 (2020), doi:10.1038/s41524-020-00440-1

work page doi:10.1038/s41524-020-00440-1 2020
[25]

Persson, Gerbrand Ceder, and Anubhav Jain

V. Tshitoyan,et al., Unsupervised word embeddings capture latent knowledge from materials science literature.Nature571(7763), 95–98 (2019), doi:10.1038/s41586-019-1335-8

work page doi:10.1038/s41586-019-1335-8 2019
[26]

Jha,et al., Elemnet: Deep learning the chemistry of materials from only elemental compo- sition.Scientific reports8(1), 17593 (2018), doi:10.1038/s41598-018-35934-y

D. Jha,et al., Elemnet: Deep learning the chemistry of materials from only elemental compo- sition.Scientific reports8(1), 17593 (2018), doi:10.1038/s41598-018-35934-y. 26

work page doi:10.1038/s41598-018-35934-y 2018
[27]

Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet

P.-P. De Breuck, G. Hautier, G.-M. Rignanese, Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet.npj computational materials 7(1), 83 (2021), doi:10.1038/s41524-021-00552-2

work page doi:10.1038/s41524-021-00552-2 2021
[28]

A. Y.-T. Wang, S. K. Kauwe, R. J. Murdock, T. D. Sparks, Compositionally restricted attention- based network for materials property predictions.npj Computational Materials7(1), 77 (2021), doi:10.1038/s41524-021-00545-1

work page doi:10.1038/s41524-021-00545-1 2021
[29]

A. Y.-T. Wang, M. S. Mahmoud, M. Czasny, A. Gurlo, CrabNet for Explainable Deep Learning in Materials Science: Bridging the Gap Between Academia and Industry.Integrating Materials and Manufacturing Innovation11(1), 41–56 (2022), doi:10.1007/s40192-021-00247-y

work page doi:10.1007/s40192-021-00247-y 2022
[30]

Heath, S

D. Heath, S. Kasif, S. Salzberg, k-DT: A multi-tree learning method, inProc. of the Second Int. Workshop on Multistrategy Learning(1993), pp. 138–149

work page 1993
[31]

Robust model benchmarking and bias- imbalance in data-driven materials science: a case study on MODNet

P.-P. De Breuck, M. L. Evans, G.-M. Rignanese, Robust model benchmarking and bias- imbalance in data-driven materials science: a case study on MODNet.Journal of Physics: Condensed Matter33(40), 404002 (2021), doi:10.1088/1361-648X/ac1280

work page doi:10.1088/1361-648x/ac1280 2021
[32]

Kraskov, H

A. Kraskov, H. St ¨ogbauer, P. Grassberger, Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics69(6), 066138 (2004), doi:10.1103/ PhysRevE.69.066138

work page 2004
[33]

Vaswani,et al., Attention is all you need.Advances in neural information processing systems 30, 5998–6008 (2017)

A. Vaswani,et al., Attention is all you need.Advances in neural information processing systems 30, 5998–6008 (2017)

work page 2017
[34]

Y. Liu, Y. Wang, J. Zhang, New machine learning algorithm: Random forest, inInternational conference on information computing and applications(Springer) (2012), pp. 246–252, doi: 10.1007/978-3-642-34062-8 32

work page doi:10.1007/978-3-642-34062-8 2012
[35]

K. Khan, S. U. Rehman, K. Aziz, S. Fong, S. Sarasvady, DBSCAN: Past, present and future, in The fifth international conference on the applications of digital information and web technolo- gies (ICADIWT 2014)(IEEE) (2014), pp. 232–238, doi:10.1109/ICADIWT.2014.6814687. 27

work page doi:10.1109/icadiwt.2014.6814687 2014
[36]

C. J. Hargreaves, M. S. Dyer, M. W. Gaultois, V. A. Kurlin, M. J. Rosseinsky, The earth mover’s distance as a metric for the space of inorganic compositions.Chemistry of Materials32(24), 10610–10620 (2020), doi:10.1021/acs.chemmater.0c03381

work page doi:10.1021/acs.chemmater.0c03381 2020
[37]

Vallender, Calculation of the Wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications18(4), 784–786 (1974)

S. Vallender, Calculation of the Wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications18(4), 784–786 (1974)

work page 1974
[38]

D. M. Allen, The relationship between variable selection and data agumentation and a method for prediction.technometrics16(1), 125–127 (1974), doi:10.2307/1267500

work page doi:10.2307/1267500 1974
[39]

Stone, Cross-validation: A review.Statistics: A Journal of Theoretical and Applied Statistics 9(1), 127–139 (1978)

M. Stone, Cross-validation: A review.Statistics: A Journal of Theoretical and Applied Statistics 9(1), 127–139 (1978)

work page 1978
[40]

Refaeilzadeh, L

P. Refaeilzadeh, L. Tang, H. Liu, Cross-validation, inEncyclopedia of database systems (Springer), pp. 532–538 (2009), doi:10.1007/978-0-387-39940-9 565

work page doi:10.1007/978-0-387-39940-9 2009
[41]

A. Dunn, Q. Wang, A. Ganose, D. Dopp, A. Jain, Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138, doi:10.1038/s41524-020-00406-3

work page doi:10.1038/s41524-020-00406-3 2020
[42]

G. J. Sz ´ekely, M. L. Rizzo, N. K. Bakirov, Measuring and Testing Dependence by Cor- relation of Distances.The Annals of Statistics35(6), 2769–2794 (2007), doi:10.1214/ 009053607000000505. 28

work page 2007

[1] [1]

Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features

clearly separates materials on a planar map, with compounds that share the same working ion found clustering tightly together. Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features. The points have been colored according to their gravimetric and volumetric capacity using equidistance segmentation of the target variables to define ...

work page

[2] [2]

A. G. Olabi, Q. Abbas, P. A. Shinde, M. A. Abdelkareem, Rechargeable batteries: Technological advancement, challenges, current and emerging applications.Energy266, 126408 (2023), doi: 10.1016/j.energy.2022.126408

work page doi:10.1016/j.energy.2022.126408 2023

[3] [3]

M. E. Sotomayor,et al., Ultra-thick battery electrodes for high gravimetric and volumetric energy density Li-ion batteries.Journal of Power Sources437, 226923 (2019), doi:10.1016/j. jpowsour.2019.226923

work page doi:10.1016/j 2019

[4] [4]

J. Wu,et al., Building efficient ion pathway in highly densified thick electrodes with high gravimetric and volumetric energy densities.Nano Letters21(21), 9339–9346 (2021), doi: 10.1021/acs.nanolett.1c03724

work page doi:10.1021/acs.nanolett.1c03724 2021

[5] [5]

Zhang, C

Y. Zhang, C. Ling, A strategy to apply machine learning to small datasets in materials science. Npj Computational Materials4(1), 25 (2018), doi:10.1038/s41524-018-0081-z. 24

work page doi:10.1038/s41524-018-0081-z 2018

[6] [6]

Wei,et al., Machine learning in materials science.InfoMat1(3), 338–358 (2019), doi: 10.1002/inf2.12028

J. Wei,et al., Machine learning in materials science.InfoMat1(3), 338–358 (2019), doi: 10.1002/inf2.12028

work page doi:10.1002/inf2.12028 2019

[7] [7]

S. P. Ong,et al., The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science97, 209–215 (2015), doi:10.1016/j.commatsci.2014.10.037

work page doi:10.1016/j.commatsci.2014.10.037 2015

[8] [8]

C. J. Hargreaves,et al., A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning.npj Computational Materials9(1), 9 (2023), doi:10.1038/s41524-022-00951-z

work page doi:10.1038/s41524-022-00951-z 2023

[9] [9]

C. W. Andersen,et al., OPTIMADE, an API for exchanging materials data.Scientific data 8(1), 217 (2021), doi:10.1038/s41597-021-00974-z

work page doi:10.1038/s41597-021-00974-z 2021

[10] [10]

M. K. Horton,et al., Accelerated data-driven materials science with the Materials Project. Nature Materialspp. 1–11 (2025), doi:10.1038/s41563-025-02272-0

work page doi:10.1038/s41563-025-02272-0 2025

[11] [11]

Choudhary,et al., Recent advances and applications of deep learning methods in materials science.npj Computational Materials8(1), 59 (2022), doi:10.1038/s41524-022-00734-6

K. Choudhary,et al., Recent advances and applications of deep learning methods in materials science.npj Computational Materials8(1), 59 (2022), doi:10.1038/s41524-022-00734-6

work page doi:10.1038/s41524-022-00734-6 2022

[12] [12]

A. D. Sendek,et al., Machine learning-assisted discovery of solid Li-ion conducting materials. Chemistry of Materials31(2), 342–352 (2018), doi:10.1021/acs.chemmater.8b03272

work page doi:10.1021/acs.chemmater.8b03272 2018

[13] [13]

Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries

L. Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries. Advanced Theory and Simulations4(9), 2100196 (2021), doi:10.1002/adts.202100196

work page doi:10.1002/adts.202100196 2021

[14] [14]

M. L. Adam,et al., Navigating materials chemical space to discover new battery electrodes using machine learning.Energy Storage Materials65, 103090 (2024), doi:10.1016/j.ensm. 2023.103090

work page doi:10.1016/j.ensm 2024

[15] [15]

Zhang, Y

Z. Zhang, Y. Wang, S. Li, S. Li, M. Chen, Interpretable Machine Learning Prediction of Voltage and Specific Capacity for Electrode Materials.Advanced Theory and Simulations 7(8), 2400227 (2024), doi:10.1002/adts.202400227

work page doi:10.1002/adts.202400227 2024

[16] [16]

2018 , issn =

L. Ward,et al., Matminer: An open source toolkit for materials data mining.Computational Materials Science152, 60–69 (2018), doi:10.1016/j.commatsci.2018.05.018. 25

work page doi:10.1016/j.commatsci.2018.05.018 2018

[17] [17]

Materials Project Battery Explorer,https://legacy.materialsproject.org/#search/ batteries/

work page

[18] [18]

Charting the complete elastic properties of inorganic crystalline compounds

M. de Jong,et al., Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data2(2015), doi:10.1038/sdata.2015.9

work page doi:10.1038/sdata.2015.9 2015

[19] [19]

F. Zhou, M. Cococcioni, C. A. Marianetti, D. Morgan, G. Ceder, First-principles prediction of redox potentials in transition-metal compounds with LDA+U.Physical Review B70, 235121 (2004), doi:10.1103/PhysRevB.70.235121

work page doi:10.1103/physrevb.70.235121 2004

[20] [20]

L. Wang, T. Maxisch, G. Ceder, A First-Principles Approach to Studying the Thermal Stability of Oxide Cathode Materials.Chemistry of Materials19(3), 543–552 (2007), doi:10.1021/ cm0620943

work page 2007

[21] [21]

S. P. Ong, A. Jain, G. Hautier, B. Kang, G. Ceder, Thermal stabilities of delithiated olivine MPO4 (M=Fe, Mn) cathodes investigated using first principles calculations.Electrochemistry Communications12(3), 427–430 (2010), doi:10.1016/j.elecom.2010.01.010

work page doi:10.1016/j.elecom.2010.01.010 2010

[22] [22]

S. P. Ong,et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science68, 314–319 (2013), doi: 10.1016/j.commatsci.2012.10.028

work page doi:10.1016/j.commatsci.2012.10.028 2013

[23] [23]

L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials2(1), 1–7 (2016), doi:10.1038/npjcompumats.2016.28

work page doi:10.1038/npjcompumats.2016.28 2016

[24] [24]

K. Choudhary,et al., The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design.npj computational materials6(1), 173 (2020), doi:10.1038/s41524-020-00440-1

work page doi:10.1038/s41524-020-00440-1 2020

[25] [25]

Persson, Gerbrand Ceder, and Anubhav Jain

V. Tshitoyan,et al., Unsupervised word embeddings capture latent knowledge from materials science literature.Nature571(7763), 95–98 (2019), doi:10.1038/s41586-019-1335-8

work page doi:10.1038/s41586-019-1335-8 2019

[26] [26]

Jha,et al., Elemnet: Deep learning the chemistry of materials from only elemental compo- sition.Scientific reports8(1), 17593 (2018), doi:10.1038/s41598-018-35934-y

D. Jha,et al., Elemnet: Deep learning the chemistry of materials from only elemental compo- sition.Scientific reports8(1), 17593 (2018), doi:10.1038/s41598-018-35934-y. 26

work page doi:10.1038/s41598-018-35934-y 2018

[27] [27]

Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet

P.-P. De Breuck, G. Hautier, G.-M. Rignanese, Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet.npj computational materials 7(1), 83 (2021), doi:10.1038/s41524-021-00552-2

work page doi:10.1038/s41524-021-00552-2 2021

[28] [28]

A. Y.-T. Wang, S. K. Kauwe, R. J. Murdock, T. D. Sparks, Compositionally restricted attention- based network for materials property predictions.npj Computational Materials7(1), 77 (2021), doi:10.1038/s41524-021-00545-1

work page doi:10.1038/s41524-021-00545-1 2021

[29] [29]

A. Y.-T. Wang, M. S. Mahmoud, M. Czasny, A. Gurlo, CrabNet for Explainable Deep Learning in Materials Science: Bridging the Gap Between Academia and Industry.Integrating Materials and Manufacturing Innovation11(1), 41–56 (2022), doi:10.1007/s40192-021-00247-y

work page doi:10.1007/s40192-021-00247-y 2022

[30] [30]

Heath, S

D. Heath, S. Kasif, S. Salzberg, k-DT: A multi-tree learning method, inProc. of the Second Int. Workshop on Multistrategy Learning(1993), pp. 138–149

work page 1993

[31] [31]

Robust model benchmarking and bias- imbalance in data-driven materials science: a case study on MODNet

P.-P. De Breuck, M. L. Evans, G.-M. Rignanese, Robust model benchmarking and bias- imbalance in data-driven materials science: a case study on MODNet.Journal of Physics: Condensed Matter33(40), 404002 (2021), doi:10.1088/1361-648X/ac1280

work page doi:10.1088/1361-648x/ac1280 2021

[32] [32]

Kraskov, H

A. Kraskov, H. St ¨ogbauer, P. Grassberger, Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics69(6), 066138 (2004), doi:10.1103/ PhysRevE.69.066138

work page 2004

[33] [33]

Vaswani,et al., Attention is all you need.Advances in neural information processing systems 30, 5998–6008 (2017)

A. Vaswani,et al., Attention is all you need.Advances in neural information processing systems 30, 5998–6008 (2017)

work page 2017

[34] [34]

Y. Liu, Y. Wang, J. Zhang, New machine learning algorithm: Random forest, inInternational conference on information computing and applications(Springer) (2012), pp. 246–252, doi: 10.1007/978-3-642-34062-8 32

work page doi:10.1007/978-3-642-34062-8 2012

[35] [35]

K. Khan, S. U. Rehman, K. Aziz, S. Fong, S. Sarasvady, DBSCAN: Past, present and future, in The fifth international conference on the applications of digital information and web technolo- gies (ICADIWT 2014)(IEEE) (2014), pp. 232–238, doi:10.1109/ICADIWT.2014.6814687. 27

work page doi:10.1109/icadiwt.2014.6814687 2014

[36] [36]

C. J. Hargreaves, M. S. Dyer, M. W. Gaultois, V. A. Kurlin, M. J. Rosseinsky, The earth mover’s distance as a metric for the space of inorganic compositions.Chemistry of Materials32(24), 10610–10620 (2020), doi:10.1021/acs.chemmater.0c03381

work page doi:10.1021/acs.chemmater.0c03381 2020

[37] [37]

Vallender, Calculation of the Wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications18(4), 784–786 (1974)

S. Vallender, Calculation of the Wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications18(4), 784–786 (1974)

work page 1974

[38] [38]

D. M. Allen, The relationship between variable selection and data agumentation and a method for prediction.technometrics16(1), 125–127 (1974), doi:10.2307/1267500

work page doi:10.2307/1267500 1974

[39] [39]

Stone, Cross-validation: A review.Statistics: A Journal of Theoretical and Applied Statistics 9(1), 127–139 (1978)

M. Stone, Cross-validation: A review.Statistics: A Journal of Theoretical and Applied Statistics 9(1), 127–139 (1978)

work page 1978

[40] [40]

Refaeilzadeh, L

P. Refaeilzadeh, L. Tang, H. Liu, Cross-validation, inEncyclopedia of database systems (Springer), pp. 532–538 (2009), doi:10.1007/978-0-387-39940-9 565

work page doi:10.1007/978-0-387-39940-9 2009

[41] [41]

A. Dunn, Q. Wang, A. Ganose, D. Dopp, A. Jain, Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138, doi:10.1038/s41524-020-00406-3

work page doi:10.1038/s41524-020-00406-3 2020

[42] [42]

G. J. Sz ´ekely, M. L. Rizzo, N. K. Bakirov, Measuring and Testing Dependence by Cor- relation of Distances.The Annals of Statistics35(6), 2769–2794 (2007), doi:10.1214/ 009053607000000505. 28

work page 2007