Machine Learning for Electrode Materials: Property Prediction via Composition
Pith reviewed 2026-05-15 14:12 UTC · model grok-4.3
The pith
CrabNet outperforms MODNet and random forest models when predicting battery electrode properties from composition alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CrabNet consistently records higher predictive accuracy than MODNet and the Magpie random forest across voltage, capacity, and energy-density targets on the Materials Project Battery Explorer dataset. Superiority persists under bootstrap resampling, leave-one-cluster-out cross-validation, and stratified five-fold cross-validation. Unsupervised clustering applied to MODNet-derived features produces coherent material groupings without any property labels supplied in advance.
What carries the argument
CrabNet, a graph neural network that treats material composition as an input graph to predict scalar properties.
If this is right
- Compositional ML models can rapidly screen large libraries of candidate electrode formulas for promising voltage or capacity values.
- Graph-based architectures capture composition-property relationships more effectively than feature-engineered random forests for this task.
- Unsupervised clustering on model embeddings reveals natural families of battery materials without requiring labeled data.
- Early-stage ML screening is feasible for battery research despite limits on data quality and model transfer to real devices.
Where Pith is reading between the lines
- Retraining CrabNet on experimental rather than computed data could test whether the accuracy advantage survives outside the Materials Project.
- The discovered clusters might highlight under-explored compositional regions for new electrode candidates.
- Hybrid models that inject physical constraints could address the practical limitations the paper identifies.
- Extending the benchmark to additional properties such as cycle life would show how far the compositional approach generalizes.
Load-bearing premise
The Materials Project Battery Explorer dataset and its chosen property labels accurately reflect real electrode behavior without major bias from computational methods or data collection.
What would settle it
Retraining and testing the three models on an independent set of experimentally measured electrode properties and finding that CrabNet no longer leads in accuracy metrics would refute the claim of consistent outperformance.
read the original abstract
In this work, we benchmark three leading Machine Learning (ML) frameworks-MODNet, CrabNet, and a random forest model based on Magpie feature-for predicting properties of battery electrode materials using the Materials Project Battery Explorer dataset. We evaluate these models based on predictive accuracy, visualize numerical features using two-dimensional embeddings, and quantify performance using standard metrics. Our results demonstrate that CrabNet consistently outperforms the other models across all tests. To validate these findings, we employ robust statistical methods: bootstrap resampling and two cross-validation (CV) strategies (leave one cluster out and stratified 5-fold CV), comparing each model against a control baseline. In addition, we apply unsupervised clustering on MODNet-derived features using t-SNE and DBSCAN, revealing coherent material groupings without prior labels. This analysis confirms the robustness of the evaluated models and underscores the potential of ML-driven approaches for accelerating the electrode materials discovery. However, our study also identifies practical limitations and quantifies challenges associated with integrating ML models into materials science workflows. Despite these constraints, our findings suggest that ML models are highly effective for early-stage compositional screening in the battery industry. This work provides a foundation for future research on ML applications in materials discovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript benchmarks three ML models—MODNet, CrabNet, and a Magpie-feature-based random forest—for predicting compositional properties of battery electrode materials drawn from the Materials Project Battery Explorer dataset. It reports that CrabNet achieves superior accuracy on standard metrics, validates this via bootstrap resampling plus leave-one-cluster-out and stratified 5-fold CV, and supplements the analysis with t-SNE/DBSCAN clustering of MODNet-derived features to reveal unlabeled material groupings. The work concludes that ML approaches, particularly CrabNet, are effective for early-stage compositional screening despite practical workflow limitations.
Significance. If the central empirical ranking holds under broader validation, the study supplies a concrete, statistically supported comparison of off-the-shelf frameworks for battery-materials property prediction and demonstrates the utility of unsupervised clustering for exploratory analysis. The explicit use of bootstrap resampling and two distinct CV protocols is a positive methodological feature that strengthens internal reproducibility claims.
major comments (2)
- [Abstract / Results] Abstract and Results sections: the central claim that 'CrabNet consistently outperforms the other models across all tests' rests exclusively on GGA(+U) computed labels from the Materials Project. Because these labels carry known composition-dependent systematic errors (e.g., underestimation of transition-metal redox voltages), any model that better exploits the same chemical features used to generate the surrogate labels could appear superior without necessarily generalizing to experimental ground truth. The CV and bootstrap procedures control only for statistical variation within the computed set and do not test transferability.
- [Methods] Methods: hyperparameter selection, exact data-split indices, and any post-hoc sample exclusions are not described. Without these details it is impossible to determine whether the reported CrabNet advantage is robust to reasonable modeling choices or whether it partly reflects an advantageous (but unreported) tuning protocol.
minor comments (2)
- [Abstract] The abstract refers to 'standard metrics' without naming them (MAE, RMSE, R², etc.); the main text should list the precise metrics and report numerical values with uncertainties for each model and property.
- [Figures] Figure captions for the t-SNE/DBSCAN embeddings should state the perplexity, learning rate, and DBSCAN parameters used, as these choices affect the apparent cluster coherence.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The two major comments highlight important limitations in scope and reproducibility that we address below. We agree that our benchmarking is confined to computed labels and will strengthen the discussion of this point; we will also expand the Methods section with the requested details to ensure full reproducibility.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results sections: the central claim that 'CrabNet consistently outperforms the other models across all tests' rests exclusively on GGA(+U) computed labels from the Materials Project. Because these labels carry known composition-dependent systematic errors (e.g., underestimation of transition-metal redox voltages), any model that better exploits the same chemical features used to generate the surrogate labels could appear superior without necessarily generalizing to experimental ground truth. The CV and bootstrap procedures control only for statistical variation within the computed set and do not test transferability.
Authors: We agree that all reported results are obtained on GGA(+U) formation energies and voltages from the Materials Project and that these labels contain known systematic biases relative to experiment. Our central claim is therefore limited to relative model performance on this specific computational dataset, which is the standard practice for compositional screening studies. The bootstrap and CV protocols demonstrate that the CrabNet advantage is statistically robust within the computed data distribution, but they do not address transfer to experimental values. In the revised manuscript we will (i) explicitly qualify the claim in the abstract and conclusions to refer to “computed properties from the Materials Project,” (ii) add a dedicated paragraph in the Discussion section acknowledging the composition-dependent errors in the labels and the absence of experimental validation, and (iii) suggest that future work should benchmark against experimental electrode datasets once they become sufficiently large. revision: partial
-
Referee: [Methods] Methods: hyperparameter selection, exact data-split indices, and any post-hoc sample exclusions are not described. Without these details it is impossible to determine whether the reported CrabNet advantage is robust to reasonable modeling choices or whether it partly reflects an advantageous (but unreported) tuning protocol.
Authors: We apologize for the incomplete Methods description. The revised manuscript will contain a new subsection “Hyperparameter Optimization and Data Splits” that specifies: (a) the exact hyperparameter search spaces and optimization procedure (random search with 5-fold inner CV for CrabNet and MODNet; default scikit-learn settings for the random forest after a small grid search), (b) the random seeds and stratification criteria used to generate the train/test partitions, and (c) the precise criteria and number of samples excluded (primarily entries with missing voltage or formation-energy values). We will also deposit the full list of material IDs used in each split as supplementary data to allow exact reproduction. revision: yes
Circularity Check
No circularity: standard empirical benchmarking on held-out data
full rationale
The paper reports a direct comparison of three ML models (MODNet, CrabNet, Magpie-RF) trained and evaluated on the Materials Project Battery Explorer dataset. Performance is quantified via standard metrics on held-out folds using leave-one-cluster-out and stratified 5-fold CV plus bootstrap resampling. No derivation, equation, or central claim reduces to a fitted parameter renamed as a prediction, a self-referential definition, or a load-bearing self-citation chain. The ranking of models is an independent empirical result conditioned on the chosen dataset and features; it does not presuppose its own outcome by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features
clearly separates materials on a planar map, with compounds that share the same working ion found clustering tightly together. Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features. The points have been colored according to their gravimetric and volumetric capacity using equidistance segmentation of the target variables to define ...
-
[2]
A. G. Olabi, Q. Abbas, P. A. Shinde, M. A. Abdelkareem, Rechargeable batteries: Technological advancement, challenges, current and emerging applications.Energy266, 126408 (2023), doi: 10.1016/j.energy.2022.126408
-
[3]
M. E. Sotomayor,et al., Ultra-thick battery electrodes for high gravimetric and volumetric energy density Li-ion batteries.Journal of Power Sources437, 226923 (2019), doi:10.1016/j. jpowsour.2019.226923
work page doi:10.1016/j 2019
-
[4]
J. Wu,et al., Building efficient ion pathway in highly densified thick electrodes with high gravimetric and volumetric energy densities.Nano Letters21(21), 9339–9346 (2021), doi: 10.1021/acs.nanolett.1c03724
-
[5]
Y. Zhang, C. Ling, A strategy to apply machine learning to small datasets in materials science. Npj Computational Materials4(1), 25 (2018), doi:10.1038/s41524-018-0081-z. 24
-
[6]
J. Wei,et al., Machine learning in materials science.InfoMat1(3), 338–358 (2019), doi: 10.1002/inf2.12028
-
[7]
S. P. Ong,et al., The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science97, 209–215 (2015), doi:10.1016/j.commatsci.2014.10.037
-
[8]
C. J. Hargreaves,et al., A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning.npj Computational Materials9(1), 9 (2023), doi:10.1038/s41524-022-00951-z
-
[9]
C. W. Andersen,et al., OPTIMADE, an API for exchanging materials data.Scientific data 8(1), 217 (2021), doi:10.1038/s41597-021-00974-z
-
[10]
M. K. Horton,et al., Accelerated data-driven materials science with the Materials Project. Nature Materialspp. 1–11 (2025), doi:10.1038/s41563-025-02272-0
-
[11]
K. Choudhary,et al., Recent advances and applications of deep learning methods in materials science.npj Computational Materials8(1), 59 (2022), doi:10.1038/s41524-022-00734-6
-
[12]
A. D. Sendek,et al., Machine learning-assisted discovery of solid Li-ion conducting materials. Chemistry of Materials31(2), 342–352 (2018), doi:10.1021/acs.chemmater.8b03272
-
[13]
Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries
L. Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries. Advanced Theory and Simulations4(9), 2100196 (2021), doi:10.1002/adts.202100196
-
[14]
M. L. Adam,et al., Navigating materials chemical space to discover new battery electrodes using machine learning.Energy Storage Materials65, 103090 (2024), doi:10.1016/j.ensm. 2023.103090
-
[15]
Z. Zhang, Y. Wang, S. Li, S. Li, M. Chen, Interpretable Machine Learning Prediction of Voltage and Specific Capacity for Electrode Materials.Advanced Theory and Simulations 7(8), 2400227 (2024), doi:10.1002/adts.202400227
-
[16]
L. Ward,et al., Matminer: An open source toolkit for materials data mining.Computational Materials Science152, 60–69 (2018), doi:10.1016/j.commatsci.2018.05.018. 25
-
[17]
Materials Project Battery Explorer,https://legacy.materialsproject.org/#search/ batteries/
-
[18]
Charting the complete elastic properties of inorganic crystalline compounds
M. de Jong,et al., Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data2(2015), doi:10.1038/sdata.2015.9
-
[19]
F. Zhou, M. Cococcioni, C. A. Marianetti, D. Morgan, G. Ceder, First-principles prediction of redox potentials in transition-metal compounds with LDA+U.Physical Review B70, 235121 (2004), doi:10.1103/PhysRevB.70.235121
-
[20]
L. Wang, T. Maxisch, G. Ceder, A First-Principles Approach to Studying the Thermal Stability of Oxide Cathode Materials.Chemistry of Materials19(3), 543–552 (2007), doi:10.1021/ cm0620943
work page 2007
-
[21]
S. P. Ong, A. Jain, G. Hautier, B. Kang, G. Ceder, Thermal stabilities of delithiated olivine MPO4 (M=Fe, Mn) cathodes investigated using first principles calculations.Electrochemistry Communications12(3), 427–430 (2010), doi:10.1016/j.elecom.2010.01.010
-
[22]
S. P. Ong,et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science68, 314–319 (2013), doi: 10.1016/j.commatsci.2012.10.028
-
[23]
L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials2(1), 1–7 (2016), doi:10.1038/npjcompumats.2016.28
-
[24]
K. Choudhary,et al., The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design.npj computational materials6(1), 173 (2020), doi:10.1038/s41524-020-00440-1
-
[25]
Persson, Gerbrand Ceder, and Anubhav Jain
V. Tshitoyan,et al., Unsupervised word embeddings capture latent knowledge from materials science literature.Nature571(7763), 95–98 (2019), doi:10.1038/s41586-019-1335-8
-
[26]
D. Jha,et al., Elemnet: Deep learning the chemistry of materials from only elemental compo- sition.Scientific reports8(1), 17593 (2018), doi:10.1038/s41598-018-35934-y. 26
-
[27]
P.-P. De Breuck, G. Hautier, G.-M. Rignanese, Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet.npj computational materials 7(1), 83 (2021), doi:10.1038/s41524-021-00552-2
-
[28]
A. Y.-T. Wang, S. K. Kauwe, R. J. Murdock, T. D. Sparks, Compositionally restricted attention- based network for materials property predictions.npj Computational Materials7(1), 77 (2021), doi:10.1038/s41524-021-00545-1
-
[29]
A. Y.-T. Wang, M. S. Mahmoud, M. Czasny, A. Gurlo, CrabNet for Explainable Deep Learning in Materials Science: Bridging the Gap Between Academia and Industry.Integrating Materials and Manufacturing Innovation11(1), 41–56 (2022), doi:10.1007/s40192-021-00247-y
- [30]
-
[31]
P.-P. De Breuck, M. L. Evans, G.-M. Rignanese, Robust model benchmarking and bias- imbalance in data-driven materials science: a case study on MODNet.Journal of Physics: Condensed Matter33(40), 404002 (2021), doi:10.1088/1361-648X/ac1280
-
[32]
A. Kraskov, H. St ¨ogbauer, P. Grassberger, Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics69(6), 066138 (2004), doi:10.1103/ PhysRevE.69.066138
work page 2004
-
[33]
A. Vaswani,et al., Attention is all you need.Advances in neural information processing systems 30, 5998–6008 (2017)
work page 2017
-
[34]
Y. Liu, Y. Wang, J. Zhang, New machine learning algorithm: Random forest, inInternational conference on information computing and applications(Springer) (2012), pp. 246–252, doi: 10.1007/978-3-642-34062-8 32
-
[35]
K. Khan, S. U. Rehman, K. Aziz, S. Fong, S. Sarasvady, DBSCAN: Past, present and future, in The fifth international conference on the applications of digital information and web technolo- gies (ICADIWT 2014)(IEEE) (2014), pp. 232–238, doi:10.1109/ICADIWT.2014.6814687. 27
-
[36]
C. J. Hargreaves, M. S. Dyer, M. W. Gaultois, V. A. Kurlin, M. J. Rosseinsky, The earth mover’s distance as a metric for the space of inorganic compositions.Chemistry of Materials32(24), 10610–10620 (2020), doi:10.1021/acs.chemmater.0c03381
-
[37]
S. Vallender, Calculation of the Wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications18(4), 784–786 (1974)
work page 1974
-
[38]
D. M. Allen, The relationship between variable selection and data agumentation and a method for prediction.technometrics16(1), 125–127 (1974), doi:10.2307/1267500
-
[39]
M. Stone, Cross-validation: A review.Statistics: A Journal of Theoretical and Applied Statistics 9(1), 127–139 (1978)
work page 1978
-
[40]
P. Refaeilzadeh, L. Tang, H. Liu, Cross-validation, inEncyclopedia of database systems (Springer), pp. 532–538 (2009), doi:10.1007/978-0-387-39940-9 565
-
[41]
A. Dunn, Q. Wang, A. Ganose, D. Dopp, A. Jain, Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138, doi:10.1038/s41524-020-00406-3
-
[42]
G. J. Sz ´ekely, M. L. Rizzo, N. K. Bakirov, Measuring and Testing Dependence by Cor- relation of Distances.The Annals of Statistics35(6), 2769–2794 (2007), doi:10.1214/ 009053607000000505. 28
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.