pith. sign in

arxiv: 2603.07805 · v2 · submitted 2026-03-08 · ❄️ cond-mat.mtrl-sci

Machine Learning for Electrode Materials: Property Prediction via Composition

Pith reviewed 2026-05-15 14:12 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci
keywords machine learningbattery electrodesproperty predictionCrabNetcomposition-based modelingmaterials discoveryMODNetclustering
0
0 comments X

The pith

CrabNet outperforms MODNet and random forest models when predicting battery electrode properties from composition alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks three machine learning approaches for forecasting properties such as voltage and capacity in battery electrode materials, using only their chemical makeup as input. It tests MODNet, CrabNet, and a random forest built on Magpie descriptors against the Materials Project Battery Explorer data and finds CrabNet ahead on accuracy metrics. Multiple validation steps, including bootstrap resampling and two cross-validation schemes, plus t-SNE and DBSCAN clustering on the learned features, support the ranking. The work matters because reliable compositional screening can narrow the search space for new electrode candidates before costly synthesis and testing begin. The authors also flag practical limits when moving these models from benchmark to industrial workflows.

Core claim

CrabNet consistently records higher predictive accuracy than MODNet and the Magpie random forest across voltage, capacity, and energy-density targets on the Materials Project Battery Explorer dataset. Superiority persists under bootstrap resampling, leave-one-cluster-out cross-validation, and stratified five-fold cross-validation. Unsupervised clustering applied to MODNet-derived features produces coherent material groupings without any property labels supplied in advance.

What carries the argument

CrabNet, a graph neural network that treats material composition as an input graph to predict scalar properties.

If this is right

  • Compositional ML models can rapidly screen large libraries of candidate electrode formulas for promising voltage or capacity values.
  • Graph-based architectures capture composition-property relationships more effectively than feature-engineered random forests for this task.
  • Unsupervised clustering on model embeddings reveals natural families of battery materials without requiring labeled data.
  • Early-stage ML screening is feasible for battery research despite limits on data quality and model transfer to real devices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Retraining CrabNet on experimental rather than computed data could test whether the accuracy advantage survives outside the Materials Project.
  • The discovered clusters might highlight under-explored compositional regions for new electrode candidates.
  • Hybrid models that inject physical constraints could address the practical limitations the paper identifies.
  • Extending the benchmark to additional properties such as cycle life would show how far the compositional approach generalizes.

Load-bearing premise

The Materials Project Battery Explorer dataset and its chosen property labels accurately reflect real electrode behavior without major bias from computational methods or data collection.

What would settle it

Retraining and testing the three models on an independent set of experimentally measured electrode properties and finding that CrabNet no longer leads in accuracy metrics would refute the claim of consistent outperformance.

read the original abstract

In this work, we benchmark three leading Machine Learning (ML) frameworks-MODNet, CrabNet, and a random forest model based on Magpie feature-for predicting properties of battery electrode materials using the Materials Project Battery Explorer dataset. We evaluate these models based on predictive accuracy, visualize numerical features using two-dimensional embeddings, and quantify performance using standard metrics. Our results demonstrate that CrabNet consistently outperforms the other models across all tests. To validate these findings, we employ robust statistical methods: bootstrap resampling and two cross-validation (CV) strategies (leave one cluster out and stratified 5-fold CV), comparing each model against a control baseline. In addition, we apply unsupervised clustering on MODNet-derived features using t-SNE and DBSCAN, revealing coherent material groupings without prior labels. This analysis confirms the robustness of the evaluated models and underscores the potential of ML-driven approaches for accelerating the electrode materials discovery. However, our study also identifies practical limitations and quantifies challenges associated with integrating ML models into materials science workflows. Despite these constraints, our findings suggest that ML models are highly effective for early-stage compositional screening in the battery industry. This work provides a foundation for future research on ML applications in materials discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript benchmarks three ML models—MODNet, CrabNet, and a Magpie-feature-based random forest—for predicting compositional properties of battery electrode materials drawn from the Materials Project Battery Explorer dataset. It reports that CrabNet achieves superior accuracy on standard metrics, validates this via bootstrap resampling plus leave-one-cluster-out and stratified 5-fold CV, and supplements the analysis with t-SNE/DBSCAN clustering of MODNet-derived features to reveal unlabeled material groupings. The work concludes that ML approaches, particularly CrabNet, are effective for early-stage compositional screening despite practical workflow limitations.

Significance. If the central empirical ranking holds under broader validation, the study supplies a concrete, statistically supported comparison of off-the-shelf frameworks for battery-materials property prediction and demonstrates the utility of unsupervised clustering for exploratory analysis. The explicit use of bootstrap resampling and two distinct CV protocols is a positive methodological feature that strengthens internal reproducibility claims.

major comments (2)
  1. [Abstract / Results] Abstract and Results sections: the central claim that 'CrabNet consistently outperforms the other models across all tests' rests exclusively on GGA(+U) computed labels from the Materials Project. Because these labels carry known composition-dependent systematic errors (e.g., underestimation of transition-metal redox voltages), any model that better exploits the same chemical features used to generate the surrogate labels could appear superior without necessarily generalizing to experimental ground truth. The CV and bootstrap procedures control only for statistical variation within the computed set and do not test transferability.
  2. [Methods] Methods: hyperparameter selection, exact data-split indices, and any post-hoc sample exclusions are not described. Without these details it is impossible to determine whether the reported CrabNet advantage is robust to reasonable modeling choices or whether it partly reflects an advantageous (but unreported) tuning protocol.
minor comments (2)
  1. [Abstract] The abstract refers to 'standard metrics' without naming them (MAE, RMSE, R², etc.); the main text should list the precise metrics and report numerical values with uncertainties for each model and property.
  2. [Figures] Figure captions for the t-SNE/DBSCAN embeddings should state the perplexity, learning rate, and DBSCAN parameters used, as these choices affect the apparent cluster coherence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments highlight important limitations in scope and reproducibility that we address below. We agree that our benchmarking is confined to computed labels and will strengthen the discussion of this point; we will also expand the Methods section with the requested details to ensure full reproducibility.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results sections: the central claim that 'CrabNet consistently outperforms the other models across all tests' rests exclusively on GGA(+U) computed labels from the Materials Project. Because these labels carry known composition-dependent systematic errors (e.g., underestimation of transition-metal redox voltages), any model that better exploits the same chemical features used to generate the surrogate labels could appear superior without necessarily generalizing to experimental ground truth. The CV and bootstrap procedures control only for statistical variation within the computed set and do not test transferability.

    Authors: We agree that all reported results are obtained on GGA(+U) formation energies and voltages from the Materials Project and that these labels contain known systematic biases relative to experiment. Our central claim is therefore limited to relative model performance on this specific computational dataset, which is the standard practice for compositional screening studies. The bootstrap and CV protocols demonstrate that the CrabNet advantage is statistically robust within the computed data distribution, but they do not address transfer to experimental values. In the revised manuscript we will (i) explicitly qualify the claim in the abstract and conclusions to refer to “computed properties from the Materials Project,” (ii) add a dedicated paragraph in the Discussion section acknowledging the composition-dependent errors in the labels and the absence of experimental validation, and (iii) suggest that future work should benchmark against experimental electrode datasets once they become sufficiently large. revision: partial

  2. Referee: [Methods] Methods: hyperparameter selection, exact data-split indices, and any post-hoc sample exclusions are not described. Without these details it is impossible to determine whether the reported CrabNet advantage is robust to reasonable modeling choices or whether it partly reflects an advantageous (but unreported) tuning protocol.

    Authors: We apologize for the incomplete Methods description. The revised manuscript will contain a new subsection “Hyperparameter Optimization and Data Splits” that specifies: (a) the exact hyperparameter search spaces and optimization procedure (random search with 5-fold inner CV for CrabNet and MODNet; default scikit-learn settings for the random forest after a small grid search), (b) the random seeds and stratification criteria used to generate the train/test partitions, and (c) the precise criteria and number of samples excluded (primarily entries with missing voltage or formation-energy values). We will also deposit the full list of material IDs used in each split as supplementary data to allow exact reproduction. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical benchmarking on held-out data

full rationale

The paper reports a direct comparison of three ML models (MODNet, CrabNet, Magpie-RF) trained and evaluated on the Materials Project Battery Explorer dataset. Performance is quantified via standard metrics on held-out folds using leave-one-cluster-out and stratified 5-fold CV plus bootstrap resampling. No derivation, equation, or central claim reduces to a fitted parameter renamed as a prediction, a self-referential definition, or a load-bearing self-citation chain. The ranking of models is an independent empirical result conditioned on the chosen dataset and features; it does not presuppose its own outcome by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study rests on the assumption that the public dataset labels are reliable ground truth and that standard ML training procedures produce generalizable predictors for electrode properties. No explicit free parameters or invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5518 in / 959 out tokens · 61727 ms · 2026-05-15T14:12:14.770974+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features

    clearly separates materials on a planar map, with compounds that share the same working ion found clustering tightly together. Figure 4 shows a 2D-map of the t-SNE embeddings of MODNet and CrabNet features. The points have been colored according to their gravimetric and volumetric capacity using equidistance segmentation of the target variables to define ...

  2. [2]

    A. G. Olabi, Q. Abbas, P. A. Shinde, M. A. Abdelkareem, Rechargeable batteries: Technological advancement, challenges, current and emerging applications.Energy266, 126408 (2023), doi: 10.1016/j.energy.2022.126408

  3. [3]

    M. E. Sotomayor,et al., Ultra-thick battery electrodes for high gravimetric and volumetric energy density Li-ion batteries.Journal of Power Sources437, 226923 (2019), doi:10.1016/j. jpowsour.2019.226923

  4. [4]

    J. Wu,et al., Building efficient ion pathway in highly densified thick electrodes with high gravimetric and volumetric energy densities.Nano Letters21(21), 9339–9346 (2021), doi: 10.1021/acs.nanolett.1c03724

  5. [5]

    Zhang, C

    Y. Zhang, C. Ling, A strategy to apply machine learning to small datasets in materials science. Npj Computational Materials4(1), 25 (2018), doi:10.1038/s41524-018-0081-z. 24

  6. [6]

    Wei,et al., Machine learning in materials science.InfoMat1(3), 338–358 (2019), doi: 10.1002/inf2.12028

    J. Wei,et al., Machine learning in materials science.InfoMat1(3), 338–358 (2019), doi: 10.1002/inf2.12028

  7. [7]

    S. P. Ong,et al., The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science97, 209–215 (2015), doi:10.1016/j.commatsci.2014.10.037

  8. [8]

    C. J. Hargreaves,et al., A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning.npj Computational Materials9(1), 9 (2023), doi:10.1038/s41524-022-00951-z

  9. [9]

    C. W. Andersen,et al., OPTIMADE, an API for exchanging materials data.Scientific data 8(1), 217 (2021), doi:10.1038/s41597-021-00974-z

  10. [10]

    M. K. Horton,et al., Accelerated data-driven materials science with the Materials Project. Nature Materialspp. 1–11 (2025), doi:10.1038/s41563-025-02272-0

  11. [11]

    Choudhary,et al., Recent advances and applications of deep learning methods in materials science.npj Computational Materials8(1), 59 (2022), doi:10.1038/s41524-022-00734-6

    K. Choudhary,et al., Recent advances and applications of deep learning methods in materials science.npj Computational Materials8(1), 59 (2022), doi:10.1038/s41524-022-00734-6

  12. [12]

    A. D. Sendek,et al., Machine learning-assisted discovery of solid Li-ion conducting materials. Chemistry of Materials31(2), 342–352 (2018), doi:10.1021/acs.chemmater.8b03272

  13. [13]

    Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries

    L. Zhou,et al., Machine learning assisted prediction of cathode materials for Zn-ion batteries. Advanced Theory and Simulations4(9), 2100196 (2021), doi:10.1002/adts.202100196

  14. [14]

    M. L. Adam,et al., Navigating materials chemical space to discover new battery electrodes using machine learning.Energy Storage Materials65, 103090 (2024), doi:10.1016/j.ensm. 2023.103090

  15. [15]

    Zhang, Y

    Z. Zhang, Y. Wang, S. Li, S. Li, M. Chen, Interpretable Machine Learning Prediction of Voltage and Specific Capacity for Electrode Materials.Advanced Theory and Simulations 7(8), 2400227 (2024), doi:10.1002/adts.202400227

  16. [16]

    2018 , issn =

    L. Ward,et al., Matminer: An open source toolkit for materials data mining.Computational Materials Science152, 60–69 (2018), doi:10.1016/j.commatsci.2018.05.018. 25

  17. [17]

    Materials Project Battery Explorer,https://legacy.materialsproject.org/#search/ batteries/

  18. [18]

    Charting the complete elastic properties of inorganic crystalline compounds

    M. de Jong,et al., Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data2(2015), doi:10.1038/sdata.2015.9

  19. [19]

    F. Zhou, M. Cococcioni, C. A. Marianetti, D. Morgan, G. Ceder, First-principles prediction of redox potentials in transition-metal compounds with LDA+U.Physical Review B70, 235121 (2004), doi:10.1103/PhysRevB.70.235121

  20. [20]

    L. Wang, T. Maxisch, G. Ceder, A First-Principles Approach to Studying the Thermal Stability of Oxide Cathode Materials.Chemistry of Materials19(3), 543–552 (2007), doi:10.1021/ cm0620943

  21. [21]

    S. P. Ong, A. Jain, G. Hautier, B. Kang, G. Ceder, Thermal stabilities of delithiated olivine MPO4 (M=Fe, Mn) cathodes investigated using first principles calculations.Electrochemistry Communications12(3), 427–430 (2010), doi:10.1016/j.elecom.2010.01.010

  22. [22]

    S. P. Ong,et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science68, 314–319 (2013), doi: 10.1016/j.commatsci.2012.10.028

  23. [23]

    L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials2(1), 1–7 (2016), doi:10.1038/npjcompumats.2016.28

  24. [24]

    K. Choudhary,et al., The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design.npj computational materials6(1), 173 (2020), doi:10.1038/s41524-020-00440-1

  25. [25]

    Persson, Gerbrand Ceder, and Anubhav Jain

    V. Tshitoyan,et al., Unsupervised word embeddings capture latent knowledge from materials science literature.Nature571(7763), 95–98 (2019), doi:10.1038/s41586-019-1335-8

  26. [26]

    Jha,et al., Elemnet: Deep learning the chemistry of materials from only elemental compo- sition.Scientific reports8(1), 17593 (2018), doi:10.1038/s41598-018-35934-y

    D. Jha,et al., Elemnet: Deep learning the chemistry of materials from only elemental compo- sition.Scientific reports8(1), 17593 (2018), doi:10.1038/s41598-018-35934-y. 26

  27. [27]

    Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet

    P.-P. De Breuck, G. Hautier, G.-M. Rignanese, Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet.npj computational materials 7(1), 83 (2021), doi:10.1038/s41524-021-00552-2

  28. [28]

    A. Y.-T. Wang, S. K. Kauwe, R. J. Murdock, T. D. Sparks, Compositionally restricted attention- based network for materials property predictions.npj Computational Materials7(1), 77 (2021), doi:10.1038/s41524-021-00545-1

  29. [29]

    A. Y.-T. Wang, M. S. Mahmoud, M. Czasny, A. Gurlo, CrabNet for Explainable Deep Learning in Materials Science: Bridging the Gap Between Academia and Industry.Integrating Materials and Manufacturing Innovation11(1), 41–56 (2022), doi:10.1007/s40192-021-00247-y

  30. [30]

    Heath, S

    D. Heath, S. Kasif, S. Salzberg, k-DT: A multi-tree learning method, inProc. of the Second Int. Workshop on Multistrategy Learning(1993), pp. 138–149

  31. [31]

    Robust model benchmarking and bias- imbalance in data-driven materials science: a case study on MODNet

    P.-P. De Breuck, M. L. Evans, G.-M. Rignanese, Robust model benchmarking and bias- imbalance in data-driven materials science: a case study on MODNet.Journal of Physics: Condensed Matter33(40), 404002 (2021), doi:10.1088/1361-648X/ac1280

  32. [32]

    Kraskov, H

    A. Kraskov, H. St ¨ogbauer, P. Grassberger, Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics69(6), 066138 (2004), doi:10.1103/ PhysRevE.69.066138

  33. [33]

    Vaswani,et al., Attention is all you need.Advances in neural information processing systems 30, 5998–6008 (2017)

    A. Vaswani,et al., Attention is all you need.Advances in neural information processing systems 30, 5998–6008 (2017)

  34. [34]

    Y. Liu, Y. Wang, J. Zhang, New machine learning algorithm: Random forest, inInternational conference on information computing and applications(Springer) (2012), pp. 246–252, doi: 10.1007/978-3-642-34062-8 32

  35. [35]

    K. Khan, S. U. Rehman, K. Aziz, S. Fong, S. Sarasvady, DBSCAN: Past, present and future, in The fifth international conference on the applications of digital information and web technolo- gies (ICADIWT 2014)(IEEE) (2014), pp. 232–238, doi:10.1109/ICADIWT.2014.6814687. 27

  36. [36]

    C. J. Hargreaves, M. S. Dyer, M. W. Gaultois, V. A. Kurlin, M. J. Rosseinsky, The earth mover’s distance as a metric for the space of inorganic compositions.Chemistry of Materials32(24), 10610–10620 (2020), doi:10.1021/acs.chemmater.0c03381

  37. [37]

    Vallender, Calculation of the Wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications18(4), 784–786 (1974)

    S. Vallender, Calculation of the Wasserstein distance between probability distributions on the line.Theory of Probability & Its Applications18(4), 784–786 (1974)

  38. [38]

    D. M. Allen, The relationship between variable selection and data agumentation and a method for prediction.technometrics16(1), 125–127 (1974), doi:10.2307/1267500

  39. [39]

    Stone, Cross-validation: A review.Statistics: A Journal of Theoretical and Applied Statistics 9(1), 127–139 (1978)

    M. Stone, Cross-validation: A review.Statistics: A Journal of Theoretical and Applied Statistics 9(1), 127–139 (1978)

  40. [40]

    Refaeilzadeh, L

    P. Refaeilzadeh, L. Tang, H. Liu, Cross-validation, inEncyclopedia of database systems (Springer), pp. 532–538 (2009), doi:10.1007/978-0-387-39940-9 565

  41. [41]

    A. Dunn, Q. Wang, A. Ganose, D. Dopp, A. Jain, Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138, doi:10.1038/s41524-020-00406-3

  42. [42]

    G. J. Sz ´ekely, M. L. Rizzo, N. K. Bakirov, Measuring and Testing Dependence by Cor- relation of Distances.The Annals of Statistics35(6), 2769–2794 (2007), doi:10.1214/ 009053607000000505. 28