Computational references are not experiments: pre-registered validation of machine-learned sodium-cathode voltages
Pith reviewed 2026-06-26 13:56 UTC · model grok-4.3
The pith
Machine-learned sodium-cathode voltage predictions fail pre-registered validation against experiment because the DFT references are the dominant error source.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On an operator-audited set of six known Na-ion cathodes, the held-out mean absolute error is 0.67 V with an upper 95% confidence bound of 1.09 V on the cross-validated bias-corrected error; the residual is voltage-dependent (r = -0.94) so additive calibration is invalid, and on the two compounds allowing three-way comparison the PBE+U reference lies 0.54 V below experiment while the model prediction is closer to measurement.
What carries the argument
The pre-registered validation against an operator-audited experimental test set that separates model error from reference error.
If this is right
- The screen is retired because it cannot be treated as verified against experiment.
- No additive calibration of the model is valid because residuals vary strongly with voltage.
- At least 70% of the targeted Na substitution space has already been published according to a prior-art screen.
- A calibration audit of the DFT ledger against four benchmark Li couples is now pre-registered.
Where Pith is reading between the lines
- Many other machine-learning battery screens that rely on the same class of DFT references may be limited by reference accuracy rather than model capacity.
- Direct experimental anchoring or improved DFT functionals for voltage prediction would be needed before such screens can be considered reliable.
- The observation that prior computational searches have already covered most of the Na substitution space suggests limited additional yield from further unanchored computational enumeration in this chemical space.
Load-bearing premise
The small operator-audited collection of literature experimental voltages forms an unbiased and representative ground-truth set.
What would settle it
New experimental measurements or additional audited literature values showing that the Materials Project PBE+U voltages lie within 0.2 V of experiment on a larger set of Na cathodes.
Figures
read the original abstract
Machine-learning screens for battery materials are trained and judged almost entirely against computed reference voltages, and those references carry their own systematic errors. We report a case in which this matters quantitatively: our own screening stack (a graph-network voltage screen, a prior-art triage layer, and a local PBE+U bench) fails pre-registered validation against experiment-anchored literature values. Verdict thresholds, failure modes, and the primary metric were committed before analysis. On an operator-audited set of known Na-ion cathodes (n = 6 after one documented exclusion; verdict unchanged at n = 7), the raw held-out mean absolute error was 0.67 V, the pre-registered conservative metric, the upper 95% confidence bound of the cross-validated bias-corrected error, was 1.09 V, and the residual was strongly voltage-dependent (r = -0.94), so no additive calibration is valid. On the two compounds where prediction, database reference, and experiment could all be compared, the Materials Project PBE+U reference sat about 0.54 V below measurement: the reference, not the model, dominated the error. A prior-art screen found at least 70% of the targeted Na substitution space already published. We retire the screen, bound what "verified" means for our DFT ledger, and pre-register a calibration audit of it against four benchmark Li couples.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a pre-registered validation of a machine-learned graph-network voltage screen for Na-ion cathode materials against operator-audited experimental literature values. On a set of n=6 known Na-ion cathodes (after one documented exclusion), the raw held-out MAE is 0.67 V, the pre-registered conservative metric (upper 95% CI of cross-validated bias-corrected error) is 1.09 V, and the residual shows strong voltage dependence (r = -0.94), indicating no valid additive calibration. Analysis of two compounds where prediction, database reference, and experiment can be compared shows the Materials Project PBE+U reference underestimates by ~0.54 V, suggesting the computational reference, not the ML model, dominates the error. The authors retire the screen and pre-register a calibration audit of their DFT ledger against Li couples.
Significance. If the findings hold, this work highlights a critical limitation in ML materials screening: reliance on computed references can lead to misleading performance assessments. The explicit pre-registration of metrics, thresholds, and failure modes, along with transparent documentation of the test set curation, strengthens the credibility of the validation process and provides a model for rigorous benchmarking in the field.
major comments (1)
- [Abstract / Validation results] The operator-audited selection of the n=6 (or n=7) test set with one documented exclusion is load-bearing for the central claim that the ML model fails validation and that the DFT reference dominates the error. With such a small sample, it is not demonstrated that the curation process avoids selection bias that could produce the observed voltage-dependent residual pattern (r=-0.94) and the attribution of error to the references even if the model itself is unbiased.
minor comments (1)
- [Abstract] The notation for the residual correlation (r = -0.94) could benefit from specifying the exact number of points used in the correlation calculation, given the small n.
Simulated Author's Rebuttal
We thank the referee for their review and for emphasizing the importance of test-set curation in a small-sample validation. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract / Validation results] The operator-audited selection of the n=6 (or n=7) test set with one documented exclusion is load-bearing for the central claim that the ML model fails validation and that the DFT reference dominates the error. With such a small sample, it is not demonstrated that the curation process avoids selection bias that could produce the observed voltage-dependent residual pattern (r=-0.94) and the attribution of error to the references even if the model itself is unbiased.
Authors: The test set was defined in the pre-registration as every documented Na-ion cathode in the experimental literature that satisfied the stated inclusion criteria; the single exclusion and its rationale were recorded before any residual analysis. Operator auditing verified compliance with those criteria without reference to model outputs or voltage values. We agree that n=6 inherently limits statistical power to exclude every conceivable selection bias. However, the pre-registration of the full protocol (test-set definition, metric, failure threshold, and analysis plan) before model evaluation removes the most common source of post-hoc bias. The observed r=-0.94 voltage dependence is independently corroborated by the direct DFT-vs-experiment comparison on the two compounds where all three quantities exist, showing a consistent 0.54 V underestimation by the PBE+U reference irrespective of the ML model. This external anchor supports attribution of the dominant error to the computational ledger rather than to curation artifacts. No revision is required. revision: no
Circularity Check
No circularity: central result is direct comparison to external experimental literature values
full rationale
The paper reports pre-registered held-out MAE, bias-corrected upper CI, and residual correlation computed directly from operator-audited literature experimental voltages (n=6/7) for known Na-ion cathodes. These quantities are not derived from any internal equations, fitted parameters, or self-citations; they are straightforward statistical summaries against independent external ground truth. The screening stack description references prior work but does not load-bear the validation metrics or failure conclusion. No self-definitional, fitted-input-as-prediction, or ansatz-smuggling steps appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard statistical procedures for mean absolute error, cross-validated bias correction, and 95% confidence bounds apply without modification to the n=6 sample.
Reference graph
Works this paper leans on
-
[1]
Merchant, S
A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, Scaling deep learning for materials discovery, Nature624, 80 (2023)
2023
-
[2]
N. J. Szymanski, B. Rendy, Y. Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, H. Kim, A. Jain, C. J. Bartel, K. Persson, Y. Zeng, and G. Ceder, An autonomous laboratory for the accelerated synthesis of inorganic materials, Nature 624, 86 (2023)
2023
-
[3]
A. K. Cheetham and R. Seshadri, Artificial Intelligence Driving Materials Discovery? Perspective on the Article: Scaling Deep Learning for Materials Discovery, Chemistry of Materials36, 3490 (2024)
2024
-
[4]
J. Leeman, Y. Liu, J. Stiles, S. B. Lee, P. Bhatt, L. M. Schoop, and R. G. Palgrave, Challenges in High- Throughput Inorganic Materials Prediction and Au- tonomous Synthesis, PRX Energy3, 10.1103/PRXEn- ergy.3.011002 (2024)
-
[5]
M. K. Aydinol, A. F. Kohan, G. Ceder, K. Cho, and J. Joannopoulos, Ab initio study of lithium intercalation in metal oxides and metal dichalcogenides, Physical Review B56, 1354 (1997)
1997
-
[6]
L. Wang, T. Maxisch, and G. Ceder, Oxidation energies of transition metal oxides within the GGA + U framework, Physical Review B73, 10.1103/PhysRevB.73.195107 (2006)
-
[7]
A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A. Persson, Commentary: The Materials Project: A materials genome approach to accelerating materials innovation, APL Materials1, 10.1063/1.4812323 (2013)
-
[8]
B. A. Nosek, C. R. Ebersole, A. C. DeHaven, and D. T. Mellor, The preregistration revolution, Proceedings of the National Academy of Sciences115, 2600 (2018)
2018
-
[9]
Davies, K
D. Davies, K. Butler, A. Jackson, J. Skelton, K. Morita, and A. Walsh, SMACT: Semiconducting Materials by Analogy and Chemical Theory, Journal of Open Source Software4, 1361 (2019)
2019
-
[10]
I. Batatia, D. P. Kov´ acs, G. N. C. Simm, C. Ortner, and G. Cs´ anyi, MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields (2022), arXiv:2206.07697
arXiv 2022
-
[11]
I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kov´ acs, J. Riebesell, X. R. Advincula, M. Asta, M. Avay- lon, W. J. Baldwin,et al., A foundation model for atom- istic materials chemistry (2023), arXiv:2401.00096
Pith/arXiv arXiv 2023
-
[12]
Giannozzi, S
P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo,et al., QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials, Journal of Physics: Condensed Matter21, 395502 (2009)
2009
-
[13]
Giannozzi, O
P. Giannozzi, O. Andreussi, T. Brumme, O. Bunau, M. Buongiorno Nardelli, M. Calandra, R. Car, C. Cavaz- zoni, D. Ceresoli, M. Cococcioni,et al., Advanced capabil- ities for materials modelling with Quantum ESPRESSO, Journal of Physics: Condensed Matter29, 465901 (2017)
2017
-
[14]
Z. Jian, L. Zhao, H. Pan, Y.-S. Hu, H. Li, W. Chen, and L. Chen, Carbon coated Na3V2(PO4)3 as novel elec- trode material for sodium ion batteries, Electrochemistry Communications14, 86 (2012)
2012
-
[15]
Kim, D.-H
J. Kim, D.-H. Seo, H. Kim, I. Park, J.-K. Yoo, S.-K. Jung, Y.-U. Park, W. A. Goddard III, and K. Kang, Unexpected discovery of low-cost maricite NaFePO 4 as a high-performance electrode for Na-ion batteries, Energy & Environmental Science8, 540 (2015)
2015
-
[16]
Chiring, M
A. Chiring, M. Mazumder, S. K. Pati, C. S. Johnson, and P. Senguttuvan, Unraveling the formation mechanism of 12 NaCoPO4 polymorphs, Journal of Solid State Chemistry 293, 121766 (2021)
2021
-
[17]
Barker, M
J. Barker, M. Y. Saidi, and J. L. Swoyer, A Sodium- Ion Cell Based on the Fluorophosphate Compound NaVPO[sub 4]F, Electrochemical and Solid-State Letters 6, A1 (2003)
2003
-
[18]
A. K. Padhi, K. S. Nanjundaswamy, and J. B. Goode- nough, Phospho-olivines as Positive-Electrode Materials for Rechargeable Lithium Batteries, Journal of The Elec- trochemical Society144, 1188 (1997)
1997
-
[19]
Nishimura, M
S.-i. Nishimura, M. Nakamura, R. Natsui, and A. Yamada, New Lithium Iron Pyrophosphate as 3.5 V Class Cathode Material for Lithium Ion Battery, Journal of the American Chemical Society132, 13596 (2010)
2010
-
[20]
Thackeray, W
M. Thackeray, W. David, P. Bruce, and J. Goodenough, Lithium insertion into manganese spinels, Materials Re- search Bulletin18, 461 (1983)
1983
-
[21]
Ohzuku, M
T. Ohzuku, M. Kitagawa, and T. Hirai, Electrochemistry of Manganese Dioxide in Lithium Nonaqueous Cell: III . X-Ray Diffractional Study on the Reduction of Spinel- Related Manganese Dioxide, Journal of The Electrochem- ical Society137, 769 (1990)
1990
-
[22]
Rodr´ ıguez-Carvajal, G
J. Rodr´ ıguez-Carvajal, G. Rousse, C. Masquelier, and M. Hervieu, Electronic Crystallization in a Lithium Bat- tery Material: Columnar Ordering of Electrons and Holes in the Spinel LiMn 2 O 4, Physical Review Letters81, 4660 (1998)
1998
-
[23]
H. J. Monkhorst and J. D. Pack, Special points for Brillouin-zone integrations, Physical Review B13, 5188 (1976)
1976
-
[24]
S. L. Dudarev, G. A. Botton, S. Y. Savrasov, C. J. Humphreys, and A. P. Sutton, Electron-energy-loss spec- tra and the structural stability of nickel oxide: An LSDA+U study, Physical Review B57, 1505 (1998)
1998
-
[25]
Timrov, N
I. Timrov, N. Marzari, and M. Cococcioni, HP – A code for the calculation of Hubbard parameters using density- functional perturbation theory, Computer Physics Com- munications279, 108455 (2022)
2022
-
[26]
K. F. Garrity, J. W. Bennett, K. M. Rabe, and D. Van- derbilt, Pseudopotentials for high-throughput DFT calcu- lations, Computational Materials Science81, 446 (2014)
2014
-
[27]
Dal Corso, Pseudopotentials periodic table: From H to Pu, Computational Materials Science95, 337 (2014)
A. Dal Corso, Pseudopotentials periodic table: From H to Pu, Computational Materials Science95, 337 (2014)
2014
-
[28]
Chen and S
C. Chen and S. P. Ong, A universal graph deep learn- ing interatomic potential for the periodic table, Nature Computational Science2, 718 (2022)
2022
-
[29]
Bootstrap methods: Another look at the jackknife,
B. Efron, Bootstrap Methods: Another Look at the Jack- knife, The Annals of Statistics7, 10.1214/aos/1176344552 (1979)
-
[30]
S.-H. Bo, X. Li, A. J. Toumar, and G. Ceder, Layered-to- Rock-Salt Transformation in Desodiated Na x CrO2 ( x 0.4), Chemistry of Materials28, 1419 (2016)
2016
-
[31]
H. Kim, I. Park, D.-H. Seo, S. Lee, S.-W. Kim, W. J. Kwon, Y.-U. Park, C. S. Kim, S. Jeon, and K. Kang, New Iron-Based Mixed-Polyanion Cathodes for Lithium and Sodium Rechargeable Batteries: Combined First Princi- ples Calculations and Experimental Study, Journal of the American Chemical Society134, 10369 (2012)
2012
-
[32]
G. Yan, S. Mariyappan, G. Rousse, Q. Jacquet, M. De- schamps, R. David, B. Mirvaux, J. W. Freeland, and J.-M. Tarascon, Higher energy and safer sodium ion batteries via an electrochemically made disordered Na3V2(PO4)2F3 material, Nature Communications10, 10.1038/s41467- 019-08359-y (2019)
-
[33]
Barpanda, T
P. Barpanda, T. Ye, S.-i. Nishimura, S.-C. Chung, Y. Ya- mada, M. Okubo, H. Zhou, and A. Yamada, Sodium iron pyrophosphate: A novel 3.0 V iron-based cathode for sodium-ion batteries, Electrochemistry Communications 24, 116 (2012)
2012
-
[34]
P. Barpanda, G. Oyama, S.-i. Nishimura, S.-C. Chung, and A. Yamada, A 3.8-V earth-abundant sodium battery electrode, Nature Communications5, 10.1038/ncomms5358 (2014)
-
[35]
I. U. Mohsin, L. Schneider, Z. Yu, W. Cai, and C. Ziebert, Enabling the Electrochemical Performance of Maricite- NaMnPO4 and Maricite-NaFePO4 Cathode Materials in Sodium-Ion Batteries, International Journal of Electro- chemistry2023, 1 (2023)
2023
-
[36]
Tripathi, T
R. Tripathi, T. N. Ramesh, B. L. Ellis, and L. F. Nazar, Scalable Synthesis of Tavorite LiFeSO4 F and NaFeSO4 F Cathode Materials, Angewandte Chemie International Edition49, 8738 (2010)
2010
-
[37]
Komaba, N
S. Komaba, N. Yabuuchi, T. Nakayama, A. Ogata, T. Ishikawa, and I. Nakai, Study on the Reversible Elec- trode Reaction of Na 1−−x Ni0.5 Mn0.5 O2 for a Recharge- able Sodium-Ion Battery, Inorganic Chemistry51, 6211 (2012)
2012
-
[38]
B. L. Ellis, W. R. M. Makahnouk, Y. Makimura, K. Toghill, and L. F. Nazar, A multifunctional 3.5 V iron-based phosphate cathode for rechargeable batteries, Nature Materials6, 749 (2007)
2007
-
[39]
Kawabe, N
Y. Kawabe, N. Yabuuchi, M. Kajiyama, N. Fukuhara, T. Inamasu, R. Okuyama, I. Nakai, and S. Komaba, Synthesis and electrode performance of carbon coated Na2FePO4F for rechargeable Na batteries, Electrochem- istry Communications13, 1225 (2011)
2011
-
[40]
Y. Liu, Y. Zhou, J. Zhang, Y. Xia, T. Chen, and S. Zhang, Monoclinic Phase Na 3 Fe2(PO4)3: Synthesis, Structure, and Electrochemical Performance as Cathode Material in Sodium-Ion Batteries, ACS Sustainable Chemistry & Engineering5, 1306 (2016)
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.