pith. sign in

arxiv: 2606.11870 · v1 · pith:4NPQIZEInew · submitted 2026-06-10 · ❄️ cond-mat.mtrl-sci · cs.LG

Modelling magnetic material properties with uncertainty-aware neural networks

Pith reviewed 2026-06-27 09:15 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.LG
keywords uncertainty quantificationneural networksmagnetic propertiespermanent magnetscoercivitygraph neural networksmaterial discoverypredictive uncertainty
0
0 comments X

The pith

Uncertainty quantification via neural networks makes magnetic property predictions more trustworthy and transfers to new tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that adding uncertainty estimation to machine learning models for magnetic materials improves the reliability of those models when data is scarce or when predicting for new compositions. It first benchmarks several models on intrinsic property prediction using Gaussian negative log-likelihood loss and dropout to generate uncertainty values, then moves the same uncertainty features to a graph neural network that predicts coercivity from microstructure graphs. A sympathetic reader would care because this approach lets researchers know which model outputs to trust before committing to expensive experiments on candidate permanent magnet materials.

Core claim

The authors demonstrate that uncertainty quantification not only enhances the trustworthiness of predictions but is also transferable across different modeling tasks: standard models for intrinsic magnetic properties and a graph neural network for coercivity both benefit from Gaussian negative log-likelihood loss and dropout-based Bayesian approximation.

What carries the argument

Gaussian negative log-likelihood loss combined with dropout-based Bayesian approximation to produce predictive uncertainty estimates in neural networks.

If this is right

  • Researchers can use the uncertainty values to flag and set aside low-confidence predictions during material screening.
  • The same uncertainty machinery can be reused when moving from simple property models to graph-based microstructure models without new calibration.
  • Out-of-distribution cases in compositional design spaces become detectable rather than hidden.
  • Model outputs for coercivity gain an accompanying reliability score that was previously unavailable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The transferability could support modular uncertainty components that plug into multiple material prediction pipelines.
  • High-uncertainty predictions could guide targeted experiments in an active-learning loop for permanent magnets.
  • Similar uncertainty handling might extend to other physical properties where data sparsity and out-of-distribution needs are common.

Load-bearing premise

The chosen loss and dropout methods produce well-calibrated uncertainty estimates that remain useful when models encounter new material structures.

What would settle it

A collection of previously unseen magnetic material compositions where the model's reported uncertainty shows no correlation with its actual prediction error.

Figures

Figures reproduced from arXiv: 2606.11870 by Akihito Kinoshita, Akira Kato, Alexander Kovacs, Clemens Wager, Harald Oezelt, Hayate Yamano, Heisam Moustafa, Hyuga Hosoi, Masao Yano, Noritsugu Sakuma, Qais Ali, Tetsuya Shoji, Thomas Schrefl.

Figure 1
Figure 1. Figure 1: Aleatoric uncertainty arises from the intrinsic randomness of the observations which [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: for a visual explanation of the model architecture. A significant advan￾tage of integrating MC dropout over ensemble methods is that only a single model needs to be trained, rather than a large number of separate models. This reduces training time and computational costs. This effect scales with increasing model complexity. The mean µ of the M stochastic predictions yi represents the final prediction value… view at source ↗
Figure 3
Figure 3. Figure 3: Procedure to obtain a confidence curve: With this routine a plot can be created that visually quantifies how informed a model’s uncertainty estimates are. First, an uncertainty-aware machine learning model is trained and predicts on an unseen test dataset. The machine learning model produces a prediction value µ and uncertainty estimates σ for each predicted target. Then we sort the predictions by the corr… view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of experimental measurement labels of all 1519 datapoints. The plot shows [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: A Evaluation plots of the Gaussian process model predicting spontaneous magnetization µ0Ms with an R 2 -score of 94% on the 5-fold CV test folds. B The increasing confidence curve indicates that the model’s uncertainty estimation is meaningless. The light blue shadow indicates the standard deviation from the 10 repetitions of the uncertainty evaluation method. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A Evaluation plots of the Gaussian process model predicting spontaneous magnetization µ0Ms with an R 2 -score of 97% on the 5-fold CV test folds. B The increasing confidence curve indicates that the model’s uncertainty estimation is meaningless. The light blue shadow indicates the standard deviation from the 10 repetitions of the uncertainty evaluation method. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Only the epistemic uncertainty σ 2 e was evaluated for this plot, because the random forest model only estimates this type of uncertainty. A Evaluation plots of the random forest model predicting spontaneous magnetization µ0Ms with an R 2 -score of 92% on the 5-fold CV test folds. B The confidence curve shows a slight negative slope. The light blue shadow indicates the standard deviation from the 10 repeti… view at source ↗
Figure 8
Figure 8. Figure 8: Only the epistemic uncertainty σ 2 e was evaluated for this plot, because the random forest model only estimates this type of uncertainty. A Evaluation plots of the random forest model predicting anisotropy field µ0Ha with an R 2 -score of 94% on the 5-fold CV test folds. B The declining confidence curve indicates good uncertainty estimation. We observe an S-shape that indicates that highly uncertain predi… view at source ↗
Figure 9
Figure 9. Figure 9: A Evaluation plots of the Bayesian neural network model predicting spontaneous mag￾netization µ0Ms with an R 2 -score of 90% on the 5-fold CV test folds. The uncertainty estimation colored in the residual plot (upper panel) does not coincide with the residuals. B The aleatoric and the total uncertainty’s confidence curves decline slowly but steadily. The epistemic uncertainty’s confidence curve declines at… view at source ↗
Figure 10
Figure 10. Figure 10: A Evaluation plots of the Bayesian neural network model predicting the anisotropy field µ0Ha, achieving an R 2 -score of 92% on the 5-fold CV test folds. B The confidence curves cor￾responding to all three uncertainty estimations (epistemic, aleatoric and their combination) show a consistent and steep decline, indicating that higher predicted uncertainty reliably corresponds to larger prediction errors de… view at source ↗
Figure 11
Figure 11. Figure 11: A Residual and measured vs. predicted plot of the graph neural network model predicting coercivity µ0Hc with an R 2 of 94% on the 5-fold CV test folds. B The confidence curve, which is evaluated on the test set, demonstrates the effectiveness of the model’s uncertainty estimates, especially for the aleatoric and total uncertainty. The model was trained using 70% of the total dataset and predicted on a tes… view at source ↗
read the original abstract

Machine learning is increasingly applied to accelerate the discovery of novel materials by exploring large compositional and structural design spaces. Yet, the scarcity of high-quality data and the frequent need for out-of-distribution prediction introduce substantial uncertainty, making the assessment of model reliability essential. In this work, we investigate uncertainty quantification as a means to evaluate model confidence in the context of permanent magnet research. In a first study, we benchmark classical and modern machine learning models for predicting intrinsic magnetic properties, focusing on the quality of their uncertainty estimates. We apply Gaussian negative log-likelihood loss and dropout-based Bayesian approximation as practical strategies for estimating predictive uncertainty. In a second study, we transfer these architectural features for uncertainty estimation to a more complex task: predicting coercivity from microstructural information using a graph neural network. Together, these studies demonstrate that uncertainty quantification not only enhances the trustworthiness of predictions but is also transferable across different modeling tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper investigates uncertainty quantification (UQ) in machine learning for magnetic materials, with two studies. The first benchmarks classical and modern ML models on intrinsic magnetic properties using Gaussian negative log-likelihood loss and dropout-based Bayesian approximation to estimate predictive uncertainty. The second transfers these UQ techniques to a graph neural network task predicting coercivity from microstructural data. The central claim is that UQ enhances the trustworthiness of predictions and is transferable across modeling tasks in permanent magnet research.

Significance. If the empirical results demonstrate well-calibrated uncertainties that improve decision-making in data-scarce and out-of-distribution settings, the work could offer practical tools for reliable ML-assisted materials discovery. The transferability across intrinsic-property regression and GNN-based microstructure tasks would be a notable contribution if supported by quantitative evidence such as calibration metrics and ablation studies.

major comments (2)
  1. [Abstract] Abstract: the central claim that the two studies 'demonstrate' enhanced trustworthiness and transferability of UQ cannot be evaluated, as the provided text contains no results, validation metrics, error bars, calibration plots, or ablation studies comparing models with and without UQ.
  2. [Abstract] The weakest assumption—that Gaussian NLL loss combined with dropout produces well-calibrated uncertainties useful for OOD predictions in magnetic-property modeling—is stated but not tested in the visible material; without explicit calibration or OOD experiments, the transferability claim remains ungrounded.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments on the abstract. We address each point below and note that the full manuscript contains the supporting results, metrics, and experiments referenced in the studies.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the two studies 'demonstrate' enhanced trustworthiness and transferability of UQ cannot be evaluated, as the provided text contains no results, validation metrics, error bars, calibration plots, or ablation studies comparing models with and without UQ.

    Authors: The abstract serves as a concise summary and conventionally omits detailed numerical results, plots, or metrics, which appear in the main text. The manuscript reports validation metrics, error bars, calibration plots, and ablation studies comparing models with and without UQ for both the intrinsic-property benchmarks and the GNN coercivity task. We will revise the abstract to more precisely indicate that the demonstrations rest on the empirical evidence detailed in the results sections. revision: yes

  2. Referee: [Abstract] The weakest assumption—that Gaussian NLL loss combined with dropout produces well-calibrated uncertainties useful for OOD predictions in magnetic-property modeling—is stated but not tested in the visible material; without explicit calibration or OOD experiments, the transferability claim remains ungrounded.

    Authors: The full manuscript contains explicit calibration assessments and out-of-distribution experiments for uncertainties obtained with Gaussian negative log-likelihood loss and dropout-based approximation. These are presented for the classical and modern ML models on intrinsic magnetic properties and transferred to the graph neural network microstructure task. We will revise the abstract wording to better reflect the presence of these tests in the manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical study applying Gaussian NLL loss and dropout-based uncertainty estimation first to intrinsic magnetic property prediction and then to a GNN-based coercivity task. No equations, fitted parameters, or derivation chains are described in the provided text. The central claim of transferability is an experimental observation rather than a mathematical reduction to inputs by construction. No self-citations, ansatzes, or uniqueness theorems appear as load-bearing elements. The work is therefore self-contained against external benchmarks with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5734 in / 981 out tokens · 16059 ms · 2026-06-27T09:15:08.353751+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 20 canonical work pages · 3 internal anchors

  1. [1]

    J. Lai, A. Bolyachkin, N. Terada, S. Dieb, X. Tang, T. Ohkubo, H. Sepehri- Amin, K. Hono, Machine learning assisted development of Fe2P-type magnetocaloric compounds for cryogenic applications, Acta Materialia 232 (2022) 117942.doi:10.1016/j.actamat.2022.117942. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S1359645422003238

  2. [2]

    Dengina, A

    E. Dengina, A. Bolyachkin, H. Sepehri-Amin, K. Hono, Machine Learning Approach for Evaluation of Nanodefects and Magnetic Anisotropy in FePt Granular Films, Scripta Materialia 218 (2022) 114797.doi:10.1016/j.scriptamat.2022.114797. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S1359646222002937

  3. [3]

    Kulesh, A

    N. Kulesh, A. Bolyachkin, I. Suzuki, Y . Takahashi, H. Sepehri-Amin, K. Hono, Data-driven optimization of FePt heat-assisted magnetic recording media accelerated by deep learning TEM image segmentation, Acta Materi- alia 255 (2023) 119039.doi:10.1016/j.actamat.2023.119039. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S1359645423003701

  4. [4]

    Kovacs, J

    A. Kovacs, J. Fischbacher, H. Oezelt, A. Kornell, Q. Ali, M. Gusenbauer, M. Yano, N. Sakuma, A. Kinoshita, T. Shoji, A. Kato, Y . Hong, S. Grenier, T. Devillers, N. M. Dempsey, T. Fukushima, H. Akai, N. Kawashima, T. Miyake, T. Schrefl, Physics-informed machine learning combining exper- iment and simulation for the design of neodymium-iron-boron permanent...

  5. [5]

    Srinithi, A

    A. Srinithi, A. Bolyachkin, X. Tang, H. Sepehri-Amin, S. Dieb, A. Saito, T. Ohkubo, K. Hono, Data-driven compositional op- timization of La(Fe,Si)13-based magnetocaloric compounds for cryogenic applications, Scripta Materialia 258 (2025) 116486. doi:10.1016/j.scriptamat.2024.116486. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S1359646224005219

  6. [6]

    V., Xue, D

    T. Lookman, P. V . Balachandran, D. Xue, R. Yuan, Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design, npj Computational Materials 5 (1) (2019) 21.doi: 10.1038/s41524-019-0153-8. URLhttps://www.nature.com/articles/s41524-019-0153-8

  7. [7]

    Korolev, I

    V . Korolev, I. Nevolin, P. Protsenko, A universal similarity based approach for predictive uncertainty quantification in materials science, Scientific Re- ports 12 (1) (2022) 14931.doi:10.1038/s41598-022-19205-5. URLhttps://www.nature.com/articles/s41598-022-19205-5

  8. [8]

    K. Tran, W. Neiswanger, J. Yoon, Q. Zhang, E. Xing, Z. W. Ulissi, Methods for comparing uncertainty quantifications for material property predic- tions, Machine Learning: Science and Technology 1 (2) (2020) 025006. doi:10.1088/2632-2153/ab7e1a. URLhttps://iopscience.iop.org/article/10.1088/2632-2153/ ab7e1a

  9. [9]

    I. G. De Moraes, J. Fischbacher, Y . Hong, C. Naud, H. Okuno, A. Masseboeuf, T. Devillers, T. Schrefl, N. M. Dempsey, Nanofab- rication, characterisation and modelling of soft-in-hard FeCo–FePt magnetic nanocomposites, Acta Materialia 274 (2024) 119970. doi:10.1016/j.actamat.2024.119970. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S1359645424003227

  10. [10]

    C. E. Rasmussen, C. K. I. Williams, Gaussian Processes for Machine Learn- 19 ing, Adaptive Computation and Machine Learning, MIT Press, Cambridge, Mass, 2006

  11. [11]

    Machine Learning45, 5–32 (2001)

    L. Breiman, Random Forests, Machine Learning 45 (1) (2001) 5–32.doi: 10.1023/A:1010933404324. URLhttp://link.springer.com/10.1023/A:1010933404324

  12. [12]

    Y . Gal, Z. Ghahramani, Dropout as a Bayesian Approximation: Represent- ing Model Uncertainty in Deep Learning (Oct. 2016).arXiv:1506.02142. URLhttp://arxiv.org/abs/1506.02142

  13. [13]

    Kendall, Y

    A. Kendall, Y . Gal, What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? (Oct. 2017).arXiv:1703.04977. URLhttp://arxiv.org/abs/1703.04977

  14. [14]

    Scalia, C

    G. Scalia, C. A. Grambow, B. Pernici, Y .-P. Li, W. H. Green, Evaluating Scalable Uncertainty Estimation Methods for DNN-Based Molecular Prop- erty Prediction (Oct. 2019).arXiv:1910.03127. URLhttp://arxiv.org/abs/1910.03127

  15. [15]

    Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

    A. Kendall, Y . Gal, R. Cipolla, Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (Apr. 2018).arXiv: 1705.07115,doi:10.48550/arXiv.1705.07115. URLhttp://arxiv.org/abs/1705.07115

  16. [16]

    Green, M

    R. Green, M. Rowe, A. Polleri, MACEst: The reliable and trustworthy Model Agnostic Confidence Estimator (Sep. 2021).arXiv:2109.01531, doi:10.48550/arXiv.2109.01531. URLhttp://arxiv.org/abs/2109.01531

  17. [17]

    Chollet, Keras (2015)

    F. Chollet, Keras (2015). URLhttps://keras.io

  18. [18]

    Geron, Hands-On Machine Learning with Scikit-Learn, Keras & Tensor- Flow, 2nd Edition, 2019

    A. Geron, Hands-On Machine Learning with Scikit-Learn, Keras & Tensor- Flow, 2nd Edition, 2019

  19. [19]

    D. Nix, A. Weigend, Estimating the mean and variance of the target prob- ability distribution, in: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), IEEE, Orlando, FL, USA, 1994, pp. 55–60 vol.1.doi:10.1109/ICNN.1994.374138. URLhttp://ieeexplore.ieee.org/document/374138/ 20

  20. [20]

    Lakshminarayanan, A

    B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and Scalable Predic- tive Uncertainty Estimation using Deep Ensembles (Nov. 2017).arXiv: 1612.01474. URLhttp://arxiv.org/abs/1612.01474

  21. [21]

    Nguyen, Why Uncertainty Matters in Deep Learning and How to Estimate It « Searching Gradients (Jan

    H. Nguyen, Why Uncertainty Matters in Deep Learning and How to Estimate It « Searching Gradients (Jan. 2020). URLhttps://everyhue.me/posts/why-uncertainty-matters/

  22. [22]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. De- Vito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learn- ing Library (Dec. 2019).arXiv:1912.01703,doi:10.48550/arXiv. ...

  23. [23]

    Dewolf, B

    N. Dewolf, B. D. Baets, W. Waegeman, Valid prediction intervals for re- gression problems, Artificial Intelligence Review 56 (1) (2023) 577–613. doi:10.1007/s10462-022-10178-5. URLhttps://link.springer.com/10.1007/s10462-022-10178-5

  24. [24]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12 (85) (2011) 2825–2830. URLhttp://jmlr.org/papers/v12/pedregosa11a.html

  25. [25]

    Bradbury, R

    J. Bradbury, R. Frostig, P. Hawkins, M. James Johnson, C. Leary, D. Maclau- rin, JAX: Composable transformations of {P}ython+{N}um{P}y programs (2018). URLhttp://github.com/jax-ml/jax

  26. [26]

    Pernot, Confidence curves for UQ validation: Probabilistic reference vs

    P. Pernot, Confidence curves for UQ validation: Probabilistic reference vs. oracle (Dec. 2022).arXiv:2206.15272,doi:10.48550/arXiv.2206. 15272. URLhttp://arxiv.org/abs/2206.15272 21

  27. [27]

    J. Park, C. Yoon, C. Park, J. Ahn, Kernel Methods for Radial Transformed Compositional Data with Many Zeros 162 (2022). URLhttps://proceedings.mlr.press/v162/park22d.html

  28. [28]

    Moustafa, A

    H. Moustafa, A. Kovacs, J. Fischbacher, M. Gusenbauer, Q. Ali, L. Breth, T. Schrefl, H. Oezelt, Graph neural networks to predict co- ercivity and maximum energy product of hard magnetic microstructures, Journal of Magnetism and Magnetic Materials 634 (2025) 173594. doi:10.1016/j.jmmm.2025.173594. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S0304885325008261

  29. [29]

    R. Quey, P. Dawson, F. Barbe, Large-scale 3D random polycrystals for the finite element method: Generation, meshing and remeshing, Computer Methods in Applied Mechanics and Engineering 200 (17-20) (2011) 1729–1745.doi:10.1016/j.cma.2011.01.002. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S004578251100003X

  30. [30]

    Moustafa, A

    H. Moustafa, A. Kovacs, J. Fischbacher, M. Gusenbauer, Q. Ali, L. Breth, Y . Hong, W. Rigaut, T. Devillers, N. M. Dempsey, T. Schrefl, H. Oezelt, Reduced order model for hard magnetic films, AIP Advances 14 (2) (2024) 025001.doi:10.1063/9.0000816. URLhttps://pubs.aip.org/adv/article/14/2/025001/3261431/ Reduced-order-model-for-hard-magnetic-films

  31. [31]

    M. Fey, J. E. Lenssen, Fast Graph Representation Learning with PyTorch Geometric, version Number: 3 (2019).doi:10.48550/ARXIV.1903. 02428. URLhttps://arxiv.org/abs/1903.02428

  32. [32]

    M. Dai, M. F. Demirel, Y . Liang, J.-M. Hu, Graph neural networks for an accurate and interpretable prediction of the properties of polycrystalline materials, npj Computational Materials 7 (1) (2021) 103.doi:10.1038/ s41524-021-00574-w. URLhttps://www.nature.com/articles/s41524-021-00574-w 22 Figure 5:AEvaluation plots of the Gaussian process model predic...