pith. sign in

arxiv: 2604.08122 · v1 · submitted 2026-04-09 · ❄️ cond-mat.mtrl-sci

Unveiling the Core of Materials Properties via SISSO and Sensitivity Analysis

Pith reviewed 2026-05-10 16:54 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci
keywords SISSOsensitivity analysisperovskiteslattice constantmaterials genessymbolic regressioninterpretabilityvalence orbital radii
0
0 comments X

The pith

Derivative-based sensitivity analysis on SISSO models identifies valence orbital radii and nuclear charges as key to perovskite lattice constants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper applies the SISSO symbolic regression method to find analytical expressions that link materials parameters to the equilibrium lattice constant of perovskites. Multiple different combinations of parameters, called materials genes, can yield models of comparable accuracy. To resolve this non-uniqueness the authors develop a derivative-based sensitivity analysis that ranks each gene's contribution and reveals when separate gene sets carry equivalent information. The analysis shows that valence orbital radii, nuclear charges, and their products are the dominant quantities. A reader cares because the method converts black-box regression outputs into explicit physical descriptors that can guide material design.

Core claim

The derivative-based sensitivity analysis applied to equally accurate SISSO models for the perovskite equilibrium lattice constant reveals that distinct gene combinations encode equivalent information. It identifies the valence orbital radii, nuclear charges, and their products as the key quantities governing this property.

What carries the argument

Derivative-based sensitivity analysis applied to SISSO models, which quantifies each materials gene's importance by measuring the effect of small perturbations on the predicted lattice constant.

Load-bearing premise

The derivative-based sensitivity measure correctly identifies physically meaningful quantities without being dominated by correlations or scaling choices already present in the SISSO feature pool.

What would settle it

An independent calculation or experiment that finds no strong correlation between perovskite lattice constants and the products of valence orbital radii with nuclear charges, after controlling for other structural variables, would falsify the identification of these quantities as the governing factors.

Figures

Figures reproduced from arXiv: 2604.08122 by Lucas Foppa, Matthias Scheffler.

Figure 1
Figure 1. Figure 1: FIG. 1: Three-dimensional materials-property map as defined by the SISSO model for the equilibrium lattice [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: (a) The scaled partial effects [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

Interpretable AI can reveal physical principles governing intricate materials properties by uncovering explicit relationships between physical parameters and target properties. The sure-independence screening and sparsifying operator (SISSO) symbolic-regression approach identifies analytical expressions that correlate a target property with a small set of parameters, termed materials genes, selected from a large pool of candidates. However, multiple gene combinations can yield equally accurate SISSO models, with individual genes contributing with different weights. Here, we establish a derivative-based sensitivity analysis that resolves the non-uniqueness of symbolic-regression descriptions, enhances interpretability, thereby enabling deeper physical insight. This analysis reveals how distinct gene combinations encode equivalent information and identifies valence orbital radii, nuclear charges, and their products as the key quantities governing the equilibrium lattice constant of perovskites.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a derivative-based sensitivity analysis as a post-processing step on SISSO symbolic-regression outputs to resolve non-uniqueness among equally accurate models. Applied to the equilibrium lattice constant of perovskites, the analysis is claimed to reveal that distinct gene combinations encode equivalent information and to identify valence orbital radii, nuclear charges, and their products as the dominant physical quantities.

Significance. If the sensitivity measure can be shown to extract physically meaningful rankings independent of feature-pool construction artifacts, the approach would strengthen the interpretability of SISSO for materials properties by turning model ambiguity into a source of insight rather than a limitation. The work directly targets a practical weakness in symbolic regression for science.

major comments (2)
  1. [§3 (Sensitivity Analysis)] §3 (Sensitivity Analysis): The derivative-based sensitivity is introduced without orthogonalization, variance-inflation-factor diagnostics, or ablation of the candidate pool. Because the SISSO feature pool is itself generated by algebraic combinations, products, and scalings of precisely the same parameters later ranked as important (valence orbital radii, nuclear charges), the reported equivalence of gene combinations and the final ranking risk recovering pool-construction choices rather than independent physical importance.
  2. [Results on perovskites] Results on perovskites (no numbered subsection given): The central claim that the analysis identifies the governing quantities is presented without quantitative validation, error analysis, cross-validation against held-out data, or comparison to alternative interpretability techniques such as permutation importance or SHAP values. This absence leaves the physical interpretation unsupported at the level required for the claim.
minor comments (2)
  1. The term 'materials genes' is used in the abstract and introduction without an explicit definition or reference to its prior usage in the SISSO literature on first appearance.
  2. Figure captions should explicitly state the number of perovskites in the dataset and the size of the feature pool to allow readers to assess the scale of the non-uniqueness problem.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback on our manuscript. Below, we provide point-by-point responses to the major comments. We have revised the manuscript to incorporate additional analyses and clarifications as detailed in our responses.

read point-by-point responses
  1. Referee: The derivative-based sensitivity is introduced without orthogonalization, variance-inflation-factor diagnostics, or ablation of the candidate pool. Because the SISSO feature pool is itself generated by algebraic combinations, products, and scalings of precisely the same parameters later ranked as important (valence orbital radii, nuclear charges), the reported equivalence of gene combinations and the final ranking risk recovering pool-construction choices rather than independent physical importance.

    Authors: We thank the referee for highlighting this important consideration. The SISSO feature pool is indeed constructed from operations on the input parameters, which is standard in symbolic regression to generate candidate expressions. Our derivative-based sensitivity analysis evaluates the contribution of each selected gene to the target property by computing partial derivatives with respect to the input features, thereby capturing the effective physical influence rather than the algebraic construction. The equivalence of different gene combinations is demonstrated by their similar sensitivity profiles to the underlying quantities like valence orbital radii and nuclear charges. While we did not include VIF or orthogonalization (as the models are nonlinear symbolic expressions, not linear regressions), we have added an ablation study of the candidate pool in the revised manuscript to demonstrate that the rankings are robust. This shows that the identified dominant quantities are not artifacts of pool construction. revision: partial

  2. Referee: The central claim that the analysis identifies the governing quantities is presented without quantitative validation, error analysis, cross-validation against held-out data, or comparison to alternative interpretability techniques such as permutation importance or SHAP values. This absence leaves the physical interpretation unsupported at the level required for the claim.

    Authors: The physical interpretation is grounded in the consistency of the sensitivity analysis across multiple high-accuracy SISSO models, which all point to the same key quantities, and these quantities have established roles in determining lattice constants in perovskites. To provide quantitative support, we have now included error bars on the sensitivity measures derived from the model ensemble and performed cross-validation by training on subsets of the data and evaluating the stability of the identified genes. Regarding alternative techniques, permutation importance and SHAP are primarily for opaque machine learning models, whereas our method leverages the explicit symbolic form; we have added a brief discussion comparing the insights obtained. We believe this addresses the concern and supports the claim more robustly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies the SISSO symbolic-regression method to select analytical expressions (materials genes) from a pre-defined candidate feature pool, then introduces a derivative-based sensitivity analysis as a subsequent, independent post-processing step to address model non-uniqueness and rank physical contributions. No load-bearing step reduces by construction to its own inputs: the sensitivity metric is not defined in terms of the fitted SISSO coefficients or gene weights, the key quantities (valence orbital radii, nuclear charges, products) are not presupposed by the pool construction in a self-definitional way within the presented workflow, and no self-citation chain is invoked to justify uniqueness or the central claim. The derivation remains self-contained against the external perovskite dataset and the explicit separation of regression and sensitivity stages.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the established SISSO framework and the assumption that multiple gene sets can be equally accurate; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)
  • domain assumption Multiple distinct combinations of materials genes can produce SISSO models of comparable accuracy.
    Explicitly stated as the motivation for developing the sensitivity analysis.

pith-pipeline@v0.9.0 · 5426 in / 1134 out tokens · 60570 ms · 2026-05-10T16:54:07.855471+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    Machine learning in materials informatics: recent applications and prospects.npj Computational Materials, 3(1):54, Dec 2017

    Rampi Ramprasad, Rohit Batra, Ghanshyam Pilania, Arun Mannodi-Kanakkithodi, and Chiho Kim. Machine learning in materials informatics: recent applications and prospects.npj Computational Materials, 3(1):54, Dec 2017

  2. [2]

    Jonathan Schmidt, Mário R. G. Marques, Silvana Botti, and Miguel A. L. Marques. Recent advances and applica- tions of machine learning in solid-state materials science. npj Computational Materials, 5(1):83, Aug 2019

  3. [3]

    Human and machinecentred designs of molecules and materials for sustainability and decar- bonization.Nature Reviews Materials, 7(12):991–1009, 2022

    Jiayu Peng, Daniel SchwalbeKoda, Karthik Akkiraju, Tian Xie, Lorenzo Giordano, Yang Yu, ChangJun Eom, Reshma Rao, et al. Human and machinecentred designs of molecules and materials for sustainability and decar- bonization.Nature Reviews Materials, 7(12):991–1009, 2022

  4. [4]

    Stefan Bauer, Peter Benner, Tristan Bereau, Volker Blum, Mario Boley, Christian Carbogno, C Richard A Catlow, Gerhard Dehm, Sebastian Eibl, Ralph Ernstor- fer, dm Fekete, Lucas Foppa, Peter Fratzl, Christoph Freysoldt, Baptiste Gault, Luca M Ghiringhelli, Sa- jal K Giri, Anton Gladyshev, Pawan Goyal, Jason Hattrick-Simpers, Lara Kabalan, Petr Karpov, Moha...

  5. [5]

    Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai.Information Fusion, 58:82–115, 2020

    Alejandro Barredo Arrieta, Natalia Daz-Rodrguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai.Information Fusion, 5...

  6. [6]

    Angelov, Eduardo A

    Plamen P. Angelov, Eduardo A. Soares, Richard Jiang, Nicholas I. Arnold, and Peter M. Atkinson. Explainable artificial intelligence: an analytical review.WIREs Data Mining and Knowledge Discovery, 11(5):e1424, 2021

  7. [7]

    Random forests.Machine Learning, 45(1):5–32, Oct 2001

    Leo Breiman. Random forests.Machine Learning, 45(1):5–32, Oct 2001

  8. [8]

    Why should i trust you?: Explaining the predictions of any classifier

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 16, pages 1135–1144. Association for Computing Machinery, 2016

  9. [9]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors,Advances in Neu- ral Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017

  10. [10]

    The many shap- ley values for model explanation

    Mukund Sundararajan and Amir Najmi. The many shap- ley values for model explanation. In Hal Daum III and Aarti Singh, editors,Proceedings of the 37th Interna- tional Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9269–

  11. [11]

    Explain- ing individual predictions when features are dependent: More accurate approximations to shapley values.Artifi- cial Intelligence, 298:103502, 2021

    Kjersti Aas, Martin Jullum, and Anders Lland. Explain- ing individual predictions when features are dependent: More accurate approximations to shapley values.Artifi- cial Intelligence, 298:103502, 2021

  12. [12]

    Distilling free- form natural laws from experimental data.Science, 324(5923):81, 2009

    Michael Schmidt and Hod Lipson. Distilling free- form natural laws from experimental data.Science, 324(5923):81, 2009

  13. [13]

    Symbolic regression in materials science.MRS Commu- nications, 9(3):793–805, 2019

    YiqunWang, NicholasWagner, andJamesM.Rondinelli. Symbolic regression in materials science.MRS Commu- nications, 9(3):793–805, 2019

  14. [14]

    Patryk Orzechowski, William La Cava, and Jason H. Moore. Where are we now? a large benchmark study of recent symbolic regression methods. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, pages 1183–1190, New York, NY, USA,

  15. [15]

    Association for Computing Machinery

  16. [16]

    Runhai Ouyang, Stefano Curtarolo, Emre Ahmetcik, Matthias Scheffler, and Luca M. Ghiringhelli.SISSO: A compressed-sensing method for identifying the best low- dimensional descriptor in an immensity of offered candi- dates.Physical Review Materials, 2(8):083802, 2018

  17. [17]

    Senftle, and Meng Li

    Shengbin Ye, Thomas P. Senftle, and Meng Li. Operator- induced structural variable selection for identifying ma- terials genes.Journal of the American Statistical Associ- ation, 119(545):81–94, 2024

  18. [18]

    MadhavR.Muthyala, FarshudSorourifar, YouPeng, and Joel A. Paulson. Symantic: An efficient symbolic regres- sion method for interpretable and parsimonious model discovery in science and beyond.Industrial & Engineer- ing Chemistry Research, 64(6):3354–3369, Feb 2025

  19. [19]

    de Silva, Kathleen Champion, Markus Quade, Jean-Christophe Loiseau, J

    Brian M. de Silva, Kathleen Champion, Markus Quade, Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton. Pysindy: A python package for the sparse identification of nonlinear dynamical systems from data. 7 Journal of Open Source Software, 5(49):2104, 2020

  20. [20]

    Kaptanoglu, Brian M

    Alan A. Kaptanoglu, Brian M. de Silva, Urban Fasel, Kadierdan Kaheman, Andy J. Goldschmidt, Jared Calla- ham, Charles B. Delahunt, Zachary G. Nicolaou, Kath- leen Champion, Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton. Pysindy: A comprehensive python package for robust sparse system identification. Journal of Open Source Software, 7(69):...

  21. [21]

    Thomas A. R. Purcell, Matthias Scheffler, Luca M. Ghir- inghelli, andChristianCarbogno. Acceleratingmaterials- space exploration for thermal insulators by mapping ma- terials properties via artificial intelligence.npj Computa- tional Materials, 9(1):112, Jun 2023

  22. [22]

    Bartel, Samantha L

    Christopher J. Bartel, Samantha L. Millican, Ann M. Deml, John R. Rumptz, William Tumas, Alan W. Weimer, Stephan Lany, Vladan Stevanovi, Charles B. Musgrave, and Aaron M. Holder. Physical descrip- tor for the gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry.Nature Communications, 9(1):4168, 2018

  23. [23]

    Bartel, Christopher Sutton, Bryan R

    Christopher J. Bartel, Christopher Sutton, Bryan R. Goldsmith, Runhai Ouyang, Charles B. Musgrave, Luca M. Ghiringhelli, and Matthias Scheffler. New tol- erance factor to predict the stability of perovskite oxides and halides.Science Advances, 5(2):eaav0693, 2019

  24. [24]

    S. R. Xie, G. R. Stewart, J. J. Hamlin, P. J. Hirschfeld, and R. G. Hennig. Functional form of the superconduct- ing critical temperature from machine learning.Phys. Rev. B, 100:174513, Nov 2019

  25. [25]

    Exploiting ionic radii for rational design of halide perovskites.Chemistry of Materials, 32(1):595– 604, 2019

    Runhai Ouyang. Exploiting ionic radii for rational design of halide perovskites.Chemistry of Materials, 32(1):595– 604, 2019

  26. [26]

    Nature of metal- support interaction for metal catalysts on oxide supports

    Tairan Wang, Jianyu Hu, Runhai Ouyang, Yutao Wang, Yi Huang, Sulei Hu, and Wei-Xue Li. Nature of metal- support interaction for metal catalysts on oxide supports. Science, 386(6724):915–920, 2024

  27. [27]

    E. J. Candes and M. B. Wakin. An introduction to com- pressive sampling.IEEE Signal Processing Magazine, 25(2):21–30, 2008

  28. [28]

    Nelson, Gus L

    Lance J. Nelson, Gus L. W. Hart, Fei Zhou, and Vidvuds Ozoli. Compressive sensing as a paradigm for building physics models.Physical Review B, 87(3):035125, 2013

  29. [29]

    Ghiringhelli, Frank Girgs- dies, Maike Hashagen, Pierre Kube, Michael Hvecker, Spencer J

    Lucas Foppa, Luca M. Ghiringhelli, Frank Girgs- dies, Maike Hashagen, Pierre Kube, Michael Hvecker, Spencer J. Carey, Andrey Tarasov, Peter Kraus, Frank Rosowski, Robert Schlgl, Annette Trunschke, and Matthias Scheffler. Materials genes of heterogeneous catalysis from clean experiments and artificial intelli- gence.MRS Bull., 46:1016–1026, Nov 2021

  30. [30]

    Improving symbolic regression for pre- dicting materials properties with iterative variable se- lection.Journal of Chemical Theory and Computation, 18(8):4945–4951, Aug 2022

    Zhen Guo, Shunbo Hu, Zhong-Kang Han, and Run- hai Ouyang. Improving symbolic regression for pre- dicting materials properties with iterative variable se- lection.Journal of Chemical Theory and Computation, 18(8):4945–4951, Aug 2022

  31. [31]

    Neu- ral network-guided symbolic regression for interpretable descriptor discovery in perovskite catalysts.arXiv, 2025

    Yeming Xian, Xiaoming Wang, and Yanfa Yan. Neu- ral network-guided symbolic regression for interpretable descriptor discovery in perovskite catalysts.arXiv, 2025

  32. [32]

    Max D. Morris. Factorial sampling plans for preliminary computational experiments.Technometrics, 33(2):161– 174, May 1991

  33. [33]

    Sensitivity estimates for nonlinear mathematical models

    Ilya M. Sobol. Sensitivity analysis for non-linear mathe- matical models.Mathematical Modelling and Computa- tional Experiment, 1(4):407–414, 1993. English transla- tion of I.M.Sobol’, “Sensitivity estimates for nonlinear mathematical models”, Matematicheskoe Modelirovanie 2 (1990) 112-118

  34. [34]

    Winkler, Gabriel Kron- berger, Michael Kommenda, Bogdan Burlacu, and Stefan Wagner.Gaining Deeper Insights in Symbolic Regression, pages 175–190

    Michael Affenzeller, Stephan M. Winkler, Gabriel Kron- berger, Michael Kommenda, Bogdan Burlacu, and Stefan Wagner.Gaining Deeper Insights in Symbolic Regression, pages 175–190. Springer New York, New York, NY, 2014

  35. [35]

    Renato Miranda Filho, Anisio Lacerda, and Gisele L. Pappa. Explaining symbolic regression predictions. In2020 IEEE Congress on Evolutionary Computation (CEC), pages 1–8, 2020

  36. [36]

    Thomas A. R. Purcell, Matthias Scheffler, Christian Car- bogno, and Luca M Ghiringhelli. Sisso++: A c++ imple- mentation of the sure-independence screening and spar- sifying operator approach.Journal of Open Source Soft- ware, 7(71):3960, 2022

  37. [37]

    Estimation of global sensitivity indices for models with dependent variables.Computer Physics Communi- cations, 183(4):937–946, 2012

    Sergei Kucherenko, Stefano Tarantola, and Paola An- noni. Estimation of global sensitivity indices for models with dependent variables.Computer Physics Communi- cations, 183(4):937–946, 2012

  38. [38]

    A primer on marginal effects-part i: Theory and formulae.PharmacoEconomics, 33(1):25–30, Jan 2015

    Eberechukwu Onukwugha, Jason Bergtold, and Rahul Jain. A primer on marginal effects-part i: Theory and formulae.PharmacoEconomics, 33(1):25–30, Jan 2015

  39. [39]

    Measuring feature importance of symbolic regression models using partial effects

    Guilherme Seidyo Imai Aldeia and Fabrício Olivetti de França. Measuring feature importance of symbolic regression models using partial effects. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO ’21, pages 750–758, New York, NY, USA, 2021. Association for Computing Machinery

  40. [40]

    Interpretability in symbolic regression: a benchmark of explanatory methods using the feynman data set.Genetic Programming and Evolvable Machines, 23(3):309–349, Sep 2022

    Guilherme Seidyo Imai Aldeia and Fabrício Olivetti de França. Interpretability in symbolic regression: a benchmark of explanatory methods using the feynman data set.Genetic Programming and Evolvable Machines, 23(3):309–349, Sep 2022

  41. [41]

    Rethinking cataly- sis: Interpretable ai and description of real-world condi- tions via materials genes.To be published, 2026

    Lucas Foppa and Matthias Scheffler. Rethinking cataly- sis: Interpretable ai and description of real-world condi- tions via materials genes.To be published, 2026

  42. [42]

    Csonka, John P

    Gbor I. Csonka, John P. Perdew, Adrienn Ruzsinszky, Pier H. T. Philipsen, Sbastien Lebgue, Joachim Paier, Oleg A. Vydrov, and Jnos G. ngyn. Assessing the perfor- mance of recent density functionals for bulk solids.Phys- ical Review B, 79(15):155107, 2009

  43. [43]

    Volker Blum, Ralf Gehrke, Felix Hanke, Paula Havu, Ville Havu, Xinguo Ren, Karsten Reuter, and Matthias Scheffler.Ab initiomolecular simulations with numeric atom-centered orbitals.Computer Physics Communica- tions, 180(11):2175–2196, 2009

  44. [44]

    Joseph W. Abbott, Carlos Mera Acosta, Alaa Akkoush, Alberto Ambrosetti, Viktor Atalla, Alexej Bagrets, Jrg Behler, Daniel Berger, Bjrn Bieniek, Jonas Bjrk, Volker Blum, Saeed Bohloul, Connor L. Box, Nicholas Boyer, Danilo Simoes Brambila, Gabriel A. Bramley, Kyle R. Bryenton, Mara Camarasa-Gmez, Christian Car- bogno, Fabio Caruso, Sucismita Chutia, Michel...

  45. [45]

    V. M. Goldschmidt. Die gesetze der krystallochemie. Naturwissenschaften, 14:477–485, 1926

  46. [46]

    Nair, Lucas Foppa, and Matthias Scheffler

    Akhil S. Nair, Lucas Foppa, and Matthias Scheffler. Materials-discovery workflow guided by symbolic regres- sion for identifying acid-stable oxides for electrocatalysis. npj Computational Materials, 11(1):150, May 2025

  47. [47]

    Apley and Jingyu Zhu

    Daniel W. Apley and Jingyu Zhu. Visualizing the effects of predictor variables in black box supervised learning models.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086, 06 2020