Unveiling the Core of Materials Properties via SISSO and Sensitivity Analysis
Pith reviewed 2026-05-10 16:54 UTC · model grok-4.3
The pith
Derivative-based sensitivity analysis on SISSO models identifies valence orbital radii and nuclear charges as key to perovskite lattice constants.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The derivative-based sensitivity analysis applied to equally accurate SISSO models for the perovskite equilibrium lattice constant reveals that distinct gene combinations encode equivalent information. It identifies the valence orbital radii, nuclear charges, and their products as the key quantities governing this property.
What carries the argument
Derivative-based sensitivity analysis applied to SISSO models, which quantifies each materials gene's importance by measuring the effect of small perturbations on the predicted lattice constant.
Load-bearing premise
The derivative-based sensitivity measure correctly identifies physically meaningful quantities without being dominated by correlations or scaling choices already present in the SISSO feature pool.
What would settle it
An independent calculation or experiment that finds no strong correlation between perovskite lattice constants and the products of valence orbital radii with nuclear charges, after controlling for other structural variables, would falsify the identification of these quantities as the governing factors.
Figures
read the original abstract
Interpretable AI can reveal physical principles governing intricate materials properties by uncovering explicit relationships between physical parameters and target properties. The sure-independence screening and sparsifying operator (SISSO) symbolic-regression approach identifies analytical expressions that correlate a target property with a small set of parameters, termed materials genes, selected from a large pool of candidates. However, multiple gene combinations can yield equally accurate SISSO models, with individual genes contributing with different weights. Here, we establish a derivative-based sensitivity analysis that resolves the non-uniqueness of symbolic-regression descriptions, enhances interpretability, thereby enabling deeper physical insight. This analysis reveals how distinct gene combinations encode equivalent information and identifies valence orbital radii, nuclear charges, and their products as the key quantities governing the equilibrium lattice constant of perovskites.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a derivative-based sensitivity analysis as a post-processing step on SISSO symbolic-regression outputs to resolve non-uniqueness among equally accurate models. Applied to the equilibrium lattice constant of perovskites, the analysis is claimed to reveal that distinct gene combinations encode equivalent information and to identify valence orbital radii, nuclear charges, and their products as the dominant physical quantities.
Significance. If the sensitivity measure can be shown to extract physically meaningful rankings independent of feature-pool construction artifacts, the approach would strengthen the interpretability of SISSO for materials properties by turning model ambiguity into a source of insight rather than a limitation. The work directly targets a practical weakness in symbolic regression for science.
major comments (2)
- [§3 (Sensitivity Analysis)] §3 (Sensitivity Analysis): The derivative-based sensitivity is introduced without orthogonalization, variance-inflation-factor diagnostics, or ablation of the candidate pool. Because the SISSO feature pool is itself generated by algebraic combinations, products, and scalings of precisely the same parameters later ranked as important (valence orbital radii, nuclear charges), the reported equivalence of gene combinations and the final ranking risk recovering pool-construction choices rather than independent physical importance.
- [Results on perovskites] Results on perovskites (no numbered subsection given): The central claim that the analysis identifies the governing quantities is presented without quantitative validation, error analysis, cross-validation against held-out data, or comparison to alternative interpretability techniques such as permutation importance or SHAP values. This absence leaves the physical interpretation unsupported at the level required for the claim.
minor comments (2)
- The term 'materials genes' is used in the abstract and introduction without an explicit definition or reference to its prior usage in the SISSO literature on first appearance.
- Figure captions should explicitly state the number of perovskites in the dataset and the size of the feature pool to allow readers to assess the scale of the non-uniqueness problem.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive feedback on our manuscript. Below, we provide point-by-point responses to the major comments. We have revised the manuscript to incorporate additional analyses and clarifications as detailed in our responses.
read point-by-point responses
-
Referee: The derivative-based sensitivity is introduced without orthogonalization, variance-inflation-factor diagnostics, or ablation of the candidate pool. Because the SISSO feature pool is itself generated by algebraic combinations, products, and scalings of precisely the same parameters later ranked as important (valence orbital radii, nuclear charges), the reported equivalence of gene combinations and the final ranking risk recovering pool-construction choices rather than independent physical importance.
Authors: We thank the referee for highlighting this important consideration. The SISSO feature pool is indeed constructed from operations on the input parameters, which is standard in symbolic regression to generate candidate expressions. Our derivative-based sensitivity analysis evaluates the contribution of each selected gene to the target property by computing partial derivatives with respect to the input features, thereby capturing the effective physical influence rather than the algebraic construction. The equivalence of different gene combinations is demonstrated by their similar sensitivity profiles to the underlying quantities like valence orbital radii and nuclear charges. While we did not include VIF or orthogonalization (as the models are nonlinear symbolic expressions, not linear regressions), we have added an ablation study of the candidate pool in the revised manuscript to demonstrate that the rankings are robust. This shows that the identified dominant quantities are not artifacts of pool construction. revision: partial
-
Referee: The central claim that the analysis identifies the governing quantities is presented without quantitative validation, error analysis, cross-validation against held-out data, or comparison to alternative interpretability techniques such as permutation importance or SHAP values. This absence leaves the physical interpretation unsupported at the level required for the claim.
Authors: The physical interpretation is grounded in the consistency of the sensitivity analysis across multiple high-accuracy SISSO models, which all point to the same key quantities, and these quantities have established roles in determining lattice constants in perovskites. To provide quantitative support, we have now included error bars on the sensitivity measures derived from the model ensemble and performed cross-validation by training on subsets of the data and evaluating the stability of the identified genes. Regarding alternative techniques, permutation importance and SHAP are primarily for opaque machine learning models, whereas our method leverages the explicit symbolic form; we have added a brief discussion comparing the insights obtained. We believe this addresses the concern and supports the claim more robustly. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper applies the SISSO symbolic-regression method to select analytical expressions (materials genes) from a pre-defined candidate feature pool, then introduces a derivative-based sensitivity analysis as a subsequent, independent post-processing step to address model non-uniqueness and rank physical contributions. No load-bearing step reduces by construction to its own inputs: the sensitivity metric is not defined in terms of the fitted SISSO coefficients or gene weights, the key quantities (valence orbital radii, nuclear charges, products) are not presupposed by the pool construction in a self-definitional way within the presented workflow, and no self-citation chain is invoked to justify uniqueness or the central claim. The derivation remains self-contained against the external perovskite dataset and the explicit separation of regression and sensitivity stages.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multiple distinct combinations of materials genes can produce SISSO models of comparable accuracy.
Reference graph
Works this paper leans on
-
[1]
Rampi Ramprasad, Rohit Batra, Ghanshyam Pilania, Arun Mannodi-Kanakkithodi, and Chiho Kim. Machine learning in materials informatics: recent applications and prospects.npj Computational Materials, 3(1):54, Dec 2017
work page 2017
-
[2]
Jonathan Schmidt, Mário R. G. Marques, Silvana Botti, and Miguel A. L. Marques. Recent advances and applica- tions of machine learning in solid-state materials science. npj Computational Materials, 5(1):83, Aug 2019
work page 2019
-
[3]
Jiayu Peng, Daniel SchwalbeKoda, Karthik Akkiraju, Tian Xie, Lorenzo Giordano, Yang Yu, ChangJun Eom, Reshma Rao, et al. Human and machinecentred designs of molecules and materials for sustainability and decar- bonization.Nature Reviews Materials, 7(12):991–1009, 2022
work page 2022
-
[4]
Stefan Bauer, Peter Benner, Tristan Bereau, Volker Blum, Mario Boley, Christian Carbogno, C Richard A Catlow, Gerhard Dehm, Sebastian Eibl, Ralph Ernstor- fer, dm Fekete, Lucas Foppa, Peter Fratzl, Christoph Freysoldt, Baptiste Gault, Luca M Ghiringhelli, Sa- jal K Giri, Anton Gladyshev, Pawan Goyal, Jason Hattrick-Simpers, Lara Kabalan, Petr Karpov, Moha...
work page 2024
-
[5]
Alejandro Barredo Arrieta, Natalia Daz-Rodrguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai.Information Fusion, 5...
work page 2020
-
[6]
Plamen P. Angelov, Eduardo A. Soares, Richard Jiang, Nicholas I. Arnold, and Peter M. Atkinson. Explainable artificial intelligence: an analytical review.WIREs Data Mining and Knowledge Discovery, 11(5):e1424, 2021
work page 2021
-
[7]
Random forests.Machine Learning, 45(1):5–32, Oct 2001
Leo Breiman. Random forests.Machine Learning, 45(1):5–32, Oct 2001
work page 2001
-
[8]
Why should i trust you?: Explaining the predictions of any classifier
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 16, pages 1135–1144. Association for Computing Machinery, 2016
work page 2016
-
[9]
Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors,Advances in Neu- ral Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017
work page 2017
-
[10]
The many shap- ley values for model explanation
Mukund Sundararajan and Amir Najmi. The many shap- ley values for model explanation. In Hal Daum III and Aarti Singh, editors,Proceedings of the 37th Interna- tional Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9269–
-
[11]
Kjersti Aas, Martin Jullum, and Anders Lland. Explain- ing individual predictions when features are dependent: More accurate approximations to shapley values.Artifi- cial Intelligence, 298:103502, 2021
work page 2021
-
[12]
Distilling free- form natural laws from experimental data.Science, 324(5923):81, 2009
Michael Schmidt and Hod Lipson. Distilling free- form natural laws from experimental data.Science, 324(5923):81, 2009
work page 2009
-
[13]
Symbolic regression in materials science.MRS Commu- nications, 9(3):793–805, 2019
YiqunWang, NicholasWagner, andJamesM.Rondinelli. Symbolic regression in materials science.MRS Commu- nications, 9(3):793–805, 2019
work page 2019
-
[14]
Patryk Orzechowski, William La Cava, and Jason H. Moore. Where are we now? a large benchmark study of recent symbolic regression methods. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, pages 1183–1190, New York, NY, USA,
-
[15]
Association for Computing Machinery
-
[16]
Runhai Ouyang, Stefano Curtarolo, Emre Ahmetcik, Matthias Scheffler, and Luca M. Ghiringhelli.SISSO: A compressed-sensing method for identifying the best low- dimensional descriptor in an immensity of offered candi- dates.Physical Review Materials, 2(8):083802, 2018
work page 2018
-
[17]
Shengbin Ye, Thomas P. Senftle, and Meng Li. Operator- induced structural variable selection for identifying ma- terials genes.Journal of the American Statistical Associ- ation, 119(545):81–94, 2024
work page 2024
-
[18]
MadhavR.Muthyala, FarshudSorourifar, YouPeng, and Joel A. Paulson. Symantic: An efficient symbolic regres- sion method for interpretable and parsimonious model discovery in science and beyond.Industrial & Engineer- ing Chemistry Research, 64(6):3354–3369, Feb 2025
work page 2025
-
[19]
de Silva, Kathleen Champion, Markus Quade, Jean-Christophe Loiseau, J
Brian M. de Silva, Kathleen Champion, Markus Quade, Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton. Pysindy: A python package for the sparse identification of nonlinear dynamical systems from data. 7 Journal of Open Source Software, 5(49):2104, 2020
work page 2020
-
[20]
Alan A. Kaptanoglu, Brian M. de Silva, Urban Fasel, Kadierdan Kaheman, Andy J. Goldschmidt, Jared Calla- ham, Charles B. Delahunt, Zachary G. Nicolaou, Kath- leen Champion, Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton. Pysindy: A comprehensive python package for robust sparse system identification. Journal of Open Source Software, 7(69):...
work page 2022
-
[21]
Thomas A. R. Purcell, Matthias Scheffler, Luca M. Ghir- inghelli, andChristianCarbogno. Acceleratingmaterials- space exploration for thermal insulators by mapping ma- terials properties via artificial intelligence.npj Computa- tional Materials, 9(1):112, Jun 2023
work page 2023
-
[22]
Christopher J. Bartel, Samantha L. Millican, Ann M. Deml, John R. Rumptz, William Tumas, Alan W. Weimer, Stephan Lany, Vladan Stevanovi, Charles B. Musgrave, and Aaron M. Holder. Physical descrip- tor for the gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry.Nature Communications, 9(1):4168, 2018
work page 2018
-
[23]
Bartel, Christopher Sutton, Bryan R
Christopher J. Bartel, Christopher Sutton, Bryan R. Goldsmith, Runhai Ouyang, Charles B. Musgrave, Luca M. Ghiringhelli, and Matthias Scheffler. New tol- erance factor to predict the stability of perovskite oxides and halides.Science Advances, 5(2):eaav0693, 2019
work page 2019
-
[24]
S. R. Xie, G. R. Stewart, J. J. Hamlin, P. J. Hirschfeld, and R. G. Hennig. Functional form of the superconduct- ing critical temperature from machine learning.Phys. Rev. B, 100:174513, Nov 2019
work page 2019
-
[25]
Runhai Ouyang. Exploiting ionic radii for rational design of halide perovskites.Chemistry of Materials, 32(1):595– 604, 2019
work page 2019
-
[26]
Nature of metal- support interaction for metal catalysts on oxide supports
Tairan Wang, Jianyu Hu, Runhai Ouyang, Yutao Wang, Yi Huang, Sulei Hu, and Wei-Xue Li. Nature of metal- support interaction for metal catalysts on oxide supports. Science, 386(6724):915–920, 2024
work page 2024
-
[27]
E. J. Candes and M. B. Wakin. An introduction to com- pressive sampling.IEEE Signal Processing Magazine, 25(2):21–30, 2008
work page 2008
-
[28]
Lance J. Nelson, Gus L. W. Hart, Fei Zhou, and Vidvuds Ozoli. Compressive sensing as a paradigm for building physics models.Physical Review B, 87(3):035125, 2013
work page 2013
-
[29]
Ghiringhelli, Frank Girgs- dies, Maike Hashagen, Pierre Kube, Michael Hvecker, Spencer J
Lucas Foppa, Luca M. Ghiringhelli, Frank Girgs- dies, Maike Hashagen, Pierre Kube, Michael Hvecker, Spencer J. Carey, Andrey Tarasov, Peter Kraus, Frank Rosowski, Robert Schlgl, Annette Trunschke, and Matthias Scheffler. Materials genes of heterogeneous catalysis from clean experiments and artificial intelli- gence.MRS Bull., 46:1016–1026, Nov 2021
work page 2021
-
[30]
Zhen Guo, Shunbo Hu, Zhong-Kang Han, and Run- hai Ouyang. Improving symbolic regression for pre- dicting materials properties with iterative variable se- lection.Journal of Chemical Theory and Computation, 18(8):4945–4951, Aug 2022
work page 2022
-
[31]
Yeming Xian, Xiaoming Wang, and Yanfa Yan. Neu- ral network-guided symbolic regression for interpretable descriptor discovery in perovskite catalysts.arXiv, 2025
work page 2025
-
[32]
Max D. Morris. Factorial sampling plans for preliminary computational experiments.Technometrics, 33(2):161– 174, May 1991
work page 1991
-
[33]
Sensitivity estimates for nonlinear mathematical models
Ilya M. Sobol. Sensitivity analysis for non-linear mathe- matical models.Mathematical Modelling and Computa- tional Experiment, 1(4):407–414, 1993. English transla- tion of I.M.Sobol’, “Sensitivity estimates for nonlinear mathematical models”, Matematicheskoe Modelirovanie 2 (1990) 112-118
work page 1993
-
[34]
Michael Affenzeller, Stephan M. Winkler, Gabriel Kron- berger, Michael Kommenda, Bogdan Burlacu, and Stefan Wagner.Gaining Deeper Insights in Symbolic Regression, pages 175–190. Springer New York, New York, NY, 2014
work page 2014
-
[35]
Renato Miranda Filho, Anisio Lacerda, and Gisele L. Pappa. Explaining symbolic regression predictions. In2020 IEEE Congress on Evolutionary Computation (CEC), pages 1–8, 2020
work page 2020
-
[36]
Thomas A. R. Purcell, Matthias Scheffler, Christian Car- bogno, and Luca M Ghiringhelli. Sisso++: A c++ imple- mentation of the sure-independence screening and spar- sifying operator approach.Journal of Open Source Soft- ware, 7(71):3960, 2022
work page 2022
-
[37]
Sergei Kucherenko, Stefano Tarantola, and Paola An- noni. Estimation of global sensitivity indices for models with dependent variables.Computer Physics Communi- cations, 183(4):937–946, 2012
work page 2012
-
[38]
A primer on marginal effects-part i: Theory and formulae.PharmacoEconomics, 33(1):25–30, Jan 2015
Eberechukwu Onukwugha, Jason Bergtold, and Rahul Jain. A primer on marginal effects-part i: Theory and formulae.PharmacoEconomics, 33(1):25–30, Jan 2015
work page 2015
-
[39]
Measuring feature importance of symbolic regression models using partial effects
Guilherme Seidyo Imai Aldeia and Fabrício Olivetti de França. Measuring feature importance of symbolic regression models using partial effects. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO ’21, pages 750–758, New York, NY, USA, 2021. Association for Computing Machinery
work page 2021
-
[40]
Guilherme Seidyo Imai Aldeia and Fabrício Olivetti de França. Interpretability in symbolic regression: a benchmark of explanatory methods using the feynman data set.Genetic Programming and Evolvable Machines, 23(3):309–349, Sep 2022
work page 2022
-
[41]
Lucas Foppa and Matthias Scheffler. Rethinking cataly- sis: Interpretable ai and description of real-world condi- tions via materials genes.To be published, 2026
work page 2026
-
[42]
Gbor I. Csonka, John P. Perdew, Adrienn Ruzsinszky, Pier H. T. Philipsen, Sbastien Lebgue, Joachim Paier, Oleg A. Vydrov, and Jnos G. ngyn. Assessing the perfor- mance of recent density functionals for bulk solids.Phys- ical Review B, 79(15):155107, 2009
work page 2009
-
[43]
Volker Blum, Ralf Gehrke, Felix Hanke, Paula Havu, Ville Havu, Xinguo Ren, Karsten Reuter, and Matthias Scheffler.Ab initiomolecular simulations with numeric atom-centered orbitals.Computer Physics Communica- tions, 180(11):2175–2196, 2009
work page 2009
-
[44]
Joseph W. Abbott, Carlos Mera Acosta, Alaa Akkoush, Alberto Ambrosetti, Viktor Atalla, Alexej Bagrets, Jrg Behler, Daniel Berger, Bjrn Bieniek, Jonas Bjrk, Volker Blum, Saeed Bohloul, Connor L. Box, Nicholas Boyer, Danilo Simoes Brambila, Gabriel A. Bramley, Kyle R. Bryenton, Mara Camarasa-Gmez, Christian Car- bogno, Fabio Caruso, Sucismita Chutia, Michel...
work page 2025
-
[45]
V. M. Goldschmidt. Die gesetze der krystallochemie. Naturwissenschaften, 14:477–485, 1926
work page 1926
-
[46]
Nair, Lucas Foppa, and Matthias Scheffler
Akhil S. Nair, Lucas Foppa, and Matthias Scheffler. Materials-discovery workflow guided by symbolic regres- sion for identifying acid-stable oxides for electrocatalysis. npj Computational Materials, 11(1):150, May 2025
work page 2025
-
[47]
Daniel W. Apley and Jingyu Zhu. Visualizing the effects of predictor variables in black box supervised learning models.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086, 06 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.