ML-guided screening of chalcogenide perovskites as solar energy materials
Pith reviewed 2026-05-15 19:39 UTC · model grok-4.3
The pith
A SISSO-derived tolerance factor more accurately identifies perovskite-forming chalcogenide compositions than standard criteria.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a curated experimental dataset of halide and chalcogenide compounds, we derive a new tolerance factor via the SISSO algorithm that more accurately distinguishes perovskite-forming compositions than established tolerance-factor-based screening criteria. This descriptor is combined with generative crystal structure prediction, composition-based bandgap estimation, and machine-learning-based feasibility assessment to systematically explore a wide chemical space of hypothetical chalcogenide perovskites. The resulting candidates are further evaluated using sustainability indicators, enabling multi-objective ranking tailored to both single-junction and tandem photovoltaic architectures.
What carries the argument
SISSO-derived tolerance factor: an interpretable analytical descriptor obtained from the sure independence screening and sparsifying operator algorithm applied to experimental data, used to distinguish perovskite-forming compositions more accurately than traditional tolerance factors.
If this is right
- Identifies several promising and previously unexplored chalcogenide perovskites as solar absorber candidates.
- Provides a transferable screening strategy for chemically constrained materials spaces.
- Enables multi-objective ranking that balances optoelectronic performance, experimental viability, and long-term sustainability.
- Supports evaluation for both single-junction and tandem photovoltaic architectures.
Where Pith is reading between the lines
- The method could be extended to other perovskite families or related materials classes by applying the same data-driven descriptor derivation.
- If the feasibility models prove accurate, this approach could reduce the number of failed synthesis attempts in the lab by prioritizing high-viability compositions.
- Connecting the tolerance factor to specific structural features might reveal why certain chalcogenides fail to form perovskites.
- Testable extension: apply the same SISSO procedure to a dataset of oxide perovskites to see if a similar improvement occurs.
Load-bearing premise
The curated experimental dataset of halide and chalcogenide compounds is representative enough for the SISSO-derived tolerance factor to generalize to unexplored chalcogenide perovskites and for the ML models to reflect real experimental outcomes.
What would settle it
Experimentally synthesizing a top-ranked hypothetical chalcogenide perovskite candidate and confirming whether it forms the perovskite phase or a competing phase would test the predictive power of the new tolerance factor and screening framework.
Figures
read the original abstract
Chalcogenide perovskites have emerged as promising absorber materials for next-generation photovoltaic devices, yet their experimental realization remains limited by competing phases, structural polymorphism, and synthetic challenges. Here, we present a fully data-driven and experimentally grounded screening and ranking framework to assess the stability and experimental feasibility of chalcogenide perovskites, integrating interpretable analytical descriptors, machine-learning models, and sustainability metrics. Using a curated experimental dataset of halide and chalcogenide compounds, we derive a new tolerance factor via the SISSO (sure independence screening and sparsifying operator) algorithm that more accurately distinguishes perovskite-forming compositions than established tolerance-factor-based screening criteria. This descriptor is combined with generative crystal structure prediction, composition-based bandgap estimation, and machine-learning-based feasibility assessment to systematically explore a wide chemical space of hypothetical chalcogenide perovskites. The resulting candidates are further evaluated using sustainability indicators, enabling multi-objective ranking tailored to both single-junction and tandem photovoltaic architectures. Beyond identifying several promising and previously unexplored chalcogenide perovskites, this work demonstrates a transferable screening strategy for chemically constrained materials spaces that balances optoelectronic performance, experimental viability, and long-term sustainability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a data-driven screening framework for chalcogenide perovskites as solar absorbers. It derives a new tolerance factor via the SISSO algorithm trained on a curated experimental dataset of mixed halide and chalcogenide compounds, claiming superior accuracy in distinguishing perovskite-forming compositions compared to established criteria. This descriptor is integrated with generative structure prediction, composition-based bandgap models, ML feasibility assessment, and sustainability metrics to rank hypothetical chalcogenide perovskites for single-junction and tandem PV applications, identifying several promising candidates.
Significance. If the central claim of improved generalization holds, the work would provide a transferable, multi-objective screening strategy for chemically constrained spaces that balances stability, optoelectronic performance, and sustainability. The explicit use of an interpretable SISSO-derived descriptor and integration of experimental grounding with ML models would strengthen data-driven materials discovery pipelines, particularly for emerging chalcogenide systems where experimental data remain sparse.
major comments (2)
- [Abstract and methods (SISSO derivation)] The central claim that the SISSO-derived tolerance factor more accurately distinguishes perovskite-forming compositions rests on a mixed halide/chalcogenide training set without reported chemistry-specific hold-out validation. A leave-one-chemistry-out test (train on halides, evaluate on chalcogenides) is needed to confirm that the descriptor captures chalcogenide-specific formation rules rather than halide-dominated correlations, as chalcogenides exhibit distinct bonding and ionic-radius regimes.
- [Results (model validation)] No quantitative validation metrics, error bars, or hold-out performance numbers are provided for the new tolerance factor or the downstream ML feasibility and bandgap models. The soundness assessment notes that the curated dataset lacks visible controls for overfitting or data leakage, which directly undermines the generalization claim to unexplored chalcogenide perovskites.
minor comments (2)
- [Methods] Clarify the exact composition of the curated experimental dataset (number of chalcogenide vs. halide entries) and any preprocessing steps to allow reproducibility.
- [Methods] The abstract mentions 'generative crystal structure prediction' but does not specify the algorithm or validation against known structures; add a brief description and reference.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We address each of the major comments below and commit to revising the manuscript to incorporate additional validation steps as suggested.
read point-by-point responses
-
Referee: [Abstract and methods (SISSO derivation)] The central claim that the SISSO-derived tolerance factor more accurately distinguishes perovskite-forming compositions rests on a mixed halide/chalcogenide training set without reported chemistry-specific hold-out validation. A leave-one-chemistry-out test (train on halides, evaluate on chalcogenides) is needed to confirm that the descriptor captures chalcogenide-specific formation rules rather than halide-dominated correlations, as chalcogenides exhibit distinct bonding and ionic-radius regimes.
Authors: We agree with the referee that a leave-one-chemistry-out validation is necessary to rigorously test the applicability of the SISSO-derived tolerance factor to chalcogenides. Although the original training set was mixed to leverage available data, we will implement the suggested test in the revised version by training on halides only and evaluating on chalcogenides. The results, including performance metrics, will be added to the methods and results sections to support the claim of improved generalization. revision: yes
-
Referee: [Results (model validation)] No quantitative validation metrics, error bars, or hold-out performance numbers are provided for the new tolerance factor or the downstream ML feasibility and bandgap models. The soundness assessment notes that the curated dataset lacks visible controls for overfitting or data leakage, which directly undermines the generalization claim to unexplored chalcogenide perovskites.
Authors: We acknowledge that the manuscript would benefit from more explicit quantitative validation details. In the revision, we will provide hold-out performance metrics, error bars, and descriptions of controls for overfitting and data leakage for the tolerance factor as well as the ML models for feasibility and bandgap prediction. This will include details on dataset splitting strategies and any regularization techniques employed. revision: yes
Circularity Check
No circularity: SISSO tolerance factor derived from external experimental data
full rationale
The paper applies the SISSO algorithm to a curated external experimental dataset of halide and chalcogenide compounds to obtain a new tolerance factor, then uses it for screening hypothetical compositions. This is a standard train-on-known/predict-on-unknown workflow with no self-referential definitions, no fitted parameters renamed as predictions within the same dataset, and no load-bearing self-citations or uniqueness theorems imported from the authors' prior work. Downstream ML feasibility and bandgap models are described as composition-based but are not shown to reduce to the tolerance-factor inputs by construction. The provided abstract and reader summary contain no equations or sections exhibiting the enumerated circularity patterns. The noted concern about mixed halide/chalcogenide training data is a generalization/validity issue, not a circularity reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- SISSO-derived tolerance factor coefficients
axioms (2)
- standard math SISSO algorithm identifies optimal low-dimensional descriptors from a large feature space
- domain assumption Composition-based bandgap estimation and ML feasibility models generalize to hypothetical compositions
Reference graph
Works this paper leans on
-
[1]
V., Comparotto, C., M´arquez, J
Sopiha, K. V., Comparotto, C., M´arquez, J. A. & Scragg, J. J. S. Chalcogenide Perovskites: Tantalizing Prospects, Challenging Materials.Advanced Optical Materials10, 2101704 (2022)
work page 2022
-
[2]
Green, M.et al.Solar Cell Efficiency Tables (Version 66).Progress in Photovoltaics: Research and Applica- tions33,795–810 (2025)
work page 2025
-
[3]
A.et al.Stability of perovskite solar cells: issues and prospects.RSC Advances13,1787– 1810 (2023)
Chowdhury, T. A.et al.Stability of perovskite solar cells: issues and prospects.RSC Advances13,1787– 1810 (2023)
work page 2023
-
[4]
Agarwal, S., Vincent, K. C. & Agrawal, R. From syn- thesis to application: a review of BaZrS3 chalcogenide perovskites.Nanoscale17,4250–4300 (2025)
work page 2025
-
[5]
Basera, P. & Bhattacharya, S. Chalcogenide Per- ovskites (ABS3; A = Ba, Ca, Sr; B = Hf, Sn): An Emerg- ing Class of Semiconductors for Optoelectronics.J. Phys. Chem. Lett.13,6439–6446 (2022)
work page 2022
-
[6]
Chakravorty, A., Adhikari, S. & Johari, P. Unlocking the optoelectronic potential of AGeX3 (A = Ca, Sr, Ba; X = S, Se): A sustainable alternative in chalcogenide perovskites.Journal of Chemical Physics163,234708 (2025)
work page 2025
-
[7]
Adhikari, S. & Johari, P. Optimizing lead-free chalco- genide perovskites for high-efficiency photovoltaics via alloying.Phys. Rev. B112,085206 (2025)
work page 2025
-
[8]
Adhikari, S., Das, S. & Johari, P. Post-transition metal Sn-based chalcogenide perovskites: a promising lead- free and transition metal alternative for stable, high- performance photovoltaics.J. Mater. Chem. C13, 7792–7805 (2025)
work page 2025
-
[9]
Nishigaki, Y.et al.Extraordinary Strong Band-Edge Absorption in Distorted Chalcogenide Perovskites.So- lar RRL4,1900555 (2020)
work page 2020
-
[10]
Liang, Y.et al.Parametric Study on Controllable Growth of SrZrS3 Thin Films with Good Conductiv- ity for Photodetectors.Nano Research16,7867–7873 (2023)
work page 2023
- [11]
-
[12]
Comparotto, C., Str ¨om, P., Donzel-Gargand, O., Kubart, T. & Scragg, J. J. S. Synthesis of BaZrS3 Per- ovskite Thin Films at a Moderate Temperature on Conductive Substrates.ACS Applied Energy Materials 5,6335–6343 (2022)
work page 2022
-
[13]
Comparotto, C.et al.Thermodynamic insights into the Ba–S system for the formation of BaZrS3 perovskites and other Ba sulfides.Journal of Materials Chemistry A13,9983–9991 (2025)
work page 2025
-
[14]
Breternitz, J. & Schorr, S. What Defines a Perovskite? Advanced Energy Materials8,1802366 (2018)
work page 2018
-
[15]
Ueber einige neue Mineralien des Urals.Jour- nal f ¨ur Praktische Chemie19,459–468 (1840)
Rose, G. Ueber einige neue Mineralien des Urals.Jour- nal f ¨ur Praktische Chemie19,459–468 (1840)
-
[16]
Goldschmidt, V. M. Die Gesetze der Krystallochemie. Naturwissenschaften14,477–485 (1926)
work page 1926
-
[17]
Jess, A., Yang, R. & Hages, C. J. On the Phase Stability of Chalcogenide Perovskites.Chemistry of Materials 34,6894–6901 (2022)
work page 2022
-
[18]
Turnley, J. W., Agarwal, S. & Agrawal, R. Rethinking tolerance factor analysis for chalcogenide perovskites. Materials Horizons11,4802–4808 (2024)
work page 2024
-
[19]
Huo, K. Z., Wei, S.-H. & Yin, W.-J. High-throughput screening of chalcogenide single perovskites by first- principles calculations for photovoltaics.J. Phys. D: Appl. Phys.51,474003 (2018)
work page 2018
-
[20]
Cao, Y., Dai, S., Wang, X. & et al. High-throughput screening of potentially ductile and low thermal con- ductivity ABX 3 (X = S, Se, Te) thermoelectric per- ovskites.Appl. Phys. Lett.124,092101 (2024)
work page 2024
-
[21]
Singh, N.et al.High -throughput and data -driven search for stable optoelectronic AMSe3 materials.J. Mater. Chem. A13,9192–9210 (2025)
work page 2025
-
[22]
Fatima, S. K.et al.Lead-free perovskites for next- generation applications: a comprehensive computa- tional and data-driven review.Materials Advances6, 7634–7661 (2025)
work page 2025
-
[23]
Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: A compressed-sensing method for identifying the best low-dimensional de- scriptor in an immensity of offered candidates.Phys. Rev. Materials2,083802 (2018). 14
work page 2018
-
[24]
Wang, A. Y.-T., Kauwe, S. K., Murdock, R. J. & Sparks, T. D. Compositionally restricted attention-based net- work for materials property predictions.npj Compu- tational Materials7,77 (2021)
work page 2021
-
[25]
De Breuck, P.-P., Hautier, G. & Rignanese, G.-M. Mate- rials property prediction for limited datasets enabled by feature selection and joint learning with MODNet. en.npj Computational Materials7,83 (2021)
work page 2021
-
[26]
De Breuck, P.-P., Wang, H.-C., Rignanese, G.-M., Botti, S. & Marques, M. A. Generative AI for crystal struc- tures: a review.npj Computational Materials(2025)
work page 2025
-
[27]
Antunes, L. M., Butler, K. T. & Grau-Crespo, R. Crys- tal structure generation with autoregressive large lan- guage modeling.Nature Communications15,10570 (2024)
work page 2024
-
[28]
H., Jang, J., Noh, J., Walsh, A
Gu, G. H., Jang, J., Noh, J., Walsh, A. & Jung, Y. Per- ovskite synthesizability using graph neural networks. npj Computational Materials8,71 (2022)
work page 2022
-
[29]
Bartel, C. J.et al.New tolerance factor to predict the stability of perovskite oxides and halides.Science Ad- vances5,eaav0693 (2019)
work page 2019
-
[30]
Carr, A., Glinberg, T., Stull, N., Neilson, J. R. & Bartel, C. J. Origins of chalcogenide perovskite instability.J. Mater. Chem. C13,19183–19195 (2025)
work page 2025
-
[31]
Yamaoka, S., Fukunaga, O. & Saito, S. Preparations of BaSnS3, SrSnS3 and PbSnS3 at high pressure.Mater. Res. Bull.5,789–794 (1970)
work page 1970
- [32]
-
[33]
Shockley, W. & Queisser, H. J. Detailed Balance Limit of Efficiency of p–n Junction Solar Cells.Journal of Applied Physics32,510–519 (1961)
work page 1961
-
[34]
The most sustainable high entropy alloys for the futurePreprint, Research Square
Nomine, A.et al. The most sustainable high entropy alloys for the futurePreprint, Research Square. 2023
work page 2023
-
[35]
Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials2,16028 (2016)
work page 2016
- [37]
-
[38]
Wang, A. Y. T., Kauwe, S. K., Murdock, R. J. & Sparks, T. D. CrabNet for explainable deep learning in materi- als science.Journal of Materials Research and Technol- ogy10,415–428 (2021)
work page 2021
-
[39]
Dunn, A., Wang, Q., Ganose, A., Dopp, D., Jain, A.,et al. Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm.npj Computational Materials6(2020)
work page 2020
-
[40]
Breuck, P.-P. D., Heymans, G. & Rignanese, G.-M. Ac- curate experimental band gap predictions with multi- fidelity correction learning.Journal of Materials Infor- matics2,10 (2022)
work page 2022
-
[41]
Vasylenko, A.et al.Digital features of chemical ele- ments extracted from local geometries in crystal struc- tures.Digital Discovery4,477–485 (2025)
work page 2025
-
[42]
Vincent, K. C., Agarwal, S., Turnley, J. W. & Agrawal, R. Liquid Flux–Assisted Mechanism for Modest Temper- ature Synthesis of Large-Grain BaZrS3 and BaHfS3 Chalcogenide Perovskites.Advanced Energy and Sus- tainability Research4(2023)
work page 2023
-
[43]
Zhang, H.et al.P-Type Transparent Conducting Mate- rial Realized by Composite Thin Film of Chalcogenide Perovskite LaScS3 and Graphene.Advanced Functional Materials35,2542382 (2025)
work page 2025
-
[44]
Fell, A.et al.Elucidating the efficiency limit of silicon- based monolithic tandem cells through the combina- tion of Auger and Shockley–Queisser limits.EES Solar 1,1030–1039 (2025)
work page 2025
-
[45]
Jang, J., Gu, G. H., Noh, J., Kim, J. & Jung, Y. Structure-Based Synthesizability Prediction of Crys- tals Using Partially Supervised Learning.Journal of the American Chemical Society142,18836–18843 (2020)
work page 2020
-
[46]
Jain, A.et al.Commentary: The Materials Project: A materials genome approach to accelerating materials innovation.APL Materials1,011002 (2013)
work page 2013
-
[47]
Chen, J.et al.Navigating phase diagram complexity to guide robotic inorganic materials synthesis.Nature Synthesis3.Open access article, 606–614 (2024)
work page 2024
-
[48]
Kim, S., Jung, Y. & Schrier, J. Large Language Models for Inorganic Synthesis Predictions.J. Am. Chem. Soc. 146,19654–19659 (2024)
work page 2024
- [49]
-
[50]
Miessler, G. L., Fischer, P. J. & Tarr, D. A.Inorganic Chemistry5th (Pearson, Boston, 2014)
work page 2014
-
[51]
Greenwood, N. N. & Earnshaw, A.Chemistry of the El- ements2nd (Butterworth-Heinemann, Oxford, 1997)
work page 1997
-
[52]
Scheidgen, M.et al.NOMAD: A distributed web-based platform for managing materials science research data.Journal of Open Source Software8,5388 (2023)
work page 2023
-
[53]
M´arquez, J. A. & Scheidgen, M.Perovskite Solar Cell Database Project2024
- [54]
-
[55]
Geological Survey.Data release for Mineral Com- modity Summaries 2025U.S
U.S. Geological Survey.Data release for Mineral Com- modity Summaries 2025U.S. Geological Survey data release. 2025
work page 2025
-
[56]
World Bank.Environment, Social and Governance Data World Bank Data Catalog. Accessed: 2025-10-16. 2023. 15
work page 2025
-
[57]
Jiang, X., Liu, G., Xie, J. & Hu, Z. Boosting SISSO per- formance on small sample datasets by using Random Forests prescreening for complex feature selection. en. Front. Phys.20,14209 (2025)
work page 2025
-
[58]
Schilling-Wilhelmi, M.et al.From text to insight: large language models for chemical data extraction.Chem- ical Society Reviews(2025)
work page 2025
-
[59]
Antunes, L., Butler, K. & Grau-Crespo, R.Supporting data for: Crystal Structure Generation with Autoregres- sive Large Language Modelingversion v1. 2024
work page 2024
-
[60]
E., Kirklin, S., Aykol, M., Meredig, B
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolver- ton, C. Materials Design and Discovery with High- Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD).JOM65, 1501–1509 (2013)
work page 2013
- [61]
-
[62]
Tshitoyan, V.et al.Unsupervised word embeddings capture latent knowledge from materials science lit- erature.Nature571,95–98 (2019)
work page 2019
-
[63]
A., Himanen, L., Andrade, L., Sadewasser, S
Garz´on, D. A., Himanen, L., Andrade, L., Sadewasser, S. & M´arquez, J. A.chalcogenide-perovskite-screening version 1.0.1. 2026. 16 Supplementary Information This Supplementary Information provides additional analyses supporting the main results of the screening workflow. Section S1 reports the classification performance of the tolerance-factor models and...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.