pith. sign in

arxiv: 2602.21812 · v1 · submitted 2026-02-25 · ❄️ cond-mat.mtrl-sci

ML-guided screening of chalcogenide perovskites as solar energy materials

Pith reviewed 2026-05-15 19:39 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci
keywords chalcogenide perovskitessolar energy materialstolerance factorSISSO algorithmmachine learning screeningphotovoltaicssustainability metrics
0
0 comments X

The pith

A SISSO-derived tolerance factor more accurately identifies perovskite-forming chalcogenide compositions than standard criteria.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a machine-learning guided framework to screen hypothetical chalcogenide perovskites for use as solar energy absorbers. It starts from a curated set of known halide and chalcogenide compounds and uses the SISSO algorithm to extract a new tolerance factor that better predicts which compositions will form the perovskite structure. This descriptor is then paired with crystal structure prediction, bandgap estimates, and feasibility models to rank candidates according to performance, experimental viability, and sustainability for photovoltaic applications. A sympathetic reader would care because chalcogenide perovskites offer potential for stable, lead-free solar cells but are hard to synthesize, and better screening reduces wasted experimental effort on unpromising compositions.

Core claim

Using a curated experimental dataset of halide and chalcogenide compounds, we derive a new tolerance factor via the SISSO algorithm that more accurately distinguishes perovskite-forming compositions than established tolerance-factor-based screening criteria. This descriptor is combined with generative crystal structure prediction, composition-based bandgap estimation, and machine-learning-based feasibility assessment to systematically explore a wide chemical space of hypothetical chalcogenide perovskites. The resulting candidates are further evaluated using sustainability indicators, enabling multi-objective ranking tailored to both single-junction and tandem photovoltaic architectures.

What carries the argument

SISSO-derived tolerance factor: an interpretable analytical descriptor obtained from the sure independence screening and sparsifying operator algorithm applied to experimental data, used to distinguish perovskite-forming compositions more accurately than traditional tolerance factors.

If this is right

  • Identifies several promising and previously unexplored chalcogenide perovskites as solar absorber candidates.
  • Provides a transferable screening strategy for chemically constrained materials spaces.
  • Enables multi-objective ranking that balances optoelectronic performance, experimental viability, and long-term sustainability.
  • Supports evaluation for both single-junction and tandem photovoltaic architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be extended to other perovskite families or related materials classes by applying the same data-driven descriptor derivation.
  • If the feasibility models prove accurate, this approach could reduce the number of failed synthesis attempts in the lab by prioritizing high-viability compositions.
  • Connecting the tolerance factor to specific structural features might reveal why certain chalcogenides fail to form perovskites.
  • Testable extension: apply the same SISSO procedure to a dataset of oxide perovskites to see if a similar improvement occurs.

Load-bearing premise

The curated experimental dataset of halide and chalcogenide compounds is representative enough for the SISSO-derived tolerance factor to generalize to unexplored chalcogenide perovskites and for the ML models to reflect real experimental outcomes.

What would settle it

Experimentally synthesizing a top-ranked hypothetical chalcogenide perovskite candidate and confirming whether it forms the perovskite phase or a competing phase would test the predictive power of the new tolerance factor and screening framework.

Figures

Figures reproduced from arXiv: 2602.21812 by Diego A. Garz\'on, Jos\'e A. M\'arquez, Lauri Himanen, Luisa Andrade, Sascha Sadewasser.

Figure 1
Figure 1. Figure 1: Overview of the ML-guided screening pipeline employed to assess chalcogenide perovskites, combining experimentally motivated descriptors and machine-learning models to evaluate structural stability, crystal structure, experimental plausibility, and photovoltaic suitability. the dodecahedral sites [14, 15]. To rationalize perovskite phase stability, simple geometrical descriptors such as the Goldschmidt tol… view at source ↗
Figure 2
Figure 2. Figure 2: a. Jess et al. tolerance factor (tJess) distribution on the experimental ABX3 data used for training; b. SISSO￾derived τ ∗ tolerance factor distribution on the experimental ABX3 data; c. Logistic-calibrated probability of perovskite￾type stability based on τ ∗ as a function of the Jess et al. tolerance factor for the experimental ABX3 data; in each plot the stability region is delimited with a green backgr… view at source ↗
Figure 3
Figure 3. Figure 3: The effects of ionic radii on the stability of chalcogenide perovskites for a. ABS3 and b. ABSe3 compounds. The scattered symbols correspond to experimentally synthesized materials, materials with a perovskite-type structure are represented with squares and a bold label, while non-perovskite structures are triangles [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Some of the generated crystal structures with CrystaLLM, and their respective space group, based on the formulae that are predicted stable with τ ∗ including some specific compounds as examples: a. SmZrS3 , b. BaSnSe3 c. DyTlS3, d. CuTlSe3, e. MnGaSe3, f. CoNiSe3, g. TbBiS3. Yellow circles correspond to the chalcogenide anions, while the other colors are for the A and B cations. ical space for both anions.… view at source ↗
Figure 5
Figure 5. Figure 5: CrabNet-estimated bandgaps for predicted chalcogenide perovskites displayed on an element–element matrix for a. ABS3 and b. ABSe3 compositions. Color indicates the predicted bandgap value, while squares outlined in black correspond to compounds predicted to adopt a corner-sharing perovskite-type structure according to CrystaLLM. ble prioritizes candidates supported by both geometric and structural hypothes… view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation of material sustainability using the supply risk (SR), derived from the Herfindahl-Hirschman Index (HHI) and ESG scores, as a function of the deviation of the estimated bandgap from the optimal values for (a) single￾junction photovoltaics (Eopt g = 1.34 eV) and (b) the top cell in a tandem photovoltaic configuration (Eopt g = 1.71 eV). Red triangles indicate Pareto-optimal materials that simulta… view at source ↗
Figure 7
Figure 7. Figure 7: summarizes the combined assessment of the statistical synthesizability indicator, sustainability, and optoelectronic suitability for the most promising candi￾dates. The figure reports CLS and SR values, with the predicted bandgap encoded by color and Pareto-optimal materials highlighted for single-junction (triangles) and tandem (squares) configurations. BaZrS3 emerges as the most promising candidate for t… view at source ↗
read the original abstract

Chalcogenide perovskites have emerged as promising absorber materials for next-generation photovoltaic devices, yet their experimental realization remains limited by competing phases, structural polymorphism, and synthetic challenges. Here, we present a fully data-driven and experimentally grounded screening and ranking framework to assess the stability and experimental feasibility of chalcogenide perovskites, integrating interpretable analytical descriptors, machine-learning models, and sustainability metrics. Using a curated experimental dataset of halide and chalcogenide compounds, we derive a new tolerance factor via the SISSO (sure independence screening and sparsifying operator) algorithm that more accurately distinguishes perovskite-forming compositions than established tolerance-factor-based screening criteria. This descriptor is combined with generative crystal structure prediction, composition-based bandgap estimation, and machine-learning-based feasibility assessment to systematically explore a wide chemical space of hypothetical chalcogenide perovskites. The resulting candidates are further evaluated using sustainability indicators, enabling multi-objective ranking tailored to both single-junction and tandem photovoltaic architectures. Beyond identifying several promising and previously unexplored chalcogenide perovskites, this work demonstrates a transferable screening strategy for chemically constrained materials spaces that balances optoelectronic performance, experimental viability, and long-term sustainability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a data-driven screening framework for chalcogenide perovskites as solar absorbers. It derives a new tolerance factor via the SISSO algorithm trained on a curated experimental dataset of mixed halide and chalcogenide compounds, claiming superior accuracy in distinguishing perovskite-forming compositions compared to established criteria. This descriptor is integrated with generative structure prediction, composition-based bandgap models, ML feasibility assessment, and sustainability metrics to rank hypothetical chalcogenide perovskites for single-junction and tandem PV applications, identifying several promising candidates.

Significance. If the central claim of improved generalization holds, the work would provide a transferable, multi-objective screening strategy for chemically constrained spaces that balances stability, optoelectronic performance, and sustainability. The explicit use of an interpretable SISSO-derived descriptor and integration of experimental grounding with ML models would strengthen data-driven materials discovery pipelines, particularly for emerging chalcogenide systems where experimental data remain sparse.

major comments (2)
  1. [Abstract and methods (SISSO derivation)] The central claim that the SISSO-derived tolerance factor more accurately distinguishes perovskite-forming compositions rests on a mixed halide/chalcogenide training set without reported chemistry-specific hold-out validation. A leave-one-chemistry-out test (train on halides, evaluate on chalcogenides) is needed to confirm that the descriptor captures chalcogenide-specific formation rules rather than halide-dominated correlations, as chalcogenides exhibit distinct bonding and ionic-radius regimes.
  2. [Results (model validation)] No quantitative validation metrics, error bars, or hold-out performance numbers are provided for the new tolerance factor or the downstream ML feasibility and bandgap models. The soundness assessment notes that the curated dataset lacks visible controls for overfitting or data leakage, which directly undermines the generalization claim to unexplored chalcogenide perovskites.
minor comments (2)
  1. [Methods] Clarify the exact composition of the curated experimental dataset (number of chalcogenide vs. halide entries) and any preprocessing steps to allow reproducibility.
  2. [Methods] The abstract mentions 'generative crystal structure prediction' but does not specify the algorithm or validation against known structures; add a brief description and reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each of the major comments below and commit to revising the manuscript to incorporate additional validation steps as suggested.

read point-by-point responses
  1. Referee: [Abstract and methods (SISSO derivation)] The central claim that the SISSO-derived tolerance factor more accurately distinguishes perovskite-forming compositions rests on a mixed halide/chalcogenide training set without reported chemistry-specific hold-out validation. A leave-one-chemistry-out test (train on halides, evaluate on chalcogenides) is needed to confirm that the descriptor captures chalcogenide-specific formation rules rather than halide-dominated correlations, as chalcogenides exhibit distinct bonding and ionic-radius regimes.

    Authors: We agree with the referee that a leave-one-chemistry-out validation is necessary to rigorously test the applicability of the SISSO-derived tolerance factor to chalcogenides. Although the original training set was mixed to leverage available data, we will implement the suggested test in the revised version by training on halides only and evaluating on chalcogenides. The results, including performance metrics, will be added to the methods and results sections to support the claim of improved generalization. revision: yes

  2. Referee: [Results (model validation)] No quantitative validation metrics, error bars, or hold-out performance numbers are provided for the new tolerance factor or the downstream ML feasibility and bandgap models. The soundness assessment notes that the curated dataset lacks visible controls for overfitting or data leakage, which directly undermines the generalization claim to unexplored chalcogenide perovskites.

    Authors: We acknowledge that the manuscript would benefit from more explicit quantitative validation details. In the revision, we will provide hold-out performance metrics, error bars, and descriptions of controls for overfitting and data leakage for the tolerance factor as well as the ML models for feasibility and bandgap prediction. This will include details on dataset splitting strategies and any regularization techniques employed. revision: yes

Circularity Check

0 steps flagged

No circularity: SISSO tolerance factor derived from external experimental data

full rationale

The paper applies the SISSO algorithm to a curated external experimental dataset of halide and chalcogenide compounds to obtain a new tolerance factor, then uses it for screening hypothetical compositions. This is a standard train-on-known/predict-on-unknown workflow with no self-referential definitions, no fitted parameters renamed as predictions within the same dataset, and no load-bearing self-citations or uniqueness theorems imported from the authors' prior work. Downstream ML feasibility and bandgap models are described as composition-based but are not shown to reduce to the tolerance-factor inputs by construction. The provided abstract and reader summary contain no equations or sections exhibiting the enumerated circularity patterns. The noted concern about mixed halide/chalcogenide training data is a generalization/validity issue, not a circularity reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on a curated experimental dataset whose size and selection criteria are unspecified, plus standard assumptions of the SISSO algorithm and generative structure predictors.

free parameters (1)
  • SISSO-derived tolerance factor coefficients
    Fitted via SISSO to the curated halide and chalcogenide dataset to maximize separation of perovskite-forming compositions.
axioms (2)
  • standard math SISSO algorithm identifies optimal low-dimensional descriptors from a large feature space
    Invoked to derive the new tolerance factor from experimental data.
  • domain assumption Composition-based bandgap estimation and ML feasibility models generalize to hypothetical compositions
    Required for ranking unexplored chalcogenide perovskites.

pith-pipeline@v0.9.0 · 5527 in / 1324 out tokens · 21990 ms · 2026-05-15T19:39:21.978822+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

  1. [1]

    V., Comparotto, C., M´arquez, J

    Sopiha, K. V., Comparotto, C., M´arquez, J. A. & Scragg, J. J. S. Chalcogenide Perovskites: Tantalizing Prospects, Challenging Materials.Advanced Optical Materials10, 2101704 (2022)

  2. [2]

    Green, M.et al.Solar Cell Efficiency Tables (Version 66).Progress in Photovoltaics: Research and Applica- tions33,795–810 (2025)

  3. [3]

    A.et al.Stability of perovskite solar cells: issues and prospects.RSC Advances13,1787– 1810 (2023)

    Chowdhury, T. A.et al.Stability of perovskite solar cells: issues and prospects.RSC Advances13,1787– 1810 (2023)

  4. [4]

    Agarwal, S., Vincent, K. C. & Agrawal, R. From syn- thesis to application: a review of BaZrS3 chalcogenide perovskites.Nanoscale17,4250–4300 (2025)

  5. [5]

    & Bhattacharya, S

    Basera, P. & Bhattacharya, S. Chalcogenide Per- ovskites (ABS3; A = Ba, Ca, Sr; B = Hf, Sn): An Emerg- ing Class of Semiconductors for Optoelectronics.J. Phys. Chem. Lett.13,6439–6446 (2022)

  6. [6]

    & Johari, P

    Chakravorty, A., Adhikari, S. & Johari, P. Unlocking the optoelectronic potential of AGeX3 (A = Ca, Sr, Ba; X = S, Se): A sustainable alternative in chalcogenide perovskites.Journal of Chemical Physics163,234708 (2025)

  7. [7]

    & Johari, P

    Adhikari, S. & Johari, P. Optimizing lead-free chalco- genide perovskites for high-efficiency photovoltaics via alloying.Phys. Rev. B112,085206 (2025)

  8. [8]

    & Johari, P

    Adhikari, S., Das, S. & Johari, P. Post-transition metal Sn-based chalcogenide perovskites: a promising lead- free and transition metal alternative for stable, high- performance photovoltaics.J. Mater. Chem. C13, 7792–7805 (2025)

  9. [9]

    Nishigaki, Y.et al.Extraordinary Strong Band-Edge Absorption in Distorted Chalcogenide Perovskites.So- lar RRL4,1900555 (2020)

  10. [10]

    Liang, Y.et al.Parametric Study on Controllable Growth of SrZrS3 Thin Films with Good Conductiv- ity for Photodetectors.Nano Research16,7867–7873 (2023)

  11. [11]

    & IJdo, D

    Lelieveld, R. & IJdo, D. J. W. Sulphides with the GdFeO3 Structure.Acta Crystallographica Section B 36,2223–2226 (1980)

  12. [12]

    & Scragg, J

    Comparotto, C., Str ¨om, P., Donzel-Gargand, O., Kubart, T. & Scragg, J. J. S. Synthesis of BaZrS3 Per- ovskite Thin Films at a Moderate Temperature on Conductive Substrates.ACS Applied Energy Materials 5,6335–6343 (2022)

  13. [13]

    Comparotto, C.et al.Thermodynamic insights into the Ba–S system for the formation of BaZrS3 perovskites and other Ba sulfides.Journal of Materials Chemistry A13,9983–9991 (2025)

  14. [14]

    & Schorr, S

    Breternitz, J. & Schorr, S. What Defines a Perovskite? Advanced Energy Materials8,1802366 (2018)

  15. [15]

    Ueber einige neue Mineralien des Urals.Jour- nal f ¨ur Praktische Chemie19,459–468 (1840)

    Rose, G. Ueber einige neue Mineralien des Urals.Jour- nal f ¨ur Praktische Chemie19,459–468 (1840)

  16. [16]

    Goldschmidt, V. M. Die Gesetze der Krystallochemie. Naturwissenschaften14,477–485 (1926)

  17. [17]

    & Hages, C

    Jess, A., Yang, R. & Hages, C. J. On the Phase Stability of Chalcogenide Perovskites.Chemistry of Materials 34,6894–6901 (2022)

  18. [18]

    W., Agarwal, S

    Turnley, J. W., Agarwal, S. & Agrawal, R. Rethinking tolerance factor analysis for chalcogenide perovskites. Materials Horizons11,4802–4808 (2024)

  19. [19]

    Z., Wei, S.-H

    Huo, K. Z., Wei, S.-H. & Yin, W.-J. High-throughput screening of chalcogenide single perovskites by first- principles calculations for photovoltaics.J. Phys. D: Appl. Phys.51,474003 (2018)

  20. [20]

    Cao, Y., Dai, S., Wang, X. & et al. High-throughput screening of potentially ductile and low thermal con- ductivity ABX 3 (X = S, Se, Te) thermoelectric per- ovskites.Appl. Phys. Lett.124,092101 (2024)

  21. [21]

    Singh, N.et al.High -throughput and data -driven search for stable optoelectronic AMSe3 materials.J. Mater. Chem. A13,9192–9210 (2025)

  22. [22]

    K.et al.Lead-free perovskites for next- generation applications: a comprehensive computa- tional and data-driven review.Materials Advances6, 7634–7661 (2025)

    Fatima, S. K.et al.Lead-free perovskites for next- generation applications: a comprehensive computa- tional and data-driven review.Materials Advances6, 7634–7661 (2025)

  23. [23]

    & Ghiringhelli, L

    Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: A compressed-sensing method for identifying the best low-dimensional de- scriptor in an immensity of offered candidates.Phys. Rev. Materials2,083802 (2018). 14

  24. [24]

    Y.-T., Kauwe, S

    Wang, A. Y.-T., Kauwe, S. K., Murdock, R. J. & Sparks, T. D. Compositionally restricted attention-based net- work for materials property predictions.npj Compu- tational Materials7,77 (2021)

  25. [25]

    & Rignanese, G.-M

    De Breuck, P.-P., Hautier, G. & Rignanese, G.-M. Mate- rials property prediction for limited datasets enabled by feature selection and joint learning with MODNet. en.npj Computational Materials7,83 (2021)

  26. [26]

    & Marques, M

    De Breuck, P.-P., Wang, H.-C., Rignanese, G.-M., Botti, S. & Marques, M. A. Generative AI for crystal struc- tures: a review.npj Computational Materials(2025)

  27. [27]

    M., Butler, K

    Antunes, L. M., Butler, K. T. & Grau-Crespo, R. Crys- tal structure generation with autoregressive large lan- guage modeling.Nature Communications15,10570 (2024)

  28. [28]

    H., Jang, J., Noh, J., Walsh, A

    Gu, G. H., Jang, J., Noh, J., Walsh, A. & Jung, Y. Per- ovskite synthesizability using graph neural networks. npj Computational Materials8,71 (2022)

  29. [29]

    J.et al.New tolerance factor to predict the stability of perovskite oxides and halides.Science Ad- vances5,eaav0693 (2019)

    Bartel, C. J.et al.New tolerance factor to predict the stability of perovskite oxides and halides.Science Ad- vances5,eaav0693 (2019)

  30. [30]

    Carr, A., Glinberg, T., Stull, N., Neilson, J. R. & Bartel, C. J. Origins of chalcogenide perovskite instability.J. Mater. Chem. C13,19183–19195 (2025)

  31. [31]

    & Saito, S

    Yamaoka, S., Fukunaga, O. & Saito, S. Preparations of BaSnS3, SrSnS3 and PbSnS3 at high pressure.Mater. Res. Bull.5,789–794 (1970)

  32. [32]

    & Zeng, X

    Ju, M.-G., Dai, J., Ma, L. & Zeng, X. C. Perovskite Chalcogenides with Optimal Bandgap and Desired Optical Absorption for Photovoltaic Devices.Adv. En- ergy Mater.7,1700216 (2017)

  33. [33]

    & Queisser, H

    Shockley, W. & Queisser, H. J. Detailed Balance Limit of Efficiency of p–n Junction Solar Cells.Journal of Applied Physics32,510–519 (1961)

  34. [34]

    The most sustainable high entropy alloys for the futurePreprint, Research Square

    Nomine, A.et al. The most sustainable high entropy alloys for the futurePreprint, Research Square. 2023

  35. [35]

    & Wolverton, C

    Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials2,16028 (2016)

  36. [37]

    Cerqueira, T. F. T., Wang, H., Botti, S. & Marques, M. A. L. A non-orthogonal representation of the chem- ical space.arXiv preprint arXiv:2406.19761.Accessed: 2025-10-15 (2025)

  37. [38]

    Wang, A. Y. T., Kauwe, S. K., Murdock, R. J. & Sparks, T. D. CrabNet for explainable deep learning in materi- als science.Journal of Materials Research and Technol- ogy10,415–428 (2021)

  38. [39]

    Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm.npj Computational Materials6(2020)

    Dunn, A., Wang, Q., Ganose, A., Dopp, D., Jain, A.,et al. Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm.npj Computational Materials6(2020)

  39. [40]

    D., Heymans, G

    Breuck, P.-P. D., Heymans, G. & Rignanese, G.-M. Ac- curate experimental band gap predictions with multi- fidelity correction learning.Journal of Materials Infor- matics2,10 (2022)

  40. [41]

    Vasylenko, A.et al.Digital features of chemical ele- ments extracted from local geometries in crystal struc- tures.Digital Discovery4,477–485 (2025)

  41. [42]

    C., Agarwal, S., Turnley, J

    Vincent, K. C., Agarwal, S., Turnley, J. W. & Agrawal, R. Liquid Flux–Assisted Mechanism for Modest Temper- ature Synthesis of Large-Grain BaZrS3 and BaHfS3 Chalcogenide Perovskites.Advanced Energy and Sus- tainability Research4(2023)

  42. [43]

    Zhang, H.et al.P-Type Transparent Conducting Mate- rial Realized by Composite Thin Film of Chalcogenide Perovskite LaScS3 and Graphene.Advanced Functional Materials35,2542382 (2025)

  43. [44]

    Fell, A.et al.Elucidating the efficiency limit of silicon- based monolithic tandem cells through the combina- tion of Auger and Shockley–Queisser limits.EES Solar 1,1030–1039 (2025)

  44. [45]

    H., Noh, J., Kim, J

    Jang, J., Gu, G. H., Noh, J., Kim, J. & Jung, Y. Structure-Based Synthesizability Prediction of Crys- tals Using Partially Supervised Learning.Journal of the American Chemical Society142,18836–18843 (2020)

  45. [46]

    Jain, A.et al.Commentary: The Materials Project: A materials genome approach to accelerating materials innovation.APL Materials1,011002 (2013)

  46. [47]

    Chen, J.et al.Navigating phase diagram complexity to guide robotic inorganic materials synthesis.Nature Synthesis3.Open access article, 606–614 (2024)

  47. [48]

    & Schrier, J

    Kim, S., Jung, Y. & Schrier, J. Large Language Models for Inorganic Synthesis Predictions.J. Am. Chem. Soc. 146,19654–19659 (2024)

  48. [49]

    & Jung, Y

    Kim, S., Schrier, J. & Jung, Y. Explainable Synthesiz- ability Prediction of Inorganic Crystal Polymorphs Using Large Language Models.Angew. Chem. Int. Ed. 64,e202423950 (2025)

  49. [50]

    L., Fischer, P

    Miessler, G. L., Fischer, P. J. & Tarr, D. A.Inorganic Chemistry5th (Pearson, Boston, 2014)

  50. [51]

    Greenwood, N. N. & Earnshaw, A.Chemistry of the El- ements2nd (Butterworth-Heinemann, Oxford, 1997)

  51. [52]

    Scheidgen, M.et al.NOMAD: A distributed web-based platform for managing materials science research data.Journal of Open Source Software8,5388 (2023)

  52. [53]

    M´arquez, J. A. & Scheidgen, M.Perovskite Solar Cell Database Project2024

  53. [54]

    & Cole, J

    Dong, Q. & Cole, J. M. Auto-generated database of semiconductor band gaps using ChemDataExtractor. Scientific Data9(2022)

  54. [55]

    Geological Survey.Data release for Mineral Com- modity Summaries 2025U.S

    U.S. Geological Survey.Data release for Mineral Com- modity Summaries 2025U.S. Geological Survey data release. 2025

  55. [56]

    Accessed: 2025-10-16

    World Bank.Environment, Social and Governance Data World Bank Data Catalog. Accessed: 2025-10-16. 2023. 15

  56. [57]

    Jiang, X., Liu, G., Xie, J. & Hu, Z. Boosting SISSO per- formance on small sample datasets by using Random Forests prescreening for complex feature selection. en. Front. Phys.20,14209 (2025)

  57. [58]

    Schilling-Wilhelmi, M.et al.From text to insight: large language models for chemical data extraction.Chem- ical Society Reviews(2025)

  58. [59]

    & Grau-Crespo, R.Supporting data for: Crystal Structure Generation with Autoregres- sive Large Language Modelingversion v1

    Antunes, L., Butler, K. & Grau-Crespo, R.Supporting data for: Crystal Structure Generation with Autoregres- sive Large Language Modelingversion v1. 2024

  59. [60]

    E., Kirklin, S., Aykol, M., Meredig, B

    Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolver- ton, C. Materials Design and Discovery with High- Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD).JOM65, 1501–1509 (2013)

  60. [61]

    Shabih, S.et al.An autonomous living database for per- ovskite photovoltaics.arXiv preprint arXiv:2601.17807 (2026)

  61. [62]

    Tshitoyan, V.et al.Unsupervised word embeddings capture latent knowledge from materials science lit- erature.Nature571,95–98 (2019)

  62. [63]

    A., Himanen, L., Andrade, L., Sadewasser, S

    Garz´on, D. A., Himanen, L., Andrade, L., Sadewasser, S. & M´arquez, J. A.chalcogenide-perovskite-screening version 1.0.1. 2026. 16 Supplementary Information This Supplementary Information provides additional analyses supporting the main results of the screening workflow. Section S1 reports the classification performance of the tolerance-factor models and...