Genome-Guided Interpretable Screening of Phase-Stable, Lead-Free Double Perovskite Absorbers for All-Inorganic Semiconductors, Sensors, and Photovoltaics with DFT-Validated Design Rules
Pith reviewed 2026-05-25 05:53 UTC · model grok-4.3
The pith
A genome-guided ML framework narrows 13,088 lead-free compositions to five DFT-validated phase-stable double perovskites.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applying the staged inverse-design constraint stack to 13,088 charge-balanced, lead-free compositions reduces the search space to five DFT-validated, phase-stable semiconductors—Rb2SnMnBr6, Cs2CdSnBr6, Cs2CdSnI6, Cs2KGaI6, and Cs2AgAlBr6—that lie on the convex hull (E_hull <= 0 meV/atom), preserve ordered double-perovskite structures, and exhibit strong optical absorption (alpha peak ~1e5 cm^-1).
What carries the argument
The staged inverse-design constraint stack that sequentially applies a recall-optimized stability classifier, an XGBoost band-gap regressor, and structural validation to the four descriptor families of packing, bonding, polarization, and electronic identity.
If this is right
- The five candidates are thermodynamically stable and maintain ordered double-perovskite structures suitable for all-inorganic devices.
- Packing descriptors control structural formability while bonding descriptors govern near-edge optical transitions.
- Optoelectronic response descriptors regulate dielectric constants in the range 4.6-8.2 and exciton screening.
- The genotype-phenotype analysis supplies concrete design rules for further lead-free double perovskite discovery.
Where Pith is reading between the lines
- The same staged stack could be tested on single perovskites or other halide families to check transferability.
- Thin-film growth and measured absorption spectra on any of the five candidates would provide an external check on the predicted alpha values.
- If the hierarchical descriptor ranking holds, future searches can safely de-emphasize polarization descriptors until packing and bonding are satisfied.
Load-bearing premise
The machine learning models trained on the 1,221 DFT compounds generalize accurately to the 13,088 screened compositions without significant errors from extrapolation or descriptor limitations.
What would settle it
A new DFT calculation showing that any one of the five listed candidates has a positive hull energy above zero meV/atom would falsify the claim that the screening produced phase-stable materials.
Figures
read the original abstract
The discovery of stable, lead-free halide perovskites for optoelectronic applications is constrained by vast compositional space and limited interpretability of conventional screening approaches. We present a genome-guided, physics-informed framework that decodes thermodynamic stability and optoelectronic behavior through four physically interpretable descriptor families: packing, bonding, polarization, and electronic identity. Trained on 1,221 DFT-calculated A2BB'X6 compounds, machine-learning surrogates achieve robust predictive performance, with a recall-optimized stability classifier (ROC-AUC = 0.92) and an XGBoost regressor for band-gap prediction (R2 = 0.93 on held-out data). Applying a staged inverse-design constraint stack to 13,088 charge-balanced, lead-free compositions reduces the search space to five DFT-validated, phase-stable semiconductors: Rb2SnMnBr6, Cs2CdSnBr6, Cs2CdSnI6, Cs2KGaI6, and Cs2AgAlBr6. These candidates lie on the convex hull (E_hull <= 0 meV/atom), preserve ordered double-perovskite structures, and exhibit strong optical absorption (alpha peak ~1e5 cm^-1). Genotype-phenotype coupling analysis reveals a hierarchical control mechanism: packing genes define structural formability, bonding genes govern near-edge optical transitions and conductivity, and optoelectronic response genes regulate dielectric response and exciton screening (epsilon0 = 4.6-8.2). This work establishes a generalizable paradigm for interpretable inverse design, linking descriptor-level genomics to experimentally relevant optoelectronic phenotypes and providing design rules for discovering stable, lead-free double perovskites for photovoltaics, sensing, and transparent electronic applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a genome-guided ML framework using four descriptor families (packing, bonding, polarization, electronic identity) trained on 1,221 DFT A2BB'X6 compounds. A recall-optimized stability classifier (ROC-AUC 0.92) and XGBoost band-gap regressor (R2 0.93 on held-out data) are applied via a staged inverse-design constraint stack to filter 13,088 charge-balanced lead-free compositions, yielding five DFT-validated candidates (Rb2SnMnBr6, Cs2CdSnBr6, Cs2CdSnI6, Cs2KGaI6, Cs2AgAlBr6) that lie on the convex hull (E_hull <= 0 meV/atom), retain ordered double-perovskite structures, and show strong absorption (~1e5 cm^-1). The work also derives hierarchical design rules linking descriptors to optoelectronic phenotypes.
Significance. If the ML generalization holds, the paper offers a concrete, interpretable inverse-design pipeline that reduces a large compositional space to experimentally relevant candidates while providing physics-based design rules. The combination of ML surrogates with final DFT validation on survivors and the explicit genotype-phenotype mapping are strengths that could aid discovery in halide perovskites for PV and sensors. The significance is limited by the absence of evidence that the screening process itself is reliable outside the training manifold.
major comments (3)
- [Results (screening pipeline)] Results section on screening pipeline: The central claim that the staged constraint stack correctly reduces 13,088 compositions to five phase-stable candidates rests on the assumption that the stability classifier and band-gap regressor generalize to the full screened space, yet no out-of-distribution test set, descriptor-space coverage analysis, or uncertainty quantification is reported comparing the 13,088 to the 1,221 training compounds.
- [Methods (ML model training)] Methods (ML model training) and Abstract: The reported ROC-AUC 0.92 and R2 0.93 are obtained on held-out data drawn from the same 1,221 DFT set; without an independent test on compositions far from this manifold, the reduction step cannot be confirmed as complete or free of systematic false negatives/positives.
- [DFT validation] DFT validation paragraph: Validation that the five survivors satisfy E_hull <= 0 and ordered structures confirms only those specific compounds, not that the ML-driven filtering correctly identified all (or the only) stable candidates from the 13,088.
minor comments (2)
- [Abstract] Abstract: The phrase 'genome-guided' is used without a clear definition of what constitutes the 'genome' versus the four descriptor families; a brief clarification would improve readability.
- [Figures] Figure captions (assumed from typical structure): Ensure all figures showing the constraint stack explicitly label the number of compositions remaining after each stage.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important considerations regarding generalization of the ML models. We respond to each major comment below and indicate planned revisions where appropriate.
read point-by-point responses
-
Referee: Results section on screening pipeline: The central claim that the staged constraint stack correctly reduces 13,088 compositions to five phase-stable candidates rests on the assumption that the stability classifier and band-gap regressor generalize to the full screened space, yet no out-of-distribution test set, descriptor-space coverage analysis, or uncertainty quantification is reported comparing the 13,088 to the 1,221 training compounds.
Authors: We agree that explicit OOD testing, coverage analysis, and uncertainty quantification would provide stronger evidence of generalization. The four descriptor families were selected for physical transferability across A2BB'X6 chemistries, and the 1,221 training compounds were chosen to span diverse elemental combinations. The final DFT validation of the five candidates on the convex hull offers supporting evidence that the pipeline identified viable compounds. We will add a new subsection discussing descriptor-space overlap between training and screened sets along with a limitations paragraph on the absence of formal OOD metrics. revision: partial
-
Referee: Methods (ML model training) and Abstract: The reported ROC-AUC 0.92 and R2 0.93 are obtained on held-out data drawn from the same 1,221 DFT set; without an independent test on compositions far from this manifold, the reduction step cannot be confirmed as complete or free of systematic false negatives/positives.
Authors: The performance figures are from stratified cross-validation within the 1,221-compound DFT dataset. The stability classifier was explicitly recall-optimized to reduce the risk of discarding stable phases. While an external far-manifold test set is absent, the genome-guided descriptors encode packing, bonding, polarization, and electronic features expected to remain relevant beyond the training distribution. We will revise the abstract and methods sections to explicitly state that metrics reflect in-distribution performance and to note the reliance on final DFT validation for the survivors. revision: yes
-
Referee: DFT validation paragraph: Validation that the five survivors satisfy E_hull <= 0 and ordered structures confirms only those specific compounds, not that the ML-driven filtering correctly identified all (or the only) stable candidates from the 13,088.
Authors: We concur that DFT validation establishes the stability of the five reported candidates but does not demonstrate that the pipeline recovered every stable composition or that no other stable phases exist among the 13,088. The objective is to deliver an interpretable, efficient inverse-design workflow that yields experimentally actionable leads together with genotype-phenotype design rules, rather than an exhaustive enumeration. The staged stack was constructed to be conservative via high-recall filtering. We will expand the discussion to clarify this scope and emphasize that the five compounds constitute validated, promising starting points. revision: partial
Circularity Check
No significant circularity; ML screening is a surrogate step with external DFT grounding on final candidates
full rationale
The derivation trains ML surrogates (stability classifier, band-gap regressor) on 1,221 DFT compounds, applies them to filter 13,088 compositions, and then performs independent DFT validation on the five survivors to confirm E_hull <= 0, structure, and absorption. This is a standard surrogate-assisted discovery workflow whose central claims rest on the external DFT benchmarks rather than on the ML outputs alone. No self-definitional steps, no fitted inputs renamed as predictions, and no load-bearing self-citations appear in the provided text. The approach is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Density functional theory calculations provide sufficiently accurate ground-truth labels for thermodynamic stability (E_hull) and band gaps in double perovskites.
- domain assumption The four descriptor families (packing, bonding, polarization, electronic identity) capture the dominant physics controlling phase stability and optoelectronic behavior.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
genome-guided, physics-informed screening framework that decodes thermodynamic stability and optoelectronic behavior through four physically interpretable descriptor families–packing, bonding, polarization, and electronic identity. Trained on 1,221 DFT-calculated A₂BB′X₆ compounds, machine-learning surrogates achieve robust predictive performance, with a recall-optimized stability classifier (ROC–AUC = 0.92) and an XGBoost regressor for band-gap prediction (R² = 0.93 on held-out test data).
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Applying a staged inverse-design constraint stack to 13,088 charge-balanced, lead-free compositions reduces the search space to five DFT-validated, phase-stable semiconductors
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
D.O. Obada, S.B. Akinpelu, S.A. Abolade, E. Okafor, A.M. Ukpong, S. Kumar R, A. Akande, Lead-Free Double Perovskites: A Review of the Structural, Optoelectronic, Mechanical, and Thermoelectric Properties Derived from First-Principles Calculations, and Materials Design Applicable for Pedagogical Purposes, Crystals 14 (2024) 86. https://doi.org/10.3390/crys...
-
[2]
S. Wang, H. Li, L. Qi, K. Pan, Lead-free halide double-perovskite nanocrystals: structure, synthesis, optoelectronic properties, and applications, J. Mater. Chem. C 13 (2025) 19080– 19105. https://doi.org/10.1039/D5TC02430G
-
[3]
M.K.M. Ali, A.A. Mohsen, N.K. Allam, Lead-free perovskite materials for optoelectronic and solar energy applications, Sol. Energy Mater. Sol. Cells 295 (2026) 114025. https://doi.org/10.1016/j.solmat.2025.114025
- [4]
-
[5]
E. Landini, K. Reuter, H. Oberhofer, Machine-learning Based Screening of Lead-free Halide Double Perovskites for Photovoltaic Applications, (2022). https://doi.org/10.48550/arXiv.2208.12736
-
[6]
Z. Chen, J. Wang, C. Li, B. Liu, D. Luo, Y . Min, N. Fu, Q. Xue, Highly versatile and accurate machine learning methods for predicting perovskite properties, J. Mater. Chem. C 12 (2024) 15444–15453. https://doi.org/10.1039/D4TC02268H
-
[7]
Z. Gao, G. Mao, S. Chen, Y . Bai, P. Gao, C. Wu, I.D. Gates, W. Yang, X. Ding, J. Yao, High throughput screening of promising lead-free inorganic halide double perovskites via first- principles calculations, Phys. Chem. Chem. Phys. 24 (2022) 3460–3469. https://doi.org/10.1039/D1CP04976C
-
[8]
K. Hippalgaonkar, Q. Li, X. Wang, J.W. Fisher, J. Kirkpatrick, T. Buonassisi, Knowledge- integrated machine learning for materials: lessons from gameplaying and robotics, Nat. Rev. Mater. 8 (2023) 241–260. https://doi.org/10.1038/s41578-022-00513-1
-
[10]
J. Dean, M. Scheffler, T.A.R. Purcell, S.V . Barabash, R. Bhowmik, T. Bazhirov, Interpretable machine learning for materials design, J. Mater. Res. 38 (2023) 4477–4496. https://doi.org/10.1557/s43578-023-01164-w
-
[11]
Z. Guo, B. Lin, Machine learning stability and band gap of lead-free halide double perovskite materials for perovskite solar cells, Sol. Energy 228 (2021) 689–699. https://doi.org/10.1016/j.solener.2021.09.030
-
[12]
Y . Wei, J. He, C. Yang, W. Yu, J. Feng, X. Liu, X. Chong, Accelerated Multi‐Property Screening of Lead‐Free Halide Double Perovskite via Transfer Learning, Adv. Funct. Mater. 36 (2026) e14377
work page 2026
-
[13]
J. Riebesell, R.E.A. Goodall, P. Benner, Y . Chiang, B. Deng, G. Ceder, M. Asta, A.A. Lee, A. Jain, K.A. Persson, A framework to evaluate machine learning crystal stability predictions, Nat. Mach. Intell. 7 (2025) 836–847. https://doi.org/10.1038/s42256-025- 01055-1
-
[14]
M. Fronzi, M.J. Ford, K.S. Nayal, O. Isayev, C. Stampfl, Interpretable machine learning for thermoelectric materials design with Kolmogorov–Arnold networks, Sci. Rep. 16 (2026) 14146. https://doi.org/10.1038/s41598-026-44723-x
-
[15]
Udabe, A scientist’s guide to AI-driven molecular discovery, Artif
J. Udabe, A scientist’s guide to AI-driven molecular discovery, Artif. Intell. Chem. 4 (2026) 100107. https://doi.org/10.1016/j.aichem.2026.100107
-
[16]
H. Wang, R. Ouyang, W. Chen, A. Pasquarello, High-Quality Data Enabling Universality of Band Gap Descriptor and Discovery of Photovoltaic Perovskites, J Am Chem Soc (2024)
work page 2024
-
[17]
J. Schmidt, J. Shi, P. Borlido, L. Chen, S. Botti, M.A.L. Marques, Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning, Chem. Mater. 29 (2017) 5090–5103. https://doi.org/10.1021/acs.chemmater.7b00156
-
[18]
Y . Wei, J. He, C. Yang, W. Yu, J. Feng, X.-J. Liu, X. Chong, Accelerated Multi‐Property Screening of Lead‐Free Halide Double Perovskite via Transfer Learning, (2025). https://doi.org/10.1002/adfm.202514377
-
[19]
M.H. Moklis, C. Avian, C. Shuo, S. Boonyubol, J.S. Cross, Machine learning-driven prediction and optimization of selective glycerol electrocatalytic reduction into propanediols, J. Electroanal. Chem. 988 (2025) 119150. https://doi.org/10.1016/j.jelechem.2025.119150
-
[20]
M. Baharfar, A.C. Hillier, G. Mao, Charge-Transfer Complexes: Fundamentals and Advances in Catalysis, Sensing, and Optoelectronic Applications, Adv. Mater. 36 (2024) 2406083. https://doi.org/10.1002/adma.202406083
-
[21]
S. Iseki, K. Nonomura, S. Kishida, D. Ogata, J. Yuasa, Zinc-Ion-Stabilized Charge-Transfer Interactions Drive Self-Complementary or Complementary Molecular Recognition, J. Am. Chem. Soc. 142 (2020) 15842–15851. https://doi.org/10.1021/jacs.0c05940
-
[22]
C. Jelsch, Y . Bibila Mayaya Bisseyou, Deciphering the driving forces in crystal packing by analysis of electrostatic energies and contact enrichment ratios, IUCrJ 10 (2023) 557–567. https://doi.org/10.1107/S2052252523005675
-
[23]
F. Marin, A. Zappi, D. Melucci, L. Maini, Self-organizing maps as a data-driven approach to elucidate the packing motifs of perylene diimide derivatives, Mol. Syst. Des. Eng. 8 (2023) 500–515. https://doi.org/10.1039/D2ME00240J
-
[24]
K.M. Steed, J.W. Steed, Packing Problems: High Z ′ Crystal Structures and Their Relationship to Cocrystals, Inclusion Compounds, and Polymorphism, Chem. Rev. 115 (2015) 2895–2933. https://doi.org/10.1021/cr500564z
-
[25]
S. Tretiakov, A. Nigam, R. Pollice, Studying Noncovalent Interactions in Molecular Systems with Machine Learning, Chem. Rev. 125 (2025) 5776–5829. https://doi.org/10.1021/acs.chemrev.4c00893
-
[26]
X. Zhao, M.L. Ball, A. Kakekhani, T. Liu, A.M. Rappe, Y .-L. Loo, A charge transfer framework that describes supramolecular interactions governing structure and properties of 2D perovskites, Nat. Commun. 13 (2022) 3970. https://doi.org/10.1038/s41467-022-31567- y
-
[27]
F. Gou, Z. Ma, Q. Yang, H. Du, Y . Li, Q. Zhang, W. You, Y . Chen, Z. Du, J. Yang, N. He, J. Luo, Z. Liu, Z. Tian, M. Mao, K. Liu, J. Yu, A. Zhang, F. Min, K. Sun, N. Xuan, Machine Learning-Assisted Prediction and Control of Bandgap for Organic–Inorganic Metal Halide Perovskites, ACS Appl. Mater. Interfaces 17 (2025) 18383–18393. https://doi.org/10.1021/a...
-
[28]
X. He, J. Liu, C. Yang, G. Jiang, Predicting thermodynamic stability of magnesium alloys in machine learning, Comput. Mater. Sci. 223 (2023) 112111. https://doi.org/10.1016/j.commatsci.2023.112111
-
[29]
M.R. Soltanian, A. Bemani, F. Moeini, R. Ershadnia, Z. Yang, Z. Du, H. Yin, Z. Dai, Data driven simulations for accurately predicting thermodynamic properties of H2 during geological storage, Fuel 362 (2024) 130768. https://doi.org/10.1016/j.fuel.2023.130768
-
[30]
H. Wang, R. Ouyang, W. Chen, A. Pasquarello, High-Quality Data Enabling Universality of Band Gap Descriptor and Discovery of Photovoltaic Perovskites, J. Am. Chem. Soc. 146 (2024) 17636–17645. https://doi.org/10.1021/jacs.4c03507
-
[31]
R. Rafiu, M. Sakib Hasan, M. Azizur Rahman, I. Ahamed Apon, K. Kriaa, M. Benghanem, S. AlFaify, N. Elboughdiri, First-principles calculations to investigate structural, electronic, optical, elastic, mechanical and phonon properties of novel Q 3 GaBr 6 (Q = Na and K) for next-generation lead-free solar cells, RSC Adv. 16 (2026) 7803–7829. https://doi.org/1...
-
[32]
Y . Zhydachevskyy, Y . Hizhnyi, S.G. Nedilko, I. Kudryavtseva, V . Pankratov, V . Stasiv, L. Vasylechko, D. Sugak, A. Lushchik, M. Berkowski, A. Suchocki, N. Klyui, Band Gap Engineering and Trap Depths of Intrinsic Point Defects in RAlO3 (R = Y , La, Gd, Yb, Lu) Perovskites, J. Phys. Chem. C 125 (2021) 26698–26710. https://doi.org/10.1021/acs.jpcc.1c06573
-
[33]
M.U. Ghani, M. Junaid, K.M. Batoo, M.F. Ijaz, B. Zazoum, An extensive study of structural, electronic, optical, mechanical, and thermodynamic properties of inorganic oxide perovskite materials ScXO3 (X = Ga, In) for optoelectronic applications: A DFT study, Inorg. Chem. Commun. 172 (2025) 113459. https://doi.org/10.1016/j.inoche.2024.113459
-
[34]
B. Xu, Y .D. Xia, J. Yin, X.G. Wan, K. Jiang, A.D. Li, D. Wu, Z.G. Liu, The effect of acoustic phonon scattering on the carrier mobility in the semiconducting zigzag single wall carbon nanotubes, Appl. Phys. Lett. 96 (2010) 183108. https://doi.org/10.1063/1.3427419
-
[35]
Y .K. Chung, J. Lee, W.-G. Lee, D. Sung, S. Chae, S. Oh, K.H. Choi, B.J. Kim, J.-Y . Choi, J. Huh, Theoretical Study of Anisotropic Carrier Mobility for Two-Dimensional Nb2Se9 Material, ACS Omega 6 (2021) 26782–26790. https://doi.org/10.1021/acsomega.1c03728
-
[36]
J. Laflamme Janssen, Y . Gillet, S. Poncé, A. Martin, M. Torrent, X. Gonze, Precise effective masses from density functional perturbation theory, Phys. Rev. B 93 (2016) 205147. https://doi.org/10.1103/PhysRevB.93.205147
-
[37]
Z. Li, P. Graziosi, N. Neophytou, Deformation potential extraction and computationally efficient mobility calculations in silicon from first principles, Phys. Rev. B 104 (2021) 195201. https://doi.org/10.1103/PhysRevB.104.195201
-
[38]
F. Murphy-Armando, G. Fagas, J.C. Greer, Deformation Potentials and Electron−Phonon Coupling in Silicon Nanowires, Nano Lett. 10 (2010) 869–873. https://doi.org/10.1021/nl9034384. Research Article Genome guided Interpretable Screening of Perovskite Supporting Information Genome-Guided Interpretable Screening of Phase-Stable, Lead-Free Double Perovskite Ab...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.