pith. sign in

arxiv: 2507.16307 · v2 · pith:4FOMRKZAnew · submitted 2025-07-22 · 💻 cs.LG · cond-mat.mtrl-sci· cs.AI· physics.chem-ph

Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

Pith reviewed 2026-05-22 00:20 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.mtrl-scics.AIphysics.chem-ph
keywords perovskite solar cellslarge language modelsprecursor additivesmaterials discoverydefect passivationexperimental designphotovoltaicsdomain adaptation
0
0 comments X

The pith

A fine-tuned LLM generates experimentally effective precursor additive designs for perovskite solar cells

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Perovskite-R1, a large language model adapted specifically for research on perovskite solar cells by training it on knowledge extracted from 1,232 scientific publications. The goal is to overcome the difficulty researchers face in keeping up with the rapidly expanding literature on how precursor additives can improve the efficiency, stability, and manufacturability of these solar cells. The authors build a training dataset using automated methods to create questions, answers, and step-by-step reasoning from the papers, combined with a large library of possible materials. This allows the model to propose new strategies for choosing additives that passivate defects in the material. Experiments testing some of these proposals show gains in stability and performance, suggesting the approach can make materials discovery faster and more systematic.

Core claim

Perovskite-R1 is created by fine-tuning the QwQ-32B model on a dataset constructed from 1,232 high-quality papers on perovskite solar cells and a library of 33,269 candidate materials. The dataset is generated through automated question-answer pairs and chain-of-thought reasoning to capture relationships between precursors, processes, and device outcomes. The resulting model can synthesize insights from the literature to generate innovative solutions for defect passivation and the selection of precursor additives. Several strategies proposed by the model were tested experimentally and confirmed to improve material stability and performance, demonstrating a practical closed-loop system for智能,

What carries the argument

Perovskite-R1, the domain-specialized LLM obtained by instruction-tuning on automatically generated reasoning chains from perovskite literature and material databases

Load-bearing premise

The automated question-answer generation and chain-of-thought reasoning applied to the 1,232 papers produces high-quality, unbiased training examples that faithfully capture the complex relationships between precursors, processes, and device outcomes

What would settle it

Performing the experimental validations described for the model-proposed precursor additives and observing no improvement in stability or performance compared to baseline devices

Figures

Figures reproduced from arXiv: 2507.16307 by Cheng Mu, Peng-Jie Guo, Xin-De Wang, Ze-Feng Gao, Zhi-Rui Chen, Zhong-Yi Lu.

Figure 1
Figure 1. Figure 1: The detail of the construction of the proposed Perovskite-R1. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of dialogue. The user provides the prompt, then Perovskite-R1 first gives its thought process and following its final result. depth. The prompt we design consists of three clearly delineated sections: 1. Task Definition. In the first section, we explicitly state the overarching goal: guiding the model to recognize and select appropriate chemical precursor additives for perovskite synthesis. For … view at source ↗
Figure 3
Figure 3. Figure 3: Device architecture and the selected moleculars. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Current density-voltage (J–V ) characteristic curves (forward and reverse scans) of PSCs. As illustrated in (a), the control group is represented, while (b) through (e) present the experimental outcomes for the four selected molecules. And (f) offers a comprehensive summary of the experimental results. Test conditions: scan rate of 10mV/10ms, AM1.5G illumination (100 mW/cm2 ). The open-circuit voltage (VOC… view at source ↗
Figure 5
Figure 5. Figure 5: The word cloud and t-SNE visualization of the instruction dataset. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
read the original abstract

Perovskite solar cells (PSCs) have rapidly emerged as a leading contender in next-generation photovoltaic technologies, owing to their exceptional power conversion efficiencies and advantageous material properties. Despite these advances, challenges such as long-term stability, environmental sustainability, and scalable manufacturing continue to hinder their commercialization. Precursor additive engineering has shown promise in addressing these issues by enhancing both the performance and durability of PSCs. However, the explosive growth of scientific literature and the complex interplay of materials, processes, and device architectures make it increasingly difficult for researchers to efficiently access, organize, and utilize domain knowledge in this rapidly evolving field. To address this gap, we introduce Perovskite-R1, a specialized large language model (LLM) with advanced reasoning capabilities tailored for the discovery and design of PSC precursor additives. By systematically mining and curating 1,232 high-quality scientific publications and integrating a comprehensive library of 33,269 candidate materials, we constructed a domain-specific instruction-tuning dataset using automated question-answer generation and chain-of-thought reasoning. Fine-tuning the QwQ-32B model on this dataset resulted in Perovskite-R1, which can intelligently synthesize literature insights and generate innovative and practical solutions for defect passivation and the selection of precursor additives. Experimental validation of several model-proposed strategies confirms their effectiveness in improving material stability and performance. Our work demonstrates the potential of domain-adapted LLMs in accelerating materials discovery and provides a closed-loop framework for intelligent, data-driven advancements in perovskite photovoltaic research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Perovskite-R1, a domain-specialized LLM obtained by fine-tuning QwQ-32B on an instruction-tuning dataset constructed from 1,232 curated publications on perovskite solar cells via automated question-answer generation and chain-of-thought reasoning, together with a library of 33,269 candidate materials. The model is claimed to synthesize literature insights to propose precursor additives for defect passivation and performance improvement in PSCs. The authors report that experimental validation of several model-proposed strategies confirms their effectiveness in enhancing material stability and performance, presenting a closed-loop framework for LLM-assisted materials discovery.

Significance. If the generated dataset faithfully encodes literature relationships and the reported experimental improvements are reproducible with appropriate controls, the work would demonstrate a practical route for domain-adapted LLMs to accelerate knowledge synthesis and experimental design in a high-volume literature field such as perovskite photovoltaics.

major comments (2)
  1. [Dataset construction] Dataset construction section: the automated question-answer generation and chain-of-thought reasoning applied to the 1,232 papers is presented without any reported accuracy metrics, expert validation rate, bias audit, or error analysis. This is load-bearing for the central claim because the downstream experimental improvements are attributed to the model's reasoning, which in turn rests on the fidelity of the instruction-tuning examples.
  2. [Experimental validation] Experimental validation section: the abstract states that experimental validation confirmed effectiveness, yet provides no quantitative results, controls, sample sizes, statistical analysis, or details on how the proposed strategies were selected and tested. This leaves the central experimental claim only partially supported.
minor comments (2)
  1. [Abstract and Methods] The numbers 1,232 papers and 33,269 candidate materials should be cross-checked for consistency between abstract, methods, and any supplementary tables.
  2. [Methods] Clarify whether the 33,269-material library was used only for candidate generation or also for additional filtering steps in the experimental design workflow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestions. We have addressed each of the major comments in detail below, and the manuscript has been revised to incorporate additional information and clarifications where appropriate.

read point-by-point responses
  1. Referee: [Dataset construction] Dataset construction section: the automated question-answer generation and chain-of-thought reasoning applied to the 1,232 papers is presented without any reported accuracy metrics, expert validation rate, bias audit, or error analysis. This is load-bearing for the central claim because the downstream experimental improvements are attributed to the model's reasoning, which in turn rests on the fidelity of the instruction-tuning examples.

    Authors: We agree that explicit validation of the dataset construction process is important to substantiate the model's capabilities. Although the generation process was designed with domain-specific templates and iterative refinement, the original manuscript did not include quantitative metrics. In the revised manuscript, we have added a new subsection on dataset validation, including accuracy metrics from expert review of a sample of generated examples, inter-rater reliability, and a summary of identified errors and biases. This revision directly addresses the concern regarding the fidelity of the instruction-tuning data. revision: yes

  2. Referee: [Experimental validation] Experimental validation section: the abstract states that experimental validation confirmed effectiveness, yet provides no quantitative results, controls, sample sizes, statistical analysis, or details on how the proposed strategies were selected and tested. This leaves the central experimental claim only partially supported.

    Authors: We thank the referee for this observation. The experimental results are detailed in the main text and supplementary materials, but we recognize that the presentation could be more comprehensive. We have revised the experimental validation section to include specific quantitative results (such as measured improvements in power conversion efficiency and stability metrics), descriptions of control experiments, sample sizes used, statistical analyses performed, and a clear explanation of the strategy selection process based on model rankings. These additions provide stronger support for the effectiveness of the proposed strategies. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core process—curating 1,232 external publications, building a 33,269-material library, generating an instruction-tuning dataset via automated QA/CoT, fine-tuning QwQ-32B, and confirming model-proposed strategies through new laboratory experiments—relies on independent external sources and empirical validation. No equations, fitted parameters, or self-citations reduce any central claim to its inputs by construction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the representativeness of the automatically generated training data and the assumption that standard LLM fine-tuning transfers useful domain knowledge without introducing systematic errors from the curation pipeline.

free parameters (2)
  • Curated publication count
    The specific number of 1,232 papers was chosen as the source corpus for dataset construction.
  • Candidate materials library size
    The library of 33,269 materials was integrated as the pool for additive suggestions.
axioms (1)
  • domain assumption Automated question-answer generation from scientific text produces training data of sufficient quality for effective domain adaptation of LLMs.
    This premise is invoked when the authors describe constructing the instruction-tuning dataset from the mined publications.

pith-pipeline@v0.9.0 · 5838 in / 1316 out tokens · 77027 ms · 2026-05-22T00:20:52.152132+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 8 internal anchors

  1. [1]

    Organometal halide perovskites as visible-light sensitizers for photovoltaic cells.Journal of the american chemical society, 131(17):6050–6051, 2009

    Akihiro Kojima, Kenjiro Teshima, Yasuo Shirai, and Tsutomu Miyasaka. Organometal halide perovskites as visible-light sensitizers for photovoltaic cells.Journal of the american chemical society, 131(17):6050–6051, 2009

  2. [2]

    Interactive Best Research-Cell Efficiency Chart, June

    National Renewable Energy Laboratory. Interactive Best Research-Cell Efficiency Chart, June

  3. [3]

    Removal of residual additive enabling per- fect crystallization of photovoltaic perovskites

    Ze-Kai Bian, Zhenhuang Su, Yan-Hui Lou, Jing Chen, Run-Jun Jin, Chun-Hao Chen, Yu Xia, Lei Huang, Kai-Li Wang, Xingyu Gao, et al. Removal of residual additive enabling per- fect crystallization of photovoltaic perovskites. Angewandte Chemie International Edition, 64(4):e202416887, 2025

  4. [4]

    Divalent cation replacement strategy stabilizes wide-bandgap perovskite for Cu (In, Ga) Se2 tandem solar cells.Nature Photonics, 19:479–485, 2025

    Liuwen Tian, Enbing Bi, Ilhan Yavuz, Caner Deger, Yuan Tian, Jingjing Zhou, Shaochen Zhang, Qingqing Liu, Jiahui Shen, Libing Yao, et al. Divalent cation replacement strategy stabilizes wide-bandgap perovskite for Cu (In, Ga) Se2 tandem solar cells.Nature Photonics, 19:479–485, 2025

  5. [5]

    Recent defect passivation drifts and role of additive engineering in perovskite photovoltaics.Nano Energy, 101:107579, 2022

    Ali Hassan, Zhijie Wang, Yeong Hwan Ahn, Muhammad Azam, Abbas Ahmad Khan, Umar Farooq, Muhammad Zubair, and Yu Cao. Recent defect passivation drifts and role of additive engineering in perovskite photovoltaics.Nano Energy, 101:107579, 2022

  6. [6]

    Perovskite solar cells: Progress, challenges, and future avenues to clean energy.Solar Energy, 287:113205, 2025

    Mohsin Afroz, Ratneshwar Kumar Ratnesh, Swapnil Srivastava, and Jay Singh. Perovskite solar cells: Progress, challenges, and future avenues to clean energy.Solar Energy, 287:113205, 2025

  7. [7]

    Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm.The Journal of Chemical Physics, 160(6), 2024

    Jiaqi Yang, Panayotis Manganaris, and Arun Mannodi-Kanakkithodi. Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm.The Journal of Chemical Physics, 160(6), 2024

  8. [8]

    Feature selection in machine learning for perovskite materials design and discovery.Materials, 16(8):3134, 2023

    Junya Wang, Pengcheng Xu, Xiaobo Ji, Minjie Li, and Wencong Lu. Feature selection in machine learning for perovskite materials design and discovery.Materials, 16(8):3134, 2023

  9. [9]

    A machine learning approach for in silico prediction of the photovoltaic properties of perovskite solar cells based on dopant-free hole-transport materials

    Islam M Abdellah and Ahmed El-Shafei. A machine learning approach for in silico prediction of the photovoltaic properties of perovskite solar cells based on dopant-free hole-transport materials. New Journal of Chemistry, 48(44):18666–18682, 2024

  10. [10]

    Perovskite-llm: Knowledge-enhanced large language models for perovskite solar cell research.arXiv preprint arXiv:2502.12669, 2025

    Xiang Liu, Penglei Sun, Shuyan Chen, Longhan Zhang, Peijie Dong, Huajie You, Yongqi Zhang, Chang Yan, Xiaowen Chu, and Tong-yi Zhang. Perovskite-llm: Knowledge-enhanced large language models for perovskite solar cell research.arXiv preprint arXiv:2502.12669, 2025. 14

  11. [11]

    Explainable synthesizability prediction of inorganic crystal polymorphs using large language models.Angewandte Chemie International Edition, 64(19):e202423950, 2025

    Seongmin Kim, Joshua Schrier, and Yousung Jung. Explainable synthesizability prediction of inorganic crystal polymorphs using large language models.Angewandte Chemie International Edition, 64(19):e202423950, 2025

  12. [12]

    Perovskite solar cells.Nature Reviews Methods Primers, 5(1):3, 2025

    Jiye Han, Keonwoo Park, Shaun Tan, Yana Vaynzof, Jingjing Xue, Eric Wei-Guang Diau, Moungi G Bawendi, Jin-Wook Lee, and Il Jeon. Perovskite solar cells.Nature Reviews Methods Primers, 5(1):3, 2025

  13. [13]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

  14. [14]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Sori- cut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

  15. [15]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  16. [16]

    DeepSeek-V3 Technical Report

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

  17. [17]

    Qwen Technical Report

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

  18. [18]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

  19. [19]

    OpenAI o1 System Card

    Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

  20. [20]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  21. [21]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

  22. [22]

    Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine

    Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. arXiv preprint arXiv:2308.09442, 2023

  23. [23]

    Medbiolm: Optimizing medical and biological qa with fine-tuned large language models and retrieval-augmented generation.arXiv preprint arXiv:2502.03004, 2025

    Seonok Kim. Medbiolm: Optimizing medical and biological qa with fine-tuned large language models and retrieval-augmented generation.arXiv preprint arXiv:2502.03004, 2025

  24. [24]

    Pharmagpt: Domain-specific large language models for bio-pharmaceutical and chemistry.arXiv preprint arXiv:2406.18045, 2024

    Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, et al. Pharmagpt: Domain-specific large language models for bio-pharmaceutical and chemistry.arXiv preprint arXiv:2406.18045, 2024. 15

  25. [25]

    Crystal structure generation with autoregressive large language modeling.Nature Communications, 15(1):1–16, 2024

    Luis M Antunes, Keith T Butler, and Ricardo Grau-Crespo. Crystal structure generation with autoregressive large language modeling.Nature Communications, 15(1):1–16, 2024

  26. [26]

    Fine-tuned language models generate stable inorganic materials as text

    Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C Lawrence Zitnick, and Zachary Ulissi. Fine-tuned language models generate stable inorganic materials as text. arXiv preprint arXiv:2402.04379, 2024

  27. [27]

    Flowllm: Flow match- ingformaterialgenerationwithlargelanguagemodelsasbasedistributions

    Anuroop Sriram, Benjamin Miller, Ricky TQ Chen, and Brandon Wood. Flowllm: Flow match- ingformaterialgenerationwithlargelanguagemodelsasbasedistributions. Advances in Neural Information Processing Systems, 37:46025–46046, 2024

  28. [28]

    Foundational large language models for materials research.arXiv preprint arXiv:2412.09560, 2024

    Vaibhav Mishra, Somaditya Singh, Dhruv Ahlawat, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, NM Krishnan, et al. Foundational large language models for materials research.arXiv preprint arXiv:2412.09560, 2024

  29. [29]

    Coursegpt-zh: An educational large language model based on knowledge distillation incorporating prompt optimization

    Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, et al. Coursegpt-zh: An educational large language model based on knowledge distillation incorporating prompt optimization. arXiv preprint arXiv:2405.04781, 2024

  30. [30]

    Beyond answers: Large language model-powered tu- toring system in physics education for deep learning and precise understanding.arXiv preprint arXiv:2406.10934, 2024

    Zhoumingju Jiang and Mengjun Jiang. Beyond answers: Large language model-powered tu- toring system in physics education for deep learning and precise understanding.arXiv preprint arXiv:2406.10934, 2024

  31. [31]

    Investlm: A large langu age model for investment using financial domain instruction tuning

    Yi Yang, Yixuan Tang, and Kar Yan Tam. Investlm: A large language model for investment using financial domain instruction tuning.arXiv preprint arXiv:2309.13064, 2023

  32. [32]

    Financial knowledge large language model.arXiv preprint arXiv:2407.00365, 2024

    Cehao Yang, Chengjin Xu, and Yiyan Qi. Financial knowledge large language model.arXiv preprint arXiv:2407.00365, 2024

  33. [33]

    Fin-R1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

    Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, et al. Fin-r1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

  34. [34]

    Qwq-32b: Embracing the power of reinforcement learning, March 2025

    Qwen Team. Qwq-32b: Embracing the power of reinforcement learning, March 2025

  35. [35]

    Ai-driven inverse design of materials: Past, present and future.Chinese Physics Letters, 2024

    Xiao-Qi Han, Xin-De Wang, Meng-Yuan Xu, Zhen Feng, Bo-Wen Yao, Peng-Jie Guo, Ze-Feng Gao, and Zhong-Yi Lu. Ai-driven inverse design of materials: Past, present and future.Chinese Physics Letters, 2024

  36. [36]

    Materials generation in the era of artificial intelligence: A comprehensive survey

    ZhixunLi, BinCao, RuiJiao, LiangWang, DingWang, YangLiu, DingshuoChen, JiaLi, Qiang Liu, Yu Rong, et al. Materials generation in the era of artificial intelligence: A comprehensive survey. arXiv preprint arXiv:2505.16379, 2025

  37. [37]

    Ai-driven materials design: a mini-review

    Mouyang Cheng, Chu-Liang Fu, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Artittaya Boonkird, Nguyen Tuan Hung, and Mingda Li. Ai-driven materials design: a mini-review. arXiv preprint arXiv:2502.02905, 2025

  38. [38]

    Atomistic line graph neural network for improved ma- terials property predictions.npj Computational Materials, 7(1):185, 2021

    Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved ma- terials property predictions.npj Computational Materials, 7(1):185, 2021

  39. [39]

    Graph neural network prediction of nonlinear optical properties.arXiv preprint arXiv:2504.19987, 2025

    Yomn Alkabakibi, Congwei Xie, and Artem R Oganov. Graph neural network prediction of nonlinear optical properties.arXiv preprint arXiv:2504.19987, 2025. 16

  40. [40]

    Advancing 2d material predictions: superior work function estimation with atomistic line graph neural networks.RSC advances, 14(51):38070–38078, 2024

    Harikrishnan Sibi, Jovita Biju, and Chandra Chowdhury. Advancing 2d material predictions: superior work function estimation with atomistic line graph neural networks.RSC advances, 14(51):38070–38078, 2024

  41. [41]

    Joshua Ojih, Mohammed Al-Fahdi, Yagang Yao, Jianjun Hu, and Ming Hu. Graph theory and graph neural network assisted high-throughput crystal structure prediction and screening for energy conversion and storage.Journal of Materials Chemistry A, 12(14):8502–8515, 2024

  42. [42]

    Rapid prediction of phonon structureandpropertiesusingtheatomisticlinegraphneuralnetwork(alignn)

    Ramya Gurunathan, Kamal Choudhary, and Francesca Tavazza. Rapid prediction of phonon structureandpropertiesusingtheatomisticlinegraphneuralnetwork(alignn). Physical Review Materials, 7(2):023803, 2023

  43. [43]

    Ctgnn: Crystal transformer graph neural network for crystal material property prediction

    Zijian Du, Luozhijie Jin, Le Shu, Yan Cen, Yuanfeng Xu, Yongfeng Mei, and Hao Zhang. Ctgnn: Crystal transformer graph neural network for crystal material property prediction. arXiv preprint arXiv:2405.11502, 2024

  44. [44]

    An equivariant graph neural network for the elasticity tensors of all seven crystal systems.Digital Discovery, 3(5):869–882, 2024

    Mingjian Wen, Matthew K Horton, Jason M Munro, Patrick Huck, and Kristin A Persson. An equivariant graph neural network for the elasticity tensors of all seven crystal systems.Digital Discovery, 3(5):869–882, 2024

  45. [45]

    Explainableai for material property prediction based on energy cloud: a shapley-driven approach.Materials, 16(23):7322, 2023

    FaizaQayyum, MuradAliKhan, Do-HyeunKim, HyunseokKo, andGa-AeRyu. Explainableai for material property prediction based on energy cloud: a shapley-driven approach.Materials, 16(23):7322, 2023

  46. [46]

    Crystal diffusion variational autoencoder for periodic material generation.arXiv preprint arXiv:2110.06197,

    Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi Jaakkola. Crys- tal diffusion variational autoencoder for periodic material generation. arXiv preprint arXiv:2110.06197, 2021

  47. [47]

    Crystal structure prediction by joint equivariant diffusion.Advances in Neural Information Processing Systems, 36:17464–17497, 2023

    Rui Jiao, Wenbing Huang, Peijia Lin, Jiaqi Han, Pin Chen, Yutong Lu, and Yang Liu. Crystal structure prediction by joint equivariant diffusion.Advances in Neural Information Processing Systems, 36:17464–17497, 2023

  48. [48]

    Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

    Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

  49. [49]

    Mahoney, Andy Nonaka, and Zhi Yao

    Yingheng Tang, Wenbin Xu, Jie Cao, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W Mahoney, Andy Nonaka, and Zhi Yao. Matterchat: A multi-modal llm for material science. arXiv preprint arXiv:2502.13107, 2025

  50. [50]

    Honeycomb: A flexible llm-based agent system for materials science

    Huan Zhang, Yu Song, Ziyu Hou, Santiago Miret, and Bang Liu. Honeycomb: A flexible llm-based agent system for materials science.arXiv preprint arXiv:2409.00135, 2024

  51. [51]

    Llamp: Large language model made powerful for high-fidelity materials knowledge retrieval and distillation.arXiv preprint arXiv:2401.17244, 2024

    Yuan Chiang, Elvis Hsieh, Chia-Hong Chou, and Janosh Riebesell. Llamp: Large language model made powerful for high-fidelity materials knowledge retrieval and distillation.arXiv preprint arXiv:2401.17244, 2024

  52. [52]

    Core Design Goal

    Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. Llamafactory: Unified efficient fine-tuning of 100+ language models. InPro- ceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand, 2024. Association for Computational Linguis...

  53. [53]

    Control crystallization kinetics and film morphology

  54. [54]

    Key Design Principles & Strategies:

    Optimize energy-level alignment. Key Design Principles & Strategies:

  55. [55]

    Lewis acid-base interactions

  56. [56]

    Crystallization modulators

  57. [57]

    Critical Design Considerations:

    Multifunctional additive design. Critical Design Considerations:

  58. [58]

    Solubility in DMF/DMSO

  59. [59]

    Compatibility with perovskite chemistry and subsequent layers

  60. [60]

    Output Requirements: – Analyze functional groups and mechanistic role

    Prefer commercially available or natural/food-grade compounds. Output Requirements: – Analyze functional groups and mechanistic role. – Evaluate processing compatibility. – Rank 3–5 top candidate additives by expected PV efficiency improvement. – For each: chemical name + CAS number; rationale/mechanism; primary function(s). Additional Filtering Rule: – *...

  61. [61]

    It has a Lewis basic phosphorus atom that could potentially participate in Lewis acid - base interactions with perovskite precursors

    **Analysis of each compound**: - **Dimethylphosphine oxide**: - **Solubility**: It is likely to be soluble in DMF/DMSO as it has a phosphorus - oxygen bond which can interact with the polar solvents. It has a Lewis basic phosphorus atom that could potentially participate in Lewis acid - base interactions with perovskite precursors. It could act as a ligan...

  62. [62]

    **Final assessment**: - **Top candidates**: 23 - Dimethylphosphine oxide (CAS: 7211 - 39 - 4): It has a phosphorus - oxygen bond that can interact with perovskite precursors through Lewis acid - base interactions, potentially controlling crystallization kinetics and passivating defects. - 5 - Hydroxy - 2 - Methyl - Benzoic Acid (CAS: 578 - 22 - 3): The ca...