Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

Cheng Mu; Peng-Jie Guo; Xin-De Wang; Ze-Feng Gao; Zhi-Rui Chen; Zhong-Yi Lu

arxiv: 2507.16307 · v2 · pith:4FOMRKZAnew · submitted 2025-07-22 · 💻 cs.LG · cond-mat.mtrl-sci· cs.AI· physics.chem-ph

Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

Xin-De Wang , Zhi-Rui Chen , Peng-Jie Guo , Ze-Feng Gao , Cheng Mu , Zhong-Yi Lu This is my paper

Pith reviewed 2026-05-22 00:20 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.mtrl-scics.AIphysics.chem-ph

keywords perovskite solar cellslarge language modelsprecursor additivesmaterials discoverydefect passivationexperimental designphotovoltaicsdomain adaptation

0 comments

The pith

A fine-tuned LLM generates experimentally effective precursor additive designs for perovskite solar cells

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Perovskite-R1, a large language model adapted specifically for research on perovskite solar cells by training it on knowledge extracted from 1,232 scientific publications. The goal is to overcome the difficulty researchers face in keeping up with the rapidly expanding literature on how precursor additives can improve the efficiency, stability, and manufacturability of these solar cells. The authors build a training dataset using automated methods to create questions, answers, and step-by-step reasoning from the papers, combined with a large library of possible materials. This allows the model to propose new strategies for choosing additives that passivate defects in the material. Experiments testing some of these proposals show gains in stability and performance, suggesting the approach can make materials discovery faster and more systematic.

Core claim

Perovskite-R1 is created by fine-tuning the QwQ-32B model on a dataset constructed from 1,232 high-quality papers on perovskite solar cells and a library of 33,269 candidate materials. The dataset is generated through automated question-answer pairs and chain-of-thought reasoning to capture relationships between precursors, processes, and device outcomes. The resulting model can synthesize insights from the literature to generate innovative solutions for defect passivation and the selection of precursor additives. Several strategies proposed by the model were tested experimentally and confirmed to improve material stability and performance, demonstrating a practical closed-loop system for智能,

What carries the argument

Perovskite-R1, the domain-specialized LLM obtained by instruction-tuning on automatically generated reasoning chains from perovskite literature and material databases

Load-bearing premise

The automated question-answer generation and chain-of-thought reasoning applied to the 1,232 papers produces high-quality, unbiased training examples that faithfully capture the complex relationships between precursors, processes, and device outcomes

What would settle it

Performing the experimental validations described for the model-proposed precursor additives and observing no improvement in stability or performance compared to baseline devices

Figures

Figures reproduced from arXiv: 2507.16307 by Cheng Mu, Peng-Jie Guo, Xin-De Wang, Ze-Feng Gao, Zhi-Rui Chen, Zhong-Yi Lu.

**Figure 2.** Figure 2: An example of dialogue. The user provides the prompt, then Perovskite-R1 first gives its thought process and following its final result. depth. The prompt we design consists of three clearly delineated sections: 1. Task Definition. In the first section, we explicitly state the overarching goal: guiding the model to recognize and select appropriate chemical precursor additives for perovskite synthesis. For … view at source ↗

**Figure 3.** Figure 3: Device architecture and the selected moleculars. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Current density-voltage (J–V ) characteristic curves (forward and reverse scans) of PSCs. As illustrated in (a), the control group is represented, while (b) through (e) present the experimental outcomes for the four selected molecules. And (f) offers a comprehensive summary of the experimental results. Test conditions: scan rate of 10mV/10ms, AM1.5G illumination (100 mW/cm2 ). The open-circuit voltage (VOC… view at source ↗

**Figure 5.** Figure 5: The word cloud and t-SNE visualization of the instruction dataset. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Perovskite solar cells (PSCs) have rapidly emerged as a leading contender in next-generation photovoltaic technologies, owing to their exceptional power conversion efficiencies and advantageous material properties. Despite these advances, challenges such as long-term stability, environmental sustainability, and scalable manufacturing continue to hinder their commercialization. Precursor additive engineering has shown promise in addressing these issues by enhancing both the performance and durability of PSCs. However, the explosive growth of scientific literature and the complex interplay of materials, processes, and device architectures make it increasingly difficult for researchers to efficiently access, organize, and utilize domain knowledge in this rapidly evolving field. To address this gap, we introduce Perovskite-R1, a specialized large language model (LLM) with advanced reasoning capabilities tailored for the discovery and design of PSC precursor additives. By systematically mining and curating 1,232 high-quality scientific publications and integrating a comprehensive library of 33,269 candidate materials, we constructed a domain-specific instruction-tuning dataset using automated question-answer generation and chain-of-thought reasoning. Fine-tuning the QwQ-32B model on this dataset resulted in Perovskite-R1, which can intelligently synthesize literature insights and generate innovative and practical solutions for defect passivation and the selection of precursor additives. Experimental validation of several model-proposed strategies confirms their effectiveness in improving material stability and performance. Our work demonstrates the potential of domain-adapted LLMs in accelerating materials discovery and provides a closed-loop framework for intelligent, data-driven advancements in perovskite photovoltaic research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Perovskite-R1 fine-tunes a 32B model on 1,232 papers to suggest additives and runs some lab checks, but the automated dataset step has no reported quality controls.

read the letter

The main point is that the authors built a domain-specific LLM called Perovskite-R1 by curating 1,232 perovskite papers, generating an instruction dataset through automated QA and chain-of-thought, fine-tuning QwQ-32B, and then testing a few of its additive suggestions in actual devices. They also pulled in a library of over 33,000 candidate materials to keep suggestions practical. That combination of literature mining plus real experiments is what gives the work its modest edge over earlier text-only efforts in the area. The curation itself and the closed-loop idea are straightforward and useful steps for this specific materials problem. The experimental confirmation that some proposals improved stability is the part that moves it beyond pure modeling claims. The soft spot sits in the training data pipeline. Automated generation of questions, answers, and reasoning chains from papers can easily bake in selection biases or miss negative results, yet the abstract gives no accuracy checks, expert review rates, or error analysis on those examples. Without that, it is difficult to tell whether the model is surfacing genuinely new insights or simply repackaging well-known additives. The experimental section is also light on numbers: no sample sizes, controls, or quantitative effect sizes are mentioned, so the validation remains preliminary. This paper is mainly for perovskite device researchers who already follow the additive literature and want an AI tool to generate starting ideas quickly. A reader in that niche could extract some concrete suggestions and the overall framework. It is grounded enough, with new artifacts and lab tests, to deserve a serious referee rather than a desk reject. I would send it for review but ask the authors to add details on dataset validation and full experimental protocols.

Referee Report

2 major / 2 minor

Summary. The paper introduces Perovskite-R1, a domain-specialized LLM obtained by fine-tuning QwQ-32B on an instruction-tuning dataset constructed from 1,232 curated publications on perovskite solar cells via automated question-answer generation and chain-of-thought reasoning, together with a library of 33,269 candidate materials. The model is claimed to synthesize literature insights to propose precursor additives for defect passivation and performance improvement in PSCs. The authors report that experimental validation of several model-proposed strategies confirms their effectiveness in enhancing material stability and performance, presenting a closed-loop framework for LLM-assisted materials discovery.

Significance. If the generated dataset faithfully encodes literature relationships and the reported experimental improvements are reproducible with appropriate controls, the work would demonstrate a practical route for domain-adapted LLMs to accelerate knowledge synthesis and experimental design in a high-volume literature field such as perovskite photovoltaics.

major comments (2)

[Dataset construction] Dataset construction section: the automated question-answer generation and chain-of-thought reasoning applied to the 1,232 papers is presented without any reported accuracy metrics, expert validation rate, bias audit, or error analysis. This is load-bearing for the central claim because the downstream experimental improvements are attributed to the model's reasoning, which in turn rests on the fidelity of the instruction-tuning examples.
[Experimental validation] Experimental validation section: the abstract states that experimental validation confirmed effectiveness, yet provides no quantitative results, controls, sample sizes, statistical analysis, or details on how the proposed strategies were selected and tested. This leaves the central experimental claim only partially supported.

minor comments (2)

[Abstract and Methods] The numbers 1,232 papers and 33,269 candidate materials should be cross-checked for consistency between abstract, methods, and any supplementary tables.
[Methods] Clarify whether the 33,269-material library was used only for candidate generation or also for additional filtering steps in the experimental design workflow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestions. We have addressed each of the major comments in detail below, and the manuscript has been revised to incorporate additional information and clarifications where appropriate.

read point-by-point responses

Referee: [Dataset construction] Dataset construction section: the automated question-answer generation and chain-of-thought reasoning applied to the 1,232 papers is presented without any reported accuracy metrics, expert validation rate, bias audit, or error analysis. This is load-bearing for the central claim because the downstream experimental improvements are attributed to the model's reasoning, which in turn rests on the fidelity of the instruction-tuning examples.

Authors: We agree that explicit validation of the dataset construction process is important to substantiate the model's capabilities. Although the generation process was designed with domain-specific templates and iterative refinement, the original manuscript did not include quantitative metrics. In the revised manuscript, we have added a new subsection on dataset validation, including accuracy metrics from expert review of a sample of generated examples, inter-rater reliability, and a summary of identified errors and biases. This revision directly addresses the concern regarding the fidelity of the instruction-tuning data. revision: yes
Referee: [Experimental validation] Experimental validation section: the abstract states that experimental validation confirmed effectiveness, yet provides no quantitative results, controls, sample sizes, statistical analysis, or details on how the proposed strategies were selected and tested. This leaves the central experimental claim only partially supported.

Authors: We thank the referee for this observation. The experimental results are detailed in the main text and supplementary materials, but we recognize that the presentation could be more comprehensive. We have revised the experimental validation section to include specific quantitative results (such as measured improvements in power conversion efficiency and stability metrics), descriptions of control experiments, sample sizes used, statistical analyses performed, and a clear explanation of the strategy selection process based on model rankings. These additions provide stronger support for the effectiveness of the proposed strategies. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core process—curating 1,232 external publications, building a 33,269-material library, generating an instruction-tuning dataset via automated QA/CoT, fine-tuning QwQ-32B, and confirming model-proposed strategies through new laboratory experiments—relies on independent external sources and empirical validation. No equations, fitted parameters, or self-citations reduce any central claim to its inputs by construction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the representativeness of the automatically generated training data and the assumption that standard LLM fine-tuning transfers useful domain knowledge without introducing systematic errors from the curation pipeline.

free parameters (2)

Curated publication count
The specific number of 1,232 papers was chosen as the source corpus for dataset construction.
Candidate materials library size
The library of 33,269 materials was integrated as the pool for additive suggestions.

axioms (1)

domain assumption Automated question-answer generation from scientific text produces training data of sufficient quality for effective domain adaptation of LLMs.
This premise is invoked when the authors describe constructing the instruction-tuning dataset from the mined publications.

pith-pipeline@v0.9.0 · 5838 in / 1316 out tokens · 77027 ms · 2026-05-22T00:20:52.152132+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we constructed a domain-specific instruction-tuning dataset using automated question-answer generation and chain-of-thought reasoning. Fine-tuning the QwQ-32B model on this dataset resulted in Perovskite-R1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 8 internal anchors

[1]

Organometal halide perovskites as visible-light sensitizers for photovoltaic cells.Journal of the american chemical society, 131(17):6050–6051, 2009

Akihiro Kojima, Kenjiro Teshima, Yasuo Shirai, and Tsutomu Miyasaka. Organometal halide perovskites as visible-light sensitizers for photovoltaic cells.Journal of the american chemical society, 131(17):6050–6051, 2009

work page 2009
[2]

Interactive Best Research-Cell Efficiency Chart, June

National Renewable Energy Laboratory. Interactive Best Research-Cell Efficiency Chart, June

work page
[3]

Removal of residual additive enabling per- fect crystallization of photovoltaic perovskites

Ze-Kai Bian, Zhenhuang Su, Yan-Hui Lou, Jing Chen, Run-Jun Jin, Chun-Hao Chen, Yu Xia, Lei Huang, Kai-Li Wang, Xingyu Gao, et al. Removal of residual additive enabling per- fect crystallization of photovoltaic perovskites. Angewandte Chemie International Edition, 64(4):e202416887, 2025

work page 2025
[4]

Divalent cation replacement strategy stabilizes wide-bandgap perovskite for Cu (In, Ga) Se2 tandem solar cells.Nature Photonics, 19:479–485, 2025

Liuwen Tian, Enbing Bi, Ilhan Yavuz, Caner Deger, Yuan Tian, Jingjing Zhou, Shaochen Zhang, Qingqing Liu, Jiahui Shen, Libing Yao, et al. Divalent cation replacement strategy stabilizes wide-bandgap perovskite for Cu (In, Ga) Se2 tandem solar cells.Nature Photonics, 19:479–485, 2025

work page 2025
[5]

Recent defect passivation drifts and role of additive engineering in perovskite photovoltaics.Nano Energy, 101:107579, 2022

Ali Hassan, Zhijie Wang, Yeong Hwan Ahn, Muhammad Azam, Abbas Ahmad Khan, Umar Farooq, Muhammad Zubair, and Yu Cao. Recent defect passivation drifts and role of additive engineering in perovskite photovoltaics.Nano Energy, 101:107579, 2022

work page 2022
[6]

Perovskite solar cells: Progress, challenges, and future avenues to clean energy.Solar Energy, 287:113205, 2025

Mohsin Afroz, Ratneshwar Kumar Ratnesh, Swapnil Srivastava, and Jay Singh. Perovskite solar cells: Progress, challenges, and future avenues to clean energy.Solar Energy, 287:113205, 2025

work page 2025
[7]

Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm.The Journal of Chemical Physics, 160(6), 2024

Jiaqi Yang, Panayotis Manganaris, and Arun Mannodi-Kanakkithodi. Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm.The Journal of Chemical Physics, 160(6), 2024

work page 2024
[8]

Feature selection in machine learning for perovskite materials design and discovery.Materials, 16(8):3134, 2023

Junya Wang, Pengcheng Xu, Xiaobo Ji, Minjie Li, and Wencong Lu. Feature selection in machine learning for perovskite materials design and discovery.Materials, 16(8):3134, 2023

work page 2023
[9]

A machine learning approach for in silico prediction of the photovoltaic properties of perovskite solar cells based on dopant-free hole-transport materials

Islam M Abdellah and Ahmed El-Shafei. A machine learning approach for in silico prediction of the photovoltaic properties of perovskite solar cells based on dopant-free hole-transport materials. New Journal of Chemistry, 48(44):18666–18682, 2024

work page 2024
[10]

Perovskite-llm: Knowledge-enhanced large language models for perovskite solar cell research.arXiv preprint arXiv:2502.12669, 2025

Xiang Liu, Penglei Sun, Shuyan Chen, Longhan Zhang, Peijie Dong, Huajie You, Yongqi Zhang, Chang Yan, Xiaowen Chu, and Tong-yi Zhang. Perovskite-llm: Knowledge-enhanced large language models for perovskite solar cell research.arXiv preprint arXiv:2502.12669, 2025. 14

work page arXiv 2025
[11]

Explainable synthesizability prediction of inorganic crystal polymorphs using large language models.Angewandte Chemie International Edition, 64(19):e202423950, 2025

Seongmin Kim, Joshua Schrier, and Yousung Jung. Explainable synthesizability prediction of inorganic crystal polymorphs using large language models.Angewandte Chemie International Edition, 64(19):e202423950, 2025

work page 2025
[12]

Perovskite solar cells.Nature Reviews Methods Primers, 5(1):3, 2025

Jiye Han, Keonwoo Park, Shaun Tan, Yana Vaynzof, Jingjing Xue, Eric Wei-Guang Diau, Moungi G Bawendi, Jin-Wook Lee, and Il Jeon. Perovskite solar cells.Nature Reviews Methods Primers, 5(1):3, 2025

work page 2025
[13]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Sori- cut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022
[19]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine

Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. arXiv preprint arXiv:2308.09442, 2023

work page arXiv 2023
[23]

Medbiolm: Optimizing medical and biological qa with fine-tuned large language models and retrieval-augmented generation.arXiv preprint arXiv:2502.03004, 2025

Seonok Kim. Medbiolm: Optimizing medical and biological qa with fine-tuned large language models and retrieval-augmented generation.arXiv preprint arXiv:2502.03004, 2025

work page arXiv 2025
[24]

Pharmagpt: Domain-specific large language models for bio-pharmaceutical and chemistry.arXiv preprint arXiv:2406.18045, 2024

Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, et al. Pharmagpt: Domain-specific large language models for bio-pharmaceutical and chemistry.arXiv preprint arXiv:2406.18045, 2024. 15

work page arXiv 2024
[25]

Crystal structure generation with autoregressive large language modeling.Nature Communications, 15(1):1–16, 2024

Luis M Antunes, Keith T Butler, and Ricardo Grau-Crespo. Crystal structure generation with autoregressive large language modeling.Nature Communications, 15(1):1–16, 2024

work page 2024
[26]

Fine-tuned language models generate stable inorganic materials as text

Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C Lawrence Zitnick, and Zachary Ulissi. Fine-tuned language models generate stable inorganic materials as text. arXiv preprint arXiv:2402.04379, 2024

work page arXiv 2024
[27]

Flowllm: Flow match- ingformaterialgenerationwithlargelanguagemodelsasbasedistributions

Anuroop Sriram, Benjamin Miller, Ricky TQ Chen, and Brandon Wood. Flowllm: Flow match- ingformaterialgenerationwithlargelanguagemodelsasbasedistributions. Advances in Neural Information Processing Systems, 37:46025–46046, 2024

work page 2024
[28]

Foundational large language models for materials research.arXiv preprint arXiv:2412.09560, 2024

Vaibhav Mishra, Somaditya Singh, Dhruv Ahlawat, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, NM Krishnan, et al. Foundational large language models for materials research.arXiv preprint arXiv:2412.09560, 2024

work page arXiv 2024
[29]

Coursegpt-zh: An educational large language model based on knowledge distillation incorporating prompt optimization

Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, et al. Coursegpt-zh: An educational large language model based on knowledge distillation incorporating prompt optimization. arXiv preprint arXiv:2405.04781, 2024

work page arXiv 2024
[30]

Beyond answers: Large language model-powered tu- toring system in physics education for deep learning and precise understanding.arXiv preprint arXiv:2406.10934, 2024

Zhoumingju Jiang and Mengjun Jiang. Beyond answers: Large language model-powered tu- toring system in physics education for deep learning and precise understanding.arXiv preprint arXiv:2406.10934, 2024

work page arXiv 2024
[31]

Investlm: A large langu age model for investment using ﬁnancial domain instruction tuning

Yi Yang, Yixuan Tang, and Kar Yan Tam. Investlm: A large language model for investment using financial domain instruction tuning.arXiv preprint arXiv:2309.13064, 2023

work page arXiv 2023
[32]

Financial knowledge large language model.arXiv preprint arXiv:2407.00365, 2024

Cehao Yang, Chengjin Xu, and Yiyan Qi. Financial knowledge large language model.arXiv preprint arXiv:2407.00365, 2024

work page arXiv 2024
[33]

Fin-R1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, et al. Fin-r1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

work page arXiv 2025
[34]

Qwq-32b: Embracing the power of reinforcement learning, March 2025

Qwen Team. Qwq-32b: Embracing the power of reinforcement learning, March 2025

work page 2025
[35]

Ai-driven inverse design of materials: Past, present and future.Chinese Physics Letters, 2024

Xiao-Qi Han, Xin-De Wang, Meng-Yuan Xu, Zhen Feng, Bo-Wen Yao, Peng-Jie Guo, Ze-Feng Gao, and Zhong-Yi Lu. Ai-driven inverse design of materials: Past, present and future.Chinese Physics Letters, 2024

work page 2024
[36]

Materials generation in the era of artificial intelligence: A comprehensive survey

ZhixunLi, BinCao, RuiJiao, LiangWang, DingWang, YangLiu, DingshuoChen, JiaLi, Qiang Liu, Yu Rong, et al. Materials generation in the era of artificial intelligence: A comprehensive survey. arXiv preprint arXiv:2505.16379, 2025

work page arXiv 2025
[37]

Ai-driven materials design: a mini-review

Mouyang Cheng, Chu-Liang Fu, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Artittaya Boonkird, Nguyen Tuan Hung, and Mingda Li. Ai-driven materials design: a mini-review. arXiv preprint arXiv:2502.02905, 2025

work page arXiv 2025
[38]

Atomistic line graph neural network for improved ma- terials property predictions.npj Computational Materials, 7(1):185, 2021

Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved ma- terials property predictions.npj Computational Materials, 7(1):185, 2021

work page 2021
[39]

Graph neural network prediction of nonlinear optical properties.arXiv preprint arXiv:2504.19987, 2025

Yomn Alkabakibi, Congwei Xie, and Artem R Oganov. Graph neural network prediction of nonlinear optical properties.arXiv preprint arXiv:2504.19987, 2025. 16

work page arXiv 2025
[40]

Advancing 2d material predictions: superior work function estimation with atomistic line graph neural networks.RSC advances, 14(51):38070–38078, 2024

Harikrishnan Sibi, Jovita Biju, and Chandra Chowdhury. Advancing 2d material predictions: superior work function estimation with atomistic line graph neural networks.RSC advances, 14(51):38070–38078, 2024

work page 2024
[41]

Joshua Ojih, Mohammed Al-Fahdi, Yagang Yao, Jianjun Hu, and Ming Hu. Graph theory and graph neural network assisted high-throughput crystal structure prediction and screening for energy conversion and storage.Journal of Materials Chemistry A, 12(14):8502–8515, 2024

work page 2024
[42]

Rapid prediction of phonon structureandpropertiesusingtheatomisticlinegraphneuralnetwork(alignn)

Ramya Gurunathan, Kamal Choudhary, and Francesca Tavazza. Rapid prediction of phonon structureandpropertiesusingtheatomisticlinegraphneuralnetwork(alignn). Physical Review Materials, 7(2):023803, 2023

work page 2023
[43]

Ctgnn: Crystal transformer graph neural network for crystal material property prediction

Zijian Du, Luozhijie Jin, Le Shu, Yan Cen, Yuanfeng Xu, Yongfeng Mei, and Hao Zhang. Ctgnn: Crystal transformer graph neural network for crystal material property prediction. arXiv preprint arXiv:2405.11502, 2024

work page arXiv 2024
[44]

An equivariant graph neural network for the elasticity tensors of all seven crystal systems.Digital Discovery, 3(5):869–882, 2024

Mingjian Wen, Matthew K Horton, Jason M Munro, Patrick Huck, and Kristin A Persson. An equivariant graph neural network for the elasticity tensors of all seven crystal systems.Digital Discovery, 3(5):869–882, 2024

work page 2024
[45]

Explainableai for material property prediction based on energy cloud: a shapley-driven approach.Materials, 16(23):7322, 2023

FaizaQayyum, MuradAliKhan, Do-HyeunKim, HyunseokKo, andGa-AeRyu. Explainableai for material property prediction based on energy cloud: a shapley-driven approach.Materials, 16(23):7322, 2023

work page 2023
[46]

Crystal diffusion variational autoencoder for periodic material generation.arXiv preprint arXiv:2110.06197,

Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi Jaakkola. Crys- tal diffusion variational autoencoder for periodic material generation. arXiv preprint arXiv:2110.06197, 2021

work page arXiv 2021
[47]

Crystal structure prediction by joint equivariant diffusion.Advances in Neural Information Processing Systems, 36:17464–17497, 2023

Rui Jiao, Wenbing Huang, Peijia Lin, Jiaqi Han, Pin Chen, Yutong Lu, and Yang Liu. Crystal structure prediction by joint equivariant diffusion.Advances in Neural Information Processing Systems, 36:17464–17497, 2023

work page 2023
[48]

Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

work page 2023
[49]

Mahoney, Andy Nonaka, and Zhi Yao

Yingheng Tang, Wenbin Xu, Jie Cao, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W Mahoney, Andy Nonaka, and Zhi Yao. Matterchat: A multi-modal llm for material science. arXiv preprint arXiv:2502.13107, 2025

work page arXiv 2025
[50]

Honeycomb: A flexible llm-based agent system for materials science

Huan Zhang, Yu Song, Ziyu Hou, Santiago Miret, and Bang Liu. Honeycomb: A flexible llm-based agent system for materials science.arXiv preprint arXiv:2409.00135, 2024

work page arXiv 2024
[51]

Llamp: Large language model made powerful for high-fidelity materials knowledge retrieval and distillation.arXiv preprint arXiv:2401.17244, 2024

Yuan Chiang, Elvis Hsieh, Chia-Hong Chou, and Janosh Riebesell. Llamp: Large language model made powerful for high-fidelity materials knowledge retrieval and distillation.arXiv preprint arXiv:2401.17244, 2024

work page arXiv 2024
[52]

Core Design Goal

Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. Llamafactory: Unified efficient fine-tuning of 100+ language models. InPro- ceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand, 2024. Association for Computational Linguis...

work page 2024
[53]

Control crystallization kinetics and film morphology

work page
[54]

Key Design Principles & Strategies:

Optimize energy-level alignment. Key Design Principles & Strategies:

work page
[55]

Lewis acid-base interactions

work page
[56]

Crystallization modulators

work page
[57]

Critical Design Considerations:

Multifunctional additive design. Critical Design Considerations:

work page
[58]

Solubility in DMF/DMSO

work page
[59]

Compatibility with perovskite chemistry and subsequent layers

work page
[60]

Output Requirements: – Analyze functional groups and mechanistic role

Prefer commercially available or natural/food-grade compounds. Output Requirements: – Analyze functional groups and mechanistic role. – Evaluate processing compatibility. – Rank 3–5 top candidate additives by expected PV efficiency improvement. – For each: chemical name + CAS number; rationale/mechanism; primary function(s). Additional Filtering Rule: – *...

work page
[61]

It has a Lewis basic phosphorus atom that could potentially participate in Lewis acid - base interactions with perovskite precursors

**Analysis of each compound**: - **Dimethylphosphine oxide**: - **Solubility**: It is likely to be soluble in DMF/DMSO as it has a phosphorus - oxygen bond which can interact with the polar solvents. It has a Lewis basic phosphorus atom that could potentially participate in Lewis acid - base interactions with perovskite precursors. It could act as a ligan...

work page
[62]

**Final assessment**: - **Top candidates**: 23 - Dimethylphosphine oxide (CAS: 7211 - 39 - 4): It has a phosphorus - oxygen bond that can interact with perovskite precursors through Lewis acid - base interactions, potentially controlling crystallization kinetics and passivating defects. - 5 - Hydroxy - 2 - Methyl - Benzoic Acid (CAS: 578 - 22 - 3): The ca...

work page

[1] [1]

Organometal halide perovskites as visible-light sensitizers for photovoltaic cells.Journal of the american chemical society, 131(17):6050–6051, 2009

Akihiro Kojima, Kenjiro Teshima, Yasuo Shirai, and Tsutomu Miyasaka. Organometal halide perovskites as visible-light sensitizers for photovoltaic cells.Journal of the american chemical society, 131(17):6050–6051, 2009

work page 2009

[2] [2]

Interactive Best Research-Cell Efficiency Chart, June

National Renewable Energy Laboratory. Interactive Best Research-Cell Efficiency Chart, June

work page

[3] [3]

Removal of residual additive enabling per- fect crystallization of photovoltaic perovskites

Ze-Kai Bian, Zhenhuang Su, Yan-Hui Lou, Jing Chen, Run-Jun Jin, Chun-Hao Chen, Yu Xia, Lei Huang, Kai-Li Wang, Xingyu Gao, et al. Removal of residual additive enabling per- fect crystallization of photovoltaic perovskites. Angewandte Chemie International Edition, 64(4):e202416887, 2025

work page 2025

[4] [4]

Divalent cation replacement strategy stabilizes wide-bandgap perovskite for Cu (In, Ga) Se2 tandem solar cells.Nature Photonics, 19:479–485, 2025

Liuwen Tian, Enbing Bi, Ilhan Yavuz, Caner Deger, Yuan Tian, Jingjing Zhou, Shaochen Zhang, Qingqing Liu, Jiahui Shen, Libing Yao, et al. Divalent cation replacement strategy stabilizes wide-bandgap perovskite for Cu (In, Ga) Se2 tandem solar cells.Nature Photonics, 19:479–485, 2025

work page 2025

[5] [5]

Recent defect passivation drifts and role of additive engineering in perovskite photovoltaics.Nano Energy, 101:107579, 2022

Ali Hassan, Zhijie Wang, Yeong Hwan Ahn, Muhammad Azam, Abbas Ahmad Khan, Umar Farooq, Muhammad Zubair, and Yu Cao. Recent defect passivation drifts and role of additive engineering in perovskite photovoltaics.Nano Energy, 101:107579, 2022

work page 2022

[6] [6]

Perovskite solar cells: Progress, challenges, and future avenues to clean energy.Solar Energy, 287:113205, 2025

Mohsin Afroz, Ratneshwar Kumar Ratnesh, Swapnil Srivastava, and Jay Singh. Perovskite solar cells: Progress, challenges, and future avenues to clean energy.Solar Energy, 287:113205, 2025

work page 2025

[7] [7]

Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm.The Journal of Chemical Physics, 160(6), 2024

Jiaqi Yang, Panayotis Manganaris, and Arun Mannodi-Kanakkithodi. Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm.The Journal of Chemical Physics, 160(6), 2024

work page 2024

[8] [8]

Feature selection in machine learning for perovskite materials design and discovery.Materials, 16(8):3134, 2023

Junya Wang, Pengcheng Xu, Xiaobo Ji, Minjie Li, and Wencong Lu. Feature selection in machine learning for perovskite materials design and discovery.Materials, 16(8):3134, 2023

work page 2023

[9] [9]

A machine learning approach for in silico prediction of the photovoltaic properties of perovskite solar cells based on dopant-free hole-transport materials

Islam M Abdellah and Ahmed El-Shafei. A machine learning approach for in silico prediction of the photovoltaic properties of perovskite solar cells based on dopant-free hole-transport materials. New Journal of Chemistry, 48(44):18666–18682, 2024

work page 2024

[10] [10]

Perovskite-llm: Knowledge-enhanced large language models for perovskite solar cell research.arXiv preprint arXiv:2502.12669, 2025

Xiang Liu, Penglei Sun, Shuyan Chen, Longhan Zhang, Peijie Dong, Huajie You, Yongqi Zhang, Chang Yan, Xiaowen Chu, and Tong-yi Zhang. Perovskite-llm: Knowledge-enhanced large language models for perovskite solar cell research.arXiv preprint arXiv:2502.12669, 2025. 14

work page arXiv 2025

[11] [11]

Explainable synthesizability prediction of inorganic crystal polymorphs using large language models.Angewandte Chemie International Edition, 64(19):e202423950, 2025

Seongmin Kim, Joshua Schrier, and Yousung Jung. Explainable synthesizability prediction of inorganic crystal polymorphs using large language models.Angewandte Chemie International Edition, 64(19):e202423950, 2025

work page 2025

[12] [12]

Perovskite solar cells.Nature Reviews Methods Primers, 5(1):3, 2025

Jiye Han, Keonwoo Park, Shaun Tan, Yana Vaynzof, Jingjing Xue, Eric Wei-Guang Diau, Moungi G Bawendi, Jin-Wook Lee, and Il Jeon. Perovskite solar cells.Nature Reviews Methods Primers, 5(1):3, 2025

work page 2025

[13] [13]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Sori- cut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022

[19] [19]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[21] [21]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine

Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. arXiv preprint arXiv:2308.09442, 2023

work page arXiv 2023

[23] [23]

Medbiolm: Optimizing medical and biological qa with fine-tuned large language models and retrieval-augmented generation.arXiv preprint arXiv:2502.03004, 2025

Seonok Kim. Medbiolm: Optimizing medical and biological qa with fine-tuned large language models and retrieval-augmented generation.arXiv preprint arXiv:2502.03004, 2025

work page arXiv 2025

[24] [24]

Pharmagpt: Domain-specific large language models for bio-pharmaceutical and chemistry.arXiv preprint arXiv:2406.18045, 2024

Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, et al. Pharmagpt: Domain-specific large language models for bio-pharmaceutical and chemistry.arXiv preprint arXiv:2406.18045, 2024. 15

work page arXiv 2024

[25] [25]

Crystal structure generation with autoregressive large language modeling.Nature Communications, 15(1):1–16, 2024

Luis M Antunes, Keith T Butler, and Ricardo Grau-Crespo. Crystal structure generation with autoregressive large language modeling.Nature Communications, 15(1):1–16, 2024

work page 2024

[26] [26]

Fine-tuned language models generate stable inorganic materials as text

Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C Lawrence Zitnick, and Zachary Ulissi. Fine-tuned language models generate stable inorganic materials as text. arXiv preprint arXiv:2402.04379, 2024

work page arXiv 2024

[27] [27]

Flowllm: Flow match- ingformaterialgenerationwithlargelanguagemodelsasbasedistributions

Anuroop Sriram, Benjamin Miller, Ricky TQ Chen, and Brandon Wood. Flowllm: Flow match- ingformaterialgenerationwithlargelanguagemodelsasbasedistributions. Advances in Neural Information Processing Systems, 37:46025–46046, 2024

work page 2024

[28] [28]

Foundational large language models for materials research.arXiv preprint arXiv:2412.09560, 2024

Vaibhav Mishra, Somaditya Singh, Dhruv Ahlawat, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, NM Krishnan, et al. Foundational large language models for materials research.arXiv preprint arXiv:2412.09560, 2024

work page arXiv 2024

[29] [29]

Coursegpt-zh: An educational large language model based on knowledge distillation incorporating prompt optimization

Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, et al. Coursegpt-zh: An educational large language model based on knowledge distillation incorporating prompt optimization. arXiv preprint arXiv:2405.04781, 2024

work page arXiv 2024

[30] [30]

Beyond answers: Large language model-powered tu- toring system in physics education for deep learning and precise understanding.arXiv preprint arXiv:2406.10934, 2024

Zhoumingju Jiang and Mengjun Jiang. Beyond answers: Large language model-powered tu- toring system in physics education for deep learning and precise understanding.arXiv preprint arXiv:2406.10934, 2024

work page arXiv 2024

[31] [31]

Investlm: A large langu age model for investment using ﬁnancial domain instruction tuning

Yi Yang, Yixuan Tang, and Kar Yan Tam. Investlm: A large language model for investment using financial domain instruction tuning.arXiv preprint arXiv:2309.13064, 2023

work page arXiv 2023

[32] [32]

Financial knowledge large language model.arXiv preprint arXiv:2407.00365, 2024

Cehao Yang, Chengjin Xu, and Yiyan Qi. Financial knowledge large language model.arXiv preprint arXiv:2407.00365, 2024

work page arXiv 2024

[33] [33]

Fin-R1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, et al. Fin-r1: A large language model for financial reasoning through reinforcement learning.arXiv preprint arXiv:2503.16252, 2025

work page arXiv 2025

[34] [34]

Qwq-32b: Embracing the power of reinforcement learning, March 2025

Qwen Team. Qwq-32b: Embracing the power of reinforcement learning, March 2025

work page 2025

[35] [35]

Ai-driven inverse design of materials: Past, present and future.Chinese Physics Letters, 2024

Xiao-Qi Han, Xin-De Wang, Meng-Yuan Xu, Zhen Feng, Bo-Wen Yao, Peng-Jie Guo, Ze-Feng Gao, and Zhong-Yi Lu. Ai-driven inverse design of materials: Past, present and future.Chinese Physics Letters, 2024

work page 2024

[36] [36]

Materials generation in the era of artificial intelligence: A comprehensive survey

ZhixunLi, BinCao, RuiJiao, LiangWang, DingWang, YangLiu, DingshuoChen, JiaLi, Qiang Liu, Yu Rong, et al. Materials generation in the era of artificial intelligence: A comprehensive survey. arXiv preprint arXiv:2505.16379, 2025

work page arXiv 2025

[37] [37]

Ai-driven materials design: a mini-review

Mouyang Cheng, Chu-Liang Fu, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Artittaya Boonkird, Nguyen Tuan Hung, and Mingda Li. Ai-driven materials design: a mini-review. arXiv preprint arXiv:2502.02905, 2025

work page arXiv 2025

[38] [38]

Atomistic line graph neural network for improved ma- terials property predictions.npj Computational Materials, 7(1):185, 2021

Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved ma- terials property predictions.npj Computational Materials, 7(1):185, 2021

work page 2021

[39] [39]

Graph neural network prediction of nonlinear optical properties.arXiv preprint arXiv:2504.19987, 2025

Yomn Alkabakibi, Congwei Xie, and Artem R Oganov. Graph neural network prediction of nonlinear optical properties.arXiv preprint arXiv:2504.19987, 2025. 16

work page arXiv 2025

[40] [40]

Advancing 2d material predictions: superior work function estimation with atomistic line graph neural networks.RSC advances, 14(51):38070–38078, 2024

Harikrishnan Sibi, Jovita Biju, and Chandra Chowdhury. Advancing 2d material predictions: superior work function estimation with atomistic line graph neural networks.RSC advances, 14(51):38070–38078, 2024

work page 2024

[41] [41]

Joshua Ojih, Mohammed Al-Fahdi, Yagang Yao, Jianjun Hu, and Ming Hu. Graph theory and graph neural network assisted high-throughput crystal structure prediction and screening for energy conversion and storage.Journal of Materials Chemistry A, 12(14):8502–8515, 2024

work page 2024

[42] [42]

Rapid prediction of phonon structureandpropertiesusingtheatomisticlinegraphneuralnetwork(alignn)

Ramya Gurunathan, Kamal Choudhary, and Francesca Tavazza. Rapid prediction of phonon structureandpropertiesusingtheatomisticlinegraphneuralnetwork(alignn). Physical Review Materials, 7(2):023803, 2023

work page 2023

[43] [43]

Ctgnn: Crystal transformer graph neural network for crystal material property prediction

Zijian Du, Luozhijie Jin, Le Shu, Yan Cen, Yuanfeng Xu, Yongfeng Mei, and Hao Zhang. Ctgnn: Crystal transformer graph neural network for crystal material property prediction. arXiv preprint arXiv:2405.11502, 2024

work page arXiv 2024

[44] [44]

An equivariant graph neural network for the elasticity tensors of all seven crystal systems.Digital Discovery, 3(5):869–882, 2024

Mingjian Wen, Matthew K Horton, Jason M Munro, Patrick Huck, and Kristin A Persson. An equivariant graph neural network for the elasticity tensors of all seven crystal systems.Digital Discovery, 3(5):869–882, 2024

work page 2024

[45] [45]

Explainableai for material property prediction based on energy cloud: a shapley-driven approach.Materials, 16(23):7322, 2023

FaizaQayyum, MuradAliKhan, Do-HyeunKim, HyunseokKo, andGa-AeRyu. Explainableai for material property prediction based on energy cloud: a shapley-driven approach.Materials, 16(23):7322, 2023

work page 2023

[46] [46]

Crystal diffusion variational autoencoder for periodic material generation.arXiv preprint arXiv:2110.06197,

Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi Jaakkola. Crys- tal diffusion variational autoencoder for periodic material generation. arXiv preprint arXiv:2110.06197, 2021

work page arXiv 2021

[47] [47]

Crystal structure prediction by joint equivariant diffusion.Advances in Neural Information Processing Systems, 36:17464–17497, 2023

Rui Jiao, Wenbing Huang, Peijia Lin, Jiaqi Han, Pin Chen, Yutong Lu, and Yang Liu. Crystal structure prediction by joint equivariant diffusion.Advances in Neural Information Processing Systems, 36:17464–17497, 2023

work page 2023

[48] [48]

Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

work page 2023

[49] [49]

Mahoney, Andy Nonaka, and Zhi Yao

Yingheng Tang, Wenbin Xu, Jie Cao, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W Mahoney, Andy Nonaka, and Zhi Yao. Matterchat: A multi-modal llm for material science. arXiv preprint arXiv:2502.13107, 2025

work page arXiv 2025

[50] [50]

Honeycomb: A flexible llm-based agent system for materials science

Huan Zhang, Yu Song, Ziyu Hou, Santiago Miret, and Bang Liu. Honeycomb: A flexible llm-based agent system for materials science.arXiv preprint arXiv:2409.00135, 2024

work page arXiv 2024

[51] [51]

Llamp: Large language model made powerful for high-fidelity materials knowledge retrieval and distillation.arXiv preprint arXiv:2401.17244, 2024

Yuan Chiang, Elvis Hsieh, Chia-Hong Chou, and Janosh Riebesell. Llamp: Large language model made powerful for high-fidelity materials knowledge retrieval and distillation.arXiv preprint arXiv:2401.17244, 2024

work page arXiv 2024

[52] [52]

Core Design Goal

Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. Llamafactory: Unified efficient fine-tuning of 100+ language models. InPro- ceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand, 2024. Association for Computational Linguis...

work page 2024

[53] [53]

Control crystallization kinetics and film morphology

work page

[54] [54]

Key Design Principles & Strategies:

Optimize energy-level alignment. Key Design Principles & Strategies:

work page

[55] [55]

Lewis acid-base interactions

work page

[56] [56]

Crystallization modulators

work page

[57] [57]

Critical Design Considerations:

Multifunctional additive design. Critical Design Considerations:

work page

[58] [58]

Solubility in DMF/DMSO

work page

[59] [59]

Compatibility with perovskite chemistry and subsequent layers

work page

[60] [60]

Output Requirements: – Analyze functional groups and mechanistic role

Prefer commercially available or natural/food-grade compounds. Output Requirements: – Analyze functional groups and mechanistic role. – Evaluate processing compatibility. – Rank 3–5 top candidate additives by expected PV efficiency improvement. – For each: chemical name + CAS number; rationale/mechanism; primary function(s). Additional Filtering Rule: – *...

work page

[61] [61]

It has a Lewis basic phosphorus atom that could potentially participate in Lewis acid - base interactions with perovskite precursors

**Analysis of each compound**: - **Dimethylphosphine oxide**: - **Solubility**: It is likely to be soluble in DMF/DMSO as it has a phosphorus - oxygen bond which can interact with the polar solvents. It has a Lewis basic phosphorus atom that could potentially participate in Lewis acid - base interactions with perovskite precursors. It could act as a ligan...

work page

[62] [62]

**Final assessment**: - **Top candidates**: 23 - Dimethylphosphine oxide (CAS: 7211 - 39 - 4): It has a phosphorus - oxygen bond that can interact with perovskite precursors through Lewis acid - base interactions, potentially controlling crystallization kinetics and passivating defects. - 5 - Hydroxy - 2 - Methyl - Benzoic Acid (CAS: 578 - 22 - 3): The ca...

work page