GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols

Alexandre C. Dias; Celso Ricardo Caldeira R\^ego; Diego Guedes-Sobrinho; Maur\'icio J. Piotrowski; Mohammad Soleymanibrojeni; Roland Aydin; Wolfgang Wenzel

arxiv: 2512.06404 · v1 · pith:MUSMI5IYnew · submitted 2025-12-06 · 💻 cs.AI · cond-mat.mtrl-sci· physics.chem-ph

GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols

Mohammad Soleymanibrojeni , Roland Aydin , Diego Guedes-Sobrinho , Alexandre C. Dias , Maur\'icio J. Piotrowski , Wolfgang Wenzel , Celso Ricardo Caldeira R\^ego This is my paper

Pith reviewed 2026-05-25 07:49 UTC · model grok-4.3

classification 💻 cs.AI cond-mat.mtrl-sciphysics.chem-ph

keywords agentic AIQuantum ESPRESSODFT simulationsautonomous workflowerror recoverymaterials discoverylarge language modelsknowledge graph

0 comments

The pith

GENIUS converts free-form user prompts into validated Quantum ESPRESSO inputs that run to completion on roughly 80 percent of 295 benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GENIUS as an agentic AI workflow that automates the setup of atomistic simulations for materials discovery. It fuses a Quantum ESPRESSO knowledge graph with tiered large language models under the control of a finite-state error-recovery machine. The system turns ordinary human instructions into simulation input files without requiring computer specialists. On 295 test cases it reaches about 80 percent completion rate, with 76 percent of successes coming from autonomous repairs, while halving inference costs and removing most hallucinations relative to plain LLM baselines.

Core claim

GENIUS translates free-form human-generated prompts into validated input files that run to completion on ≈80% of 295 diverse benchmarks, where 76% are autonomously repaired, with success decaying exponentially to a 7% baseline. Compared with LLM-only baselines, GENIUS halves inference costs and virtually eliminates hallucinations. The framework democratizes electronic-structure DFT simulations by intelligently automating protocol generation, validation, and repair.

What carries the argument

A finite-state error-recovery machine supervising a tiered hierarchy of large language models connected to a Quantum ESPRESSO knowledge graph.

If this is right

Non-specialists can generate and execute DFT simulation protocols without coding expertise.
Protocol generation, validation, and repair become automated, enabling broader adoption of integrated computational materials engineering.
Inference costs drop by half relative to direct large-language-model use.
Hallucinations in generated simulation inputs are virtually eliminated.
Large-scale screening and accelerated design loops become feasible across academia and industry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same structure could be adapted to other simulation codes if equivalent knowledge graphs are developed.
Performance on truly novel prompts may require periodic updates to the knowledge graph as codes evolve.
Linking the workflow to experimental results could support closed-loop materials discovery systems.
Success rates might improve with larger or more varied benchmark sets drawn from real user logs.

Load-bearing premise

The 295 benchmarks capture the range and difficulty of prompts that non-expert users would actually issue in practice.

What would settle it

Running GENIUS on a new collection of 100 prompts written by actual non-expert materials researchers and measuring the fraction that produce complete runs without any manual fixes.

read the original abstract

Predictive atomistic simulations have propelled materials discovery, yet routine setup and debugging still demand computer specialists. This know-how gap limits Integrated Computational Materials Engineering (ICME), where state-of-the-art codes exist but remain cumbersome for non-experts. We address this bottleneck with GENIUS, an AI-agentic workflow that fuses a smart Quantum ESPRESSO knowledge graph with a tiered hierarchy of large language models supervised by a finite-state error-recovery machine. Here we show that GENIUS translates free-form human-generated prompts into validated input files that run to completion on $\approx$80% of 295 diverse benchmarks, where 76% are autonomously repaired, with success decaying exponentially to a 7% baseline. Compared with LLM-only baselines, GENIUS halves inference costs and virtually eliminates hallucinations. The framework democratizes electronic-structure DFT simulations by intelligently automating protocol generation, validation, and repair, opening large-scale screening and accelerating ICME design loops across academia and industry worldwide.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GENIUS wires a QE knowledge graph to tiered LLMs and a finite-state repair machine to turn prompts into running DFT jobs at ~80% on their 295 cases, but the test-set construction is not described well enough to judge real-world robustness.

read the letter

The paper's main deliverable is a concrete agentic pipeline that combines a domain knowledge graph for Quantum ESPRESSO with a supervised stack of LLMs and a finite-state machine that catches and fixes input errors. On the reported 295 benchmarks it reaches roughly 80% completion, with 76% of those fixes happening without human input, and it shows lower inference cost plus fewer hallucinations than plain LLM baselines. The exponential drop to a 7% baseline when the repair layer is removed is a clean way to quantify what the FSM adds. That integration is the actual engineering step forward; it is not a new theoretical principle but a practical assembly that targets a known pain point in materials workflows. The knowledge graph supplies the rules that keep the LLMs from drifting, and the tiered models let cheaper calls handle routine checks while reserving heavier models for harder cases. This is the sort of system that could let non-specialists run routine DFT without constant hand-holding. The soft spot is the evaluation. The abstract and stress-test note both leave the 295 cases opaque: no account of how they were sampled, whether they match the distribution of real non-expert prompts, or whether any were held out after the recovery logic was written. Without that, the 80% figure and the cost/hallucination gains could be inflated by test-set construction. The paper would benefit from a clearer description of prompt statistics and at least one external validation set. This work is aimed at computational materials groups that already use QE and want to reduce setup time. Readers building agentic tools for scientific codes will find the architecture worth examining. It is solid enough on the engineering side to deserve a serious referee who can press on the benchmark details and ask for the missing failure-mode analysis.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces GENIUS, an agentic AI framework that combines a Quantum ESPRESSO knowledge graph, a tiered hierarchy of LLMs, and a finite-state error-recovery machine to translate free-form human prompts into validated, executable simulation input files. It reports that the system achieves ≈80% completion on 295 diverse benchmarks (with 76% of cases autonomously repaired), success decaying exponentially to a 7% baseline, and that it halves inference costs while virtually eliminating hallucinations relative to LLM-only baselines.

Significance. If the empirical results are reproducible and the benchmarks are representative, the work could meaningfully lower the expertise barrier for routine DFT simulations, supporting broader adoption of ICME workflows. The integration of a domain knowledge graph with supervised agentic recovery offers a concrete, deployable pattern for automating scientific computing protocols.

major comments (2)

[Abstract] Abstract: the headline performance figures (≈80% completion, 76% autonomous repair on 295 benchmarks, exponential decay to 7% baseline, halved costs, elimination of hallucinations) are stated without any description of benchmark selection criteria, baseline implementation details, error-bar or statistical methodology, or failure-mode categorization. These omissions make it impossible to assess whether the reported metrics are load-bearing evidence for the claimed robustness and generalization.
[Abstract] Abstract: the central claim that the finite-state recovery machine plus knowledge graph will continue to function on unseen real-user prompts rests on the unvalidated assumption that the 295 benchmarks match the distribution of length, terminology, implicit assumptions, and error types that non-experts actually produce; no evidence is supplied that the set was constructed independently of the recovery logic or evaluated on a truly held-out collection of user-generated cases.

minor comments (1)

[Abstract] Abstract: the phrase 'diverse benchmarks' is used without enumerating the scientific domains, code versions, or complexity strata represented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these focused comments on the abstract. We will revise the manuscript to address the concerns about missing context and generalizability assumptions, while preserving the abstract's brevity. Details follow point by point.

read point-by-point responses

Referee: [Abstract] Abstract: the headline performance figures (≈80% completion, 76% autonomous repair on 295 benchmarks, exponential decay to 7% baseline, halved costs, elimination of hallucinations) are stated without any description of benchmark selection criteria, baseline implementation details, error-bar or statistical methodology, or failure-mode categorization. These omissions make it impossible to assess whether the reported metrics are load-bearing evidence for the claimed robustness and generalization.

Authors: We agree that the abstract would be strengthened by brief contextual phrases. In revision we will insert concise qualifiers noting that the 295 benchmarks span prompt complexities and common DFT error types (full selection criteria in Section 3.1), that baselines follow standard zero-shot and few-shot LLM prompting (Methods), and that error bars, statistical tests, and failure-mode breakdowns are provided in the Results section and Supplementary Information. This keeps the abstract within length limits while directing readers to the supporting evidence. revision: yes
Referee: [Abstract] Abstract: the central claim that the finite-state recovery machine plus knowledge graph will continue to function on unseen real-user prompts rests on the unvalidated assumption that the 295 benchmarks match the distribution of length, terminology, implicit assumptions, and error types that non-experts actually produce; no evidence is supplied that the set was constructed independently of the recovery logic or evaluated on a truly held-out collection of user-generated cases.

Authors: The benchmarks were assembled by domain experts to cover a broad range of prompt styles, lengths, and error categories drawn from typical DFT workflows, with construction performed separately from the recovery-machine implementation details. The manuscript presents the exponential decay to the 7% baseline as empirical support for robustness rather than a distributional proof. We acknowledge the absence of a held-out corpus of actual non-expert user prompts. In revision we will add a clarifying sentence in the abstract and a short limitations paragraph in the Discussion that states this scope and identifies real-user validation as future work. revision: partial

Circularity Check

0 steps flagged

No circularity detected; empirical benchmark results only

full rationale

The paper reports empirical success rates (≈80% completion, 76% autonomous repair on 295 benchmarks) from running the GENIUS agentic workflow against a fixed benchmark set. No derivation chain, equations, fitted parameters, or predictions appear in the abstract or described structure. Performance metrics are direct observations, not quantities that reduce by construction to inputs or prior self-citations. The framework description invokes no uniqueness theorems or ansatzes smuggled via citation. This is a standard non-circular empirical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied software-engineering paper; the central claim rests on empirical performance of an implemented workflow rather than on mathematical axioms, fitted constants, or newly postulated physical entities.

pith-pipeline@v0.9.0 · 5741 in / 1135 out tokens · 51607 ms · 2026-05-25T07:49:39.466993+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

El Agente Quntur: A research collaborator agent for quantum chemistry
physics.chem-ph 2026-02 unverdicted novelty 7.0

El Agente Quntur is a new multi-agent system that uses reasoning over literature and software documentation to autonomously handle the full workflow of quantum chemistry experiments in ORCA.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Toward computational materials design: the impact of density functional theory on materials research.MRS bulletin, 31(9):659–668, 2006

Jürgen Hafner, Christopher Wolverton, and Gerbrand Ceder. Toward computational materials design: the impact of density functional theory on materials research.MRS bulletin, 31(9):659–668, 2006. doi: doi:10.1557/mrs2006.174

work page doi:10.1557/mrs2006.174 2006
[3]

A review of the application of machine learning and data mining approaches in continuum materials mechanics.Frontiers in Materials, 6:110, 2019

Frederic E Bock, Roland C Aydin, Christian J Cyron, Norbert Huber, Surya R Kalidindi, and Benjamin Klusemann. A review of the application of machine learning and data mining approaches in continuum materials mechanics.Frontiers in Materials, 6:110, 2019. doi: 10.1021/acs.chemrev.2c00479

work page doi:10.1021/acs.chemrev.2c00479 2019
[4]

SimStack: An intuitive workflow frame- work.Frontiers in Materials, 9, may 2022

Celso Ricardo Caldeira Rego, Jörg Schaarschmidt, Tobias Schlöder, Montserrat Penaloza-Amion, Saien- tan Bag, Tobias Neumann, Timo Strunk, and Wolfgang Wenzel. SimStack: An intuitive workflow frame- work.Frontiers in Materials, 9, may 2022. doi: https://doi.org/10.3389/fmats.2022.877597

work page doi:10.3389/fmats.2022.877597 2022
[5]

Zhuo Yu, Baltej Singh, Yue Yu, and Linda F. Nazar. Suppressing argyrodite oxidation by tuning the host structure for high-areal-capacity all-solid-state lithium–sulfur batteries.Nature Materials, May 2025. ISSN 1476-4660. doi: 10.1038/s41563-025-02238-2. URLhttp://dx.doi.org/10.1038/s41563-0 25-02238-2

work page doi:10.1038/s41563-025-02238-2 2025
[6]

A family of dual-anion-based sodium superionic conduc- tors for all-solid-state sodium-ion batteries.Nature Materials, 24(1):83–91, Oct

Xiaoting Lin, Shumin Zhang, Menghao Yang, Biwei Xiao, Yang Zhao, Jing Luo, Jiamin Fu, Changhong Wang, Xiaona Li, Weihan Li, Feipeng Yang, Hui Duan, Jianwen Liang, Bolin Fu, Hamidreza Abdolvand, Jinghua Guo, Graham King, and Xueliang Sun. A family of dual-anion-based sodium superionic conduc- tors for all-solid-state sodium-ion batteries.Nature Materials, ...

work page doi:10.1038/s41563-024-02011-x 2024
[7]

Warzecha, Marshall S

Xuexiang Han, Mohamad-Gabriel Alameh, Ningqiang Gong, Lulu Xue, Majed Ghattas, Goutham Bojja, Junchao Xu, Gan Zhao, Claude C. Warzecha, Marshall S. Padilla, Rakan El-Mayta, Garima Dwivedi, Ying Xu, Andrew E. Vaughan, James M. Wilson, Drew Weissman, and Michael J. Mitchell. Fast and facile synthesis of amidine-incorporated degradable lipids for versatile m...

work page doi:10.1038/s41557-024-01557-2 2024
[8]

Four ways to power-up ai for drug discovery.Nature, Feb

Anthony King. Four ways to power-up ai for drug discovery.Nature, Feb. 2025. ISSN 1476-4687. doi: 10.1038/d41586-025-00602-5. URLhttp://dx.doi.org/10.1038/d41586-025-00602-5

work page doi:10.1038/d41586-025-00602-5 2025
[9]

Huber, Giovanni Pizzi, Leonid Kahle, Felix T

Joerg Schaarschmidt, Jie Yuan, Timo Strunk, Ivan Kondov, Sebastiaan P. Huber, Giovanni Pizzi, Leonid Kahle, Felix T. Bölle, Ivano E. Castelli, Tejs Vegge, Felix Hanke, Tilmann Hickel, Jörg Neugebauer, Celso R. C. Rêgo, and Wolfgang Wenzel. Workflow engineering in materials design within the battery 2030+ project.Advanced Energy Materials, 12(17), Dec. 202...

work page doi:10.1002/aenm.2021026 2030
[10]

URLhttp://dx.doi.org/10.1002/aenm.202102638

work page doi:10.1002/aenm.202102638
[11]

Integrated computational materials engineering: a new paradigm for the global materials profession.Jom, 58:25–27, 2006

John Allison, Dan Backman, and Leo Christodoulou. Integrated computational materials engineering: a new paradigm for the global materials profession.Jom, 58:25–27, 2006. doi: 10.1007/s11837-006-022 3-5

work page doi:10.1007/s11837-006-022 2006
[12]

Integrated computational materials engineering of corrosion resistant alloys.npj Materials Degradation, 2(1):6, 2018

Christopher D Taylor, Pin Lu, James Saal, GS Frankel, and JR Scully. Integrated computational materials engineering of corrosion resistant alloys.npj Materials Degradation, 2(1):6, 2018. doi: 10.1038/s41529 -018-0027-4

work page doi:10.1038/s41529 2018
[13]

Castelli, Stewart J

Kurt Lejaeghere, Gustav Bihlmayer, Torbjörn Björkman, Peter Blaha, Stefan Blügel, V olker Blum, Damien Caliste, Ivano E. Castelli, Stewart J. Clark, Andrea Dal Corso, Stefano de Gironcoli, Thierry Deutsch, John Kay Dewhurst, Igor Di Marco, Claudia Draxl, Marcin Dułak, Olle Eriksson, José A. Flores- Livas, Kevin F. Garrity, Luigi Genovese, Paolo Giannozzi,...

work page doi:10.1126/science.aad3000 2016
[14]

Sebastiaan P. Huber, Emanuele Bosoni, Marnik Bercx, Jens Bröder, Augustin Degomme, Vladimir Dikan, Kristjan Eimre, Espen Flage-Larsen, Alberto Garcia, Luigi Genovese, Dominik Gresch, Conrad Johnston, Guido Petretto, Samuel Poncé, Gian-Marco Rignanese, Christopher J. Sewell, Berend Smit, Vasily Tse- plyaev, Martin Uhrin, Daniel Wortmann, Aliaksandr V . Yak...

work page doi:10.1038/s41524-021-00594-6 2021
[15]

Automated workflow for analyzing thermody- namic stability in polymorphic perovskite alloys.npj Computational Materials, 10(1), July 2024

Luis Octavio de Araujo, Celso Ricardo Caldeira Rego, Wolfgang Wenzel, Maurício Jeomar Piotrowski, Alexandre Cavalheiro Dias, and Diego Guedes-Sobrinho. Automated workflow for analyzing thermody- namic stability in polymorphic perovskite alloys.npj Computational Materials, 10(1), July 2024. ISSN 2057-3960. doi: 10.1038/s41524-024-01320-8. URLhttp://dx.doi....

work page doi:10.1038/s41524-024-01320-8 2024
[16]

Towards high-throughput many-body perturbation theory: efficient algorithms and automated workflows.npj Computational Materials, 9(1), May 2023

Miki Bonacci, Junfeng Qiao, Nicola Spallanzani, Antimo Marrazzo, Giovanni Pizzi, Elisa Molinari, Daniele Varsano, Andrea Ferretti, and Deborah Prezzi. Towards high-throughput many-body perturbation theory: efficient algorithms and automated workflows.npj Computational Materials, 9(1), May 2023. ISSN 2057-3960. doi: 10.1038/s41524-023-01027-2. URLhttp://dx...

work page doi:10.1038/s41524-023-01027-2 2023
[17]

An active learning approach to model solid-electrolyte interphase formation in li-ion batteries.Journal of Materials Chemistry A, 12(4):2249–2266, 2024

Mohammad Soleymanibrojeni, Celso Ricardo Caldeira Rego, Meysam Esmaeilpour, and Wolfgang Wen- zel. An active learning approach to model solid-electrolyte interphase formation in li-ion batteries.Journal of Materials Chemistry A, 12(4):2249–2266, 2024. ISSN 2050-7496. doi: 10.1039/d3ta06054c. URL http://dx.doi.org/10.1039/D3TA06054C

work page doi:10.1039/d3ta06054c 2024
[18]

Bridges and mechanisms: integrating systems science thinking into implementation research.Annual Review of Public Health, 45, 2024

Douglas A Luke, Byron J Powell, and Alejandra Paniagua-Avila. Bridges and mechanisms: integrating systems science thinking into implementation research.Annual Review of Public Health, 45, 2024. doi: 10.1146/annurev-publhealth-060922-040205

work page doi:10.1146/annurev-publhealth-060922-040205 2024
[19]

Artificial intelligence, scientific discovery, and product innovation.arXiv preprint arXiv:2412.17866, 2024

Aidan Toner-Rodgers. Artificial intelligence, scientific discovery, and product innovation.arXiv preprint arXiv:2412.17866, 2024. doi: 10.48550/arXiv.2412.17866

work page doi:10.48550/arxiv.2412.17866 2024
[20]

14 exam- ples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon.Digital discovery, 2(5):1233–1250, 2023

Kevin Maik Jablonka, Qianxiang Ai, Alexander Al-Feghali, Shruti Badhwar, Joshua D Bocarsly, An- dres M Bran, Stefan Bringuier, L Catherine Brinson, Kamal Choudhary, Defne Circi, et al. 14 exam- ples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon.Digital discovery, 2(5):1233–1250, 2023. doi: 10.1...

work page doi:10.1039/d3dd00113j 2023
[21]

Knowledge graph embedding: A survey of ap- proaches and applications.IEEE transactions on knowledge and data engineering, 29(12):2724–2743,

Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. Knowledge graph embedding: A survey of ap- proaches and applications.IEEE transactions on knowledge and data engineering, 29(12):2724–2743,

work page
[22]

doi: 10.1109/TKDE.2017.2754499

work page doi:10.1109/tkde.2017.2754499 2017
[23]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V . Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirz...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haotian Yao, Haotian Zhao, Haoyu Lu, Haoze Li, Haoz...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Quantum espresso: a modular and open-source software project for quantum simulations of materials.Journal of Physics: Condensed Matter, 21(39):395502, Sept

Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavazzoni, Davide Ceresoli, Guido L Chiarotti, Matteo Cococcioni, Ismaila Dabo, Andrea Dal Corso, Stefano de Gironcoli, Stefano Fabris, Guido Fratesi, Ralph Gebauer, Uwe Gerstmann, Christos Gougoussis, Anton Kokalj, Michele Lazzeri, Layla Martin-Samos, Nicola Marzari, Fran...

work page doi:10.1088/0953-8984/21/39/3955 2009
[27]

URLhttp://dx.doi.org/10.1088/0953-8984/21/39/395502

work page doi:10.1088/0953-8984/21/39/395502
[28]

A systematic study on the potentials and limitations of llm-assisted software development

Chiara Michelutti, Jens Eckert, Milko Monecke, Julian Klein, and Sabine Glesner. A systematic study on the potentials and limitations of llm-assisted software development. In2024 2nd International Conference 19 on F oundation and Large Language Models (FLLM), pages 330–338. IEEE, 2024. doi: 10.1109/FLLM 63129.2024.10852455

work page doi:10.1109/fllm 2024
[29]

Detecting llm hallucinations using monte carlo simulations on token probabilities.Authorea Preprints, 2024

Grant Ledger and Rafael Mancinni. Detecting llm hallucinations using monte carlo simulations on token probabilities.Authorea Preprints, 2024. doi: 10.36227/techrxiv.171822396.61518693/v1

work page doi:10.36227/techrxiv.171822396.61518693/v1 2024
[30]

The fundamental principles of reproducibility.Philosophical Transactions of the Royal Society A, 379(2197):20200210, 2021

Odd Erik Gundersen. The fundamental principles of reproducibility.Philosophical Transactions of the Royal Society A, 379(2197):20200210, 2021. doi: 10.1098/rsta.2020.0210

work page doi:10.1098/rsta.2020.0210 2021
[31]

Open science, open access, and the democratization of knowledge.Issues in science and technology, 35(3):26–28, 2019

J Britt Holbrook. Open science, open access, and the democratization of knowledge.Issues in science and technology, 35(3):26–28, 2019

work page 2019
[32]

Accessed: February, 2025

Claude 3.5 sonnet.https://www.anthropic.com/news/claude-3-5-sonnet, 2024. Accessed: February, 2025

work page 2024
[33]

Is cosine-similarity of embeddings really about similarity? 2024

Harald Steck, Chaitanya Ekanadham, and Nathan Kallus. Is cosine-similarity of embeddings really about similarity? 2024. doi: 10.48550/ARXIV.2403.05440. URLhttps://arxiv.org/abs/2403.05440

work page doi:10.48550/arxiv.2403.05440 2024
[34]

Mixtral-8x22b instruct.https://mistral.ai/news/mixtral-8x22b, 2024

Mistral AI. Mixtral-8x22b instruct.https://mistral.ai/news/mixtral-8x22b, 2024. Accessed: February, 2025

work page 2024
[35]

Databricks

Inc. Databricks. dbrx.https://www.databricks.com/blog/introducing-dbrx-new-state-art -open-llm, 2024. Accessed: February, 2025

work page 2024
[36]

Meta llama 3.1.https://ai.meta.com/blog/meta-llama-3-1/, 2024

Meta AI. Meta llama 3.1.https://ai.meta.com/blog/meta-llama-3-1/, 2024. Accessed: February, 2025

work page 2024
[37]

Gemini 2.0 flash.https://blog.google/technology/google-deepmind/google-g emini-ai-update-december-2024/, 2024

Google AI. Gemini 2.0 flash.https://blog.google/technology/google-deepmind/google-g emini-ai-update-december-2024/, 2024. Accessed: February, 2025

work page 2024
[38]

Materials cloud three-dimensional crystals database (mc3d).Materials Cloud Archive 2022.38, 2022

S Huber, M Bercx, N Hörmann, M Uhrin, G Pizzi, and N Marzari. Materials cloud three-dimensional crystals database (mc3d).Materials Cloud Archive 2022.38, 2022. doi: 10.24435/materialscloud:rw-t0

work page doi:10.24435/materialscloud:rw-t0 2022
[39]

Expansion of the materials cloud 2d database.ACS nano, 17(12):11268–11278, 2023

Davide Campi, Nicolas Mounet, Marco Gibertini, Giovanni Pizzi, and Nicola Marzari. Expansion of the materials cloud 2d database.ACS nano, 17(12):11268–11278, 2023. doi: 10.1021/acsnano.2c11510

work page doi:10.1021/acsnano.2c11510 2023
[40]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[41]

Mohammad Soleymanibrojeni and Celso Ricardo Caldeira Rego. agentic-workflow-framework: AI- driven agentic framework for autonomous simulation protocol generation and execution.https: //github.com/KIT-Workflows/agentic-workflow-framework, 2025. GitHub repository

work page 2025
[42]

Quantum ESPRESSO Foundationhttps://www.quantum- espresso.org/Doc/pw_user_guide/, 2023

Quantum ESPRESSO Group.User’s Guide for Quantum ESPRESSO (pw.x). Quantum ESPRESSO Foundationhttps://www.quantum- espresso.org/Doc/pw_user_guide/, 2023. Accessed: February, 2025

work page 2023
[43]

The self-organizing map.Proceedings of the IEEE, 78(9):1464–1480, 1990

Teuvo Kohonen. The self-organizing map.Proceedings of the IEEE, 78(9):1464–1480, 1990. doi: 10.1109/5.58325. 20

work page doi:10.1109/5.58325 1990

[1] [1]

Toward computational materials design: the impact of density functional theory on materials research.MRS bulletin, 31(9):659–668, 2006

Jürgen Hafner, Christopher Wolverton, and Gerbrand Ceder. Toward computational materials design: the impact of density functional theory on materials research.MRS bulletin, 31(9):659–668, 2006. doi: doi:10.1557/mrs2006.174

work page doi:10.1557/mrs2006.174 2006

[2] [3]

A review of the application of machine learning and data mining approaches in continuum materials mechanics.Frontiers in Materials, 6:110, 2019

Frederic E Bock, Roland C Aydin, Christian J Cyron, Norbert Huber, Surya R Kalidindi, and Benjamin Klusemann. A review of the application of machine learning and data mining approaches in continuum materials mechanics.Frontiers in Materials, 6:110, 2019. doi: 10.1021/acs.chemrev.2c00479

work page doi:10.1021/acs.chemrev.2c00479 2019

[3] [4]

SimStack: An intuitive workflow frame- work.Frontiers in Materials, 9, may 2022

Celso Ricardo Caldeira Rego, Jörg Schaarschmidt, Tobias Schlöder, Montserrat Penaloza-Amion, Saien- tan Bag, Tobias Neumann, Timo Strunk, and Wolfgang Wenzel. SimStack: An intuitive workflow frame- work.Frontiers in Materials, 9, may 2022. doi: https://doi.org/10.3389/fmats.2022.877597

work page doi:10.3389/fmats.2022.877597 2022

[4] [5]

Zhuo Yu, Baltej Singh, Yue Yu, and Linda F. Nazar. Suppressing argyrodite oxidation by tuning the host structure for high-areal-capacity all-solid-state lithium–sulfur batteries.Nature Materials, May 2025. ISSN 1476-4660. doi: 10.1038/s41563-025-02238-2. URLhttp://dx.doi.org/10.1038/s41563-0 25-02238-2

work page doi:10.1038/s41563-025-02238-2 2025

[5] [6]

A family of dual-anion-based sodium superionic conduc- tors for all-solid-state sodium-ion batteries.Nature Materials, 24(1):83–91, Oct

Xiaoting Lin, Shumin Zhang, Menghao Yang, Biwei Xiao, Yang Zhao, Jing Luo, Jiamin Fu, Changhong Wang, Xiaona Li, Weihan Li, Feipeng Yang, Hui Duan, Jianwen Liang, Bolin Fu, Hamidreza Abdolvand, Jinghua Guo, Graham King, and Xueliang Sun. A family of dual-anion-based sodium superionic conduc- tors for all-solid-state sodium-ion batteries.Nature Materials, ...

work page doi:10.1038/s41563-024-02011-x 2024

[6] [7]

Warzecha, Marshall S

Xuexiang Han, Mohamad-Gabriel Alameh, Ningqiang Gong, Lulu Xue, Majed Ghattas, Goutham Bojja, Junchao Xu, Gan Zhao, Claude C. Warzecha, Marshall S. Padilla, Rakan El-Mayta, Garima Dwivedi, Ying Xu, Andrew E. Vaughan, James M. Wilson, Drew Weissman, and Michael J. Mitchell. Fast and facile synthesis of amidine-incorporated degradable lipids for versatile m...

work page doi:10.1038/s41557-024-01557-2 2024

[7] [8]

Four ways to power-up ai for drug discovery.Nature, Feb

Anthony King. Four ways to power-up ai for drug discovery.Nature, Feb. 2025. ISSN 1476-4687. doi: 10.1038/d41586-025-00602-5. URLhttp://dx.doi.org/10.1038/d41586-025-00602-5

work page doi:10.1038/d41586-025-00602-5 2025

[8] [9]

Huber, Giovanni Pizzi, Leonid Kahle, Felix T

Joerg Schaarschmidt, Jie Yuan, Timo Strunk, Ivan Kondov, Sebastiaan P. Huber, Giovanni Pizzi, Leonid Kahle, Felix T. Bölle, Ivano E. Castelli, Tejs Vegge, Felix Hanke, Tilmann Hickel, Jörg Neugebauer, Celso R. C. Rêgo, and Wolfgang Wenzel. Workflow engineering in materials design within the battery 2030+ project.Advanced Energy Materials, 12(17), Dec. 202...

work page doi:10.1002/aenm.2021026 2030

[9] [10]

URLhttp://dx.doi.org/10.1002/aenm.202102638

work page doi:10.1002/aenm.202102638

[10] [11]

Integrated computational materials engineering: a new paradigm for the global materials profession.Jom, 58:25–27, 2006

John Allison, Dan Backman, and Leo Christodoulou. Integrated computational materials engineering: a new paradigm for the global materials profession.Jom, 58:25–27, 2006. doi: 10.1007/s11837-006-022 3-5

work page doi:10.1007/s11837-006-022 2006

[11] [12]

Integrated computational materials engineering of corrosion resistant alloys.npj Materials Degradation, 2(1):6, 2018

Christopher D Taylor, Pin Lu, James Saal, GS Frankel, and JR Scully. Integrated computational materials engineering of corrosion resistant alloys.npj Materials Degradation, 2(1):6, 2018. doi: 10.1038/s41529 -018-0027-4

work page doi:10.1038/s41529 2018

[12] [13]

Castelli, Stewart J

Kurt Lejaeghere, Gustav Bihlmayer, Torbjörn Björkman, Peter Blaha, Stefan Blügel, V olker Blum, Damien Caliste, Ivano E. Castelli, Stewart J. Clark, Andrea Dal Corso, Stefano de Gironcoli, Thierry Deutsch, John Kay Dewhurst, Igor Di Marco, Claudia Draxl, Marcin Dułak, Olle Eriksson, José A. Flores- Livas, Kevin F. Garrity, Luigi Genovese, Paolo Giannozzi,...

work page doi:10.1126/science.aad3000 2016

[13] [14]

Sebastiaan P. Huber, Emanuele Bosoni, Marnik Bercx, Jens Bröder, Augustin Degomme, Vladimir Dikan, Kristjan Eimre, Espen Flage-Larsen, Alberto Garcia, Luigi Genovese, Dominik Gresch, Conrad Johnston, Guido Petretto, Samuel Poncé, Gian-Marco Rignanese, Christopher J. Sewell, Berend Smit, Vasily Tse- plyaev, Martin Uhrin, Daniel Wortmann, Aliaksandr V . Yak...

work page doi:10.1038/s41524-021-00594-6 2021

[14] [15]

Automated workflow for analyzing thermody- namic stability in polymorphic perovskite alloys.npj Computational Materials, 10(1), July 2024

Luis Octavio de Araujo, Celso Ricardo Caldeira Rego, Wolfgang Wenzel, Maurício Jeomar Piotrowski, Alexandre Cavalheiro Dias, and Diego Guedes-Sobrinho. Automated workflow for analyzing thermody- namic stability in polymorphic perovskite alloys.npj Computational Materials, 10(1), July 2024. ISSN 2057-3960. doi: 10.1038/s41524-024-01320-8. URLhttp://dx.doi....

work page doi:10.1038/s41524-024-01320-8 2024

[15] [16]

Towards high-throughput many-body perturbation theory: efficient algorithms and automated workflows.npj Computational Materials, 9(1), May 2023

Miki Bonacci, Junfeng Qiao, Nicola Spallanzani, Antimo Marrazzo, Giovanni Pizzi, Elisa Molinari, Daniele Varsano, Andrea Ferretti, and Deborah Prezzi. Towards high-throughput many-body perturbation theory: efficient algorithms and automated workflows.npj Computational Materials, 9(1), May 2023. ISSN 2057-3960. doi: 10.1038/s41524-023-01027-2. URLhttp://dx...

work page doi:10.1038/s41524-023-01027-2 2023

[16] [17]

An active learning approach to model solid-electrolyte interphase formation in li-ion batteries.Journal of Materials Chemistry A, 12(4):2249–2266, 2024

Mohammad Soleymanibrojeni, Celso Ricardo Caldeira Rego, Meysam Esmaeilpour, and Wolfgang Wen- zel. An active learning approach to model solid-electrolyte interphase formation in li-ion batteries.Journal of Materials Chemistry A, 12(4):2249–2266, 2024. ISSN 2050-7496. doi: 10.1039/d3ta06054c. URL http://dx.doi.org/10.1039/D3TA06054C

work page doi:10.1039/d3ta06054c 2024

[17] [18]

Bridges and mechanisms: integrating systems science thinking into implementation research.Annual Review of Public Health, 45, 2024

Douglas A Luke, Byron J Powell, and Alejandra Paniagua-Avila. Bridges and mechanisms: integrating systems science thinking into implementation research.Annual Review of Public Health, 45, 2024. doi: 10.1146/annurev-publhealth-060922-040205

work page doi:10.1146/annurev-publhealth-060922-040205 2024

[18] [19]

Artificial intelligence, scientific discovery, and product innovation.arXiv preprint arXiv:2412.17866, 2024

Aidan Toner-Rodgers. Artificial intelligence, scientific discovery, and product innovation.arXiv preprint arXiv:2412.17866, 2024. doi: 10.48550/arXiv.2412.17866

work page doi:10.48550/arxiv.2412.17866 2024

[19] [20]

14 exam- ples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon.Digital discovery, 2(5):1233–1250, 2023

Kevin Maik Jablonka, Qianxiang Ai, Alexander Al-Feghali, Shruti Badhwar, Joshua D Bocarsly, An- dres M Bran, Stefan Bringuier, L Catherine Brinson, Kamal Choudhary, Defne Circi, et al. 14 exam- ples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon.Digital discovery, 2(5):1233–1250, 2023. doi: 10.1...

work page doi:10.1039/d3dd00113j 2023

[20] [21]

Knowledge graph embedding: A survey of ap- proaches and applications.IEEE transactions on knowledge and data engineering, 29(12):2724–2743,

Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. Knowledge graph embedding: A survey of ap- proaches and applications.IEEE transactions on knowledge and data engineering, 29(12):2724–2743,

work page

[21] [22]

doi: 10.1109/TKDE.2017.2754499

work page doi:10.1109/tkde.2017.2754499 2017

[22] [23]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V . Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirz...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [24]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [25]

Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haotian Yao, Haotian Zhao, Haoyu Lu, Haoze Li, Haoz...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [26]

Quantum espresso: a modular and open-source software project for quantum simulations of materials.Journal of Physics: Condensed Matter, 21(39):395502, Sept

Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavazzoni, Davide Ceresoli, Guido L Chiarotti, Matteo Cococcioni, Ismaila Dabo, Andrea Dal Corso, Stefano de Gironcoli, Stefano Fabris, Guido Fratesi, Ralph Gebauer, Uwe Gerstmann, Christos Gougoussis, Anton Kokalj, Michele Lazzeri, Layla Martin-Samos, Nicola Marzari, Fran...

work page doi:10.1088/0953-8984/21/39/3955 2009

[26] [27]

URLhttp://dx.doi.org/10.1088/0953-8984/21/39/395502

work page doi:10.1088/0953-8984/21/39/395502

[27] [28]

A systematic study on the potentials and limitations of llm-assisted software development

Chiara Michelutti, Jens Eckert, Milko Monecke, Julian Klein, and Sabine Glesner. A systematic study on the potentials and limitations of llm-assisted software development. In2024 2nd International Conference 19 on F oundation and Large Language Models (FLLM), pages 330–338. IEEE, 2024. doi: 10.1109/FLLM 63129.2024.10852455

work page doi:10.1109/fllm 2024

[28] [29]

Detecting llm hallucinations using monte carlo simulations on token probabilities.Authorea Preprints, 2024

Grant Ledger and Rafael Mancinni. Detecting llm hallucinations using monte carlo simulations on token probabilities.Authorea Preprints, 2024. doi: 10.36227/techrxiv.171822396.61518693/v1

work page doi:10.36227/techrxiv.171822396.61518693/v1 2024

[29] [30]

The fundamental principles of reproducibility.Philosophical Transactions of the Royal Society A, 379(2197):20200210, 2021

Odd Erik Gundersen. The fundamental principles of reproducibility.Philosophical Transactions of the Royal Society A, 379(2197):20200210, 2021. doi: 10.1098/rsta.2020.0210

work page doi:10.1098/rsta.2020.0210 2021

[30] [31]

Open science, open access, and the democratization of knowledge.Issues in science and technology, 35(3):26–28, 2019

J Britt Holbrook. Open science, open access, and the democratization of knowledge.Issues in science and technology, 35(3):26–28, 2019

work page 2019

[31] [32]

Accessed: February, 2025

Claude 3.5 sonnet.https://www.anthropic.com/news/claude-3-5-sonnet, 2024. Accessed: February, 2025

work page 2024

[32] [33]

Is cosine-similarity of embeddings really about similarity? 2024

Harald Steck, Chaitanya Ekanadham, and Nathan Kallus. Is cosine-similarity of embeddings really about similarity? 2024. doi: 10.48550/ARXIV.2403.05440. URLhttps://arxiv.org/abs/2403.05440

work page doi:10.48550/arxiv.2403.05440 2024

[33] [34]

Mixtral-8x22b instruct.https://mistral.ai/news/mixtral-8x22b, 2024

Mistral AI. Mixtral-8x22b instruct.https://mistral.ai/news/mixtral-8x22b, 2024. Accessed: February, 2025

work page 2024

[34] [35]

Databricks

Inc. Databricks. dbrx.https://www.databricks.com/blog/introducing-dbrx-new-state-art -open-llm, 2024. Accessed: February, 2025

work page 2024

[35] [36]

Meta llama 3.1.https://ai.meta.com/blog/meta-llama-3-1/, 2024

Meta AI. Meta llama 3.1.https://ai.meta.com/blog/meta-llama-3-1/, 2024. Accessed: February, 2025

work page 2024

[36] [37]

Gemini 2.0 flash.https://blog.google/technology/google-deepmind/google-g emini-ai-update-december-2024/, 2024

Google AI. Gemini 2.0 flash.https://blog.google/technology/google-deepmind/google-g emini-ai-update-december-2024/, 2024. Accessed: February, 2025

work page 2024

[37] [38]

Materials cloud three-dimensional crystals database (mc3d).Materials Cloud Archive 2022.38, 2022

S Huber, M Bercx, N Hörmann, M Uhrin, G Pizzi, and N Marzari. Materials cloud three-dimensional crystals database (mc3d).Materials Cloud Archive 2022.38, 2022. doi: 10.24435/materialscloud:rw-t0

work page doi:10.24435/materialscloud:rw-t0 2022

[38] [39]

Expansion of the materials cloud 2d database.ACS nano, 17(12):11268–11278, 2023

Davide Campi, Nicolas Mounet, Marco Gibertini, Giovanni Pizzi, and Nicola Marzari. Expansion of the materials cloud 2d database.ACS nano, 17(12):11268–11278, 2023. doi: 10.1021/acsnano.2c11510

work page doi:10.1021/acsnano.2c11510 2023

[39] [40]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[40] [41]

Mohammad Soleymanibrojeni and Celso Ricardo Caldeira Rego. agentic-workflow-framework: AI- driven agentic framework for autonomous simulation protocol generation and execution.https: //github.com/KIT-Workflows/agentic-workflow-framework, 2025. GitHub repository

work page 2025

[41] [42]

Quantum ESPRESSO Foundationhttps://www.quantum- espresso.org/Doc/pw_user_guide/, 2023

Quantum ESPRESSO Group.User’s Guide for Quantum ESPRESSO (pw.x). Quantum ESPRESSO Foundationhttps://www.quantum- espresso.org/Doc/pw_user_guide/, 2023. Accessed: February, 2025

work page 2023

[42] [43]

The self-organizing map.Proceedings of the IEEE, 78(9):1464–1480, 1990

Teuvo Kohonen. The self-organizing map.Proceedings of the IEEE, 78(9):1464–1480, 1990. doi: 10.1109/5.58325. 20

work page doi:10.1109/5.58325 1990