arxiv: 2605.03205 · v1 · submitted 2026-05-04 · ❄️ cond-mat.mtrl-sci · cs.AI

Recognition: unknown

From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Aritra Roy , Kevin Shen , Andrew MacBride , Awwal Oladipupo , Mudassra Taskeen , Wojtek Treyde , Ruaa A. E. A. Abakar , Ahmad D. Abbas

show 345 more authors

Elsayed Abdelfatah Abbas A. Abdullahi Seham S. Abyah Chahd Rahyl Adjmi Fariha Agbere Savyasanchi Aggarwal Muhammad Ahmed Tasnim Ahmed Motasem Ajlouni Mattias Akke Hussein AlAdwan Anwaar S. Alazani Zahra A. Alharbi Wajd A. Aljulyhi Mohammed A. AlKubaish Fatima A. Almahri Sayed A. Almohri David Obeh Alobo Mohammed Alouni Azizah S. Alqahtani Omar Alsaigh Husain Althagafi Md. Aqib Aman Lena Ara Arifin Ignacio Arretche Abdulaziz Ashy Syeda A. Asim Amro Aswad Adeel Atta S\"oren Auer Abdullah al Azmi Toheeb Balogun Suvo Banik Viktoriia Baibakova Shakira A. Baksh Neus G. Bast\'us Christina J. Bayard Adib Bazgir Louis Beal Lejla Biberi\'c Wahid Billah Ankita Biswas Joshua Bocarsly Montassar T. Bouzidi Esma B. Boydas Youssef Briki Cailin Buchanan Mauricio Cafiero Damien Caliste Yi Cao Rafael E. Casta\~neda Sruthy K. Chandy Benjamin Charmes Shayantan Chaudhuri Yiming Chen Alexander Chen Jieneng Chen Min-Hsueh Chiu Defne Circi Cinthya H. Contreras Yoann Cure Nathan Daelman Roshini Dantuluri Thomas Davy William Dawson Leonid Didukh Rui Ding Aminu R. Doguwa Claudia Draxl Sathya Edamadaka Oulaya Elargab Christina Ertural Matthew L. Evans Edvin Fako Hossam Farag Nur A. Fathurrahman Merve Fedai Rodrigo P. Ferreira Giuseppe Fisicaro Thomas Frank Sasi K. Gaddipati Abhijeet Gangan Jennifer Garland James Garrick Luigi Genovese Maryam Ghadrdran Sandip Giri Maxime Goulet Jeremy Goumaz Sara U. Gracia Jacob Graham Gabriel Graves Kevin P. Greenman Tim Greitemeier Cameron Gruich Sophie Gu Salom\'e Guilbert Hans Gundlach Muriel F. Gusta Mourad El Haddaoui Alexander J. Haibel Anubhab Haldar Vehaan Handa Hassan Harb Nathan D. Harms Abdullah Al Hasan Abir Hassan Qiyao He Andr\'es Henao-Aristiz\'abal Bram Hoex Sungil Hong Alexander J. Horvath Md. Shaib Hossain Yanqi Huang Yuqing Huang Kostiantyn Hubaiev Donald Intal Katherine Inzani Kevin Ishimwe Tugba Isik Gopal R. Iyer Katharina Jager Jan Janssen Hyewon Jeong Michael Jirasek Tyler R. Josephson Nisarg Joshi Yassir Ben Kacem Remya A. M. Kalapurakal Rakesh R. Kamath Sugan Kanagasenthinathan Dohun Kang Jason Kantorow K\"ubra Kaygisiz Murat Keceli Farhana Keya Muhammad U. Khan Sartaaj Takrim Khan Hyungjun Kim Alexander Kister Sascha Klawohn Collin Kovacs Pranav Krishnan Maurycy Kryzanowski Ritesh Kumar Suman Kumari Gourav Kumbhojkar Ryo Kuroki Shashank Kushwaha Magdalena Lederbauer Jaejun Lee Seunghan Lee Jeonghwan Lee Bingcan Li Calvin Li Zhanzhao Li Shi Li Shicheng Li Chengyan Liu Hao Liu Tung Yan Liu Yutong Liu Lucia Vina-Lopez Chayaphol Lortaraparsert Andre K.Y. Low Saffron Luxford Carlos Madariaga Rishikesh Magar Piyush R. Maharana Rahul Mallela Shoaib Mahmud Natesan Mani Umair Mansoor Omar B. Mansour Cassandra Masschelein Kinga O. Mastej Ankit Mathanker Jeffrey Meng Omran Mezghani Yidong Ming Rishav Mitra Michail Mitsakis Matthew Miyagishima Ravikumar Mohan Naveen R. Mohanraj Trupti Mohanty Bernadette Mohr Francisco A. Molina-Bakhos Jeremy Monat Seyed Mohamad Moosavi Shayan Mousavi Arman Moussavi Rubel Mozumber Muhammad J. Mufti Diyana Muhammed Ram Munde Mrigi Munjal Jos\'e A. M\'arquez Shankha Nag Giacomo Nagaro Juno Nam Jose M. Napoles-Duarte Ry Nduma Xuan-Vu Nguyen Ebrahim Norouzi Oluwatosin Ohiro Ryotaro Okabe Viejay Ordillo Shuichiro Ozawa Sebastian Pagel Daniel Palmer Angela Pan Akash Pandey Vivek Pandit Prakul Pandit Chiku Parida Jaehee Park Hyunsoo Park Hemangi Patel Shakul Pathak Taradutt Pattnaik Elena Patyukova Noah Paulson Deepak S. Pendyala Erick S. Pepek Martin H. Petersen Thang D. Pham Aniket Phutane Sabila K. Pinky \'Etienne Polack Alison Polasik Maria Politi Tim Pongratz Akhila Ponugoti Fabio Priante Thomas Michael Pruyn Sai S. Puppala Mohammad A. Qazi Heike Quosdorf Gollam Rabby Mohammad J. Raei Md. Habibur Rahman A.B.M. Ashikur Rahman Subhashree Rajasekaran Tawfiqur Rakib Hemanth N. Ramesh Vrushali Ranadive Karnamohit Ranka Bojana Rankovic Adwaith Ravichandran Ilija Ra\v{s}ovi\'c Sergei Rigin Tatem Rios Varun Rishi Victor Naden Robinson Lucas S. Rodrigues Oswaldo Rodriguez Mahule Roy Diptendu Roy Subhas Roy Arokia Anto Royan M Joseph F. Rudzinski Muhammad Sabih Subramanyam Sahoo Srusti Bheem Sain Thahira Saliya Vignesh Sampath Jesus Diaz Sanchez Arthur S. S. Santos Muliady Satria Hasan M. Sayeed J\"org Schaarschmidt Philippe Schwaller Nofit Segal Abhishec Senthilvel Sherjeel Shabih Devanshu Shah Faezeh Shahmoradi Samiha Sharlin Killian Sheriff Qiuyu Shi Abubakar D. Shuaibu Ayesha Siddiqua M.A. Shadab Siddiqui Darian Smalley Benjamin Smith Taylor D. Sparks Daniel T. Speckhard Elena Stojanovska Akshay Subramanian Jiwon Sun Yunkai Sun Abdul W. Syed Souvik Ta Izumi Takahara Kelly Tallau Guannan Tang Ans B. Tariq Sui X. Tay Nurlybek Temirbay Surya P. Tiwari Febin Tom Tajah Trapier Kasidet J. Trerayapiwat Samanvya Tripathi Hawra H. Tuhaifa Mustafa Unal Mohammad Uzair Vallabh Vasudevan Estefania Vazquez Victor Venturi Rahul Verma Ashwini Verma Alvaro Vazquez-Mayagoitia Nicholas Wagner Araki Wakiuchi Hao Wan Liaoyaqi Wang Wolfgang Wenzel Alexander Wieczorek Sze H. Wong Yue Wu Tong Xie Andrew Yi Ziqi Yin Jodie A. Yuwono Nahed A. Zaid Mohd Zaki Shehtab Zaman Maimuna U. Zarewa Mahtab Zehtab Baosen Zhang Wenyu Zhang Melody Zhang Yangfan Zhang Yuwen Zhang Runze Zhang Zongmin Zhang Huanhuan Zhao Yuanlong Bill Zheng Ramzi Zidani Xue Zong Ian Foster Ben Blaiszik

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:33 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI

keywords large language modelsmaterials sciencechemistrymulti-agent systemsretrieval-augmented generationscientific workflowshackathon outcomes

0 comments

The pith

LLM applications in materials science are shifting from standalone assistants to integrated multi-agent systems that organize knowledge and execute scientific actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews outcomes from a 2025 hackathon where teams built LLM tools for materials science and chemistry. It separates the projects into knowledge infrastructure tools that retrieve, synthesize, and validate information, and action systems that run computations or coordinate experiments. Analysis of the submissions identifies a move toward multi-agent setups that link retrieval with reasoning and validation steps. This pattern indicates LLMs can act as building blocks for full research pipelines instead of isolated helpers. Readers in the field would care because the work maps concrete examples of how these models might speed up discovery cycles in hard sciences.

Core claim

The submissions reveal a shift from single-purpose LLM tools toward integrated, multi-agent workflows that combine retrieval, reasoning, tool use, and domain-specific validation. Prominent themes include retrieval-augmented generation as grounding infrastructure, persistent structured knowledge representations, multimodal and multilingual scientific inputs, and early progress toward laboratory-integrated closed-loop systems. Together, these results suggest that LLMs are evolving from general-purpose assistants into composable infrastructure for scientific reasoning and action.

What carries the argument

The paper's taxonomy dividing projects into Knowledge Infrastructure systems (for structuring and validating scientific information) and Action Systems (for executing and automating work in computational or experimental environments); this split tracks the move to combined workflows.

If this is right

Retrieval-augmented generation becomes standard grounding for reliable scientific outputs.
Persistent structured knowledge bases enable better synthesis across papers and data.
Multimodal inputs allow LLMs to process images, spectra, and text together in one workflow.
Early closed-loop systems link computation directly to lab execution.
A shared taxonomy helps researchers design and compare future LLM-enabled scientific tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the pattern holds, research groups could build custom agents that move from literature review to experiment design without manual handoffs.
Similar hackathons in other domains like biology or physics might produce parallel taxonomies for cross-field comparison.
A testable next step is piloting one multi-agent system in an actual lab and tracking time saved from idea to result.

Load-bearing premise

The hackathon submissions represent the main trends and future directions for LLM use across materials science and chemistry.

What would settle it

A survey of LLM tools published in the field over the next 18 months that shows most remain single-purpose rather than integrated multi-agent systems would undermine the claimed transition.

Figures

Figures reproduced from arXiv: 2605.03205 by Abbas A. Abdullahi, Abdulaziz Ashy, Abdullah al Azmi, Abdullah Al Hasan, Abdul W. Syed, Abhijeet Gangan, Abhishec Senthilvel, Abir Hassan, A.B.M. Ashikur Rahman, Abubakar D. Shuaibu, Adeel Atta, Adib Bazgir, Adwaith Ravichandran, Ahmad D. Abbas, Akash Pandey, Akhila Ponugoti, Akshay Subramanian, Alexander Chen, Alexander J. Haibel, Alexander J. Horvath, Alexander Kister, Alexander Wieczorek, Alison Polasik, Alvaro Vazquez-Mayagoitia, Aminu R. Doguwa, Amro Aswad, Andre K.Y. Low, Andr\'es Henao-Aristiz\'abal, Andrew MacBride, Andrew Yi, Angela Pan, Aniket Phutane, Ankita Biswas, Ankit Mathanker, Ans B. Tariq, Anubhab Haldar, Anwaar S. Alazani, Araki Wakiuchi, Arifin, Aritra Roy, Arman Moussavi, Arokia Anto Royan M, Arthur S. S. Santos, Ashwini Verma, Awwal Oladipupo, Ayesha Siddiqua, Azizah S. Alqahtani, Baosen Zhang, Ben Blaiszik, Benjamin Charmes, Benjamin Smith, Bernadette Mohr, Bingcan Li, Bojana Rankovic, Bram Hoex, Cailin Buchanan, Calvin Li, Cameron Gruich, Carlos Madariaga, Cassandra Masschelein, Chahd Rahyl Adjmi, Chayaphol Lortaraparsert, Chengyan Liu, Chiku Parida, Christina Ertural, Christina J. Bayard, Cinthya H. Contreras, Claudia Draxl, Collin Kovacs, Damien Caliste, Daniel Palmer, Daniel T. Speckhard, Darian Smalley, David Obeh Alobo, Deepak S. Pendyala, Defne Circi, Devanshu Shah, Diptendu Roy, Diyana Muhammed, Dohun Kang, Donald Intal, Ebrahim Norouzi, Edvin Fako, Elena Patyukova, Elena Stojanovska, Elsayed Abdelfatah, Erick S. Pepek, Esma B. Boydas, Estefania Vazquez, \'Etienne Polack, Fabio Priante, Faezeh Shahmoradi, Farhana Keya, Fariha Agbere, Fatima A. Almahri, Febin Tom, Francisco A. Molina-Bakhos, Gabriel Graves, Giacomo Nagaro, Giuseppe Fisicaro, Gollam Rabby, Gopal R. Iyer, Gourav Kumbhojkar, Guannan Tang, Hans Gundlach, Hao Liu, Hao Wan, Hasan M. Sayeed, Hassan Harb, Hawra H. Tuhaifa, Heike Quosdorf, Hemangi Patel, Hemanth N. Ramesh, Hossam Farag, Huanhuan Zhao, Husain Althagafi, Hussein AlAdwan, Hyewon Jeong, Hyungjun Kim, Hyunsoo Park, Ian Foster, Ignacio Arretche, Ilija Ra\v{s}ovi\'c, Izumi Takahara, Jacob Graham, Jaehee Park, Jaejun Lee, James Garrick, Jan Janssen, Jason Kantorow, Jeffrey Meng, Jennifer Garland, Jeonghwan Lee, Jeremy Goumaz, Jeremy Monat, Jesus Diaz Sanchez, Jieneng Chen, Jiwon Sun, Jodie A. Yuwono, J\"org Schaarschmidt, Jos\'e A. M\'arquez, Jose M. Napoles-Duarte, Joseph F. Rudzinski, Joshua Bocarsly, Juno Nam, Karnamohit Ranka, Kasidet J. Trerayapiwat, Katharina Jager, Katherine Inzani, Kelly Tallau, Kevin Ishimwe, Kevin P. Greenman, Kevin Shen, Killian Sheriff, Kinga O. Mastej, Kostiantyn Hubaiev, K\"ubra Kaygisiz, Lejla Biberi\'c, Lena Ara, Leonid Didukh, Liaoyaqi Wang, Louis Beal, Lucas S. Rodrigues, Lucia Vina-Lopez, Luigi Genovese, Magdalena Lederbauer, Mahtab Zehtab, Mahule Roy, Maimuna U. Zarewa, Maria Politi, Martin H. Petersen, Maryam Ghadrdran, M.A. Shadab Siddiqui, Matthew L. Evans, Matthew Miyagishima, Mattias Akke, Mauricio Cafiero, Maurycy Kryzanowski, Maxime Goulet, Md. Aqib Aman, Md. Habibur Rahman, Md. Shaib Hossain, Melody Zhang, Merve Fedai, Michael Jirasek, Michail Mitsakis, Min-Hsueh Chiu, Mohammad A. Qazi, Mohammad J. Raei, Mohammad Uzair, Mohammed A. AlKubaish, Mohammed Alouni, Mohd Zaki, Montassar T. Bouzidi, Motasem Ajlouni, Mourad El Haddaoui, Mrigi Munjal, Mudassra Taskeen, Muhammad Ahmed, Muhammad J. Mufti, Muhammad Sabih, Muhammad U. Khan, Muliady Satria, Murat Keceli, Muriel F. Gusta, Mustafa Unal, Nahed A. Zaid, Natesan Mani, Nathan Daelman, Nathan D. Harms, Naveen R. Mohanraj, Neus G. Bast\'us, Nicholas Wagner, Nisarg Joshi, Noah Paulson, Nofit Segal, Nur A. Fathurrahman, Nurlybek Temirbay, Oluwatosin Ohiro, Omar Alsaigh, Omar B. Mansour, Omran Mezghani, Oswaldo Rodriguez, Oulaya Elargab, Philippe Schwaller, Piyush R. Maharana, Prakul Pandit, Pranav Krishnan, Qiuyu Shi, Qiyao He, Rafael E. Casta\~neda, Rahul Mallela, Rahul Verma, Rakesh R. Kamath, Ram Munde, Ramzi Zidani, Ravikumar Mohan, Remya A. M. Kalapurakal, Rishav Mitra, Rishikesh Magar, Ritesh Kumar, Rodrigo P. Ferreira, Roshini Dantuluri, Ruaa A. E. A. Abakar, Rubel Mozumber, Rui Ding, Runze Zhang, Ry Nduma, Ryo Kuroki, Ryotaro Okabe, Sabila K. Pinky, Saffron Luxford, Sai S. Puppala, Salom\'e Guilbert, Samanvya Tripathi, Samiha Sharlin, Sandip Giri, Sara U. Gracia, Sartaaj Takrim Khan, Sascha Klawohn, Sasi K. Gaddipati, Sathya Edamadaka, Savyasanchi Aggarwal, Sayed A. Almohri, Sebastian Pagel, Seham S. Abyah, Sergei Rigin, Seunghan Lee, Seyed Mohamad Moosavi, Shakira A. Baksh, Shakul Pathak, Shankha Nag, Shashank Kushwaha, Shayan Mousavi, Shayantan Chaudhuri, Shehtab Zaman, Sherjeel Shabih, Shicheng Li, Shi Li, Shoaib Mahmud, Shuichiro Ozawa, Sophie Gu, S\"oren Auer, Souvik Ta, Srusti Bheem Sain, Sruthy K. Chandy, Subhashree Rajasekaran, Subhas Roy, Subramanyam Sahoo, Sugan Kanagasenthinathan, Sui X. Tay, Suman Kumari, Sungil Hong, Surya P. Tiwari, Suvo Banik, Syeda A. Asim, Sze H. Wong, Tajah Trapier, Taradutt Pattnaik, Tasnim Ahmed, Tatem Rios, Tawfiqur Rakib, Taylor D. Sparks, Thahira Saliya, Thang D. Pham, Thomas Davy, Thomas Frank, Thomas Michael Pruyn, Tim Greitemeier, Tim Pongratz, Toheeb Balogun, Tong Xie, Trupti Mohanty, Tugba Isik, Tung Yan Liu, Tyler R. Josephson, Umair Mansoor, Vallabh Vasudevan, Varun Rishi, Vehaan Handa, Victor Naden Robinson, Victor Venturi, Viejay Ordillo, Vignesh Sampath, Viktoriia Baibakova, Vivek Pandit, Vrushali Ranadive, Wahid Billah, Wajd A. Aljulyhi, Wenyu Zhang, William Dawson, Wojtek Treyde, Wolfgang Wenzel, Xuan-Vu Nguyen, Xue Zong, Yangfan Zhang, Yanqi Huang, Yassir Ben Kacem, Yi Cao, Yidong Ming, Yiming Chen, Yoann Cure, Youssef Briki, Yuanlong Bill Zheng, Yue Wu, Yunkai Sun, Yuqing Huang, Yutong Liu, Yuwen Zhang, Zahra A. Alharbi, Zhanzhao Li, Ziqi Yin, Zongmin Zhang.

**Figure 1.** Figure 1: Venn diagram showing the classification of 88 submitted hackathon projects from the third LLM view at source ↗

**Figure 2.** Figure 2: The blue bars represent the percentage of molecules for which the given AI model is able to view at source ↗

**Figure 3.** Figure 3: Ligand in the MR binding site; pose found using UMADock. view at source ↗

**Figure 4.** Figure 4: A schematic representation of the complete workflow from database generation using Forja de view at source ↗

**Figure 5.** Figure 5: Schematic of the workflow used for the LLM-driven generation of new cuprate materials. view at source ↗

**Figure 6.** Figure 6: Crystal structures of two candidate cuprate superconductors generated by Chemeleon using the view at source ↗

**Figure 7.** Figure 7: Workflow of the NeuroSymbolic hypothesis generation pipeline. view at source ↗

**Figure 8.** Figure 8: Real-time system analytics dashboard for NSHE. view at source ↗

**Figure 9.** Figure 9: The ARIA framework for bidirectional materials reasoning. (a) Bidirectional materials discovery enables both forward property prediction from synthesis parameters and inverse design of synthesis protocols from target properties, with complete causal traceability through the knowledge graph. (b) Automated knowledge graph construction extracts causal relationships from scientific literature using LLM-powered… view at source ↗

**Figure 10.** Figure 10: The LARA-HPC architecture. Workflows are orchestrated by a set of autonomous agents and incorporate a human-in-the-loop (HITL) component to ensure scientific validity and computational safety. A Generator Agent receives a natural language scientific query (e.g., “Calculate the atomization energy of HCN”) and, through the Ontoflow RAG pipeline, retrieves domain knowledge from software documentation such as… view at source ↗

**Figure 11.** Figure 11: Overview of MixSense. The system ingests 1H-NMR spectra and routes them to task-specific modules for product hypothesis generation, spectral deconvolution, time-series quantification, and SMILESbased chemosensory property prediction. Demo and Interface. The MixSense team demonstrated all three capabilities through an interactive Gradio interface [47]: mixture identification and quantification, time-serie… view at source ↗

**Figure 12.** Figure 12: The AgentLearn workflow for LLM-driven active learning and iterative dataset expansion. view at source ↗

**Figure 13.** Figure 13: (A) Overall workflow of The Fullerene Factory multi-agent framework. (B) Examples of MLIPoptimized functionalized fullerene structures generated by the system. Inspired by Thought Anchors, [56] the Fordham Reasoning Team began exploring reasoning-trace knowledge extraction and interpretability in LLMs for chemistry problems, which is a new direction toward understanding their inner workings. This study… view at source ↗

**Figure 14.** Figure 14: (A) Closeness centrality circular graph (Left) and PageRank centrality analysis (Right) of Ether0- view at source ↗

**Figure 15.** Figure 15: High-level conceptual architecture of the ChemBot system, showing the sequential stages from view at source ↗

**Figure 16.** Figure 16: Graphs displaying the accuracy of the ChemBot LLM. view at source ↗

**Figure 17.** Figure 17: OntoKG pipeline: seed knowledge graph (Neo4j) view at source ↗

**Figure 18.** Figure 18: Overview of the MINT LLM web application, showing its five core modules: Simulation Assess view at source ↗

**Figure 19.** Figure 19: Node embedding performed on an ontology view at source ↗

**Figure 20.** Figure 20: Influence of context enhancement on ontology matching. view at source ↗

**Figure 21.** Figure 21: (A) Schematic illustration of AtomBridge workflow. (B) AtomBridge allows users to select the region of interest (ROI) in electron microscopy (EM) images (top) and detects lattice vectors from selected EM images (bottom). (C) AtomBridge is tested for materials with multi-dimensions. (D) AtomBridge extracts structural information in CIF form from journals [72, 74] and is equipped with an internal structure … view at source ↗

**Figure 22.** Figure 22: a) GAINS workflow. b) Two-dimensional UMAP projection of molecular fingerprints showing view at source ↗

**Figure 23.** Figure 23: ChemGraph-IR workflow: from a natural-language request to optimized structure, vibrational view at source ↗

**Figure 24.** Figure 24: CLUE workflow. (A) Counterfactual generation: property prediction for the sample and iden view at source ↗

**Figure 25.** Figure 25: NEDD user interface for data-driven experiment planning. A) Data upload and tabular view view at source ↗

**Figure 26.** Figure 26: Overview of the relaxation training pipeline. The input structure from the LeMatTraj dataset view at source ↗

**Figure 27.** Figure 27: Workflow of the MaterialSim AI Agent. Future Work Future development will focus on evolving the MaterialSim AI Agent into a fully autonomous computational framework. The long-term goal is to enable researchers to initiate simulations, monitor execution, analyze results, and generate predictive insights entirely through natural-language interaction. By allowing complex 43 view at source ↗

**Figure 28.** Figure 28: Performance comparison of SCARA against a general-purpose LLM and supervised ML ap view at source ↗

**Figure 29.** Figure 29: SCARA (Steel Corrosion Agent for Risk Assessment) workflow. view at source ↗

**Figure 30.** Figure 30: User interface of ChromatographyMiner, illustrating the workflow for uploading and analyzing two-dimensional gas chromatography–mass spectrometry (GC×GC–MS) data in NetCDF format. The platform supports drag-and-drop input of .CDF files, chromatogram visualization, automatic mass spectrum extraction, and compound identification using spectral libraries such as MassBank and MassBank of North America. Resul… view at source ↗

**Figure 31.** Figure 31: Workflow cycle of guillemot. The AI agent interacts with users, crystal structure databases, and TOPAS to automate a human-like Rietveld refinement process. Future Work This study focused on a specific analysis task: Rietveld refinement using TOPAS Academic, a widely used proprietary software for advanced PXRD analysis. Future work will extend this approach to additional refinement tools, including open-s… view at source ↗

**Figure 32.** Figure 32: Workflow of XAScribe. Results XAScribe is an AI-assisted platform developed to automate the analysis and interpretation of Ni K-edge Xray Absorption Spectroscopy data ( view at source ↗

**Figure 33.** Figure 33: Results of the random forest models: (a) test predictions for Ni–O bond length, (b) test predictions view at source ↗

**Figure 34.** Figure 34: Overview of the BAKER framework. The system comprises a Builder module that automatically designs, implements, and reviews specialized research assistants, and an Assistant module that interacts with the user and manages execution. Each assistant is initialized with a predefined Data Manager node that oversees shared databases and documentation, while the Builder autonomously spawns all additional special… view at source ↗

**Figure 35.** Figure 35: Overview of the PolyPredictor workflow. To construct a chemical description, a LangChain agent powered by a commercial Gemini 2.5 Pro LLM is employed. The agent is guided by detailed system prompts and few-shot examples to produce structured, natural-language descriptions of the polymer repeat unit. These descriptions are converted into vector embeddings using OpenAI’s text-embedding-3-large model, with m… view at source ↗

**Figure 36.** Figure 36: Closed-loop optimization enabled through MCP–IvoryOS integration. An LLM issues natural view at source ↗

**Figure 37.** Figure 37: Interoperability architecture illustrating how an LLM communicates through the IvoryOS MCP view at source ↗

**Figure 38.** Figure 38: FADE workflow for natural language-driven drug candidate discovery. User queries describing targets of interest are processed through three sequential stages: (1) hierarchical database search for structural data or sequences; (2) binding site identification; and (3) computational generation and ranking of drug-like molecules using QSAR and binding affinity metrics to identify hit compounds. Future Work F… view at source ↗

**Figure 39.** Figure 39: MatFOMGen workflow. MatFOMGen was implemented using the Anthropic API and Streamlit. Future Work A primary limitation of the current MatFOMGen pipeline is the lack of formal validation for LLM-generated ASE functions. Future work will explore additional LLM-based reflection and refinement stages to improve code reliability. Another promising extension is the use of fine-tuned LLMs to achieve higher-accura… view at source ↗

**Figure 40.** Figure 40: Schematic workflow for DFTPilot. The user specifies the target property and material system. view at source ↗

**Figure 41.** Figure 41: Schematic depiction of the dual-mode design in Parse Patrol. Lower branch: Discovery Mode provides a single MCP interface to multiple parser and database servers, enabling agents to iteratively design parsers that conform to user-defined specifications. Upper branch: Direct Import Mode exposes the same tools as Python modules for frictionless integration into production code. Both branches are unified und… view at source ↗

**Figure 42.** Figure 42: System architecture of Catalyst Assistant. view at source ↗

**Figure 43.** Figure 43: Data extraction methods and model process flow in ThinFilm.ai. view at source ↗

**Figure 44.** Figure 44: SCALE workflow diagram. Future Work Future extensions of SCALE will focus on enhancing both chemical accuracy and autonomy by integrating higher-fidelity physics and adaptive learning. Incorporating semiempirical or DFT-level calculations (e.g., GFN2-xTB, ωB97X-D) into the surrogate model would improve the reliability of property predictions beyond empirical descriptors, while active learning loops could… view at source ↗

**Figure 45.** Figure 45: Architecture and workflow of L.A.R.A. The fine-tuned model determines whether to respond using view at source ↗

**Figure 46.** Figure 46: Workflow of ODE Forge showing (left) the two-phase agentic pipeline for research and model con view at source ↗

**Figure 47.** Figure 47: User interface to perform Materials Project queries. view at source ↗

**Figure 48.** Figure 48: Benchmark results for various model choices. Tool-augmented methods are generally more view at source ↗

**Figure 49.** Figure 49: Predictions for four materials, comparing different prompting methods. A table of summary view at source ↗

**Figure 50.** Figure 50: Model error distributions for single-property and multi-property prediction. view at source ↗

**Figure 51.** Figure 51: CaMEL-RAG framework for catalysis prediction. Dataset [181], which contains structured records describing the slab, surface site, adsorbate, and corresponding adsorption energy. Since the dataset lacks intrinsic hierarchy, a flat vector representation was employed instead of CHORUS’s multi-level memory. Each structured record was converted into a natural-language description retaining complete system inf… view at source ↗

**Figure 52.** Figure 52: Performance comparison of baseline LLMs and CaMEL-RAG-enhanced models for adsorption view at source ↗

**Figure 53.** Figure 53: The SuperconLLM fully automated workflow, from arXiv papers to JSON records. view at source ↗

**Figure 54.** Figure 54: End-to-end architecture of Catalyze, illustrating agent orchestration from user query to validated view at source ↗

**Figure 55.** Figure 55: CAMEL workflow. Open-access papers are collected via OpenAlex and Unpaywall, then parsed view at source ↗

**Figure 56.** Figure 56: ZeroMAT framework architecture. Experimental evaluation using bandgap data from the Materials Project [120] demonstrates that ZeroMAT delivers substantial improvements in both accuracy and efficiency ( view at source ↗

**Figure 57.** Figure 57: Workflow of MuMMIE model pipeline. Results In the multilingual patent corpus spanning Chinese, Russian, French, Japanese, Korean, and English, the team observed that while chemical compound names often remain consistent across languages, the associated property labels vary widely. This inconsistency makes it difficult to build unified, machine-readable datasets. The primary objective of MuMMIE is to lever… view at source ↗

**Figure 58.** Figure 58: Overview of the automated electrolyte discovery system via offline reinforcement learning. view at source ↗

**Figure 59.** Figure 59: Bayesian probability heatmap view at source ↗

**Figure 61.** Figure 61: Vector database construction and retrieval-augmented generation (RAG) workflow for Sol-Agent. view at source ↗

**Figure 62.** Figure 62: Workflow of the AutoFeaSci multi-agent featurization system. Literature, metadata, and tabular view at source ↗

**Figure 63.** Figure 63: MAGE workflow. The agent interacts with the user and invokes the appropriate function based on view at source ↗

**Figure 64.** Figure 64: Architecture diagram of BASIS. 91 view at source ↗

**Figure 65.** Figure 65: Overview of the DFT workflow performed by view at source ↗

**Figure 66.** Figure 66: Titanarium working prototype showing multi-agent scientist-persona debate. view at source ↗

**Figure 67.** Figure 67: Evaluation of large language models (LLMs) for concrete property prediction. (a) Three evaluation view at source ↗

**Figure 68.** Figure 68: Complete nanoparticle analysis and LLM-driven insight generation workflow. The pipeline in view at source ↗

**Figure 69.** Figure 69: Overview of the DynaAgent architecture. The PrepAgent constructs a context-aware simulation plan, the MDAgent executes the plan with error-corrective reasoning, and the Analyser interprets the resulting trajectories. Available tools are shown in the action space. reflecting how effectively the agent minimized unnecessary iterations. Accuracy was defined as the ratio of successfully completed tasks to the … view at source ↗

**Figure 70.** Figure 70: Comparison of efficiency and accuracy across different LLM backends. view at source ↗

**Figure 71.** Figure 71: Workflow of CrysTalk. Given an input structure file and a user prompt, the agent performs view at source ↗

**Figure 72.** Figure 72: SpectroBot workflow. A user uploads a CSV file; the FTIR or UV–Vis analyzer generates view at source ↗

**Figure 73.** Figure 73: Workflow of the personalized agents in MindMesh, illustrating the generation of user-specific view at source ↗

**Figure 74.** Figure 74: SyntheSeek two-stage synthesis recipe generation workflow. view at source ↗

**Figure 75.** Figure 75: Overview of the V-RAPIDS workflow, illustrating UMA-based structure optimization followed view at source ↗

**Figure 76.** Figure 76: Representative V-RAPIDS output for the water–graphene system, including optimized geometries view at source ↗

**Figure 77.** Figure 77: NOMAD RAGBOT workflow. The system performs (1) offline indexing with context-aware view at source ↗

**Figure 78.** Figure 78: The conceptual diagram of Language Controlled Molecular Design and Analysis. view at source ↗

**Figure 79.** Figure 79: Conceptual overview of (a) the molecule–text description dataset, (b) text conditioning, and (c) view at source ↗

**Figure 80.** Figure 80: Conceptual overview of AdsKRK. Results In its original implementation, the LIAC-AdsKRK team employed the CodeAct [263] framework to enable a flexible trial-and-error workflow. Within CodeAct, the agent incrementally generates executable code that follows the instructions specified in the prompt, while the code-execution node returns the corresponding outputs. By iteratively repeating this generate–execute… view at source ↗

**Figure 81.** Figure 81: AssemblAI’s workflow diagram. Users provide text input to generate a peptide self-assembly view at source ↗

**Figure 82.** Figure 82: Transmission electron microscopy images of the peptide KFKFQF after self-assembly experiments. view at source ↗

**Figure 83.** Figure 83: Summarized agent outputs describing the self-assembly protocol of the peptide ‘KFKFQF‘ into view at source ↗

**Figure 84.** Figure 84: Benchmark performance of AssemblAI on a withheld test set (N=198). The plot shows the view at source ↗

**Figure 85.** Figure 85: MaterialMind system architecture combining retrieval, reasoning, and scoring components. view at source ↗

**Figure 86.** Figure 86: Workflow of ChemTutor AI and its future work perspectives. view at source ↗

**Figure 87.** Figure 87: Overview of the CrystaLenz agentic XRD analysis workflow, including data loading, preprocessing, view at source ↗

**Figure 88.** Figure 88: Overview of the closed-loop discovery platform (ACME). view at source ↗

**Figure 89.** Figure 89: Workflow of HEAQuery. MatSciBERT model [283], generating vector embeddings. These embeddings were stored in a FAISS index, enabling rapid semantic search across the literature. Simultaneously, the team curated and cleaned three public HEA datasets [284, 285, 286], covering mechanical properties, thermodynamic descriptors, and synthesis routes. The datasets were harmonized by standardizing column names, no… view at source ↗

**Figure 90.** Figure 90: Automated workflow of PackSynth. The user provides an input (SMILES/Name), which the agent uses to fetch data from databases like the Materials Project. The system then uses RDKit [288] to generate a 3D model, automatically prepares and runs the LAMMPS simulation, performs analysis (Energy, RMSD), and provides an interactive 3D visualization. The workflow begins with Input Processing and Database Integrat… view at source ↗

**Figure 91.** Figure 91: Workflow for data extraction and standardization. view at source ↗

**Figure 92.** Figure 92: Example of final standardized dataset produced by the automated LLM workflow. view at source ↗

**Figure 93.** Figure 93: (A) Overall workflow of the QSPHAgent framework for interpretable prediction of electronic view at source ↗

**Figure 94.** Figure 94: Overview of the GPT-OSS–based materials generation framework. Starting from a database view at source ↗

**Figure 95.** Figure 95: Table 1 — Method comparison (synthetic test set). view at source ↗

**Figure 96.** Figure 96: VERA workflow: from lab data upload to compliance validation and PDF report export. view at source ↗

**Figure 97.** Figure 97: The MaterEase Framework Architecture. The complete workflow from natural language query to materials discovery and visualization. knowledge bases and property schemas. Real-time Knowledge Updates: Implementing dynamic ontology learning to incorporate new research findings and maintain up-to-date knowledge bases continuously. Enhanced Reasoning: Integrating causal knowledge graphs to enable multi-step cau… view at source ↗

**Figure 98.** Figure 98: The end-to-end workflow of MatSciAgent for scientific code generation. view at source ↗

**Figure 99.** Figure 99: Comparison of efficiency and accuracy across different LLM systems against our proposed coding view at source ↗

**Figure 100.** Figure 100: Illustration of using MOF-ChemUnity knowledge graph as long-term memory for AI agents. view at source ↗

**Figure 101.** Figure 101: An example workflow for modeling diffusivity of an organic molecule in water. view at source ↗

**Figure 102.** Figure 102: An example workflow for agent request from MATLAB CLI prompt view at source ↗

**Figure 103.** Figure 103: A) Workflow of instrument action database agent. B) Example prompt and agent response. view at source ↗

**Figure 104.** Figure 104: Workflow of AIssistant with MC-NEST and ChemCrow tools. Results The AIssistant framework integrates specialized tools like MC-NEST [331] for hypothesis generation and ChemCrow [332] for interactive refinement, enabling iterative cycles of AI-suggested hypotheses and human validation. Quantitative evaluation metrics were utilized to assess the alignment of AI-assisted outcomes with human reasoning. The hi… view at source ↗

**Figure 105.** Figure 105: Overview of the SKY Workflow for Materials Synthesis Planning view at source ↗

**Figure 106.** Figure 106: Participants collaborating at various physical hub locations during the 2025 LLM Hackathon view at source ↗

**Figure 107.** Figure 107: Hybrid nature and the sponsors of the 2025 LLM Hackathon for Applications in Materials view at source ↗

read the original abstract

Large language models (LLMs) are rapidly changing how researchers in materials science and chemistry discover, organize, and act on scientific knowledge. This paper analyzes a broad set of community-developed LLM applications in an effort to identify emerging patterns in how these systems can be used across the scientific research lifecycle. We organize the projects into two complementary categories: Knowledge Infrastructure, systems that structure, retrieve, synthesize, and validate scientific information; and Action Systems, systems that execute, coordinate, or automate scientific work across computational and experimental environments. The submissions reveal a shift from single-purpose LLM tools toward integrated, multi-agent workflows that combine retrieval, reasoning, tool use, and domain-specific validation. Prominent themes include retrieval-augmented generation as grounding infrastructure, persistent structured knowledge representations, multimodal and multilingual scientific inputs, and early progress toward laboratory-integrated closed-loop systems. Together, these results suggest that LLMs are evolving from general-purpose assistants into composable infrastructure for scientific reasoning and action. This work provides a community snapshot of that transition and a practical taxonomy for understanding emerging LLM-enabled workflows in materials science and chemistry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a descriptive catalog of hackathon projects with no new methods or data, and the evolutionary claim rests on a self-selected sample.

read the letter

This paper is a post-event summary of projects from the 2025 LLM Hackathon in materials science and chemistry. It organizes the submissions into knowledge-focused tools and action-oriented systems but stops short of any deeper analysis or new contributions. The authors do a solid job of grouping the work and calling out recurring patterns. Things like retrieval-augmented generation for better grounding, structured knowledge bases that persist across sessions, and early attempts at closed-loop systems that link LLMs to lab equipment show up across multiple entries. The two-category split helps make sense of the range of efforts, and the list of themes gives a practical way to think about where these tools fit in the research process. For someone scanning the field, this provides a convenient overview of what participants chose to build. The main weakness is that the broader claim about LLMs shifting toward composable infrastructure rests on a narrow sample. Hackathon projects are built fast, often on top of existing libraries, and selected by people who want to demo something quick. The paper gives no counts of how many projects succeeded, no performance metrics, and no side-by-side look at work done outside the event. That makes it difficult to tell whether the observed trends reflect real progress in the field or just the kinds of things that fit a short development window. The stress-test note about self-selection and lack of comparison is on point here. Readers who would get value are those already involved in LLM applications for science or planning similar community events. It could serve as a reference for current directions without requiring much time to read. It does not have the quantitative backbone or novel claims that would make it suitable for standard peer review in a research journal. A serious editor should probably desk reject it rather than send it out for referee comments, unless the venue specifically wants community reports on tool-building activities.

Referee Report

1 major / 3 minor

Summary. The manuscript summarizes outcomes from the 2025 LLM Hackathon focused on materials science and chemistry. It partitions the submitted projects into two categories—Knowledge Infrastructure (systems for structuring, retrieving, synthesizing, and validating scientific information) and Action Systems (systems for executing, coordinating, or automating scientific tasks)—and extracts recurring themes including retrieval-augmented generation, persistent structured knowledge representations, multimodal/multilingual inputs, and early closed-loop laboratory integrations. The central interpretive claim is that these patterns indicate LLMs are transitioning from general-purpose assistants to composable infrastructure for scientific reasoning and action.

Significance. If the reported patterns accurately capture the hackathon submissions, the work supplies a practical taxonomy and community snapshot that could help researchers navigate emerging LLM workflows. The two-category organization is internally consistent with the described themes and provides a clear organizing lens. However, the manuscript contains no quantitative metrics, error bars, or comparative benchmarks against the wider literature, limiting its ability to support stronger claims about field-wide evolution.

major comments (1)

[Abstract] Abstract and concluding section: the statement that the submissions 'suggest that LLMs are evolving from general-purpose assistants into composable infrastructure' rests on self-selected, short-timeline hackathon prototypes. No comparison is provided to non-hackathon deployments or the broader literature on LLM use in materials science, so the inference to a general trajectory is not load-bearing on the data presented.

minor comments (3)

The manuscript would benefit from an explicit limitations subsection that quantifies the number of projects per category, notes the self-selection bias, and discusses how hackathon constraints (e.g., reliance on LangChain/AutoGen) may shape the observed themes.
Project descriptions should include direct links or DOIs to the original submissions or code repositories to enable reproducibility and follow-up by readers.
Figure captions and table headings could be expanded to clarify how individual projects map onto the two-category taxonomy.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review of our manuscript. We agree that the central interpretive claim requires more cautious framing and have revised the abstract and conclusion accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and concluding section: the statement that the submissions 'suggest that LLMs are evolving from general-purpose assistants into composable infrastructure' rests on self-selected, short-timeline hackathon prototypes. No comparison is provided to non-hackathon deployments or the broader literature on LLM use in materials science, so the inference to a general trajectory is not load-bearing on the data presented.

Authors: We agree that the claim as originally phrased overreaches the scope of the hackathon data. The manuscript is a community snapshot of submitted projects rather than a field-wide survey. In the revised version we have changed the abstract and conclusion to state that the observed patterns 'illustrate emerging trends in the hackathon submissions toward composable multi-agent systems,' explicitly noting the self-selected and prototype nature of the entries. We have also added citations to recent reviews on LLM applications in materials science and chemistry to situate the hackathon observations within the broader literature. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive summary of external hackathon submissions

full rationale

The paper is a post-hoc community report summarizing submitted hackathon projects into Knowledge Infrastructure and Action Systems categories. It contains no derivations, equations, predictions, fitted parameters, or mathematical claims. The central inference about LLMs evolving into composable infrastructure is drawn from observed patterns in external submissions rather than from any self-referential fitting or self-citation chain. No load-bearing steps reduce to inputs by construction, and the analysis is self-contained against external benchmarks with no ansatz smuggling or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The report rests on the domain assumption that hackathon submissions reflect genuine emerging patterns in LLM use; no free parameters, new entities, or additional axioms are introduced beyond standard descriptive analysis.

pith-pipeline@v0.9.0 · 7436 in / 999 out tokens · 34539 ms · 2026-05-08T17:33:39.992374+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

299 extracted references · 32 canonical work pages · 5 internal anchors

[1]

Enabling large language models for real-world materials discovery,

S. Miret and N. A. Krishnan, “Enabling large language models for real-world materials discovery,” Nature Machine Intelligence, vol. 7, no. 7, pp. 991–998, 2025

2025
[2]

An automatic end-to-end chemical synthesis development platform powered by large language models,

Y. Ruan, C. Lu, N. Xu, Y. He, Y. Chen, J. Zhang, J. Xuan, J. Pan, Q. Fang, H. Gao,et al., “An automatic end-to-end chemical synthesis development platform powered by large language models,” Nature communications, vol. 15, no. 1, p. 10160, 2024

2024
[3]

Comproscanner: a multi-agent based framework forcomposition-propertystructureddataextractionfromscientificliterature,

A. Roy, E. Grisan, J. Buckeridge, and C. Gattinoni, “Comproscanner: a multi-agent based framework forcomposition-propertystructureddataextractionfromscientificliterature,”Digital Discovery, vol.5, pp. 1794–1808, 2026

2026
[4]

Chemnlp: a natural language-processing-based library for materials chemistry text data,

K. Choudhary and M. L. Kelley, “Chemnlp: a natural language-processing-based library for materials chemistry text data,”The Journal of Physical Chemistry C, vol. 127, no. 35, pp. 17545–17555, 2023

2023
[5]

Language models enable data- augmented synthesis planning for inorganic materials,

T. Prein, E. Pan, J. Jehkul, S. Weinmann, E. Olivetti, and J. L. Rupp, “Language models enable data- augmented synthesis planning for inorganic materials,”ACS Applied Materials & Interfaces, vol. 17, no. 51, pp. 69221–69233, 2025

2025
[6]

Large language models for reticular chemistry,

Z. Zheng, N. Rampal, T. J. Inizan, C. Borgs, J. T. Chayes, and O. M. Yaghi, “Large language models for reticular chemistry,”Nature Reviews Materials, vol. 10, no. 5, pp. 369–381, 2025

2025
[7]

Towards foundation models for materials science: The open matsci ml toolkit,

K. L. K. Lee, C. Gonzales, M. Spellings, M. Galkin, S. Miret, and N. Kumar, “Towards foundation models for materials science: The open matsci ml toolkit,” inProceedings of the SC’23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 51–59, 2023

2023
[8]

Understanding hackathons for science: Collaboration, affor- dances, and outcomes,

E. P. P. Pe-Than and J. D. Herbsleb, “Understanding hackathons for science: Collaboration, affor- dances, and outcomes,” inInternational Conference on Information, pp. 27–37, Springer, 2019

2019
[9]

How to support newcomers in scientific hackathons-an action research study on expert mentoring,

A. Nolte, L. B. Hayden, and J. D. Herbsleb, “How to support newcomers in scientific hackathons-an action research study on expert mentoring,”Proceedings of the ACM on Human-Computer Interaction, vol. 4, no. CSCW1, pp. 1–23, 2020

2020
[10]

Hack your organizational innovation: literature review and integrative model for running hackathons,

B. Heller, A. Amir, R. Waxman, and Y. Maaravi, “Hack your organizational innovation: literature review and integrative model for running hackathons,”Journal of Innovation and Entrepreneurship, vol. 12, no. 1, p. 6, 2023

2023
[11]

Organizing across disciplines to tackle shared computational challenges,

W. Treyde, A. Kwiatkowski, J. Achterberg, D. Akarca, M. Buttenschoen, R. T. Byrne, K. Didi, K. Kordova, J. Lála, J. Langford,et al., “Organizing across disciplines to tackle shared computational challenges,”Patterns, vol. 7, no. 4, 2026

2026
[12]

14 examples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon,

K. M. Jablonka, Q. Ai, A. Al-Feghali, S. Badhwar, J. D. Bocarsly, A. M. Bran, S. Bringuier, L. C. Brinson, K. Choudhary, D. Circi,et al., “14 examples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon,”Digital discovery, vol. 2, no. 5, pp. 1233–1250, 2023. 142

2023
[13]

Reflections from the 2024 large language model (llm) hackathon for applications in materials science and chemistry,

Y. Zimmermann, A. Bazgir, Z. Afzal, F. Agbere, Q. Ai, N. Alampara, A. Al-Feghali, M. Ansari, D. An- typov, A. Aswad, J. Bai, V. Baibakova, D. D. Biswajeet, E. Bitzek, J. D. Bocarsly, A. Borisova, A. M. Bran, L. C. Brinson, M. M. Calderon, A. Canalicchio, V. Chen, Y. Chiang, D. Circi, B. Charmes, V. Chaudhary, Z. Chen, M.-H. Chiu, J. Clymo, K. Dabhadkar, N...

2024
[14]

Large language models for chemistry robotics,

N. Yoshikawa, M. Skreta, K. Darvish, S. Arellano-Rubach, Z. Ji, L. Bjørn Kristensen, A. Z. Li, Y. Zhao, H. Xu, A. Kuramshin,et al., “Large language models for chemistry robotics,”Autonomous Robots, vol. 47, no. 8, pp. 1057–1086, 2023

2023
[15]

Autonomous materials synthesis laboratories: Integrating artificial intel- ligence with advanced robotics for accelerated discovery,

L. Duo, Y. Hao, and J. He, “Autonomous materials synthesis laboratories: Integrating artificial intel- ligence with advanced robotics for accelerated discovery,”ChemRxiv preprint, 2025

2025
[16]

Agents for self-driving laboratories applied to quantum computing,

S. Cao, Z. Zhang, M. Alghadeer, S. D. Fasciati, M. Piscitelli, M. Bakr, P. Leek, and A. Aspuru-Guzik, “Agents for self-driving laboratories applied to quantum computing,”arXiv preprint arXiv:2412.07978, 2024

work page arXiv 2024
[17]

Benchmarks and metrics for evaluations of code generation: A critical review,

D. G. Paul, H. Zhu, and I. Bayley, “Benchmarks and metrics for evaluations of code generation: A critical review,” in2024 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 87–94, IEEE, 2024

2024
[18]

Are large language models superhuman chemists?arXiv preprint arXiv:2404.01475,

A. Mirza, N. Alampara, S. Kunchapu, M. Ríos-García, B. Emoekabu, A. Krishnan, T. Gupta, M. Schilling-Wilhelmi, M. Okereke, A. Aneesh,et al., “Are large language models superhuman chemists?,”arXiv preprint arXiv:2404.01475, 2024

work page arXiv 2024
[19]

Rational design of high-entropy ceramics based on machine learning – a critical review,

J. Zhang, X. Xiang, B. Xu, S. Huang, Y. Xiong, S. Ma, H. Fu, Y. Ma, H. Chen, Z. Wu, and S. Zhao, “Rational design of high-entropy ceramics based on machine learning – a critical review,”Current Opinion in Solid State and Materials Science, vol. 27, p. 101057, 4 2023

2023
[20]

Web of science

Clarivate Analytics, “Web of science.”https://www.webofscience.com, 2025. Accessed: 2025-11-05

2025
[21]

Mistral-large:123b-instruct-2407-q4_0

Mistral AI, “Mistral-large:123b-instruct-2407-q4_0.”https://mistral.ai/news/mistral-large/,
[22]

Large Language Model by Mistral AI
[23]

Gpt-oss:120b

Open Source Science (OSS), “Gpt-oss:120b.”https://huggingface.co/oss/gpt-oss-120b, 2024. Open large language model for scientific applications

2024
[24]

Mendeleev – a python resource for properties of chemical elements, ions and isotopes,

M. Szymański, R. V. Vlasov,et al., “Mendeleev – a python resource for properties of chemical elements, ions and isotopes,”Journal of Open Source Software, vol. 3, no. 32, p. 1113, 2018

2018
[25]

The nomad laboratory – fair data infrastructure for materials science

NOMAD Laboratory Consortium, “The nomad laboratory – fair data infrastructure for materials science.”https://nomad-lab.eu, 2023. FAIR data platform for materials science

2023
[26]

Factsage thermochemical software and databases

Thermfact/CRCT and GTT-Technologies, “Factsage thermochemical software and databases.”https: //www.factsage.com, 2022. Thermochemical calculation and database system. 143

2022
[27]

Synthesis and neutron powder diffraction study of the superconductor HgBa2Ca2Cu3O8 +δby Tl substitution,

P. Dai, B. C. Chakoumakos, G. F. Sun, K. W. Wong, Y. Xin, and D. F. Lu, “Synthesis and neutron powder diffraction study of the superconductor HgBa2Ca2Cu3O8 +δby Tl substitution,”Physica C: Superconductivity, vol. 243, pp. 201–206, Mar. 1995

1995
[28]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Google, “Gemini: A Family of Highly Capable Multimodal Models,”arXiv preprint arXiv:2312.11805v5, 2025

work page internal anchor Pith review arXiv 2025
[29]

Exploration of crystal chemical space using text-guided generative artificial intelligence,

H. Park, A. Onwuli, and A. Walsh, “Exploration of crystal chemical space using text-guided generative artificial intelligence,”Nature Communications, vol. 16, p. 4379, 2025

2025
[30]

The ai revolution in science,

S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing, S. Milojević, and et al., “The ai revolution in science,”Science, vol. 359, no. 6379, p. eaao0185, 2018

2018
[31]

Language models are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language models are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

1901
[32]

On the dangers of stochastic par- rots: Can language models be too big?,

E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic par- rots: Can language models be too big?,” inProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623, 2021

2021
[33]

Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning,

A. Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran, “Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning,”Journal of Artificial Intelligence Research, vol. 74, pp. 895–946, 2022

2022
[35]

The dawn after the dark: An empirical study on factuality hallucination in large language models,

J. Li, J. Chen, R. Ren, X. Cheng, W. X. Zhao, J.-Y. Nie, and J.-R. Wen, “The dawn after the dark: An empirical study on factuality hallucination in large language models,”arXiv preprint arXiv:2401.03205, 2024

work page arXiv 2024
[36]

Grounding llm reasoning with knowledge graphs,

A. Amayuelas, J. Sain, S. Kaur, and C. Smiley, “Grounding llm reasoning with knowledge graphs,” 2025

2025
[37]

Making retrieval-augmented language models robust to irrelevant context,

O. Yoran, T. Wolfson, O. Ram, and J. Berant, “Making retrieval-augmented language models robust to irrelevant context,” inThe Twelfth International Conference on Learning Representations, 2024

2024
[38]

Roadmap on electronic structure codes in the exascale era,

V. Gavini, S. Baroni, V. Blum, D. R. Bowler, A. Buccheri, J. R. Chelikowsky, S. Das, W. Dawson, P. Delugas, M. Dogan, C. Draxl, G. Galli, L. Genovese, P. Giannozzi, M. Giantomassi, X. Gonze, M. Govoni, F. Gygi, A. Gulans, J. M. Herbert, S. Kokott, T. D. Kühne, K.-H. Liou, T. Miyazaki, P. Motamarri, A. Nakata, J. E. Pask, C. Plessl, L. E. Ratcliff, R. M. R...

2023
[39]

Flexibilities of wavelets as a computational basis set for large-scale electronic structure calculations,

L. E. Ratcliff, W. Dawson, G. Fisicaro, D. Caliste, S. Mohr, A. Degomme, B. Videau, V. Cristiglio, M. Stella, M. D’Alessandro, S. Goedecker, T. Nakajima, T. Deutsch, and L. Genovese, “Flexibilities of wavelets as a computational basis set for large-scale electronic structure calculations,”The Journal of Chemical Physics, vol. 152, p. 194110, 05 2020

2020
[40]

BigDFT software package

BigDFT developers, “BigDFT software package.”https://l_sim.gitlab.io/bigdft-suite, 2018. A wavelet-based Density Functional Theory code. Accessed: October 2025

2018
[41]

Exploratory data science on supercomputers for quantum mechanical calculations,

W. Dawson, L. Beal, L. E. Ratcliff, M. Stella, T. Nakajima, and L. Genovese, “Exploratory data science on supercomputers for quantum mechanical calculations,”Electronic Structure, vol. 6, p. 027003, Jun 2024. 144

2024
[42]

remotemanager

remotemanager developers, “remotemanager.”https://gitlab.com/l_sim/remotemanager, 2023. Modular serialisation and management package for handling the running of functions on remote ma- chines. Accessed: October 2025

2023
[43]

A chemical language model for molecular taste prediction,

Y. Zimmermann, L. Sieben, H. Seng, P. Pestlin, and F. Görlich, “A chemical language model for molecular taste prediction,”Npj Sci. Food, vol. 9, p. 122, July 2025

2025
[44]

Magnetstein: An open-source tool for quantitative nmr mixture analysis robust to low resolution, distorted lineshapes, and peak shifts,

B. Domżał, E. K. Nawrocka, D. Gołowicz, M. A. Ciach, B. Miasojedow, K. Kazimierczuk, and A. Gam- bin, “Magnetstein: An open-source tool for quantitative nmr mixture analysis robust to low resolution, distorted lineshapes, and peak shifts,”Analytical Chemistry, vol. 96, no. 1, pp. 188–196, 2024

2024
[45]

Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry,

S. Kuhn, H. Kolshorn, C. Steinbeck, and N. Schlörer, “Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry,”Magnetic Resonance in Chemistry, vol. 62, no. 2, pp. 74–83, 2024

2024
[46]

Nmrextractor: Lever- aging large language models to construct an experimental nmr database from open-source scientific publications,

Q. Wang, W. Zhang, M. Chen, X. Li, Z. Xiong, J. Xiong, Z. Fu, and M. Zheng, “Nmrextractor: Lever- aging large language models to construct an experimental nmr database from open-source scientific publications,”Chemical Science, 2025

2025
[47]

Reactiont5: Apre-trainedtransformermodelforaccuratechemicalreaction prediction with limited data,

T.SagawaandR.Kojima, “Reactiont5: Apre-trainedtransformermodelforaccuratechemicalreaction prediction with limited data,”Journal of Cheminformatics, vol. 17, p. 126, 2025

2025
[48]

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

A. Abid, A. Abdalla, A. Abid, D. Khan, A. Alfozan, and J. Zou, “Gradio: Hassle-free sharing and testing of ML models in the wild,”arXiv preprint arXiv:1906.02569, June 2019

work page Pith review arXiv 1906
[49]

DeepSeek-V3 technical report,

DeepSeek-AI, A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, D. Dai, D. Guo, D. Yang, D. Chen, D. Ji, E. Li, F. Lin, F. Dai, F. Luo, G. Hao, G. Chen, G. Li, H. Zhang, H. Bao, H. Xu, H. Wang, H. Zhang, H. Ding, H. Xin, H. Gao, H. Li, H. Qu, J. L. Cai, J. Liang, J. Guo, J. Ni, J. Li, J. Wang, J. Chen, J. Chen, J. Yuan, J...

2024
[50]

A survey on data collection for machine learning: A big data - ai integration perspective,

Y. Roh, G. Heo, and S. E. Whang, “A survey on data collection for machine learning: A big data - ai integration perspective,”IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1328–1347, 2021

2021
[51]

Structural and optical properties of highly hydroxylated fullerenes: stability of molecular domains on the c60 surface,

R. Guirado-López and M. Rincón, “Structural and optical properties of highly hydroxylated fullerenes: stability of molecular domains on the c60 surface,”The Journal of chemical physics, vol. 125, no. 15, 2006

2006
[52]

Functionalized fullerene: a key driver for high performance inverted perovskite solar cell,

X. Zhang, J. Zhang, D. Liu, and W. Zhang, “Functionalized fullerene: a key driver for high performance inverted perovskite solar cell,”Journal of Energy Chemistry, 2025

2025
[53]

Uma: A family of universal models for atoms,

B. M. Wood, M. Dzamba, X. Fu, M. Gao, M. Shuaibi, L. Barroso-Luque, K. Abdelmaqsoud, V. Gharakhanyan, J. R. Kitchin, D. S. Levine, K. Michel, A. Sriram, T. Cohen, A. Das, A. Rizvi, S. J. Sahoo, Z. W. Ulissi, and C. L. Zitnick, “Uma: A family of universal models for atoms,” 2025. 145

2025
[54]

crewai: Framework for orchestrating role-playing, autonomous ai agents,

“crewai: Framework for orchestrating role-playing, autonomous ai agents,” 2025. Accessed: 2025-07-11

2025
[55]

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning,

D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, X. Zhang, X. Yu, Y. Wu, Z. F. Wu, Z. Gou, Z. Shao, Z. Li, Z. Gao, A. Liu, B. Xue, B. Wang, B. Wu, B. Feng, C. Lu, C. Zhao, C. Deng, C. Ruan, D. Dai, D. Chen, D. Ji, E. Li, F. Lin, F. Dai, F. Luo, G. Hao, G. Chen, G. Li, H. Zhang, H. Xu, H. Ding, H. Gao, H. Qu, H. Li, J. Gu...

2025
[56]

Training a scientific reasoning model for chemistry,

S. M. Narayanan, J. D. Braza, R.-R. Griffiths, A. Bou, G. Wellawatte, M. C. Ramos, L. Mitchener, S. G. Rodriques, and A. D. White, “Training a scientific reasoning model for chemistry,” 2025

2025
[57]

Thought anchors: Which llm reasoning steps matter?,

P. C. Bogdan, U. Macar, N. Nanda, and A. Conmy, “Thought anchors: Which llm reasoning steps matter?,” 2025

2025
[58]

Introductory tutorials for simulating protein dynamics with gromacs,

J. A. Lemkul, “Introductory tutorials for simulating protein dynamics with gromacs,”The Journal of Physical Chemistry B, vol. 128, no. 39, pp. 9418–9435, 2024

2024
[59]

Streamlit: The fastest way to build data apps,

S. Inc., “Streamlit: The fastest way to build data apps,” 2025. Python library for creating interactive web apps

2025
[60]

Lammps - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales,

A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. Doak, M. J. D’Evelyn, D. W. Engel, M. Feng, O. Gissinger, A. Hackl, H. Heinz, O. Homeyer, S. Hou, M. Ihm, G. Kresse, A. Kucukel, D. Lee, T. D. Li, Z. Y. Ma, D. M. Makarov, L. Martinez, D. M. Merz, J. A. Miller, K. A. Min, C. H. Moore, R. E. Moore, T. Müller, F....

2022
[61]

Gromacs: Amessage-passingparallelmolecular dynamics implementation,

H.J.Berendsen, D.vanderSpoel, andR.vanDrunen, “Gromacs: Amessage-passingparallelmolecular dynamics implementation,”Computer physics communications, vol. 91, no. 1-3, pp. 43–56, 1995

1995
[62]

Amber 2025,

D. A. Case, H. M. Aktulga, K. Belfon, I. Y. Ben-Shalom, J. T. Berryman, S. R. Brozell, F. S. Carvahol, D. S. Cerutti, T. E. Cheatham, G. A. Cisneros, V. W. D. Cruzeiro, T. A. Darden, N. Forouzesh, M. Ghazimirsaeed, G. Giambasu, T. Giese, M. K. Gilson, H. Gohlke, A. W. Goetz, J. Harris, Z. Huang, S. Izadi, S. A. Izmailov, K. Kasavajhala, M. C. Kaymak, I. K...

2025
[63]

Chatgpt (openai api),

OpenAI, “Chatgpt (openai api),” 2025. Large language model / AI service

2025
[64]

Oxdna. org: a public webserver for coarse-grained simulations of dna and rna nanostructures,

E. Poppleton, R. Romero, A. Mallya, L. Rovigatti, and P. Šulc, “Oxdna. org: a public webserver for coarse-grained simulations of dna and rna nanostructures,”Nucleic acids research, vol. 49, no. W1, pp. W491–W498, 2021

2021
[65]

Hoomd-blue: A python package for high-performance molecular dynamics and hard particle monte carlo simulations,

J. A. Anderson, J. Glaser, and S. C. Glotzer, “Hoomd-blue: A python package for high-performance molecular dynamics and hard particle monte carlo simulations,”Computational Materials Science, vol. 173, p. 109363, 2020

2020
[66]

Concepts for a semantically acces- sible materials data space: Overview over specific implementations in materials science,

B. Bayerlein, J. Waitelonis, H. Birkholz, M. Jung, M. Schilling, P. v. Hartrott, M. Bruns, J. Schaarschmidt, K. Beilke, M. Mutz, V. Nebel, V. Königer, L. Beran, T. Kraus, A. Vyas, L. Vogt, M. Blum, B. Ell, Y.-F. Chen, T. Waurischk, A. Thomas, A. R. Durmaz, S. Ben Hassine, C. Fresemann, G. Dziwis, H. Beygi Nasrabadi, T. Hanke, M. Telong, S. Pirskawetz, M. ...

2025
[67]

Seamless science: Lifting experimental mechanical testing lab data to an interoperable semantic representation,

M. Schilling, S. Bruns, B. Bayerlein, J. Kryeziu, J. Schaarschmidt, J. Waitelonis, P. Dolabella Portella, and K. Durst, “Seamless science: Lifting experimental mechanical testing lab data to an interoperable semantic representation,”Advanced Engineering Materials, vol. 27, no. 8, p. 2401527, 2025

2025
[68]

Mulms: A multi-layer annotated text corpus for information extraction in the materials science domain,

T. P. Schrader, M. Finco, S. Grünewald, F. Hildebrand, and A. Friedrich, “Mulms: A multi-layer annotated text corpus for information extraction in the materials science domain,”arXiv preprint arXiv:2310.15569, 2023

work page arXiv 2023
[69]

Pmd core ontology: Achieving semantic interoperability in materials science,

B. Bayerlein, M. Schilling, H. Birkholz, M. Jung, J. Waitelonis, L. Mädler, and H. Sack, “Pmd core ontology: Achieving semantic interoperability in materials science,”Materials & Design, vol. 237, p. 112603, 2024

2024
[70]

Bridging microscopy with molecular dynamics and quantum simulations: an atomai based pipeline,

A. Ghosh, M. Ziatdinov, O. Dyck, B. G. Sumpter, and S. V. Kalinin, “Bridging microscopy with molecular dynamics and quantum simulations: an atomai based pipeline,”npj Computational Materi- als, vol. 8, no. 1, p. 74, 2022

2022
[71]

Atomai: a deep learning framework for analysis of image and spectroscopy data in (scanning) transmission electron microscopy and beyond,

M. Ziatdinov, A. Ghosh, T. Wong, and S. V. Kalinin, “Atomai: a deep learning framework for analysis of image and spectroscopy data in (scanning) transmission electron microscopy and beyond,”arXiv preprint arXiv:2105.07485, 2021

work page arXiv 2021
[72]

Localization and segmentation of atomic columns in supported nanoparticles for fast scanning transmission electron microscopy,

H. Eliasson and R. Erni, “Localization and segmentation of atomic columns in supported nanoparticles for fast scanning transmission electron microscopy,”npj Computational Materials, vol. 10, no. 1, p. 168, 2024

2024
[73]

Microscopy study of structural evolution in epitaxial licoo2 positive electrode films during electrochemical cycling,

H. Tan, S. Takeuchi, K. K. Bharathi, I. Takeuchi, , and L. A. Beddersky, “Microscopy study of structural evolution in epitaxial licoo2 positive electrode films during electrochemical cycling,”ACS Applied Materials & Interfaces, vol. 8, no. 10, pp. 6727–6735, 2016

2016
[74]

Deep learning enabled strain mapping of single-atom defects in two- dimensional transition metal dichalcogenides with sub-picometer precision,

C. Lee, A. Khan, D. Luo, T. P. Santos, C. Shi, B. E. Janicek, S. Kang, W. Zhu, N. A. Sobh, A. Schleife, B. K. Clark, and P. Huang, “Deep learning enabled strain mapping of single-atom defects in two- dimensional transition metal dichalcogenides with sub-picometer precision,”Nano Letters, vol. 20, no. 5, pp. 3369–3377, 2020

2020
[75]

Mechanistic insights into potassium- assistant thermal-catalytic oxidation of soot over single-crystalline srtio3 nanotubes with ordered meso- pores,

F. Fang, X. L. F. Xu, C. Chen, N. Feng, Y. Jiang, , and J. Huang, “Mechanistic insights into potassium- assistant thermal-catalytic oxidation of soot over single-crystalline srtio3 nanotubes with ordered meso- pores,”ACS Catalysis, vol. 15, no. 2, pp. 789–799, 2025

2025
[76]

arXiv preprint arXiv:2401.00096 , year=

I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kovács, J. Riebesell, X. R. Advincula, M. Asta, M. Avaylon, W. J. Baldwin,et al., “A foundation model for atomistic materials chemistry,”arXiv preprint arXiv:2401.00096, 2023. 147

work page arXiv 2023
[77]

New substructure filters for removal of pan assay interference com- pounds (pains) from screening libraries and for their exclusion in bioassays,

J. B. Baell and G. A. Holloway, “New substructure filters for removal of pan assay interference com- pounds (pains) from screening libraries and for their exclusion in bioassays,”Journal of Medicinal Chemistry, vol. 53, no. 7, pp. 2719–2740, 2010

2010
[78]

Chemistry: Chemical con artists foil drug discovery,

J. B. Baell and M. Walters, “Chemistry: Chemical con artists foil drug discovery,”Nature, vol. 513, p. 481–483, 2014

2014
[79]

Chemberta: Large-scale self-supervised pretraining for molecular property prediction,

S. Chithrananda, G. Grand, and B. Ramsundar, “Chemberta: Large-scale self-supervised pretraining for molecular property prediction,” 10 2020

2020
[80]

Neural scaling of deep chemical models,

N. Frey, R. Soklaski, S. Axelrod, S. Samsi, R. Gómez-Bombarelli, C. Coley, and V. Gadepally, “Neural scaling of deep chemical models,”Nature Machine Intelligence, vol. 5, pp. 1–9, 10 2023

2023
[81]

Txgemma: Efficient and agentic llms for therapeutics

E. Wang, S. Schmidgall, P. F. Jaeger, F. Zhang, R. Pilgrim, Y. Matias, J. Barral, D. Fleet, and S. Azizi, “Txgemma: Efficient and agentic llms for therapeutics,”arXiv preprint arXiv:2504.06196, 2025

work page arXiv 2025

Showing first 80 references.