pith. machine review for the scientific record. sign in

arxiv: 2605.03205 · v1 · submitted 2026-05-04 · ❄️ cond-mat.mtrl-sci · cs.AI

Recognition: unknown

From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Aritra Roy , Kevin Shen , Andrew MacBride , Awwal Oladipupo , Mudassra Taskeen , Wojtek Treyde , Ruaa A. E. A. Abakar , Ahmad D. Abbas
show 345 more authors
Elsayed Abdelfatah Abbas A. Abdullahi Seham S. Abyah Chahd Rahyl Adjmi Fariha Agbere Savyasanchi Aggarwal Muhammad Ahmed Tasnim Ahmed Motasem Ajlouni Mattias Akke Hussein AlAdwan Anwaar S. Alazani Zahra A. Alharbi Wajd A. Aljulyhi Mohammed A. AlKubaish Fatima A. Almahri Sayed A. Almohri David Obeh Alobo Mohammed Alouni Azizah S. Alqahtani Omar Alsaigh Husain Althagafi Md. Aqib Aman Lena Ara Arifin Ignacio Arretche Abdulaziz Ashy Syeda A. Asim Amro Aswad Adeel Atta S\"oren Auer Abdullah al Azmi Toheeb Balogun Suvo Banik Viktoriia Baibakova Shakira A. Baksh Neus G. Bast\'us Christina J. Bayard Adib Bazgir Louis Beal Lejla Biberi\'c Wahid Billah Ankita Biswas Joshua Bocarsly Montassar T. Bouzidi Esma B. Boydas Youssef Briki Cailin Buchanan Mauricio Cafiero Damien Caliste Yi Cao Rafael E. Casta\~neda Sruthy K. Chandy Benjamin Charmes Shayantan Chaudhuri Yiming Chen Alexander Chen Jieneng Chen Min-Hsueh Chiu Defne Circi Cinthya H. Contreras Yoann Cure Nathan Daelman Roshini Dantuluri Thomas Davy William Dawson Leonid Didukh Rui Ding Aminu R. Doguwa Claudia Draxl Sathya Edamadaka Oulaya Elargab Christina Ertural Matthew L. Evans Edvin Fako Hossam Farag Nur A. Fathurrahman Merve Fedai Rodrigo P. Ferreira Giuseppe Fisicaro Thomas Frank Sasi K. Gaddipati Abhijeet Gangan Jennifer Garland James Garrick Luigi Genovese Maryam Ghadrdran Sandip Giri Maxime Goulet Jeremy Goumaz Sara U. Gracia Jacob Graham Gabriel Graves Kevin P. Greenman Tim Greitemeier Cameron Gruich Sophie Gu Salom\'e Guilbert Hans Gundlach Muriel F. Gusta Mourad El Haddaoui Alexander J. Haibel Anubhab Haldar Vehaan Handa Hassan Harb Nathan D. Harms Abdullah Al Hasan Abir Hassan Qiyao He Andr\'es Henao-Aristiz\'abal Bram Hoex Sungil Hong Alexander J. Horvath Md. Shaib Hossain Yanqi Huang Yuqing Huang Kostiantyn Hubaiev Donald Intal Katherine Inzani Kevin Ishimwe Tugba Isik Gopal R. Iyer Katharina Jager Jan Janssen Hyewon Jeong Michael Jirasek Tyler R. Josephson Nisarg Joshi Yassir Ben Kacem Remya A. M. Kalapurakal Rakesh R. Kamath Sugan Kanagasenthinathan Dohun Kang Jason Kantorow K\"ubra Kaygisiz Murat Keceli Farhana Keya Muhammad U. Khan Sartaaj Takrim Khan Hyungjun Kim Alexander Kister Sascha Klawohn Collin Kovacs Pranav Krishnan Maurycy Kryzanowski Ritesh Kumar Suman Kumari Gourav Kumbhojkar Ryo Kuroki Shashank Kushwaha Magdalena Lederbauer Jaejun Lee Seunghan Lee Jeonghwan Lee Bingcan Li Calvin Li Zhanzhao Li Shi Li Shicheng Li Chengyan Liu Hao Liu Tung Yan Liu Yutong Liu Lucia Vina-Lopez Chayaphol Lortaraparsert Andre K.Y. Low Saffron Luxford Carlos Madariaga Rishikesh Magar Piyush R. Maharana Rahul Mallela Shoaib Mahmud Natesan Mani Umair Mansoor Omar B. Mansour Cassandra Masschelein Kinga O. Mastej Ankit Mathanker Jeffrey Meng Omran Mezghani Yidong Ming Rishav Mitra Michail Mitsakis Matthew Miyagishima Ravikumar Mohan Naveen R. Mohanraj Trupti Mohanty Bernadette Mohr Francisco A. Molina-Bakhos Jeremy Monat Seyed Mohamad Moosavi Shayan Mousavi Arman Moussavi Rubel Mozumber Muhammad J. Mufti Diyana Muhammed Ram Munde Mrigi Munjal Jos\'e A. M\'arquez Shankha Nag Giacomo Nagaro Juno Nam Jose M. Napoles-Duarte Ry Nduma Xuan-Vu Nguyen Ebrahim Norouzi Oluwatosin Ohiro Ryotaro Okabe Viejay Ordillo Shuichiro Ozawa Sebastian Pagel Daniel Palmer Angela Pan Akash Pandey Vivek Pandit Prakul Pandit Chiku Parida Jaehee Park Hyunsoo Park Hemangi Patel Shakul Pathak Taradutt Pattnaik Elena Patyukova Noah Paulson Deepak S. Pendyala Erick S. Pepek Martin H. Petersen Thang D. Pham Aniket Phutane Sabila K. Pinky \'Etienne Polack Alison Polasik Maria Politi Tim Pongratz Akhila Ponugoti Fabio Priante Thomas Michael Pruyn Sai S. Puppala Mohammad A. Qazi Heike Quosdorf Gollam Rabby Mohammad J. Raei Md. Habibur Rahman A.B.M. Ashikur Rahman Subhashree Rajasekaran Tawfiqur Rakib Hemanth N. Ramesh Vrushali Ranadive Karnamohit Ranka Bojana Rankovic Adwaith Ravichandran Ilija Ra\v{s}ovi\'c Sergei Rigin Tatem Rios Varun Rishi Victor Naden Robinson Lucas S. Rodrigues Oswaldo Rodriguez Mahule Roy Diptendu Roy Subhas Roy Arokia Anto Royan M Joseph F. Rudzinski Muhammad Sabih Subramanyam Sahoo Srusti Bheem Sain Thahira Saliya Vignesh Sampath Jesus Diaz Sanchez Arthur S. S. Santos Muliady Satria Hasan M. Sayeed J\"org Schaarschmidt Philippe Schwaller Nofit Segal Abhishec Senthilvel Sherjeel Shabih Devanshu Shah Faezeh Shahmoradi Samiha Sharlin Killian Sheriff Qiuyu Shi Abubakar D. Shuaibu Ayesha Siddiqua M.A. Shadab Siddiqui Darian Smalley Benjamin Smith Taylor D. Sparks Daniel T. Speckhard Elena Stojanovska Akshay Subramanian Jiwon Sun Yunkai Sun Abdul W. Syed Souvik Ta Izumi Takahara Kelly Tallau Guannan Tang Ans B. Tariq Sui X. Tay Nurlybek Temirbay Surya P. Tiwari Febin Tom Tajah Trapier Kasidet J. Trerayapiwat Samanvya Tripathi Hawra H. Tuhaifa Mustafa Unal Mohammad Uzair Vallabh Vasudevan Estefania Vazquez Victor Venturi Rahul Verma Ashwini Verma Alvaro Vazquez-Mayagoitia Nicholas Wagner Araki Wakiuchi Hao Wan Liaoyaqi Wang Wolfgang Wenzel Alexander Wieczorek Sze H. Wong Yue Wu Tong Xie Andrew Yi Ziqi Yin Jodie A. Yuwono Nahed A. Zaid Mohd Zaki Shehtab Zaman Maimuna U. Zarewa Mahtab Zehtab Baosen Zhang Wenyu Zhang Melody Zhang Yangfan Zhang Yuwen Zhang Runze Zhang Zongmin Zhang Huanhuan Zhao Yuanlong Bill Zheng Ramzi Zidani Xue Zong Ian Foster Ben Blaiszik
Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:33 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI
keywords large language modelsmaterials sciencechemistrymulti-agent systemsretrieval-augmented generationscientific workflowshackathon outcomes
0
0 comments X

The pith

LLM applications in materials science are shifting from standalone assistants to integrated multi-agent systems that organize knowledge and execute scientific actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews outcomes from a 2025 hackathon where teams built LLM tools for materials science and chemistry. It separates the projects into knowledge infrastructure tools that retrieve, synthesize, and validate information, and action systems that run computations or coordinate experiments. Analysis of the submissions identifies a move toward multi-agent setups that link retrieval with reasoning and validation steps. This pattern indicates LLMs can act as building blocks for full research pipelines instead of isolated helpers. Readers in the field would care because the work maps concrete examples of how these models might speed up discovery cycles in hard sciences.

Core claim

The submissions reveal a shift from single-purpose LLM tools toward integrated, multi-agent workflows that combine retrieval, reasoning, tool use, and domain-specific validation. Prominent themes include retrieval-augmented generation as grounding infrastructure, persistent structured knowledge representations, multimodal and multilingual scientific inputs, and early progress toward laboratory-integrated closed-loop systems. Together, these results suggest that LLMs are evolving from general-purpose assistants into composable infrastructure for scientific reasoning and action.

What carries the argument

The paper's taxonomy dividing projects into Knowledge Infrastructure systems (for structuring and validating scientific information) and Action Systems (for executing and automating work in computational or experimental environments); this split tracks the move to combined workflows.

If this is right

  • Retrieval-augmented generation becomes standard grounding for reliable scientific outputs.
  • Persistent structured knowledge bases enable better synthesis across papers and data.
  • Multimodal inputs allow LLMs to process images, spectra, and text together in one workflow.
  • Early closed-loop systems link computation directly to lab execution.
  • A shared taxonomy helps researchers design and compare future LLM-enabled scientific tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the pattern holds, research groups could build custom agents that move from literature review to experiment design without manual handoffs.
  • Similar hackathons in other domains like biology or physics might produce parallel taxonomies for cross-field comparison.
  • A testable next step is piloting one multi-agent system in an actual lab and tracking time saved from idea to result.

Load-bearing premise

The hackathon submissions represent the main trends and future directions for LLM use across materials science and chemistry.

What would settle it

A survey of LLM tools published in the field over the next 18 months that shows most remain single-purpose rather than integrated multi-agent systems would undermine the claimed transition.

Figures

Figures reproduced from arXiv: 2605.03205 by Abbas A. Abdullahi, Abdulaziz Ashy, Abdullah al Azmi, Abdullah Al Hasan, Abdul W. Syed, Abhijeet Gangan, Abhishec Senthilvel, Abir Hassan, A.B.M. Ashikur Rahman, Abubakar D. Shuaibu, Adeel Atta, Adib Bazgir, Adwaith Ravichandran, Ahmad D. Abbas, Akash Pandey, Akhila Ponugoti, Akshay Subramanian, Alexander Chen, Alexander J. Haibel, Alexander J. Horvath, Alexander Kister, Alexander Wieczorek, Alison Polasik, Alvaro Vazquez-Mayagoitia, Aminu R. Doguwa, Amro Aswad, Andre K.Y. Low, Andr\'es Henao-Aristiz\'abal, Andrew MacBride, Andrew Yi, Angela Pan, Aniket Phutane, Ankita Biswas, Ankit Mathanker, Ans B. Tariq, Anubhab Haldar, Anwaar S. Alazani, Araki Wakiuchi, Arifin, Aritra Roy, Arman Moussavi, Arokia Anto Royan M, Arthur S. S. Santos, Ashwini Verma, Awwal Oladipupo, Ayesha Siddiqua, Azizah S. Alqahtani, Baosen Zhang, Ben Blaiszik, Benjamin Charmes, Benjamin Smith, Bernadette Mohr, Bingcan Li, Bojana Rankovic, Bram Hoex, Cailin Buchanan, Calvin Li, Cameron Gruich, Carlos Madariaga, Cassandra Masschelein, Chahd Rahyl Adjmi, Chayaphol Lortaraparsert, Chengyan Liu, Chiku Parida, Christina Ertural, Christina J. Bayard, Cinthya H. Contreras, Claudia Draxl, Collin Kovacs, Damien Caliste, Daniel Palmer, Daniel T. Speckhard, Darian Smalley, David Obeh Alobo, Deepak S. Pendyala, Defne Circi, Devanshu Shah, Diptendu Roy, Diyana Muhammed, Dohun Kang, Donald Intal, Ebrahim Norouzi, Edvin Fako, Elena Patyukova, Elena Stojanovska, Elsayed Abdelfatah, Erick S. Pepek, Esma B. Boydas, Estefania Vazquez, \'Etienne Polack, Fabio Priante, Faezeh Shahmoradi, Farhana Keya, Fariha Agbere, Fatima A. Almahri, Febin Tom, Francisco A. Molina-Bakhos, Gabriel Graves, Giacomo Nagaro, Giuseppe Fisicaro, Gollam Rabby, Gopal R. Iyer, Gourav Kumbhojkar, Guannan Tang, Hans Gundlach, Hao Liu, Hao Wan, Hasan M. Sayeed, Hassan Harb, Hawra H. Tuhaifa, Heike Quosdorf, Hemangi Patel, Hemanth N. Ramesh, Hossam Farag, Huanhuan Zhao, Husain Althagafi, Hussein AlAdwan, Hyewon Jeong, Hyungjun Kim, Hyunsoo Park, Ian Foster, Ignacio Arretche, Ilija Ra\v{s}ovi\'c, Izumi Takahara, Jacob Graham, Jaehee Park, Jaejun Lee, James Garrick, Jan Janssen, Jason Kantorow, Jeffrey Meng, Jennifer Garland, Jeonghwan Lee, Jeremy Goumaz, Jeremy Monat, Jesus Diaz Sanchez, Jieneng Chen, Jiwon Sun, Jodie A. Yuwono, J\"org Schaarschmidt, Jos\'e A. M\'arquez, Jose M. Napoles-Duarte, Joseph F. Rudzinski, Joshua Bocarsly, Juno Nam, Karnamohit Ranka, Kasidet J. Trerayapiwat, Katharina Jager, Katherine Inzani, Kelly Tallau, Kevin Ishimwe, Kevin P. Greenman, Kevin Shen, Killian Sheriff, Kinga O. Mastej, Kostiantyn Hubaiev, K\"ubra Kaygisiz, Lejla Biberi\'c, Lena Ara, Leonid Didukh, Liaoyaqi Wang, Louis Beal, Lucas S. Rodrigues, Lucia Vina-Lopez, Luigi Genovese, Magdalena Lederbauer, Mahtab Zehtab, Mahule Roy, Maimuna U. Zarewa, Maria Politi, Martin H. Petersen, Maryam Ghadrdran, M.A. Shadab Siddiqui, Matthew L. Evans, Matthew Miyagishima, Mattias Akke, Mauricio Cafiero, Maurycy Kryzanowski, Maxime Goulet, Md. Aqib Aman, Md. Habibur Rahman, Md. Shaib Hossain, Melody Zhang, Merve Fedai, Michael Jirasek, Michail Mitsakis, Min-Hsueh Chiu, Mohammad A. Qazi, Mohammad J. Raei, Mohammad Uzair, Mohammed A. AlKubaish, Mohammed Alouni, Mohd Zaki, Montassar T. Bouzidi, Motasem Ajlouni, Mourad El Haddaoui, Mrigi Munjal, Mudassra Taskeen, Muhammad Ahmed, Muhammad J. Mufti, Muhammad Sabih, Muhammad U. Khan, Muliady Satria, Murat Keceli, Muriel F. Gusta, Mustafa Unal, Nahed A. Zaid, Natesan Mani, Nathan Daelman, Nathan D. Harms, Naveen R. Mohanraj, Neus G. Bast\'us, Nicholas Wagner, Nisarg Joshi, Noah Paulson, Nofit Segal, Nur A. Fathurrahman, Nurlybek Temirbay, Oluwatosin Ohiro, Omar Alsaigh, Omar B. Mansour, Omran Mezghani, Oswaldo Rodriguez, Oulaya Elargab, Philippe Schwaller, Piyush R. Maharana, Prakul Pandit, Pranav Krishnan, Qiuyu Shi, Qiyao He, Rafael E. Casta\~neda, Rahul Mallela, Rahul Verma, Rakesh R. Kamath, Ram Munde, Ramzi Zidani, Ravikumar Mohan, Remya A. M. Kalapurakal, Rishav Mitra, Rishikesh Magar, Ritesh Kumar, Rodrigo P. Ferreira, Roshini Dantuluri, Ruaa A. E. A. Abakar, Rubel Mozumber, Rui Ding, Runze Zhang, Ry Nduma, Ryo Kuroki, Ryotaro Okabe, Sabila K. Pinky, Saffron Luxford, Sai S. Puppala, Salom\'e Guilbert, Samanvya Tripathi, Samiha Sharlin, Sandip Giri, Sara U. Gracia, Sartaaj Takrim Khan, Sascha Klawohn, Sasi K. Gaddipati, Sathya Edamadaka, Savyasanchi Aggarwal, Sayed A. Almohri, Sebastian Pagel, Seham S. Abyah, Sergei Rigin, Seunghan Lee, Seyed Mohamad Moosavi, Shakira A. Baksh, Shakul Pathak, Shankha Nag, Shashank Kushwaha, Shayan Mousavi, Shayantan Chaudhuri, Shehtab Zaman, Sherjeel Shabih, Shicheng Li, Shi Li, Shoaib Mahmud, Shuichiro Ozawa, Sophie Gu, S\"oren Auer, Souvik Ta, Srusti Bheem Sain, Sruthy K. Chandy, Subhashree Rajasekaran, Subhas Roy, Subramanyam Sahoo, Sugan Kanagasenthinathan, Sui X. Tay, Suman Kumari, Sungil Hong, Surya P. Tiwari, Suvo Banik, Syeda A. Asim, Sze H. Wong, Tajah Trapier, Taradutt Pattnaik, Tasnim Ahmed, Tatem Rios, Tawfiqur Rakib, Taylor D. Sparks, Thahira Saliya, Thang D. Pham, Thomas Davy, Thomas Frank, Thomas Michael Pruyn, Tim Greitemeier, Tim Pongratz, Toheeb Balogun, Tong Xie, Trupti Mohanty, Tugba Isik, Tung Yan Liu, Tyler R. Josephson, Umair Mansoor, Vallabh Vasudevan, Varun Rishi, Vehaan Handa, Victor Naden Robinson, Victor Venturi, Viejay Ordillo, Vignesh Sampath, Viktoriia Baibakova, Vivek Pandit, Vrushali Ranadive, Wahid Billah, Wajd A. Aljulyhi, Wenyu Zhang, William Dawson, Wojtek Treyde, Wolfgang Wenzel, Xuan-Vu Nguyen, Xue Zong, Yangfan Zhang, Yanqi Huang, Yassir Ben Kacem, Yi Cao, Yidong Ming, Yiming Chen, Yoann Cure, Youssef Briki, Yuanlong Bill Zheng, Yue Wu, Yunkai Sun, Yuqing Huang, Yutong Liu, Yuwen Zhang, Zahra A. Alharbi, Zhanzhao Li, Ziqi Yin, Zongmin Zhang.

Figure 1
Figure 1. Figure 1: Venn diagram showing the classification of 88 submitted hackathon projects from the third LLM view at source ↗
Figure 2
Figure 2. Figure 2: The blue bars represent the percentage of molecules for which the given AI model is able to view at source ↗
Figure 3
Figure 3. Figure 3: Ligand in the MR binding site; pose found using UMADock. view at source ↗
Figure 4
Figure 4. Figure 4: A schematic representation of the complete workflow from database generation using Forja de view at source ↗
Figure 5
Figure 5. Figure 5: Schematic of the workflow used for the LLM-driven generation of new cuprate materials. view at source ↗
Figure 6
Figure 6. Figure 6: Crystal structures of two candidate cuprate superconductors generated by Chemeleon using the view at source ↗
Figure 7
Figure 7. Figure 7: Workflow of the NeuroSymbolic hypothesis generation pipeline. view at source ↗
Figure 8
Figure 8. Figure 8: Real-time system analytics dashboard for NSHE. view at source ↗
Figure 9
Figure 9. Figure 9: The ARIA framework for bidirectional materials reasoning. (a) Bidirectional materials discovery enables both forward property prediction from synthesis parameters and inverse design of synthesis protocols from target properties, with complete causal traceability through the knowledge graph. (b) Automated knowledge graph construction extracts causal relationships from scientific literature using LLM-powered… view at source ↗
Figure 10
Figure 10. Figure 10: The LARA-HPC architecture. Workflows are orchestrated by a set of autonomous agents and incorporate a human-in-the-loop (HITL) component to ensure scientific validity and computational safety. A Generator Agent receives a natural language scientific query (e.g., “Calculate the atomization energy of HCN”) and, through the Ontoflow RAG pipeline, retrieves domain knowledge from software documentation such as… view at source ↗
Figure 11
Figure 11. Figure 11: Overview of MixSense. The system ingests 1H-NMR spectra and routes them to task-specific modules for product hypothesis generation, spectral deconvolution, time-series quantification, and SMILES￾based chemosensory property prediction. Demo and Interface. The MixSense team demonstrated all three capabilities through an interactive Gradio interface [47]: mixture identification and quantification, time-serie… view at source ↗
Figure 12
Figure 12. Figure 12: The AgentLearn workflow for LLM-driven active learning and iterative dataset expansion. view at source ↗
Figure 13
Figure 13. Figure 13: (A) Overall workflow of The Fullerene Factory multi-agent framework. (B) Examples of MLIP￾optimized functionalized fullerene structures generated by the system. Inspired by Thought Anchors, [56] the Fordham Reasoning Team began exploring reasoning-trace knowl￾edge extraction and interpretability in LLMs for chemistry problems, which is a new direction toward un￾derstanding their inner workings. This study… view at source ↗
Figure 14
Figure 14. Figure 14: (A) Closeness centrality circular graph (Left) and PageRank centrality analysis (Right) of Ether0- view at source ↗
Figure 15
Figure 15. Figure 15: High-level conceptual architecture of the ChemBot system, showing the sequential stages from view at source ↗
Figure 16
Figure 16. Figure 16: Graphs displaying the accuracy of the ChemBot LLM. view at source ↗
Figure 17
Figure 17. Figure 17: OntoKG pipeline: seed knowledge graph (Neo4j) view at source ↗
Figure 18
Figure 18. Figure 18: Overview of the MINT LLM web application, showing its five core modules: Simulation Assess view at source ↗
Figure 19
Figure 19. Figure 19: Node embedding performed on an ontology view at source ↗
Figure 20
Figure 20. Figure 20: Influence of context enhancement on ontology matching. view at source ↗
Figure 21
Figure 21. Figure 21: (A) Schematic illustration of AtomBridge workflow. (B) AtomBridge allows users to select the region of interest (ROI) in electron microscopy (EM) images (top) and detects lattice vectors from selected EM images (bottom). (C) AtomBridge is tested for materials with multi-dimensions. (D) AtomBridge extracts structural information in CIF form from journals [72, 74] and is equipped with an internal structure … view at source ↗
Figure 22
Figure 22. Figure 22: a) GAINS workflow. b) Two-dimensional UMAP projection of molecular fingerprints showing view at source ↗
Figure 23
Figure 23. Figure 23: ChemGraph-IR workflow: from a natural-language request to optimized structure, vibrational view at source ↗
Figure 24
Figure 24. Figure 24: CLUE workflow. (A) Counterfactual generation: property prediction for the sample and iden view at source ↗
Figure 25
Figure 25. Figure 25: NEDD user interface for data-driven experiment planning. A) Data upload and tabular view view at source ↗
Figure 26
Figure 26. Figure 26: Overview of the relaxation training pipeline. The input structure from the LeMatTraj dataset view at source ↗
Figure 27
Figure 27. Figure 27: Workflow of the MaterialSim AI Agent. Future Work Future development will focus on evolving the MaterialSim AI Agent into a fully autonomous computational framework. The long-term goal is to enable researchers to initiate simulations, monitor execution, analyze results, and generate predictive insights entirely through natural-language interaction. By allowing complex 43 view at source ↗
Figure 28
Figure 28. Figure 28: Performance comparison of SCARA against a general-purpose LLM and supervised ML ap view at source ↗
Figure 29
Figure 29. Figure 29: SCARA (Steel Corrosion Agent for Risk Assessment) workflow. view at source ↗
Figure 30
Figure 30. Figure 30: User interface of ChromatographyMiner, illustrating the workflow for uploading and analyzing two-dimensional gas chromatography–mass spectrometry (GC×GC–MS) data in NetCDF format. The plat￾form supports drag-and-drop input of .CDF files, chromatogram visualization, automatic mass spectrum extraction, and compound identification using spectral libraries such as MassBank and MassBank of North America. Resul… view at source ↗
Figure 31
Figure 31. Figure 31: Workflow cycle of guillemot. The AI agent interacts with users, crystal structure databases, and TOPAS to automate a human-like Rietveld refinement process. Future Work This study focused on a specific analysis task: Rietveld refinement using TOPAS Academic, a widely used proprietary software for advanced PXRD analysis. Future work will extend this approach to additional refinement tools, including open-s… view at source ↗
Figure 32
Figure 32. Figure 32: Workflow of XAScribe. Results XAScribe is an AI-assisted platform developed to automate the analysis and interpretation of Ni K-edge X￾ray Absorption Spectroscopy data ( view at source ↗
Figure 33
Figure 33. Figure 33: Results of the random forest models: (a) test predictions for Ni–O bond length, (b) test predictions view at source ↗
Figure 34
Figure 34. Figure 34: Overview of the BAKER framework. The system comprises a Builder module that automatically designs, implements, and reviews specialized research assistants, and an Assistant module that interacts with the user and manages execution. Each assistant is initialized with a predefined Data Manager node that oversees shared databases and documentation, while the Builder autonomously spawns all additional special… view at source ↗
Figure 35
Figure 35. Figure 35: Overview of the PolyPredictor workflow. To construct a chemical description, a LangChain agent powered by a commercial Gemini 2.5 Pro LLM is employed. The agent is guided by detailed system prompts and few-shot examples to produce structured, natural-language descriptions of the polymer repeat unit. These descriptions are converted into vector embeddings using OpenAI’s text-embedding-3-large model, with m… view at source ↗
Figure 36
Figure 36. Figure 36: Closed-loop optimization enabled through MCP–IvoryOS integration. An LLM issues natural view at source ↗
Figure 37
Figure 37. Figure 37: Interoperability architecture illustrating how an LLM communicates through the IvoryOS MCP view at source ↗
Figure 38
Figure 38. Figure 38: FADE workflow for natural language-driven drug candidate discovery. User queries describing targets of interest are processed through three sequential stages: (1) hierarchical database search for struc￾tural data or sequences; (2) binding site identification; and (3) computational generation and ranking of drug-like molecules using QSAR and binding affinity metrics to identify hit compounds. Future Work F… view at source ↗
Figure 39
Figure 39. Figure 39: MatFOMGen workflow. MatFOMGen was implemented using the Anthropic API and Streamlit. Future Work A primary limitation of the current MatFOMGen pipeline is the lack of formal validation for LLM-generated ASE functions. Future work will explore additional LLM-based reflection and refinement stages to improve code reliability. Another promising extension is the use of fine-tuned LLMs to achieve higher-accura… view at source ↗
Figure 40
Figure 40. Figure 40: Schematic workflow for DFTPilot. The user specifies the target property and material system. view at source ↗
Figure 41
Figure 41. Figure 41: Schematic depiction of the dual-mode design in Parse Patrol. Lower branch: Discovery Mode provides a single MCP interface to multiple parser and database servers, enabling agents to iteratively design parsers that conform to user-defined specifications. Upper branch: Direct Import Mode exposes the same tools as Python modules for frictionless integration into production code. Both branches are unified und… view at source ↗
Figure 42
Figure 42. Figure 42: System architecture of Catalyst Assistant. view at source ↗
Figure 43
Figure 43. Figure 43: Data extraction methods and model process flow in ThinFilm.ai. view at source ↗
Figure 44
Figure 44. Figure 44: SCALE workflow diagram. Future Work Future extensions of SCALE will focus on enhancing both chemical accuracy and autonomy by integrating higher-fidelity physics and adaptive learning. Incorporating semiempirical or DFT-level calculations (e.g., GFN2-xTB, ωB97X-D) into the surrogate model would improve the reliability of property predictions be￾yond empirical descriptors, while active learning loops could… view at source ↗
Figure 45
Figure 45. Figure 45: Architecture and workflow of L.A.R.A. The fine-tuned model determines whether to respond using view at source ↗
Figure 46
Figure 46. Figure 46: Workflow of ODE Forge showing (left) the two-phase agentic pipeline for research and model con view at source ↗
Figure 47
Figure 47. Figure 47: User interface to perform Materials Project queries. view at source ↗
Figure 48
Figure 48. Figure 48: Benchmark results for various model choices. Tool-augmented methods are generally more view at source ↗
Figure 49
Figure 49. Figure 49: Predictions for four materials, comparing different prompting methods. A table of summary view at source ↗
Figure 50
Figure 50. Figure 50: Model error distributions for single-property and multi-property prediction. view at source ↗
Figure 51
Figure 51. Figure 51: CaMEL-RAG framework for catalysis prediction. Dataset [181], which contains structured records describing the slab, surface site, adsorbate, and correspond￾ing adsorption energy. Since the dataset lacks intrinsic hierarchy, a flat vector representation was employed instead of CHORUS’s multi-level memory. Each structured record was converted into a natural-language description retaining complete system inf… view at source ↗
Figure 52
Figure 52. Figure 52: Performance comparison of baseline LLMs and CaMEL-RAG-enhanced models for adsorption view at source ↗
Figure 53
Figure 53. Figure 53: The SuperconLLM fully automated workflow, from arXiv papers to JSON records. view at source ↗
Figure 54
Figure 54. Figure 54: End-to-end architecture of Catalyze, illustrating agent orchestration from user query to validated view at source ↗
Figure 55
Figure 55. Figure 55: CAMEL workflow. Open-access papers are collected via OpenAlex and Unpaywall, then parsed view at source ↗
Figure 56
Figure 56. Figure 56: ZeroMAT framework architecture. Experimental evaluation using bandgap data from the Materials Project [120] demonstrates that ZeroMAT delivers substantial improvements in both accuracy and efficiency ( view at source ↗
Figure 57
Figure 57. Figure 57: Workflow of MuMMIE model pipeline. Results In the multilingual patent corpus spanning Chinese, Russian, French, Japanese, Korean, and English, the team observed that while chemical compound names often remain consistent across languages, the associated property labels vary widely. This inconsistency makes it difficult to build unified, machine-readable datasets. The primary objective of MuMMIE is to lever… view at source ↗
Figure 58
Figure 58. Figure 58: Overview of the automated electrolyte discovery system via offline reinforcement learning. view at source ↗
Figure 59
Figure 59. Figure 59: Bayesian probability heatmap view at source ↗
Figure 61
Figure 61. Figure 61: Vector database construction and retrieval-augmented generation (RAG) workflow for Sol-Agent. view at source ↗
Figure 62
Figure 62. Figure 62: Workflow of the AutoFeaSci multi-agent featurization system. Literature, metadata, and tabular view at source ↗
Figure 63
Figure 63. Figure 63: MAGE workflow. The agent interacts with the user and invokes the appropriate function based on view at source ↗
Figure 64
Figure 64. Figure 64: Architecture diagram of BASIS. 91 view at source ↗
Figure 65
Figure 65. Figure 65: Overview of the DFT workflow performed by view at source ↗
Figure 66
Figure 66. Figure 66: Titanarium working prototype showing multi-agent scientist-persona debate. view at source ↗
Figure 67
Figure 67. Figure 67: Evaluation of large language models (LLMs) for concrete property prediction. (a) Three evaluation view at source ↗
Figure 68
Figure 68. Figure 68: Complete nanoparticle analysis and LLM-driven insight generation workflow. The pipeline in view at source ↗
Figure 69
Figure 69. Figure 69: Overview of the DynaAgent architecture. The PrepAgent constructs a context-aware simulation plan, the MDAgent executes the plan with error-corrective reasoning, and the Analyser interprets the resulting trajectories. Available tools are shown in the action space. reflecting how effectively the agent minimized unnecessary iterations. Accuracy was defined as the ratio of successfully completed tasks to the … view at source ↗
Figure 70
Figure 70. Figure 70: Comparison of efficiency and accuracy across different LLM backends. view at source ↗
Figure 71
Figure 71. Figure 71: Workflow of CrysTalk. Given an input structure file and a user prompt, the agent performs view at source ↗
Figure 72
Figure 72. Figure 72: SpectroBot workflow. A user uploads a CSV file; the FTIR or UV–Vis analyzer generates view at source ↗
Figure 73
Figure 73. Figure 73: Workflow of the personalized agents in MindMesh, illustrating the generation of user-specific view at source ↗
Figure 74
Figure 74. Figure 74: SyntheSeek two-stage synthesis recipe generation workflow. view at source ↗
Figure 75
Figure 75. Figure 75: Overview of the V-RAPIDS workflow, illustrating UMA-based structure optimization followed view at source ↗
Figure 76
Figure 76. Figure 76: Representative V-RAPIDS output for the water–graphene system, including optimized geometries view at source ↗
Figure 77
Figure 77. Figure 77: NOMAD RAGBOT workflow. The system performs (1) offline indexing with context-aware view at source ↗
Figure 78
Figure 78. Figure 78: The conceptual diagram of Language Controlled Molecular Design and Analysis. view at source ↗
Figure 79
Figure 79. Figure 79: Conceptual overview of (a) the molecule–text description dataset, (b) text conditioning, and (c) view at source ↗
Figure 80
Figure 80. Figure 80: Conceptual overview of AdsKRK. Results In its original implementation, the LIAC-AdsKRK team employed the CodeAct [263] framework to enable a flexible trial-and-error workflow. Within CodeAct, the agent incrementally generates executable code that follows the instructions specified in the prompt, while the code-execution node returns the corresponding outputs. By iteratively repeating this generate–execute… view at source ↗
Figure 81
Figure 81. Figure 81: AssemblAI’s workflow diagram. Users provide text input to generate a peptide self-assembly view at source ↗
Figure 82
Figure 82. Figure 82: Transmission electron microscopy images of the peptide KFKFQF after self-assembly experiments. view at source ↗
Figure 83
Figure 83. Figure 83: Summarized agent outputs describing the self-assembly protocol of the peptide ‘KFKFQF‘ into view at source ↗
Figure 84
Figure 84. Figure 84: Benchmark performance of AssemblAI on a withheld test set (N=198). The plot shows the view at source ↗
Figure 85
Figure 85. Figure 85: MaterialMind system architecture combining retrieval, reasoning, and scoring components. view at source ↗
Figure 86
Figure 86. Figure 86: Workflow of ChemTutor AI and its future work perspectives. view at source ↗
Figure 87
Figure 87. Figure 87: Overview of the CrystaLenz agentic XRD analysis workflow, including data loading, preprocessing, view at source ↗
Figure 88
Figure 88. Figure 88: Overview of the closed-loop discovery platform (ACME). view at source ↗
Figure 89
Figure 89. Figure 89: Workflow of HEAQuery. MatSciBERT model [283], generating vector embeddings. These embeddings were stored in a FAISS index, enabling rapid semantic search across the literature. Simultaneously, the team curated and cleaned three public HEA datasets [284, 285, 286], covering mechanical properties, thermodynamic descriptors, and synthesis routes. The datasets were harmonized by standardizing column names, no… view at source ↗
Figure 90
Figure 90. Figure 90: Automated workflow of PackSynth. The user provides an input (SMILES/Name), which the agent uses to fetch data from databases like the Materials Project. The system then uses RDKit [288] to generate a 3D model, automatically prepares and runs the LAMMPS simulation, performs analysis (Energy, RMSD), and provides an interactive 3D visualization. The workflow begins with Input Processing and Database Integrat… view at source ↗
Figure 91
Figure 91. Figure 91: Workflow for data extraction and standardization. view at source ↗
Figure 92
Figure 92. Figure 92: Example of final standardized dataset produced by the automated LLM workflow. view at source ↗
Figure 93
Figure 93. Figure 93: (A) Overall workflow of the QSPHAgent framework for interpretable prediction of electronic view at source ↗
Figure 94
Figure 94. Figure 94: Overview of the GPT-OSS–based materials generation framework. Starting from a database view at source ↗
Figure 95
Figure 95. Figure 95: Table 1 — Method comparison (synthetic test set). view at source ↗
Figure 96
Figure 96. Figure 96: VERA workflow: from lab data upload to compliance validation and PDF report export. view at source ↗
Figure 97
Figure 97. Figure 97: The MaterEase Framework Architecture. The complete workflow from natural language query to materials discovery and visualization. knowledge bases and property schemas. Real-time Knowledge Updates: Implementing dynamic ontology learning to incorporate new research findings and maintain up-to-date knowledge bases continuously. En￾hanced Reasoning: Integrating causal knowledge graphs to enable multi-step cau… view at source ↗
Figure 98
Figure 98. Figure 98: The end-to-end workflow of MatSciAgent for scientific code generation. view at source ↗
Figure 99
Figure 99. Figure 99: Comparison of efficiency and accuracy across different LLM systems against our proposed coding view at source ↗
Figure 100
Figure 100. Figure 100: Illustration of using MOF-ChemUnity knowledge graph as long-term memory for AI agents. view at source ↗
Figure 101
Figure 101. Figure 101: An example workflow for modeling diffusivity of an organic molecule in water. view at source ↗
Figure 102
Figure 102. Figure 102: An example workflow for agent request from MATLAB CLI prompt view at source ↗
Figure 103
Figure 103. Figure 103: A) Workflow of instrument action database agent. B) Example prompt and agent response. view at source ↗
Figure 104
Figure 104. Figure 104: Workflow of AIssistant with MC-NEST and ChemCrow tools. Results The AIssistant framework integrates specialized tools like MC-NEST [331] for hypothesis generation and ChemCrow [332] for interactive refinement, enabling iterative cycles of AI-suggested hypotheses and human validation. Quantitative evaluation metrics were utilized to assess the alignment of AI-assisted outcomes with human reasoning. The hi… view at source ↗
Figure 105
Figure 105. Figure 105: Overview of the SKY Workflow for Materials Synthesis Planning view at source ↗
Figure 106
Figure 106. Figure 106: Participants collaborating at various physical hub locations during the 2025 LLM Hackathon view at source ↗
Figure 107
Figure 107. Figure 107: Hybrid nature and the sponsors of the 2025 LLM Hackathon for Applications in Materials view at source ↗
read the original abstract

Large language models (LLMs) are rapidly changing how researchers in materials science and chemistry discover, organize, and act on scientific knowledge. This paper analyzes a broad set of community-developed LLM applications in an effort to identify emerging patterns in how these systems can be used across the scientific research lifecycle. We organize the projects into two complementary categories: Knowledge Infrastructure, systems that structure, retrieve, synthesize, and validate scientific information; and Action Systems, systems that execute, coordinate, or automate scientific work across computational and experimental environments. The submissions reveal a shift from single-purpose LLM tools toward integrated, multi-agent workflows that combine retrieval, reasoning, tool use, and domain-specific validation. Prominent themes include retrieval-augmented generation as grounding infrastructure, persistent structured knowledge representations, multimodal and multilingual scientific inputs, and early progress toward laboratory-integrated closed-loop systems. Together, these results suggest that LLMs are evolving from general-purpose assistants into composable infrastructure for scientific reasoning and action. This work provides a community snapshot of that transition and a practical taxonomy for understanding emerging LLM-enabled workflows in materials science and chemistry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript summarizes outcomes from the 2025 LLM Hackathon focused on materials science and chemistry. It partitions the submitted projects into two categories—Knowledge Infrastructure (systems for structuring, retrieving, synthesizing, and validating scientific information) and Action Systems (systems for executing, coordinating, or automating scientific tasks)—and extracts recurring themes including retrieval-augmented generation, persistent structured knowledge representations, multimodal/multilingual inputs, and early closed-loop laboratory integrations. The central interpretive claim is that these patterns indicate LLMs are transitioning from general-purpose assistants to composable infrastructure for scientific reasoning and action.

Significance. If the reported patterns accurately capture the hackathon submissions, the work supplies a practical taxonomy and community snapshot that could help researchers navigate emerging LLM workflows. The two-category organization is internally consistent with the described themes and provides a clear organizing lens. However, the manuscript contains no quantitative metrics, error bars, or comparative benchmarks against the wider literature, limiting its ability to support stronger claims about field-wide evolution.

major comments (1)
  1. [Abstract] Abstract and concluding section: the statement that the submissions 'suggest that LLMs are evolving from general-purpose assistants into composable infrastructure' rests on self-selected, short-timeline hackathon prototypes. No comparison is provided to non-hackathon deployments or the broader literature on LLM use in materials science, so the inference to a general trajectory is not load-bearing on the data presented.
minor comments (3)
  1. The manuscript would benefit from an explicit limitations subsection that quantifies the number of projects per category, notes the self-selection bias, and discusses how hackathon constraints (e.g., reliance on LangChain/AutoGen) may shape the observed themes.
  2. Project descriptions should include direct links or DOIs to the original submissions or code repositories to enable reproducibility and follow-up by readers.
  3. Figure captions and table headings could be expanded to clarify how individual projects map onto the two-category taxonomy.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review of our manuscript. We agree that the central interpretive claim requires more cautious framing and have revised the abstract and conclusion accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract and concluding section: the statement that the submissions 'suggest that LLMs are evolving from general-purpose assistants into composable infrastructure' rests on self-selected, short-timeline hackathon prototypes. No comparison is provided to non-hackathon deployments or the broader literature on LLM use in materials science, so the inference to a general trajectory is not load-bearing on the data presented.

    Authors: We agree that the claim as originally phrased overreaches the scope of the hackathon data. The manuscript is a community snapshot of submitted projects rather than a field-wide survey. In the revised version we have changed the abstract and conclusion to state that the observed patterns 'illustrate emerging trends in the hackathon submissions toward composable multi-agent systems,' explicitly noting the self-selected and prototype nature of the entries. We have also added citations to recent reviews on LLM applications in materials science and chemistry to situate the hackathon observations within the broader literature. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive summary of external hackathon submissions

full rationale

The paper is a post-hoc community report summarizing submitted hackathon projects into Knowledge Infrastructure and Action Systems categories. It contains no derivations, equations, predictions, fitted parameters, or mathematical claims. The central inference about LLMs evolving into composable infrastructure is drawn from observed patterns in external submissions rather than from any self-referential fitting or self-citation chain. No load-bearing steps reduce to inputs by construction, and the analysis is self-contained against external benchmarks with no ansatz smuggling or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The report rests on the domain assumption that hackathon submissions reflect genuine emerging patterns in LLM use; no free parameters, new entities, or additional axioms are introduced beyond standard descriptive analysis.

pith-pipeline@v0.9.0 · 7436 in / 999 out tokens · 34539 ms · 2026-05-08T17:33:39.992374+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

299 extracted references · 32 canonical work pages · 5 internal anchors

  1. [1]

    Enabling large language models for real-world materials discovery,

    S. Miret and N. A. Krishnan, “Enabling large language models for real-world materials discovery,” Nature Machine Intelligence, vol. 7, no. 7, pp. 991–998, 2025

  2. [2]

    An automatic end-to-end chemical synthesis development platform powered by large language models,

    Y. Ruan, C. Lu, N. Xu, Y. He, Y. Chen, J. Zhang, J. Xuan, J. Pan, Q. Fang, H. Gao,et al., “An automatic end-to-end chemical synthesis development platform powered by large language models,” Nature communications, vol. 15, no. 1, p. 10160, 2024

  3. [3]

    Comproscanner: a multi-agent based framework forcomposition-propertystructureddataextractionfromscientificliterature,

    A. Roy, E. Grisan, J. Buckeridge, and C. Gattinoni, “Comproscanner: a multi-agent based framework forcomposition-propertystructureddataextractionfromscientificliterature,”Digital Discovery, vol.5, pp. 1794–1808, 2026

  4. [4]

    Chemnlp: a natural language-processing-based library for materials chemistry text data,

    K. Choudhary and M. L. Kelley, “Chemnlp: a natural language-processing-based library for materials chemistry text data,”The Journal of Physical Chemistry C, vol. 127, no. 35, pp. 17545–17555, 2023

  5. [5]

    Language models enable data- augmented synthesis planning for inorganic materials,

    T. Prein, E. Pan, J. Jehkul, S. Weinmann, E. Olivetti, and J. L. Rupp, “Language models enable data- augmented synthesis planning for inorganic materials,”ACS Applied Materials & Interfaces, vol. 17, no. 51, pp. 69221–69233, 2025

  6. [6]

    Large language models for reticular chemistry,

    Z. Zheng, N. Rampal, T. J. Inizan, C. Borgs, J. T. Chayes, and O. M. Yaghi, “Large language models for reticular chemistry,”Nature Reviews Materials, vol. 10, no. 5, pp. 369–381, 2025

  7. [7]

    Towards foundation models for materials science: The open matsci ml toolkit,

    K. L. K. Lee, C. Gonzales, M. Spellings, M. Galkin, S. Miret, and N. Kumar, “Towards foundation models for materials science: The open matsci ml toolkit,” inProceedings of the SC’23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 51–59, 2023

  8. [8]

    Understanding hackathons for science: Collaboration, affor- dances, and outcomes,

    E. P. P. Pe-Than and J. D. Herbsleb, “Understanding hackathons for science: Collaboration, affor- dances, and outcomes,” inInternational Conference on Information, pp. 27–37, Springer, 2019

  9. [9]

    How to support newcomers in scientific hackathons-an action research study on expert mentoring,

    A. Nolte, L. B. Hayden, and J. D. Herbsleb, “How to support newcomers in scientific hackathons-an action research study on expert mentoring,”Proceedings of the ACM on Human-Computer Interaction, vol. 4, no. CSCW1, pp. 1–23, 2020

  10. [10]

    Hack your organizational innovation: literature review and integrative model for running hackathons,

    B. Heller, A. Amir, R. Waxman, and Y. Maaravi, “Hack your organizational innovation: literature review and integrative model for running hackathons,”Journal of Innovation and Entrepreneurship, vol. 12, no. 1, p. 6, 2023

  11. [11]

    Organizing across disciplines to tackle shared computational challenges,

    W. Treyde, A. Kwiatkowski, J. Achterberg, D. Akarca, M. Buttenschoen, R. T. Byrne, K. Didi, K. Kordova, J. Lála, J. Langford,et al., “Organizing across disciplines to tackle shared computational challenges,”Patterns, vol. 7, no. 4, 2026

  12. [12]

    14 examples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon,

    K. M. Jablonka, Q. Ai, A. Al-Feghali, S. Badhwar, J. D. Bocarsly, A. M. Bran, S. Bringuier, L. C. Brinson, K. Choudhary, D. Circi,et al., “14 examples of how llms can transform materials science and chemistry: a reflection on a large language model hackathon,”Digital discovery, vol. 2, no. 5, pp. 1233–1250, 2023. 142

  13. [13]

    Reflections from the 2024 large language model (llm) hackathon for applications in materials science and chemistry,

    Y. Zimmermann, A. Bazgir, Z. Afzal, F. Agbere, Q. Ai, N. Alampara, A. Al-Feghali, M. Ansari, D. An- typov, A. Aswad, J. Bai, V. Baibakova, D. D. Biswajeet, E. Bitzek, J. D. Bocarsly, A. Borisova, A. M. Bran, L. C. Brinson, M. M. Calderon, A. Canalicchio, V. Chen, Y. Chiang, D. Circi, B. Charmes, V. Chaudhary, Z. Chen, M.-H. Chiu, J. Clymo, K. Dabhadkar, N...

  14. [14]

    Large language models for chemistry robotics,

    N. Yoshikawa, M. Skreta, K. Darvish, S. Arellano-Rubach, Z. Ji, L. Bjørn Kristensen, A. Z. Li, Y. Zhao, H. Xu, A. Kuramshin,et al., “Large language models for chemistry robotics,”Autonomous Robots, vol. 47, no. 8, pp. 1057–1086, 2023

  15. [15]

    Autonomous materials synthesis laboratories: Integrating artificial intel- ligence with advanced robotics for accelerated discovery,

    L. Duo, Y. Hao, and J. He, “Autonomous materials synthesis laboratories: Integrating artificial intel- ligence with advanced robotics for accelerated discovery,”ChemRxiv preprint, 2025

  16. [16]

    Agents for self-driving laboratories applied to quantum computing,

    S. Cao, Z. Zhang, M. Alghadeer, S. D. Fasciati, M. Piscitelli, M. Bakr, P. Leek, and A. Aspuru-Guzik, “Agents for self-driving laboratories applied to quantum computing,”arXiv preprint arXiv:2412.07978, 2024

  17. [17]

    Benchmarks and metrics for evaluations of code generation: A critical review,

    D. G. Paul, H. Zhu, and I. Bayley, “Benchmarks and metrics for evaluations of code generation: A critical review,” in2024 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 87–94, IEEE, 2024

  18. [18]

    Are large language models superhuman chemists?arXiv preprint arXiv:2404.01475,

    A. Mirza, N. Alampara, S. Kunchapu, M. Ríos-García, B. Emoekabu, A. Krishnan, T. Gupta, M. Schilling-Wilhelmi, M. Okereke, A. Aneesh,et al., “Are large language models superhuman chemists?,”arXiv preprint arXiv:2404.01475, 2024

  19. [19]

    Rational design of high-entropy ceramics based on machine learning – a critical review,

    J. Zhang, X. Xiang, B. Xu, S. Huang, Y. Xiong, S. Ma, H. Fu, Y. Ma, H. Chen, Z. Wu, and S. Zhao, “Rational design of high-entropy ceramics based on machine learning – a critical review,”Current Opinion in Solid State and Materials Science, vol. 27, p. 101057, 4 2023

  20. [20]

    Web of science

    Clarivate Analytics, “Web of science.”https://www.webofscience.com, 2025. Accessed: 2025-11-05

  21. [21]

    Mistral-large:123b-instruct-2407-q4_0

    Mistral AI, “Mistral-large:123b-instruct-2407-q4_0.”https://mistral.ai/news/mistral-large/,

  22. [22]

    Large Language Model by Mistral AI

  23. [23]

    Gpt-oss:120b

    Open Source Science (OSS), “Gpt-oss:120b.”https://huggingface.co/oss/gpt-oss-120b, 2024. Open large language model for scientific applications

  24. [24]

    Mendeleev – a python resource for properties of chemical elements, ions and isotopes,

    M. Szymański, R. V. Vlasov,et al., “Mendeleev – a python resource for properties of chemical elements, ions and isotopes,”Journal of Open Source Software, vol. 3, no. 32, p. 1113, 2018

  25. [25]

    The nomad laboratory – fair data infrastructure for materials science

    NOMAD Laboratory Consortium, “The nomad laboratory – fair data infrastructure for materials science.”https://nomad-lab.eu, 2023. FAIR data platform for materials science

  26. [26]

    Factsage thermochemical software and databases

    Thermfact/CRCT and GTT-Technologies, “Factsage thermochemical software and databases.”https: //www.factsage.com, 2022. Thermochemical calculation and database system. 143

  27. [27]

    Synthesis and neutron powder diffraction study of the superconductor HgBa2Ca2Cu3O8 +δby Tl substitution,

    P. Dai, B. C. Chakoumakos, G. F. Sun, K. W. Wong, Y. Xin, and D. F. Lu, “Synthesis and neutron powder diffraction study of the superconductor HgBa2Ca2Cu3O8 +δby Tl substitution,”Physica C: Superconductivity, vol. 243, pp. 201–206, Mar. 1995

  28. [28]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Google, “Gemini: A Family of Highly Capable Multimodal Models,”arXiv preprint arXiv:2312.11805v5, 2025

  29. [29]

    Exploration of crystal chemical space using text-guided generative artificial intelligence,

    H. Park, A. Onwuli, and A. Walsh, “Exploration of crystal chemical space using text-guided generative artificial intelligence,”Nature Communications, vol. 16, p. 4379, 2025

  30. [30]

    The ai revolution in science,

    S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing, S. Milojević, and et al., “The ai revolution in science,”Science, vol. 359, no. 6379, p. eaao0185, 2018

  31. [31]

    Language models are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language models are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

  32. [32]

    On the dangers of stochastic par- rots: Can language models be too big?,

    E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic par- rots: Can language models be too big?,” inProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623, 2021

  33. [33]

    Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning,

    A. Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran, “Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning,”Journal of Artificial Intelligence Research, vol. 74, pp. 895–946, 2022

  34. [35]

    The dawn after the dark: An empirical study on factuality hallucination in large language models,

    J. Li, J. Chen, R. Ren, X. Cheng, W. X. Zhao, J.-Y. Nie, and J.-R. Wen, “The dawn after the dark: An empirical study on factuality hallucination in large language models,”arXiv preprint arXiv:2401.03205, 2024

  35. [36]

    Grounding llm reasoning with knowledge graphs,

    A. Amayuelas, J. Sain, S. Kaur, and C. Smiley, “Grounding llm reasoning with knowledge graphs,” 2025

  36. [37]

    Making retrieval-augmented language models robust to irrelevant context,

    O. Yoran, T. Wolfson, O. Ram, and J. Berant, “Making retrieval-augmented language models robust to irrelevant context,” inThe Twelfth International Conference on Learning Representations, 2024

  37. [38]

    Roadmap on electronic structure codes in the exascale era,

    V. Gavini, S. Baroni, V. Blum, D. R. Bowler, A. Buccheri, J. R. Chelikowsky, S. Das, W. Dawson, P. Delugas, M. Dogan, C. Draxl, G. Galli, L. Genovese, P. Giannozzi, M. Giantomassi, X. Gonze, M. Govoni, F. Gygi, A. Gulans, J. M. Herbert, S. Kokott, T. D. Kühne, K.-H. Liou, T. Miyazaki, P. Motamarri, A. Nakata, J. E. Pask, C. Plessl, L. E. Ratcliff, R. M. R...

  38. [39]

    Flexibilities of wavelets as a computational basis set for large-scale electronic structure calculations,

    L. E. Ratcliff, W. Dawson, G. Fisicaro, D. Caliste, S. Mohr, A. Degomme, B. Videau, V. Cristiglio, M. Stella, M. D’Alessandro, S. Goedecker, T. Nakajima, T. Deutsch, and L. Genovese, “Flexibilities of wavelets as a computational basis set for large-scale electronic structure calculations,”The Journal of Chemical Physics, vol. 152, p. 194110, 05 2020

  39. [40]

    BigDFT software package

    BigDFT developers, “BigDFT software package.”https://l_sim.gitlab.io/bigdft-suite, 2018. A wavelet-based Density Functional Theory code. Accessed: October 2025

  40. [41]

    Exploratory data science on supercomputers for quantum mechanical calculations,

    W. Dawson, L. Beal, L. E. Ratcliff, M. Stella, T. Nakajima, and L. Genovese, “Exploratory data science on supercomputers for quantum mechanical calculations,”Electronic Structure, vol. 6, p. 027003, Jun 2024. 144

  41. [42]

    remotemanager

    remotemanager developers, “remotemanager.”https://gitlab.com/l_sim/remotemanager, 2023. Modular serialisation and management package for handling the running of functions on remote ma- chines. Accessed: October 2025

  42. [43]

    A chemical language model for molecular taste prediction,

    Y. Zimmermann, L. Sieben, H. Seng, P. Pestlin, and F. Görlich, “A chemical language model for molecular taste prediction,”Npj Sci. Food, vol. 9, p. 122, July 2025

  43. [44]

    Magnetstein: An open-source tool for quantitative nmr mixture analysis robust to low resolution, distorted lineshapes, and peak shifts,

    B. Domżał, E. K. Nawrocka, D. Gołowicz, M. A. Ciach, B. Miasojedow, K. Kazimierczuk, and A. Gam- bin, “Magnetstein: An open-source tool for quantitative nmr mixture analysis robust to low resolution, distorted lineshapes, and peak shifts,”Analytical Chemistry, vol. 96, no. 1, pp. 188–196, 2024

  44. [45]

    Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry,

    S. Kuhn, H. Kolshorn, C. Steinbeck, and N. Schlörer, “Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry,”Magnetic Resonance in Chemistry, vol. 62, no. 2, pp. 74–83, 2024

  45. [46]

    Nmrextractor: Lever- aging large language models to construct an experimental nmr database from open-source scientific publications,

    Q. Wang, W. Zhang, M. Chen, X. Li, Z. Xiong, J. Xiong, Z. Fu, and M. Zheng, “Nmrextractor: Lever- aging large language models to construct an experimental nmr database from open-source scientific publications,”Chemical Science, 2025

  46. [47]

    Reactiont5: Apre-trainedtransformermodelforaccuratechemicalreaction prediction with limited data,

    T.SagawaandR.Kojima, “Reactiont5: Apre-trainedtransformermodelforaccuratechemicalreaction prediction with limited data,”Journal of Cheminformatics, vol. 17, p. 126, 2025

  47. [48]

    Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

    A. Abid, A. Abdalla, A. Abid, D. Khan, A. Alfozan, and J. Zou, “Gradio: Hassle-free sharing and testing of ML models in the wild,”arXiv preprint arXiv:1906.02569, June 2019

  48. [49]

    DeepSeek-V3 technical report,

    DeepSeek-AI, A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, D. Dai, D. Guo, D. Yang, D. Chen, D. Ji, E. Li, F. Lin, F. Dai, F. Luo, G. Hao, G. Chen, G. Li, H. Zhang, H. Bao, H. Xu, H. Wang, H. Zhang, H. Ding, H. Xin, H. Gao, H. Li, H. Qu, J. L. Cai, J. Liang, J. Guo, J. Ni, J. Li, J. Wang, J. Chen, J. Chen, J. Yuan, J...

  49. [50]

    A survey on data collection for machine learning: A big data - ai integration perspective,

    Y. Roh, G. Heo, and S. E. Whang, “A survey on data collection for machine learning: A big data - ai integration perspective,”IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1328–1347, 2021

  50. [51]

    Structural and optical properties of highly hydroxylated fullerenes: stability of molecular domains on the c60 surface,

    R. Guirado-López and M. Rincón, “Structural and optical properties of highly hydroxylated fullerenes: stability of molecular domains on the c60 surface,”The Journal of chemical physics, vol. 125, no. 15, 2006

  51. [52]

    Functionalized fullerene: a key driver for high performance inverted perovskite solar cell,

    X. Zhang, J. Zhang, D. Liu, and W. Zhang, “Functionalized fullerene: a key driver for high performance inverted perovskite solar cell,”Journal of Energy Chemistry, 2025

  52. [53]

    Uma: A family of universal models for atoms,

    B. M. Wood, M. Dzamba, X. Fu, M. Gao, M. Shuaibi, L. Barroso-Luque, K. Abdelmaqsoud, V. Gharakhanyan, J. R. Kitchin, D. S. Levine, K. Michel, A. Sriram, T. Cohen, A. Das, A. Rizvi, S. J. Sahoo, Z. W. Ulissi, and C. L. Zitnick, “Uma: A family of universal models for atoms,” 2025. 145

  53. [54]

    crewai: Framework for orchestrating role-playing, autonomous ai agents,

    “crewai: Framework for orchestrating role-playing, autonomous ai agents,” 2025. Accessed: 2025-07-11

  54. [55]

    DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning,

    D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, X. Zhang, X. Yu, Y. Wu, Z. F. Wu, Z. Gou, Z. Shao, Z. Li, Z. Gao, A. Liu, B. Xue, B. Wang, B. Wu, B. Feng, C. Lu, C. Zhao, C. Deng, C. Ruan, D. Dai, D. Chen, D. Ji, E. Li, F. Lin, F. Dai, F. Luo, G. Hao, G. Chen, G. Li, H. Zhang, H. Xu, H. Ding, H. Gao, H. Qu, H. Li, J. Gu...

  55. [56]

    Training a scientific reasoning model for chemistry,

    S. M. Narayanan, J. D. Braza, R.-R. Griffiths, A. Bou, G. Wellawatte, M. C. Ramos, L. Mitchener, S. G. Rodriques, and A. D. White, “Training a scientific reasoning model for chemistry,” 2025

  56. [57]

    Thought anchors: Which llm reasoning steps matter?,

    P. C. Bogdan, U. Macar, N. Nanda, and A. Conmy, “Thought anchors: Which llm reasoning steps matter?,” 2025

  57. [58]

    Introductory tutorials for simulating protein dynamics with gromacs,

    J. A. Lemkul, “Introductory tutorials for simulating protein dynamics with gromacs,”The Journal of Physical Chemistry B, vol. 128, no. 39, pp. 9418–9435, 2024

  58. [59]

    Streamlit: The fastest way to build data apps,

    S. Inc., “Streamlit: The fastest way to build data apps,” 2025. Python library for creating interactive web apps

  59. [60]

    Lammps - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales,

    A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. Doak, M. J. D’Evelyn, D. W. Engel, M. Feng, O. Gissinger, A. Hackl, H. Heinz, O. Homeyer, S. Hou, M. Ihm, G. Kresse, A. Kucukel, D. Lee, T. D. Li, Z. Y. Ma, D. M. Makarov, L. Martinez, D. M. Merz, J. A. Miller, K. A. Min, C. H. Moore, R. E. Moore, T. Müller, F....

  60. [61]

    Gromacs: Amessage-passingparallelmolecular dynamics implementation,

    H.J.Berendsen, D.vanderSpoel, andR.vanDrunen, “Gromacs: Amessage-passingparallelmolecular dynamics implementation,”Computer physics communications, vol. 91, no. 1-3, pp. 43–56, 1995

  61. [62]

    Amber 2025,

    D. A. Case, H. M. Aktulga, K. Belfon, I. Y. Ben-Shalom, J. T. Berryman, S. R. Brozell, F. S. Carvahol, D. S. Cerutti, T. E. Cheatham, G. A. Cisneros, V. W. D. Cruzeiro, T. A. Darden, N. Forouzesh, M. Ghazimirsaeed, G. Giambasu, T. Giese, M. K. Gilson, H. Gohlke, A. W. Goetz, J. Harris, Z. Huang, S. Izadi, S. A. Izmailov, K. Kasavajhala, M. C. Kaymak, I. K...

  62. [63]

    Chatgpt (openai api),

    OpenAI, “Chatgpt (openai api),” 2025. Large language model / AI service

  63. [64]

    Oxdna. org: a public webserver for coarse-grained simulations of dna and rna nanostructures,

    E. Poppleton, R. Romero, A. Mallya, L. Rovigatti, and P. Šulc, “Oxdna. org: a public webserver for coarse-grained simulations of dna and rna nanostructures,”Nucleic acids research, vol. 49, no. W1, pp. W491–W498, 2021

  64. [65]

    Hoomd-blue: A python package for high-performance molecular dynamics and hard particle monte carlo simulations,

    J. A. Anderson, J. Glaser, and S. C. Glotzer, “Hoomd-blue: A python package for high-performance molecular dynamics and hard particle monte carlo simulations,”Computational Materials Science, vol. 173, p. 109363, 2020

  65. [66]

    Concepts for a semantically acces- sible materials data space: Overview over specific implementations in materials science,

    B. Bayerlein, J. Waitelonis, H. Birkholz, M. Jung, M. Schilling, P. v. Hartrott, M. Bruns, J. Schaarschmidt, K. Beilke, M. Mutz, V. Nebel, V. Königer, L. Beran, T. Kraus, A. Vyas, L. Vogt, M. Blum, B. Ell, Y.-F. Chen, T. Waurischk, A. Thomas, A. R. Durmaz, S. Ben Hassine, C. Fresemann, G. Dziwis, H. Beygi Nasrabadi, T. Hanke, M. Telong, S. Pirskawetz, M. ...

  66. [67]

    Seamless science: Lifting experimental mechanical testing lab data to an interoperable semantic representation,

    M. Schilling, S. Bruns, B. Bayerlein, J. Kryeziu, J. Schaarschmidt, J. Waitelonis, P. Dolabella Portella, and K. Durst, “Seamless science: Lifting experimental mechanical testing lab data to an interoperable semantic representation,”Advanced Engineering Materials, vol. 27, no. 8, p. 2401527, 2025

  67. [68]

    Mulms: A multi-layer annotated text corpus for information extraction in the materials science domain,

    T. P. Schrader, M. Finco, S. Grünewald, F. Hildebrand, and A. Friedrich, “Mulms: A multi-layer annotated text corpus for information extraction in the materials science domain,”arXiv preprint arXiv:2310.15569, 2023

  68. [69]

    Pmd core ontology: Achieving semantic interoperability in materials science,

    B. Bayerlein, M. Schilling, H. Birkholz, M. Jung, J. Waitelonis, L. Mädler, and H. Sack, “Pmd core ontology: Achieving semantic interoperability in materials science,”Materials & Design, vol. 237, p. 112603, 2024

  69. [70]

    Bridging microscopy with molecular dynamics and quantum simulations: an atomai based pipeline,

    A. Ghosh, M. Ziatdinov, O. Dyck, B. G. Sumpter, and S. V. Kalinin, “Bridging microscopy with molecular dynamics and quantum simulations: an atomai based pipeline,”npj Computational Materi- als, vol. 8, no. 1, p. 74, 2022

  70. [71]

    Atomai: a deep learning framework for analysis of image and spectroscopy data in (scanning) transmission electron microscopy and beyond,

    M. Ziatdinov, A. Ghosh, T. Wong, and S. V. Kalinin, “Atomai: a deep learning framework for analysis of image and spectroscopy data in (scanning) transmission electron microscopy and beyond,”arXiv preprint arXiv:2105.07485, 2021

  71. [72]

    Localization and segmentation of atomic columns in supported nanoparticles for fast scanning transmission electron microscopy,

    H. Eliasson and R. Erni, “Localization and segmentation of atomic columns in supported nanoparticles for fast scanning transmission electron microscopy,”npj Computational Materials, vol. 10, no. 1, p. 168, 2024

  72. [73]

    Microscopy study of structural evolution in epitaxial licoo2 positive electrode films during electrochemical cycling,

    H. Tan, S. Takeuchi, K. K. Bharathi, I. Takeuchi, , and L. A. Beddersky, “Microscopy study of structural evolution in epitaxial licoo2 positive electrode films during electrochemical cycling,”ACS Applied Materials & Interfaces, vol. 8, no. 10, pp. 6727–6735, 2016

  73. [74]

    Deep learning enabled strain mapping of single-atom defects in two- dimensional transition metal dichalcogenides with sub-picometer precision,

    C. Lee, A. Khan, D. Luo, T. P. Santos, C. Shi, B. E. Janicek, S. Kang, W. Zhu, N. A. Sobh, A. Schleife, B. K. Clark, and P. Huang, “Deep learning enabled strain mapping of single-atom defects in two- dimensional transition metal dichalcogenides with sub-picometer precision,”Nano Letters, vol. 20, no. 5, pp. 3369–3377, 2020

  74. [75]

    Mechanistic insights into potassium- assistant thermal-catalytic oxidation of soot over single-crystalline srtio3 nanotubes with ordered meso- pores,

    F. Fang, X. L. F. Xu, C. Chen, N. Feng, Y. Jiang, , and J. Huang, “Mechanistic insights into potassium- assistant thermal-catalytic oxidation of soot over single-crystalline srtio3 nanotubes with ordered meso- pores,”ACS Catalysis, vol. 15, no. 2, pp. 789–799, 2025

  75. [76]

    arXiv preprint arXiv:2401.00096 , year=

    I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kovács, J. Riebesell, X. R. Advincula, M. Asta, M. Avaylon, W. J. Baldwin,et al., “A foundation model for atomistic materials chemistry,”arXiv preprint arXiv:2401.00096, 2023. 147

  76. [77]

    New substructure filters for removal of pan assay interference com- pounds (pains) from screening libraries and for their exclusion in bioassays,

    J. B. Baell and G. A. Holloway, “New substructure filters for removal of pan assay interference com- pounds (pains) from screening libraries and for their exclusion in bioassays,”Journal of Medicinal Chemistry, vol. 53, no. 7, pp. 2719–2740, 2010

  77. [78]

    Chemistry: Chemical con artists foil drug discovery,

    J. B. Baell and M. Walters, “Chemistry: Chemical con artists foil drug discovery,”Nature, vol. 513, p. 481–483, 2014

  78. [79]

    Chemberta: Large-scale self-supervised pretraining for molecular property prediction,

    S. Chithrananda, G. Grand, and B. Ramsundar, “Chemberta: Large-scale self-supervised pretraining for molecular property prediction,” 10 2020

  79. [80]

    Neural scaling of deep chemical models,

    N. Frey, R. Soklaski, S. Axelrod, S. Samsi, R. Gómez-Bombarelli, C. Coley, and V. Gadepally, “Neural scaling of deep chemical models,”Nature Machine Intelligence, vol. 5, pp. 1–9, 10 2023

  80. [81]

    Txgemma: Efficient and agentic llms for therapeutics

    E. Wang, S. Schmidgall, P. F. Jaeger, F. Zhang, R. Pilgrim, Y. Matias, J. Barral, D. Fleet, and S. Azizi, “Txgemma: Efficient and agentic llms for therapeutics,”arXiv preprint arXiv:2504.06196, 2025

Showing first 80 references.