Toward Exascale AI for Science: A Scalable AI Skill for Autonomous Microkinetics Discovery
Pith reviewed 2026-06-30 07:59 UTC · model grok-4.3
The pith
An AI framework using agentic workflows and surrogate models automates microkinetics discovery while recovering from simulation failures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a scalable AI-driven framework that advances autonomous scientific discovery by combining agentic workflow automation, high-performance computing, and scientific surrogate models. Using microkinetics discovery as a testbed, the work demonstrates how AI can reduce expert intervention, recover from failed simulations, and systematically evaluate surrogate model reliability. This study shows how AI skills can transform complex domain workflows into robust, scalable capabilities for next-generation materials research.
What carries the argument
Agentic workflow automation integrated with surrogate models that stand in for full microkinetics simulations and allow the system to manage failures and assess its own substitutes.
If this is right
- Expert intervention drops in complex simulation workflows because the agentic system manages the steps end to end.
- Failed simulations are recovered from without manual restarts or adjustments.
- Surrogate model reliability receives systematic checks inside the automated loop.
- Complex domain workflows become robust, scalable capabilities suitable for large-scale materials research.
Where Pith is reading between the lines
- The same pattern of agentic automation plus surrogates could be tested on other simulation-heavy tasks such as molecular dynamics or quantum chemistry calculations.
- If the approach scales cleanly, it would remove human bottlenecks that currently limit how many candidate materials can be evaluated at exascale.
- New classes of failure could appear when the agent itself makes decisions about which surrogate to trust, requiring separate monitoring methods.
Load-bearing premise
Agentic workflow automation combined with surrogate models can reliably handle complex domain-specific simulations at exascale without introducing new failure modes or requiring substantial custom tuning.
What would settle it
A run of the framework on a microkinetics task where repeated simulation failures occur and the system either fails to recover autonomously or its surrogate reliability checks do not match results from direct full simulations.
Figures
read the original abstract
We present a scalable AI-driven framework that advances autonomous scientific discovery by combining agentic workflow automation, high-performance computing, and scientific surrogate models. Using microkinetics discovery as a testbed, the work demonstrates how AI can reduce expert intervention, recover from failed simulations, and systematically evaluate surrogate model reliability. This study shows how AI skills can transform complex domain workflows into robust, scalable capabilities for next-generation materials research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a scalable AI-driven framework combining agentic workflow automation, high-performance computing, and scientific surrogate models for autonomous microkinetics discovery. Using microkinetics as a testbed, it claims to demonstrate how AI reduces expert intervention, recovers from failed simulations, and systematically evaluates surrogate model reliability, thereby transforming complex domain workflows into robust capabilities for exascale materials research.
Significance. If the central claims are substantiated with quantitative evidence, the work could advance autonomous AI for science by automating complex simulation workflows and addressing reliability at scale. The integration of agentic systems with surrogates for failure recovery and reliability evaluation represents a potentially valuable direction for exascale computing in materials science.
major comments (2)
- [Abstract] Abstract: The central claim that the framework reduces expert intervention, recovers from failed simulations, and evaluates surrogate reliability is not supported by any quantitative metrics, derivations, error analysis, or specific results (e.g., recovery success rates, intervention hours saved, or scaling behavior). This absence is load-bearing because the demonstration reduces to a framework description without evidence that the agentic + surrogate combination avoids new failure modes or requires no substantial custom tuning.
- [Abstract] Abstract: The assumption that agentic workflow automation combined with surrogate models can reliably handle complex domain-specific simulations at exascale is presented without addressing potential new error modes such as incorrect surrogate invocation or simulation parameter drift. This is load-bearing for the reliability and scalability claims.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We agree that strengthening the quantitative support for our claims will improve the manuscript and address the concerns about evidence for the framework's effectiveness.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the framework reduces expert intervention, recovers from failed simulations, and evaluates surrogate reliability is not supported by any quantitative metrics, derivations, error analysis, or specific results (e.g., recovery success rates, intervention hours saved, or scaling behavior). This absence is load-bearing because the demonstration reduces to a framework description without evidence that the agentic + surrogate combination avoids new failure modes or requires no substantial custom tuning.
Authors: We acknowledge this point. While the manuscript includes demonstrations through case studies in microkinetics discovery, specific quantitative metrics such as success rates for recovery and scaling behavior are not explicitly highlighted in the abstract. In the revision, we will update the abstract to include key quantitative results from our experiments and add a dedicated subsection on performance metrics to substantiate the claims. revision: yes
-
Referee: [Abstract] Abstract: The assumption that agentic workflow automation combined with surrogate models can reliably handle complex domain-specific simulations at exascale is presented without addressing potential new error modes such as incorrect surrogate invocation or simulation parameter drift. This is load-bearing for the reliability and scalability claims.
Authors: The manuscript describes mechanisms for handling failures and evaluating surrogate reliability, but we agree that explicit discussion of potential new error modes introduced by the AI components is important. We will revise the manuscript to include an analysis of these potential issues and how the framework mitigates them, drawing from our testbed experiments. revision: yes
Circularity Check
No circularity: framework description contains no equations or self-referential derivations
full rationale
The paper describes an AI-driven framework combining agentic automation, HPC, and surrogate models, tested on microkinetics discovery. The abstract and available text contain no equations, fitted parameters, predictions derived from inputs, or self-citations invoked as load-bearing uniqueness theorems. Claims about reducing expert intervention and recovering from failures are presented as demonstrations rather than mathematical derivations that reduce to their own inputs by construction. No load-bearing steps match any of the enumerated circularity patterns. This is the expected non-finding for a methods/framework paper without algebraic content.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Agent Skills Working Group. 2026. Agent skills specification: portable ai agent expertise via skill.md. The official standard for progressive-disclosure pro- cedural capabilities across AI coding agents. Retrieved June 18, 2026 from https://agentskills.io
2026
-
[2]
Anthropic and Agentic AI Foundation. 2024. Model context protocol (mcp): an open standard for integrating ai models with external tools and data. Main- tained by the Agentic AI Foundation under the Linux Foundation. Retrieved June 18, 2026 from https://modelcontextprotocol.io
2024
-
[3]
Kovács, Albert Musaelian, Gregor N
Ilyes Batatia, Simon Batzner, Dávid P. Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, and Gábor Csányi. 2025. The design space of 𝐸( 3)-equivariant atom-centred interatomic potentials. Nature Machine Intelligence, 7, 1, 56–67. doi:10.1038/s42256-024-00956-x
-
[4]
Ilyes Batatia, Dávid Péter Kovács, Gregor N. C. Simm, Christoph Ortner, and Gábor Csányi. 2022. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. InAdvances in Neural Information Processing Systems. Vol. 35, 11423–11436. Toward Exascale AI for Science: A Scalable AI Skill for Autonomous Microkinetics Discov...
2022
-
[5]
Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E
Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky
-
[6]
doi:10.1038/s41467-0 22-29939-5
E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials.Nature Communications, 13, 2453. doi:10.1038/s41467-0 22-29939-5
-
[7]
Peter E. Blöchl. 1994. Projector augmented-wave method.Physical Review B, 50, (Dec. 1994), 17953–17979, 24, (Dec. 1994). doi:10.1103/PhysRevB.50.17953
-
[8]
URL https://www.nature.com/articles/ s41586-023-06792-0
Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. 2023. Au- tonomous chemical research with large language models.Nature, 624, 7992, (Dec. 2023), 570–578. doi:10.1038/s41586-023-06792-0
-
[9]
Roberto Car and Michele Parrinello. 1985. Unified approach for molecular dynamics and density-functional theory.Physical Review Letters, 55, (Nov. 1985), 2471–2474, 22, (Nov. 1985). doi:10.1103/PhysRevLett.55.2471
-
[10]
William Dawson, Louis Beal, Yoann Curé, Giuseppe Fisicaro, Dorian Rolland, and Luigi Genovese. 2026. Lara: validation-driven agentic supercomputer work- flows for atomistic modeling.arXiv:2604.22571
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[11]
Ralf Drautz. 2019. Atomic cluster expansion for accurate and transferable interatomic potentials.Physical Review B, 99, 1, (Jan. 2019), 014104. doi:10.1103 /PhysRevB.99.014104
2019
-
[12]
Abhimanyu Dubey et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783. https://arxiv.org/abs/2407.21783 arXiv: 2407.21783 [cs.AI]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez- Bombarelli, and Tommi Jaakkola. 2023. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. Transactions on Machine Learning Research. arXiv: 2210.07237[physics.comp-ph]. doi:10.48550/arXiv.2210.07237
- [14]
-
[15]
Gemma Team et al. 2024. Gemma: open models based on gemini research.arXiv preprint arXiv:2403.08295. https://arxiv.org/abs/2403.08295 arXiv: 2403.08295
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[16]
Stefan Grimme. 2004. Accurate description of van der waals complexes by den- sity functional theory including empirical corrections.Journal of Computational Chemistry, 25, 12, 1463–1473. doi:https://doi.org/10.1002/jcc.20078
- [17]
-
[18]
Kalia, Priya Vashishta, and Ken-ichi Nomura
Shinnosuke Hattori, Kohei Shimamura, Aiichiro Nakano, Rajiv K. Kalia, Priya Vashishta, and Ken-ichi Nomura. 2025. Beyond scaling: chemical intuition as emergent ability of universal machine learning interatomic potentials. (June 2025). arXiv: 2506.07579 [cond-mat.mtrl-sci]. https://arxiv.org/abs/2506.07 579
-
[19]
Uberuaga, and Hannes Jónsson
Graeme Henkelman, Blas P. Uberuaga, and Hannes Jónsson. 2000. A climbing image nudged elastic band method for finding saddle points and minimum energy paths.The Journal of Chemical Physics, 113, 22, 9901–9904. doi:10.1063 /1.1329672
2000
-
[20]
Pierre Hohenberg and Walter Kohn. 1964. Inhomogeneous electron gas.Physi- cal Review, 136, (Nov. 1964), B864–B871, 3B, (Nov. 1964). doi:10.1103/PhysRev .136.B864
-
[21]
Kelly Hong, Anton Troynikov, and Jeff Huber. 2025. Context Rot: How In- creasing Input Tokens Impacts LLM Performance. Tech. rep. Comprehensive empirical study mapping performance cliffs across 18 frontier long-context language models. Chroma, (July 2025). https://research.trychroma.com/contex t-rot
2025
-
[22]
Jacobsen
Hannes Jónsson, Greg Mills, and Karsten W. Jacobsen. 1998. Nudged elastic band method for finding minimum energy paths of transitions. InClassical and Quantum Dynamics in Condensed Phase Simulations. Bruce J. Berne, Giovanni Ciccotti, and David F. Coker, (Eds.) The original seminal paper formulating the standard NEB algorithm. World Scientific, 385–404
1998
-
[23]
Colonel Vijay Kumar and Balasubramanian Kandasubramanian. 2019. Ad- vances in ablative composites of carbon based materials: a review.Industrial & Engineering Chemistry Research, 58, 51, 22663–22701. eprint: https://doi.org/10 .1021/acs.iecr.9b04625. doi:10.1021/acs.iecr.9b04625
-
[24]
Rosanna Larciprete, Stefano Fabris, Tao Sun, Paolo Lacovig, Alessandro Baraldi, and Silvano Lizzit. 2011. Dual path mechanism in the thermal reduction of graphene oxide.Journal of the American Chemical Society, 133, 43, 17315–17321. PMID: 21846143. eprint: https://doi.org/10.1021/ja205168x. doi:10.1021/ja20516 8x
-
[25]
Chris Lu, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foerster, David Ha, and Jeff Clune. 2026. Towards end-to-end automation of ai research.Nature, 651, 8107, (Mar. 2026), 914–919. doi:10.1038/s41586-026-1026 5-5
-
[26]
Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D
Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. 2024. Augmenting large language models with chem- istry tools.Nat. Mach. Intell., 6, 5, (May 2024), 525–535. doi:10.1038/s42256-024 -00832-8
-
[27]
Boiko, Jose Emilio Regio, Liliana C
Robert MacKnight, Daniil A. Boiko, Jose Emilio Regio, Liliana C. Gallegos, Théo A. Neukomm, and Gabe Gomes. 2025. Rethinking chemical research in the age of large language models.Nat. Comput. Sci., 5, 9, (Sept. 2025), 715–726. doi:10.1038/s43588-025-00811-y
-
[28]
Manning, Prabhakar tug Raghavan, and Hinrich Schütze
Christopher D. Manning, Prabhakar tug Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York. isbn: 0521865719
2008
-
[29]
Owen, Mordechai Kornbluth, and Boris Kozinsky
Albert Musaelian, Simon Batzner, Anders Johansson, Lixin Sun, Cameron J. Owen, Mordechai Kornbluth, and Boris Kozinsky. 2023. Learning local equivari- ant representations for large-scale atomistic dynamics.Nature Communications, 14, 579. doi:10.1038/s41467-023-36329-y
-
[30]
NobelPrize.org. 2026. The Nobel Prize in Chemistry 2013. Retrieved Jun 25, 2026 from https://www.nobelprize.org/prizes/chemistry/2013/summary
2026
-
[31]
[SW] Ken-ichi Nomura, nebskill: Nudged Elastic Band software utilities and analysis tools 2026.url: https://github.com/KenichiNomura/nebskill
2026
-
[32]
[SW] NVIDIA Corporation, NVIDIA NemoClaw: Enterprise-Grade Agent Or- chestration and Security Stack 2026.url: https://blogs.nvidia.com/blog/indust rial-software-leaders-secure-autonomous-ai-engineers-nemoclaw/
2026
-
[33]
OpenAI. 2025. Gpt-oss: open-weight language models with advanced reasoning and tool use capabilities. https://github.com/openai/gpt-oss. (2025)
2025
-
[34]
Michael C. Payne, Michael P. Teter, Douglas C. Allan, Tomás. A. Arias, and John D. Joannopoulos. 1992. Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients.Reviews of Modern Physics, 64, (Oct. 1992), 1045–1097, 4, (Oct. 1992). doi:10.1103/Rev ModPhys.64.1045
work page doi:10.1103/rev 1992
-
[35]
23702-10 First-principles study of the impact of as doping
John P. Perdew, Kieron Burke, and Matthias Ernzerhof. 1996. Generalized gradient approximation made simple.Physical Review Letters, 77, (Oct. 1996), 3865–3868, 18, (Oct. 1996). doi:10.1103/PhysRevLett.77.3865
-
[36]
Pyzer-Knapp, Jed W
Edward O. Pyzer-Knapp, Jed W. Pitera, Peter W. J. Staar, Seiji Takeda, Teodoro Laino, Daniel P. Sanders, James Sexton, John R. Smith, and Alessandro Curioni
-
[37]
Accelerating materials discovery using artificial intelligence, high perfor- mance computing and robotics.npj Computational Materials, 8, 1, (Apr. 2022),
2022
-
[38]
doi:10.1038/s41524-022-00765-z
-
[39]
Taufeq Mohammed Razakh et al. 2025. Multiscale light-matter dynamics in quantum materials: from electrons to topological superlattices. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’25). Association for Computing Machinery, 36–47. doi:10.1145/3712285.3771785
-
[40]
Janosh Riebesell et al. 2023. Matbench Discovery – a framework to evaluate machine learning crystal stability predictions.arXiv preprint arXiv:2308.14920, (Aug. 2023). arXiv: 2308.14920 [cond-mat.mtrl-sci]. doi:10.48550/arXiv.230 8.14920
- [41]
-
[42]
[SW] Peter Steinberger and The OpenClaw Foundation, OpenClaw: An Open- Source Autonomous AI Agent Framework 2026.url: https://github.com/Ope nClaw/openclaw
2026
-
[43]
Tao Sun, Stefano Fabris, and Stefano Baroni. 2011. Surface precursors and reaction mechanisms for the thermal reduction of graphene basal surfaces oxidized by atomic oxygen.The Journal of Physical Chemistry C, 115, 11, 4730–
2011
-
[44]
eprint: https://doi.org/10.1021/jp111372k. doi:10.1021/jp111372k
-
[45]
Kalia, Aiichiro Nakano, and Priya Vashishta
Hiroshi Takemiya, Yoshio Tanaka, Satoshi Sekiguchi, Shuji Ogata, Rajiv K. Kalia, Aiichiro Nakano, and Priya Vashishta. 2006. Sustainable adaptive grid supercomputing: multiscale simulation of semiconductor processing across the pacific. InSC ’06: Proceedings of the 2006 ACM/IEEE Conference on Super- computing, 23–23. doi:10.1109/SC.2006.59
-
[46]
[SW] Vercel Labs and The Agent Skills Directory Contributors, The Agent Skills Directory and CLI (skills.sh) 2026.url: https://www.skills.sh
2026
- [47]
-
[48]
Narasimhan, and Yuan Cao
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak prestige Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR). OpenReview.net. https://openreview.net/forum?id=WE_vluYUL-X
2023
-
[49]
Jian Zhao et al. 2026. High-temperature memristors enabled by interfacial engineering.Science, 392, 6799, 771–779. eprint: https://www.science.org/doi/p df/10.1126/science.aeb9934. doi:10.1126/science.aeb9934
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.