pith. sign in

arxiv: 2606.29100 · v1 · pith:E4PNZDBZnew · submitted 2026-06-27 · 💻 cs.CE

Toward Exascale AI for Science: A Scalable AI Skill for Autonomous Microkinetics Discovery

Pith reviewed 2026-06-30 07:59 UTC · model grok-4.3

classification 💻 cs.CE
keywords autonomous scientific discoveryagentic workflowssurrogate modelsmicrokineticshigh-performance computingmaterials researchexascale computingsimulation automation
0
0 comments X

The pith

An AI framework using agentic workflows and surrogate models automates microkinetics discovery while recovering from simulation failures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a scalable framework that combines agentic workflow automation with high-performance computing and scientific surrogate models to enable autonomous discovery in materials science. Using microkinetics discovery as a testbed, it shows how the system can reduce the need for expert intervention, recover from failed simulations on its own, and check the reliability of the surrogate models that stand in for expensive calculations. A sympathetic reader would care because the approach turns complex, domain-specific simulation tasks into more robust and scalable processes that could support next-generation research at very large computing scales.

Core claim

We present a scalable AI-driven framework that advances autonomous scientific discovery by combining agentic workflow automation, high-performance computing, and scientific surrogate models. Using microkinetics discovery as a testbed, the work demonstrates how AI can reduce expert intervention, recover from failed simulations, and systematically evaluate surrogate model reliability. This study shows how AI skills can transform complex domain workflows into robust, scalable capabilities for next-generation materials research.

What carries the argument

Agentic workflow automation integrated with surrogate models that stand in for full microkinetics simulations and allow the system to manage failures and assess its own substitutes.

If this is right

  • Expert intervention drops in complex simulation workflows because the agentic system manages the steps end to end.
  • Failed simulations are recovered from without manual restarts or adjustments.
  • Surrogate model reliability receives systematic checks inside the automated loop.
  • Complex domain workflows become robust, scalable capabilities suitable for large-scale materials research.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern of agentic automation plus surrogates could be tested on other simulation-heavy tasks such as molecular dynamics or quantum chemistry calculations.
  • If the approach scales cleanly, it would remove human bottlenecks that currently limit how many candidate materials can be evaluated at exascale.
  • New classes of failure could appear when the agent itself makes decisions about which surrogate to trust, requiring separate monitoring methods.

Load-bearing premise

Agentic workflow automation combined with surrogate models can reliably handle complex domain-specific simulations at exascale without introducing new failure modes or requiring substantial custom tuning.

What would settle it

A run of the framework on a microkinetics task where repeated simulation failures occur and the system either fails to recover autonomously or its surrogate reliability checks do not match results from direct full simulations.

Figures

Figures reproduced from arXiv: 2606.29100 by Aiichiro Nakano, Kai Ito, Ken-ichi Nomura, Nabankur Dasgupta, Taufeq Mohammed Razakh, Thomas Linker, William Dawson.

Figure 1
Figure 1. Figure 1: Design of NEB skill. (a) Directory and file organization of the five key steps. (b) Skill flowchart. A user initiates the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Atomic configurations of reactant, (b) transi [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

We present a scalable AI-driven framework that advances autonomous scientific discovery by combining agentic workflow automation, high-performance computing, and scientific surrogate models. Using microkinetics discovery as a testbed, the work demonstrates how AI can reduce expert intervention, recover from failed simulations, and systematically evaluate surrogate model reliability. This study shows how AI skills can transform complex domain workflows into robust, scalable capabilities for next-generation materials research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript presents a scalable AI-driven framework combining agentic workflow automation, high-performance computing, and scientific surrogate models for autonomous microkinetics discovery. Using microkinetics as a testbed, it claims to demonstrate how AI reduces expert intervention, recovers from failed simulations, and systematically evaluates surrogate model reliability, thereby transforming complex domain workflows into robust capabilities for exascale materials research.

Significance. If the central claims are substantiated with quantitative evidence, the work could advance autonomous AI for science by automating complex simulation workflows and addressing reliability at scale. The integration of agentic systems with surrogates for failure recovery and reliability evaluation represents a potentially valuable direction for exascale computing in materials science.

major comments (2)
  1. [Abstract] Abstract: The central claim that the framework reduces expert intervention, recovers from failed simulations, and evaluates surrogate reliability is not supported by any quantitative metrics, derivations, error analysis, or specific results (e.g., recovery success rates, intervention hours saved, or scaling behavior). This absence is load-bearing because the demonstration reduces to a framework description without evidence that the agentic + surrogate combination avoids new failure modes or requires no substantial custom tuning.
  2. [Abstract] Abstract: The assumption that agentic workflow automation combined with surrogate models can reliably handle complex domain-specific simulations at exascale is presented without addressing potential new error modes such as incorrect surrogate invocation or simulation parameter drift. This is load-bearing for the reliability and scalability claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We agree that strengthening the quantitative support for our claims will improve the manuscript and address the concerns about evidence for the framework's effectiveness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the framework reduces expert intervention, recovers from failed simulations, and evaluates surrogate reliability is not supported by any quantitative metrics, derivations, error analysis, or specific results (e.g., recovery success rates, intervention hours saved, or scaling behavior). This absence is load-bearing because the demonstration reduces to a framework description without evidence that the agentic + surrogate combination avoids new failure modes or requires no substantial custom tuning.

    Authors: We acknowledge this point. While the manuscript includes demonstrations through case studies in microkinetics discovery, specific quantitative metrics such as success rates for recovery and scaling behavior are not explicitly highlighted in the abstract. In the revision, we will update the abstract to include key quantitative results from our experiments and add a dedicated subsection on performance metrics to substantiate the claims. revision: yes

  2. Referee: [Abstract] Abstract: The assumption that agentic workflow automation combined with surrogate models can reliably handle complex domain-specific simulations at exascale is presented without addressing potential new error modes such as incorrect surrogate invocation or simulation parameter drift. This is load-bearing for the reliability and scalability claims.

    Authors: The manuscript describes mechanisms for handling failures and evaluating surrogate reliability, but we agree that explicit discussion of potential new error modes introduced by the AI components is important. We will revise the manuscript to include an analysis of these potential issues and how the framework mitigates them, drawing from our testbed experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: framework description contains no equations or self-referential derivations

full rationale

The paper describes an AI-driven framework combining agentic automation, HPC, and surrogate models, tested on microkinetics discovery. The abstract and available text contain no equations, fitted parameters, predictions derived from inputs, or self-citations invoked as load-bearing uniqueness theorems. Claims about reducing expert intervention and recovering from failures are presented as demonstrations rather than mathematical derivations that reduce to their own inputs by construction. No load-bearing steps match any of the enumerated circularity patterns. This is the expected non-finding for a methods/framework paper without algebraic content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5616 in / 905 out tokens · 21947 ms · 2026-06-30T07:59:25.705218+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 30 canonical work pages · 3 internal anchors

  1. [1]

    Agent Skills Working Group. 2026. Agent skills specification: portable ai agent expertise via skill.md. The official standard for progressive-disclosure pro- cedural capabilities across AI coding agents. Retrieved June 18, 2026 from https://agentskills.io

  2. [2]

    Anthropic and Agentic AI Foundation. 2024. Model context protocol (mcp): an open standard for integrating ai models with external tools and data. Main- tained by the Agentic AI Foundation under the Linux Foundation. Retrieved June 18, 2026 from https://modelcontextprotocol.io

  3. [3]

    Kovács, Albert Musaelian, Gregor N

    Ilyes Batatia, Simon Batzner, Dávid P. Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, and Gábor Csányi. 2025. The design space of 𝐸( 3)-equivariant atom-centred interatomic potentials. Nature Machine Intelligence, 7, 1, 56–67. doi:10.1038/s42256-024-00956-x

  4. [4]

    Ilyes Batatia, Dávid Péter Kovács, Gregor N. C. Simm, Christoph Ortner, and Gábor Csányi. 2022. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. InAdvances in Neural Information Processing Systems. Vol. 35, 11423–11436. Toward Exascale AI for Science: A Scalable AI Skill for Autonomous Microkinetics Discov...

  5. [5]

    Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

    Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky

  6. [6]

    doi:10.1038/s41467-0 22-29939-5

    E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials.Nature Communications, 13, 2453. doi:10.1038/s41467-0 22-29939-5

  7. [7]

    Peter E. Blöchl. 1994. Projector augmented-wave method.Physical Review B, 50, (Dec. 1994), 17953–17979, 24, (Dec. 1994). doi:10.1103/PhysRevB.50.17953

  8. [8]

    URL https://www.nature.com/articles/ s41586-023-06792-0

    Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. 2023. Au- tonomous chemical research with large language models.Nature, 624, 7992, (Dec. 2023), 570–578. doi:10.1038/s41586-023-06792-0

  9. [9]

    Roberto Car and Michele Parrinello. 1985. Unified approach for molecular dynamics and density-functional theory.Physical Review Letters, 55, (Nov. 1985), 2471–2474, 22, (Nov. 1985). doi:10.1103/PhysRevLett.55.2471

  10. [10]

    William Dawson, Louis Beal, Yoann Curé, Giuseppe Fisicaro, Dorian Rolland, and Luigi Genovese. 2026. Lara: validation-driven agentic supercomputer work- flows for atomistic modeling.arXiv:2604.22571

  11. [11]

    Ralf Drautz. 2019. Atomic cluster expansion for accurate and transferable interatomic potentials.Physical Review B, 99, 1, (Jan. 2019), 014104. doi:10.1103 /PhysRevB.99.014104

  12. [12]

    Abhimanyu Dubey et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783. https://arxiv.org/abs/2407.21783 arXiv: 2407.21783 [cs.AI]

  13. [13]

    Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez- Bombarelli, and Tommi Jaakkola. 2023. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. Transactions on Machine Learning Research. arXiv: 2210.07237[physics.comp-ph]. doi:10.48550/arXiv.2210.07237

  14. [14]

    Shanghua Gao et al. 2025. Democratizing ai scientists using tooluniverse. arXiv:2509.23426

  15. [15]

    Gemma Team et al. 2024. Gemma: open models based on gemini research.arXiv preprint arXiv:2403.08295. https://arxiv.org/abs/2403.08295 arXiv: 2403.08295

  16. [16]

    Stefan Grimme. 2004. Accurate description of van der waals complexes by den- sity functional theory including empirical corrections.Journal of Computational Chemistry, 25, 12, 1463–1473. doi:https://doi.org/10.1002/jcc.20078

  17. [17]

    Yikang Han, Ai Liu, and Haihao Wang. 2023. A survey of vector databases in the era of large language models.arXiv preprint arXiv:2310.11703. Standard survey for modern vector database architectures

  18. [18]

    Kalia, Priya Vashishta, and Ken-ichi Nomura

    Shinnosuke Hattori, Kohei Shimamura, Aiichiro Nakano, Rajiv K. Kalia, Priya Vashishta, and Ken-ichi Nomura. 2025. Beyond scaling: chemical intuition as emergent ability of universal machine learning interatomic potentials. (June 2025). arXiv: 2506.07579 [cond-mat.mtrl-sci]. https://arxiv.org/abs/2506.07 579

  19. [19]

    Uberuaga, and Hannes Jónsson

    Graeme Henkelman, Blas P. Uberuaga, and Hannes Jónsson. 2000. A climbing image nudged elastic band method for finding saddle points and minimum energy paths.The Journal of Chemical Physics, 113, 22, 9901–9904. doi:10.1063 /1.1329672

  20. [20]

    Pierre Hohenberg and Walter Kohn. 1964. Inhomogeneous electron gas.Physi- cal Review, 136, (Nov. 1964), B864–B871, 3B, (Nov. 1964). doi:10.1103/PhysRev .136.B864

  21. [21]

    Kelly Hong, Anton Troynikov, and Jeff Huber. 2025. Context Rot: How In- creasing Input Tokens Impacts LLM Performance. Tech. rep. Comprehensive empirical study mapping performance cliffs across 18 frontier long-context language models. Chroma, (July 2025). https://research.trychroma.com/contex t-rot

  22. [22]

    Jacobsen

    Hannes Jónsson, Greg Mills, and Karsten W. Jacobsen. 1998. Nudged elastic band method for finding minimum energy paths of transitions. InClassical and Quantum Dynamics in Condensed Phase Simulations. Bruce J. Berne, Giovanni Ciccotti, and David F. Coker, (Eds.) The original seminal paper formulating the standard NEB algorithm. World Scientific, 385–404

  23. [23]

    Colonel Vijay Kumar and Balasubramanian Kandasubramanian. 2019. Ad- vances in ablative composites of carbon based materials: a review.Industrial & Engineering Chemistry Research, 58, 51, 22663–22701. eprint: https://doi.org/10 .1021/acs.iecr.9b04625. doi:10.1021/acs.iecr.9b04625

  24. [24]

    Rosanna Larciprete, Stefano Fabris, Tao Sun, Paolo Lacovig, Alessandro Baraldi, and Silvano Lizzit. 2011. Dual path mechanism in the thermal reduction of graphene oxide.Journal of the American Chemical Society, 133, 43, 17315–17321. PMID: 21846143. eprint: https://doi.org/10.1021/ja205168x. doi:10.1021/ja20516 8x

  25. [25]

    Chris Lu, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foerster, David Ha, and Jeff Clune. 2026. Towards end-to-end automation of ai research.Nature, 651, 8107, (Mar. 2026), 914–919. doi:10.1038/s41586-026-1026 5-5

  26. [26]

    Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D

    Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. 2024. Augmenting large language models with chem- istry tools.Nat. Mach. Intell., 6, 5, (May 2024), 525–535. doi:10.1038/s42256-024 -00832-8

  27. [27]

    Boiko, Jose Emilio Regio, Liliana C

    Robert MacKnight, Daniil A. Boiko, Jose Emilio Regio, Liliana C. Gallegos, Théo A. Neukomm, and Gabe Gomes. 2025. Rethinking chemical research in the age of large language models.Nat. Comput. Sci., 5, 9, (Sept. 2025), 715–726. doi:10.1038/s43588-025-00811-y

  28. [28]

    Manning, Prabhakar tug Raghavan, and Hinrich Schütze

    Christopher D. Manning, Prabhakar tug Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York. isbn: 0521865719

  29. [29]

    Owen, Mordechai Kornbluth, and Boris Kozinsky

    Albert Musaelian, Simon Batzner, Anders Johansson, Lixin Sun, Cameron J. Owen, Mordechai Kornbluth, and Boris Kozinsky. 2023. Learning local equivari- ant representations for large-scale atomistic dynamics.Nature Communications, 14, 579. doi:10.1038/s41467-023-36329-y

  30. [30]

    NobelPrize.org. 2026. The Nobel Prize in Chemistry 2013. Retrieved Jun 25, 2026 from https://www.nobelprize.org/prizes/chemistry/2013/summary

  31. [31]

    [SW] Ken-ichi Nomura, nebskill: Nudged Elastic Band software utilities and analysis tools 2026.url: https://github.com/KenichiNomura/nebskill

  32. [32]

    [SW] NVIDIA Corporation, NVIDIA NemoClaw: Enterprise-Grade Agent Or- chestration and Security Stack 2026.url: https://blogs.nvidia.com/blog/indust rial-software-leaders-secure-autonomous-ai-engineers-nemoclaw/

  33. [33]

    OpenAI. 2025. Gpt-oss: open-weight language models with advanced reasoning and tool use capabilities. https://github.com/openai/gpt-oss. (2025)

  34. [34]

    Payne, Michael P

    Michael C. Payne, Michael P. Teter, Douglas C. Allan, Tomás. A. Arias, and John D. Joannopoulos. 1992. Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients.Reviews of Modern Physics, 64, (Oct. 1992), 1045–1097, 4, (Oct. 1992). doi:10.1103/Rev ModPhys.64.1045

  35. [35]

    23702-10 First-principles study of the impact of as doping

    John P. Perdew, Kieron Burke, and Matthias Ernzerhof. 1996. Generalized gradient approximation made simple.Physical Review Letters, 77, (Oct. 1996), 3865–3868, 18, (Oct. 1996). doi:10.1103/PhysRevLett.77.3865

  36. [36]

    Pyzer-Knapp, Jed W

    Edward O. Pyzer-Knapp, Jed W. Pitera, Peter W. J. Staar, Seiji Takeda, Teodoro Laino, Daniel P. Sanders, James Sexton, John R. Smith, and Alessandro Curioni

  37. [37]

    Accelerating materials discovery using artificial intelligence, high perfor- mance computing and robotics.npj Computational Materials, 8, 1, (Apr. 2022),

  38. [38]

    doi:10.1038/s41524-022-00765-z

  39. [39]

    Taufeq Mohammed Razakh et al. 2025. Multiscale light-matter dynamics in quantum materials: from electrons to topological superlattices. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’25). Association for Computing Machinery, 36–47. doi:10.1145/3712285.3771785

  40. [40]

    Janosh Riebesell et al. 2023. Matbench Discovery – a framework to evaluate machine learning crystal stability predictions.arXiv preprint arXiv:2308.14920, (Aug. 2023). arXiv: 2308.14920 [cond-mat.mtrl-sci]. doi:10.48550/arXiv.230 8.14920

  41. [41]

    Fuyuki Shimojo et al. 2019. Qxmd: an open-source program for nonadiabatic quantum molecular dynamics.SoftwareX, 10, 100307. doi:https://doi.org/10.101 6/j.softx.2019.100307

  42. [42]

    [SW] Peter Steinberger and The OpenClaw Foundation, OpenClaw: An Open- Source Autonomous AI Agent Framework 2026.url: https://github.com/Ope nClaw/openclaw

  43. [43]

    Tao Sun, Stefano Fabris, and Stefano Baroni. 2011. Surface precursors and reaction mechanisms for the thermal reduction of graphene basal surfaces oxidized by atomic oxygen.The Journal of Physical Chemistry C, 115, 11, 4730–

  44. [44]

    doi:10.1021/jp111372k

    eprint: https://doi.org/10.1021/jp111372k. doi:10.1021/jp111372k

  45. [45]

    Kalia, Aiichiro Nakano, and Priya Vashishta

    Hiroshi Takemiya, Yoshio Tanaka, Satoshi Sekiguchi, Shuji Ogata, Rajiv K. Kalia, Aiichiro Nakano, and Priya Vashishta. 2006. Sustainable adaptive grid supercomputing: multiscale simulation of semiconductor processing across the pacific. InSC ’06: Proceedings of the 2006 ACM/IEEE Conference on Super- computing, 23–23. doi:10.1109/SC.2006.59

  46. [46]

    [SW] Vercel Labs and The Agent Skills Directory Contributors, The Agent Skills Directory and CLI (skills.sh) 2026.url: https://www.skills.sh

  47. [47]

    Ziqi Wang, Hongshuo Huang, Hancheng Zhao, Changwen Xu, Shang Zhu, Jan Janssen, and Venkatasubramanian Viswanathan. 2025. Dreams: density functional theory based research engine for agentic materials simulation. arXiv:2507.14267

  48. [48]

    Narasimhan, and Yuan Cao

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak prestige Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR). OpenReview.net. https://openreview.net/forum?id=WE_vluYUL-X

  49. [49]

    Jian Zhao et al. 2026. High-temperature memristors enabled by interfacial engineering.Science, 392, 6799, 771–779. eprint: https://www.science.org/doi/p df/10.1126/science.aeb9934. doi:10.1126/science.aeb9934