pith. sign in

arxiv: 2605.16552 · v1 · pith:UQOCIE5Mnew · submitted 2026-05-15 · 💻 cs.AI · cs.RO

From Prompts to Protocols: An AI Agent for Laboratory Automation

Pith reviewed 2026-05-20 18:32 UTC · model grok-4.3

classification 💻 cs.AI cs.RO
keywords AI agentlaboratory automationprotocol generationlarge language modelsnatural language interfacesexperiment orchestrationsimulated labs
0
0 comments X

The pith

An AI agent turns natural language prompts into executable lab protocols with 97 percent first-attempt success.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an AI agent that combines large language models with laboratory orchestration software so scientists can describe experiments in ordinary language and receive validated, runnable protocols in return. This removes the need for manual coding, configuration files, and navigation of complex instrument interfaces. The agent runs in a loop that automatically checks and corrects errors, handles the full workflow from protocol design through execution and data analysis, and offers a synchronized visual graph editor for human oversight. Evaluation across three simulated labs shows the agent succeeds on the first try 97 percent of the time while cutting the number of interface actions by roughly ten times. If these results hold beyond simulation, autonomous labs would become far more accessible to researchers without specialized programming skills.

Core claim

The AI agent architecture integrates large language models with the Experiment Orchestration System through an agentic loop that performs automated validation and error correction, enabling the complete experimental lifecycle from natural-language protocol creation and monitoring to closed-loop optimization and result analysis, while a visual graph editor keeps the AI representation synchronized with manual edits; on three simulated automated laboratories spanning chemistry, biology, and materials science this produces a 97 percent first-attempt protocol generation success rate together with an order-of-magnitude reduction in required interface actions.

What carries the argument

The agentic loop with automated validation and error correction that links large language models to laboratory orchestration and the visual graph editor.

If this is right

  • Scientists can generate and run protocols without writing code or managing configuration files.
  • The same agent supports both one-off experiments and closed-loop optimization campaigns.
  • A visual graph editor lets users switch freely between AI-generated and manually edited protocols.
  • High first-try success in simulation across chemistry, biology, and materials science suggests broad applicability within automated labs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulation-to-reality gap proves small, this architecture could shorten the time from idea to first automated experiment from days to minutes.
  • Combining the agent with separate hypothesis-generation models could produce end-to-end autonomous discovery loops.
  • The order-of-magnitude drop in interface actions may allow one researcher to oversee many more simultaneous experiments than current manual or scripted approaches permit.

Load-bearing premise

The three simulated laboratory environments capture enough of the error modes, timing, and physical constraints of real instruments that high success rates observed in simulation will transfer to physical labs.

What would settle it

Deploying the agent on a physical automated lab and measuring whether the first-attempt success rate stays near 97 percent or falls sharply when real instrument variability and timing issues appear.

Figures

Figures reproduced from arXiv: 2605.16552 by Angelos Angelopoulos, James F. Cahoon, Ron Alterovitz.

Figure 1
Figure 1. Figure 1: The EOS AI agent enables scientists to create experiment pro [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The EOS AI agent is integrated with the EOS user interface. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of AI-assisted protocol creation. The AI agent uses [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: The color mixing protocol generated by the EOS AI agent. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Conversation with the EOS AI agent to load the experiment protocol [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Approximate number of minimum discrete interface actions required [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

Automating science laboratories enables faster, safer, more accurate, and more reproducible execution of protocols, accelerating the discovery and testing of new materials, drugs, and more. However, setting up and running autonomous labs requires coordinating numerous instruments and robots, forcing scientists to write code, manage configuration files, and navigate complex software infrastructure. We present an AI agent architecture that integrates large language models with laboratory orchestration, enabling scientists to interactively create and monitor automated lab protocols using natural language. Integrated into the Experiment Orchestration System (EOS), the AI agent operates under an agentic loop with automated validation and error correction, and supports the complete experimental lifecycle: creating protocols, running and monitoring both protocols and closed-loop optimization campaigns, and analyzing results. A visual graph editor renders protocols as interactive node-based diagrams synchronized with the AI agent's protocol representation, enabling seamless alternation between AI-assisted and manual protocol construction. Evaluated on three simulated automated labs spanning chemistry, biology, and materials science, the AI agent achieves a 97% first-attempt protocol generation success rate and an order of magnitude reduction in required interface actions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents an AI agent architecture that integrates large language models with laboratory orchestration via the Experiment Orchestration System (EOS). The agent supports natural-language creation and monitoring of protocols, operates in an agentic loop with automated validation and error correction, and includes a synchronized visual graph editor for protocol construction. It is evaluated on three simulated automated labs spanning chemistry, biology, and materials science, where it reportedly achieves a 97% first-attempt protocol generation success rate and an order-of-magnitude reduction in required interface actions.

Significance. If the performance claims generalize beyond the simulators, the work could meaningfully lower barriers to laboratory automation by allowing scientists to specify and iterate on protocols in natural language rather than code or configuration files. The support for closed-loop optimization campaigns and the hybrid AI/manual workflow via the graph editor represent practical contributions to reproducible experimental workflows.

major comments (1)
  1. [Evaluation] Evaluation section: The 97% first-attempt success rate and order-of-magnitude action reduction are obtained exclusively inside three simulated laboratory environments. No details are supplied on simulator fidelity (e.g., modeling of sensor noise, command latency, partial failures, or calibration drift), so it is unclear whether the agent’s validation and error-correction loop would exhibit comparable reliability on physical instruments.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'an order of magnitude reduction in required interface actions' should be accompanied by the exact baseline comparison (e.g., number of actions with and without the agent) to allow readers to assess the magnitude of the improvement.
  2. [Evaluation] The manuscript would benefit from a brief discussion of failure modes observed during the 3% unsuccessful protocol generations and how the error-correction loop addressed (or failed to address) them.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us improve the clarity of our evaluation. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The 97% first-attempt success rate and order-of-magnitude action reduction are obtained exclusively inside three simulated laboratory environments. No details are supplied on simulator fidelity (e.g., modeling of sensor noise, command latency, partial failures, or calibration drift), so it is unclear whether the agent’s validation and error-correction loop would exhibit comparable reliability on physical instruments.

    Authors: We agree that the evaluation is confined to simulated environments and that additional details on simulator fidelity would strengthen the manuscript. In the revised version, we have expanded Section 4 (Evaluation) with a new subsection (4.1 Simulator Design) that explicitly describes the modeling assumptions for each of the three labs. This includes: additive Gaussian sensor noise with variances calibrated to typical instrument specifications; command latencies sampled from empirical distributions derived from real hardware logs; partial failures injected at rates of 5–15% that trigger the agent's built-in validation and retry mechanisms; and gradual calibration drift modeled as a linear function of simulated runtime. These additions provide the requested context for interpreting the 97% first-attempt success rate and action-reduction results. We have also added a brief Limitations paragraph stating that, while the simulators capture core sources of variability, the agent's performance on physical instruments would require separate validation and is left for future work. We believe these changes directly resolve the concern while preserving the paper's focus on the AI agent architecture. revision: yes

Circularity Check

0 steps flagged

No circularity: quantitative results obtained from direct simulation runs

full rationale

The paper presents an AI agent architecture for laboratory automation and reports its performance through empirical evaluation on three simulated laboratory environments spanning chemistry, biology, and materials science. The headline metrics (97% first-attempt protocol generation success rate and order-of-magnitude reduction in interface actions) are stated as outcomes of these direct simulation runs rather than any mathematical derivation, parameter fitting, or prediction step that reduces to the inputs by construction. No equations, fitted parameters, self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling appear in the provided text. The architecture description (agentic loop, validation/error correction, visual graph editor, integration with EOS) stands independently of the evaluation results, making the overall claim self-contained as an empirical demonstration within the simulated setting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The architecture rests on the assumption that current large language models can reliably translate natural language into executable lab protocols and that the chosen simulation environments are representative; no new free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption Large language models can generate valid and executable laboratory protocols from natural language descriptions with high reliability.
    This assumption underpins the agent's protocol generation and interactive creation capabilities.

pith-pipeline@v0.9.0 · 5722 in / 1327 out tokens · 85045 ms · 2026-05-20T18:32:59.298462+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Transforming science labs into automated factories of discovery,

    A. Angelopoulos, J. F. Cahoon, and R. Alterovitz, “Transforming science labs into automated factories of discovery,”Science Robotics, vol. 9, no. 95, p. eadm6991, Oct. 2024

  2. [2]

    The rise of self-driving labs in chemical and materials sciences,

    M. Abolhasani and E. Kumacheva, “The rise of self-driving labs in chemical and materials sciences,”Nature Synthesis, vol. 2, no. 6, pp. 483–492, June 2023

  3. [3]

    Automation: Chemistry Shoots for the Moon,

    K. Sanderson, “Automation: Chemistry Shoots for the Moon,”Nature, vol. 568, no. 7753, pp. 577–579, Apr. 2019

  4. [4]

    The Experiment Orchestration System (EOS): Comprehensive Foundation for Laboratory Automation,

    A. Angelopoulos, C. Baykal, J. Kandel, M. Verber, J. F. Cahoon, and R. Alterovitz, “The Experiment Orchestration System (EOS): Comprehensive Foundation for Laboratory Automation,” in2025 IEEE International Conference on Robotics and Automation (ICRA), May 2025, pp. 15 900–15 906

  5. [5]

    AlabOS: A Python-based recon- figurable workflow management framework for autonomous laborato- ries,

    Y . Fei, B. Rendy, R. Kumar, O. Dartsi, H. P. Sahasrabuddhe, M. J. McDermott, Z. Wang, N. J. Szymanski, L. N. Walters, D. Milsted, Y . Zeng, A. Jain, and G. Ceder, “AlabOS: A Python-based recon- figurable workflow management framework for autonomous laborato- ries,”Digital Discovery, vol. 3, no. 11, pp. 2275–2288, Nov. 2024

  6. [6]

    IvoryOS: An interoperable web interface for orchestrating Python-based self-driving laboratories,

    W. Zhang, L. Hao, V . Lai, R. Corkery, J. Jessiman, J. Zhang, J. Liu, Y . Sato, M. Politi, M. E. Reish, R. Greenwood, N. Depner, J. Min, R. El-khawaldeh, P. Prieto, E. Trushina, and J. E. Hein, “IvoryOS: An interoperable web interface for orchestrating Python-based self-driving laboratories,”Nature Communications, vol. 16, no. 1, p. 5182, June 2025

  7. [7]

    An au- tonomous laboratory for the accelerated synthesis of novel materials,

    N. J. Szymanski, B. Rendy, Y . Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, H. Kim, A. Jain, C. J. Bartel, K. Persson, Y . Zeng, and G. Ceder, “An au- tonomous laboratory for the accelerated synthesis of novel materials,” Nature, vol. 624, no. 7990, pp. 86–91, Dec. 2023

  8. [8]

    Autonomous, multiproperty-driven molecular discovery: From predictions to mea- surements and back,

    B. A. Koscher, R. B. Canty, M. A. McDonald, K. P. Greenman, C. J. McGill, C. L. Bilodeau, W. Jin, H. Wu, F. H. Vermeire, B. Jin, T. Hart, T. Kulesza, S.-C. Li, T. S. Jaakkola, R. Barzilay, R. G ´omez-Bombarelli, W. H. Green, and K. F. Jensen, “Autonomous, multiproperty-driven molecular discovery: From predictions to mea- surements and back,”Science (New Y...

  9. [9]

    Controlling an organic synthesis robot with machine learning to search for new reactivity,

    J. M. Granda, L. Donina, V . Dragone, D.-L. Long, and L. Cronin, “Controlling an organic synthesis robot with machine learning to search for new reactivity,”Nature, vol. 559, no. 7714, pp. 377–381, July 2018

  10. [10]

    A mobile robotic chemist,

    B. Burger, P. M. Maffettone, V . V . Gusev, C. M. Aitchison, Y . Bai, X. Wang, X. Li, B. M. Alston, B. Li, R. Clowes, N. Rankin, B. Harris, R. S. Sprick, and A. I. Cooper, “A mobile robotic chemist,”Nature, vol. 583, no. 7815, pp. 237–241, July 2020

  11. [11]

    High-Accuracy Injection Using a Mobile Manipulation Robot for Chemistry Lab Automation,

    A. Angelopoulos, M. Verber, C. McKinney, J. Cahoon, and R. Al- terovitz, “High-Accuracy Injection Using a Mobile Manipulation Robot for Chemistry Lab Automation,” in2023 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), Oct. 2023, pp. 10 102–10 109

  12. [12]

    A robotic platform for flow synthesis of organic compounds informed by AI planning,

    C. W. Coley, D. A. Thomas, J. A. M. Lummiss, J. N. Jaworski, C. P. Breen, V . Schultz, T. Hart, J. S. Fishman, L. Rogers, H. Gao, R. W. Hicklin, P. P. Plehiers, J. Byington, J. S. Piotti, W. H. Green, A. J. Hart, T. F. Jamison, and K. F. Jensen, “A robotic platform for flow synthesis of organic compounds informed by AI planning,”Science, vol. 365, no. 645...

  13. [13]

    Organic synthesis in a modular robotic system driven by a chemical programming language,

    S. Steiner, J. Wolf, S. Glatzel, A. Andreou, J. M. Granda, G. Keenan, T. Hinkley, G. Aragon-Camarasa, P. J. Kitson, D. Angelone, and L. Cronin, “Organic synthesis in a modular robotic system driven by a chemical programming language,”Science, vol. 363, no. 6423, p. eaav2211, Jan. 2019

  14. [14]

    A universal system for digitization and automatic execution of the chemical synthesis literature,

    S. H. M. Mehr, M. Craven, A. I. Leonov, G. Keenan, and L. Cronin, “A universal system for digitization and automatic execution of the chemical synthesis literature,”Science, vol. 370, no. 6512, pp. 101– 108, Oct. 2020

  15. [15]

    Data-science driven autonomous process optimization,

    M. Christensen, L. P. E. Yunker, F. Adedeji, F. H ¨ase, L. M. Roch, T. Gensch, G. dos Passos Gomes, T. Zepel, M. S. Sigman, A. Aspuru- Guzik, and J. E. Hein, “Data-science driven autonomous process optimization,”Communications Chemistry, vol. 4, no. 1, pp. 1–12, Aug. 2021

  16. [16]

    On-the-fly closed-loop materials discovery via Bayesian active learning,

    A. G. Kusne, H. Yu, C. Wu, H. Zhang, J. Hattrick-Simpers, B. DeCost, S. Sarker, C. Oses, C. Toher, S. Curtarolo, A. V . Davydov, R. Agarwal, L. A. Bendersky, M. Li, A. Mehta, and I. Takeuchi, “On-the-fly closed-loop materials discovery via Bayesian active learning,”Nature Communications, vol. 11, no. 1, p. 5966, Nov. 2020

  17. [17]

    Experiment manager software for an automated chemistry workstation, including a scheduler for parallel experimentation,

    L. Andrew Corkan and J. S. Lindsey, “Experiment manager software for an automated chemistry workstation, including a scheduler for parallel experimentation,”Chemometrics and Intelligent Laboratory Systems, vol. 17, no. 1, pp. 47–74, Oct. 1992

  18. [18]

    ChemOS: An orchestration software to democratize autonomous discovery,

    L. M. Roch, F. H ¨ase, C. Kreisbeck, T. Tamayo-Mendoza, L. P. E. Yunker, J. E. Hein, and A. Aspuru-Guzik, “ChemOS: An orchestration software to democratize autonomous discovery,”PLOS ONE, vol. 15, no. 4, p. e0229862, Apr. 2020

  19. [19]

    BioBlocks: Pro- gramming Protocols in Biology Made Easier,

    V . Gupta, J. Irimia, I. Pau, and A. Rodr ´ıguez-Pat´on, “BioBlocks: Pro- gramming Protocols in Biology Made Easier,”ACS synthetic biology, vol. 6, no. 7, pp. 1230–1232, July 2017

  20. [20]

    An operating system for the biology lab,

    M. Segal, “An operating system for the biology lab,”Nature, vol. 573, no. 7775, pp. S112–S113, Sept. 2019

  21. [21]

    Autonomous chemical research with large language models,

    D. A. Boiko, R. MacKnight, B. Kline, and G. Gomes, “Autonomous chemical research with large language models,”Nature, vol. 624, no. 7992, pp. 570–578, Dec. 2023

  22. [22]

    Augmenting large language models with chemistry tools,

    A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, and P. Schwaller, “Augmenting large language models with chemistry tools,”Nature Machine Intelligence, vol. 6, no. 5, pp. 525–535, May 2024

  23. [23]

    ORGANA: A robotic assistant for automated chemistry experimentation and characterization,

    K. Darvish, M. Skreta, Y . Zhao, N. Yoshikawa, S. Som, M. Bog- danovic, Y . Cao, H. Hao, H. Xu, A. Aspuru-Guzik, A. Garg, and F. Shkurti, “ORGANA: A robotic assistant for automated chemistry experimentation and characterization,”Matter, vol. 8, no. 2, Feb. 2025

  24. [24]

    ChemOS 2.0: An orchestration architecture for chemical self-driving laboratories,

    M. Sim, M. G. Vakili, F. Strieth-Kalthoff, H. Hao, R. J. Hickman, S. Miret, S. Pablo-Garc ´ıa, and A. Aspuru-Guzik, “ChemOS 2.0: An orchestration architecture for chemical self-driving laboratories,” Matter, vol. 7, no. 9, pp. 2959–2977, Sept. 2024

  25. [25]

    SiLA 2: The Next Generation Lab Automation Standard,

    D. Juchli, “SiLA 2: The Next Generation Lab Automation Standard,” Advances in Biochemical Engineering/Biotechnology, vol. 182, pp. 147–174, 2022

  26. [26]

    Hein Group / PurPOSE·GitLab,

    “Hein Group / PurPOSE·GitLab,” Dec. 2023. [Online]. Available: https://gitlab.com/heingroup/purpose

  27. [27]

    PavelDoGreat/WebGL-Fluid-Simulation,

    P. Dobryakov, “PavelDoGreat/WebGL-Fluid-Simulation,” Sept. 2024

  28. [28]

    Stable fluids,

    J. Stam, “Stable fluids,” inProc. Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’99. USA: ACM Press/Addison-Wesley Publishing Co., July 1999, pp. 121–128