From Prompts to Protocols: An AI Agent for Laboratory Automation
Pith reviewed 2026-05-20 18:32 UTC · model grok-4.3
The pith
An AI agent turns natural language prompts into executable lab protocols with 97 percent first-attempt success.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The AI agent architecture integrates large language models with the Experiment Orchestration System through an agentic loop that performs automated validation and error correction, enabling the complete experimental lifecycle from natural-language protocol creation and monitoring to closed-loop optimization and result analysis, while a visual graph editor keeps the AI representation synchronized with manual edits; on three simulated automated laboratories spanning chemistry, biology, and materials science this produces a 97 percent first-attempt protocol generation success rate together with an order-of-magnitude reduction in required interface actions.
What carries the argument
The agentic loop with automated validation and error correction that links large language models to laboratory orchestration and the visual graph editor.
If this is right
- Scientists can generate and run protocols without writing code or managing configuration files.
- The same agent supports both one-off experiments and closed-loop optimization campaigns.
- A visual graph editor lets users switch freely between AI-generated and manually edited protocols.
- High first-try success in simulation across chemistry, biology, and materials science suggests broad applicability within automated labs.
Where Pith is reading between the lines
- If the simulation-to-reality gap proves small, this architecture could shorten the time from idea to first automated experiment from days to minutes.
- Combining the agent with separate hypothesis-generation models could produce end-to-end autonomous discovery loops.
- The order-of-magnitude drop in interface actions may allow one researcher to oversee many more simultaneous experiments than current manual or scripted approaches permit.
Load-bearing premise
The three simulated laboratory environments capture enough of the error modes, timing, and physical constraints of real instruments that high success rates observed in simulation will transfer to physical labs.
What would settle it
Deploying the agent on a physical automated lab and measuring whether the first-attempt success rate stays near 97 percent or falls sharply when real instrument variability and timing issues appear.
Figures
read the original abstract
Automating science laboratories enables faster, safer, more accurate, and more reproducible execution of protocols, accelerating the discovery and testing of new materials, drugs, and more. However, setting up and running autonomous labs requires coordinating numerous instruments and robots, forcing scientists to write code, manage configuration files, and navigate complex software infrastructure. We present an AI agent architecture that integrates large language models with laboratory orchestration, enabling scientists to interactively create and monitor automated lab protocols using natural language. Integrated into the Experiment Orchestration System (EOS), the AI agent operates under an agentic loop with automated validation and error correction, and supports the complete experimental lifecycle: creating protocols, running and monitoring both protocols and closed-loop optimization campaigns, and analyzing results. A visual graph editor renders protocols as interactive node-based diagrams synchronized with the AI agent's protocol representation, enabling seamless alternation between AI-assisted and manual protocol construction. Evaluated on three simulated automated labs spanning chemistry, biology, and materials science, the AI agent achieves a 97% first-attempt protocol generation success rate and an order of magnitude reduction in required interface actions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an AI agent architecture that integrates large language models with laboratory orchestration via the Experiment Orchestration System (EOS). The agent supports natural-language creation and monitoring of protocols, operates in an agentic loop with automated validation and error correction, and includes a synchronized visual graph editor for protocol construction. It is evaluated on three simulated automated labs spanning chemistry, biology, and materials science, where it reportedly achieves a 97% first-attempt protocol generation success rate and an order-of-magnitude reduction in required interface actions.
Significance. If the performance claims generalize beyond the simulators, the work could meaningfully lower barriers to laboratory automation by allowing scientists to specify and iterate on protocols in natural language rather than code or configuration files. The support for closed-loop optimization campaigns and the hybrid AI/manual workflow via the graph editor represent practical contributions to reproducible experimental workflows.
major comments (1)
- [Evaluation] Evaluation section: The 97% first-attempt success rate and order-of-magnitude action reduction are obtained exclusively inside three simulated laboratory environments. No details are supplied on simulator fidelity (e.g., modeling of sensor noise, command latency, partial failures, or calibration drift), so it is unclear whether the agent’s validation and error-correction loop would exhibit comparable reliability on physical instruments.
minor comments (2)
- [Abstract] Abstract: The phrase 'an order of magnitude reduction in required interface actions' should be accompanied by the exact baseline comparison (e.g., number of actions with and without the agent) to allow readers to assess the magnitude of the improvement.
- [Evaluation] The manuscript would benefit from a brief discussion of failure modes observed during the 3% unsuccessful protocol generations and how the error-correction loop addressed (or failed to address) them.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which has helped us improve the clarity of our evaluation. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The 97% first-attempt success rate and order-of-magnitude action reduction are obtained exclusively inside three simulated laboratory environments. No details are supplied on simulator fidelity (e.g., modeling of sensor noise, command latency, partial failures, or calibration drift), so it is unclear whether the agent’s validation and error-correction loop would exhibit comparable reliability on physical instruments.
Authors: We agree that the evaluation is confined to simulated environments and that additional details on simulator fidelity would strengthen the manuscript. In the revised version, we have expanded Section 4 (Evaluation) with a new subsection (4.1 Simulator Design) that explicitly describes the modeling assumptions for each of the three labs. This includes: additive Gaussian sensor noise with variances calibrated to typical instrument specifications; command latencies sampled from empirical distributions derived from real hardware logs; partial failures injected at rates of 5–15% that trigger the agent's built-in validation and retry mechanisms; and gradual calibration drift modeled as a linear function of simulated runtime. These additions provide the requested context for interpreting the 97% first-attempt success rate and action-reduction results. We have also added a brief Limitations paragraph stating that, while the simulators capture core sources of variability, the agent's performance on physical instruments would require separate validation and is left for future work. We believe these changes directly resolve the concern while preserving the paper's focus on the AI agent architecture. revision: yes
Circularity Check
No circularity: quantitative results obtained from direct simulation runs
full rationale
The paper presents an AI agent architecture for laboratory automation and reports its performance through empirical evaluation on three simulated laboratory environments spanning chemistry, biology, and materials science. The headline metrics (97% first-attempt protocol generation success rate and order-of-magnitude reduction in interface actions) are stated as outcomes of these direct simulation runs rather than any mathematical derivation, parameter fitting, or prediction step that reduces to the inputs by construction. No equations, fitted parameters, self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling appear in the provided text. The architecture description (agentic loop, validation/error correction, visual graph editor, integration with EOS) stands independently of the evaluation results, making the overall claim self-contained as an empirical demonstration within the simulated setting.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can generate valid and executable laboratory protocols from natural language descriptions with high reliability.
Reference graph
Works this paper leans on
-
[1]
Transforming science labs into automated factories of discovery,
A. Angelopoulos, J. F. Cahoon, and R. Alterovitz, “Transforming science labs into automated factories of discovery,”Science Robotics, vol. 9, no. 95, p. eadm6991, Oct. 2024
work page 2024
-
[2]
The rise of self-driving labs in chemical and materials sciences,
M. Abolhasani and E. Kumacheva, “The rise of self-driving labs in chemical and materials sciences,”Nature Synthesis, vol. 2, no. 6, pp. 483–492, June 2023
work page 2023
-
[3]
Automation: Chemistry Shoots for the Moon,
K. Sanderson, “Automation: Chemistry Shoots for the Moon,”Nature, vol. 568, no. 7753, pp. 577–579, Apr. 2019
work page 2019
-
[4]
The Experiment Orchestration System (EOS): Comprehensive Foundation for Laboratory Automation,
A. Angelopoulos, C. Baykal, J. Kandel, M. Verber, J. F. Cahoon, and R. Alterovitz, “The Experiment Orchestration System (EOS): Comprehensive Foundation for Laboratory Automation,” in2025 IEEE International Conference on Robotics and Automation (ICRA), May 2025, pp. 15 900–15 906
work page 2025
-
[5]
AlabOS: A Python-based recon- figurable workflow management framework for autonomous laborato- ries,
Y . Fei, B. Rendy, R. Kumar, O. Dartsi, H. P. Sahasrabuddhe, M. J. McDermott, Z. Wang, N. J. Szymanski, L. N. Walters, D. Milsted, Y . Zeng, A. Jain, and G. Ceder, “AlabOS: A Python-based recon- figurable workflow management framework for autonomous laborato- ries,”Digital Discovery, vol. 3, no. 11, pp. 2275–2288, Nov. 2024
work page 2024
-
[6]
IvoryOS: An interoperable web interface for orchestrating Python-based self-driving laboratories,
W. Zhang, L. Hao, V . Lai, R. Corkery, J. Jessiman, J. Zhang, J. Liu, Y . Sato, M. Politi, M. E. Reish, R. Greenwood, N. Depner, J. Min, R. El-khawaldeh, P. Prieto, E. Trushina, and J. E. Hein, “IvoryOS: An interoperable web interface for orchestrating Python-based self-driving laboratories,”Nature Communications, vol. 16, no. 1, p. 5182, June 2025
work page 2025
-
[7]
An au- tonomous laboratory for the accelerated synthesis of novel materials,
N. J. Szymanski, B. Rendy, Y . Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, H. Kim, A. Jain, C. J. Bartel, K. Persson, Y . Zeng, and G. Ceder, “An au- tonomous laboratory for the accelerated synthesis of novel materials,” Nature, vol. 624, no. 7990, pp. 86–91, Dec. 2023
work page 2023
-
[8]
Autonomous, multiproperty-driven molecular discovery: From predictions to mea- surements and back,
B. A. Koscher, R. B. Canty, M. A. McDonald, K. P. Greenman, C. J. McGill, C. L. Bilodeau, W. Jin, H. Wu, F. H. Vermeire, B. Jin, T. Hart, T. Kulesza, S.-C. Li, T. S. Jaakkola, R. Barzilay, R. G ´omez-Bombarelli, W. H. Green, and K. F. Jensen, “Autonomous, multiproperty-driven molecular discovery: From predictions to mea- surements and back,”Science (New Y...
work page 2023
-
[9]
Controlling an organic synthesis robot with machine learning to search for new reactivity,
J. M. Granda, L. Donina, V . Dragone, D.-L. Long, and L. Cronin, “Controlling an organic synthesis robot with machine learning to search for new reactivity,”Nature, vol. 559, no. 7714, pp. 377–381, July 2018
work page 2018
-
[10]
B. Burger, P. M. Maffettone, V . V . Gusev, C. M. Aitchison, Y . Bai, X. Wang, X. Li, B. M. Alston, B. Li, R. Clowes, N. Rankin, B. Harris, R. S. Sprick, and A. I. Cooper, “A mobile robotic chemist,”Nature, vol. 583, no. 7815, pp. 237–241, July 2020
work page 2020
-
[11]
High-Accuracy Injection Using a Mobile Manipulation Robot for Chemistry Lab Automation,
A. Angelopoulos, M. Verber, C. McKinney, J. Cahoon, and R. Al- terovitz, “High-Accuracy Injection Using a Mobile Manipulation Robot for Chemistry Lab Automation,” in2023 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), Oct. 2023, pp. 10 102–10 109
work page 2023
-
[12]
A robotic platform for flow synthesis of organic compounds informed by AI planning,
C. W. Coley, D. A. Thomas, J. A. M. Lummiss, J. N. Jaworski, C. P. Breen, V . Schultz, T. Hart, J. S. Fishman, L. Rogers, H. Gao, R. W. Hicklin, P. P. Plehiers, J. Byington, J. S. Piotti, W. H. Green, A. J. Hart, T. F. Jamison, and K. F. Jensen, “A robotic platform for flow synthesis of organic compounds informed by AI planning,”Science, vol. 365, no. 645...
work page 2019
-
[13]
Organic synthesis in a modular robotic system driven by a chemical programming language,
S. Steiner, J. Wolf, S. Glatzel, A. Andreou, J. M. Granda, G. Keenan, T. Hinkley, G. Aragon-Camarasa, P. J. Kitson, D. Angelone, and L. Cronin, “Organic synthesis in a modular robotic system driven by a chemical programming language,”Science, vol. 363, no. 6423, p. eaav2211, Jan. 2019
work page 2019
-
[14]
A universal system for digitization and automatic execution of the chemical synthesis literature,
S. H. M. Mehr, M. Craven, A. I. Leonov, G. Keenan, and L. Cronin, “A universal system for digitization and automatic execution of the chemical synthesis literature,”Science, vol. 370, no. 6512, pp. 101– 108, Oct. 2020
work page 2020
-
[15]
Data-science driven autonomous process optimization,
M. Christensen, L. P. E. Yunker, F. Adedeji, F. H ¨ase, L. M. Roch, T. Gensch, G. dos Passos Gomes, T. Zepel, M. S. Sigman, A. Aspuru- Guzik, and J. E. Hein, “Data-science driven autonomous process optimization,”Communications Chemistry, vol. 4, no. 1, pp. 1–12, Aug. 2021
work page 2021
-
[16]
On-the-fly closed-loop materials discovery via Bayesian active learning,
A. G. Kusne, H. Yu, C. Wu, H. Zhang, J. Hattrick-Simpers, B. DeCost, S. Sarker, C. Oses, C. Toher, S. Curtarolo, A. V . Davydov, R. Agarwal, L. A. Bendersky, M. Li, A. Mehta, and I. Takeuchi, “On-the-fly closed-loop materials discovery via Bayesian active learning,”Nature Communications, vol. 11, no. 1, p. 5966, Nov. 2020
work page 2020
-
[17]
L. Andrew Corkan and J. S. Lindsey, “Experiment manager software for an automated chemistry workstation, including a scheduler for parallel experimentation,”Chemometrics and Intelligent Laboratory Systems, vol. 17, no. 1, pp. 47–74, Oct. 1992
work page 1992
-
[18]
ChemOS: An orchestration software to democratize autonomous discovery,
L. M. Roch, F. H ¨ase, C. Kreisbeck, T. Tamayo-Mendoza, L. P. E. Yunker, J. E. Hein, and A. Aspuru-Guzik, “ChemOS: An orchestration software to democratize autonomous discovery,”PLOS ONE, vol. 15, no. 4, p. e0229862, Apr. 2020
work page 2020
-
[19]
BioBlocks: Pro- gramming Protocols in Biology Made Easier,
V . Gupta, J. Irimia, I. Pau, and A. Rodr ´ıguez-Pat´on, “BioBlocks: Pro- gramming Protocols in Biology Made Easier,”ACS synthetic biology, vol. 6, no. 7, pp. 1230–1232, July 2017
work page 2017
-
[20]
An operating system for the biology lab,
M. Segal, “An operating system for the biology lab,”Nature, vol. 573, no. 7775, pp. S112–S113, Sept. 2019
work page 2019
-
[21]
Autonomous chemical research with large language models,
D. A. Boiko, R. MacKnight, B. Kline, and G. Gomes, “Autonomous chemical research with large language models,”Nature, vol. 624, no. 7992, pp. 570–578, Dec. 2023
work page 2023
-
[22]
Augmenting large language models with chemistry tools,
A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, and P. Schwaller, “Augmenting large language models with chemistry tools,”Nature Machine Intelligence, vol. 6, no. 5, pp. 525–535, May 2024
work page 2024
-
[23]
ORGANA: A robotic assistant for automated chemistry experimentation and characterization,
K. Darvish, M. Skreta, Y . Zhao, N. Yoshikawa, S. Som, M. Bog- danovic, Y . Cao, H. Hao, H. Xu, A. Aspuru-Guzik, A. Garg, and F. Shkurti, “ORGANA: A robotic assistant for automated chemistry experimentation and characterization,”Matter, vol. 8, no. 2, Feb. 2025
work page 2025
-
[24]
ChemOS 2.0: An orchestration architecture for chemical self-driving laboratories,
M. Sim, M. G. Vakili, F. Strieth-Kalthoff, H. Hao, R. J. Hickman, S. Miret, S. Pablo-Garc ´ıa, and A. Aspuru-Guzik, “ChemOS 2.0: An orchestration architecture for chemical self-driving laboratories,” Matter, vol. 7, no. 9, pp. 2959–2977, Sept. 2024
work page 2024
-
[25]
SiLA 2: The Next Generation Lab Automation Standard,
D. Juchli, “SiLA 2: The Next Generation Lab Automation Standard,” Advances in Biochemical Engineering/Biotechnology, vol. 182, pp. 147–174, 2022
work page 2022
-
[26]
“Hein Group / PurPOSE·GitLab,” Dec. 2023. [Online]. Available: https://gitlab.com/heingroup/purpose
work page 2023
-
[27]
PavelDoGreat/WebGL-Fluid-Simulation,
P. Dobryakov, “PavelDoGreat/WebGL-Fluid-Simulation,” Sept. 2024
work page 2024
-
[28]
J. Stam, “Stable fluids,” inProc. Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’99. USA: ACM Press/Addison-Wesley Publishing Co., July 1999, pp. 121–128
work page 1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.