arxiv: 2604.03286 · v1 · submitted 2026-03-25 · 💻 cs.AI · cond-mat.mtrl-sci· cs.HC

Recognition: 1 theorem link

· Lean Theorem

Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models

Yong Xie , Kexin He , Andres Castellanos-Gomez

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:05 UTC · model grok-4.3

classification 💻 cs.AI cond-mat.mtrl-scics.HC

keywords large language modelslaboratory automationAI agentsinstrumentation controlChatGPTautonomous systemsscientific equipment

0 comments

The pith

Large language models can generate and refine code to autonomously control laboratory instruments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that large language models such as ChatGPT can produce working scripts for operating scientific equipment, removing the need for researchers to possess programming skills. A case study implements a flexible setup usable as either a single-pixel camera or a scanning photocurrent microscope, with the model handling the required control code. The authors then extend the approach to autonomous AI agents that run the instruments on their own and improve the control logic through iteration. A sympathetic reader cares because this lowers the entry cost for custom automation in labs, letting more scientists adjust experiments without hiring programmers or learning code themselves. If the approach holds, it would speed up the creation and testing of new instrumentation configurations across many fields.

Core claim

Large language models and LLM-based AI agents can write custom control scripts for laboratory instruments and then operate those instruments independently while iteratively refining the control strategies, as demonstrated by the successful implementation of a dual-use single-pixel camera and scanning photocurrent microscope setup.

What carries the argument

LLM-based AI agents that generate, execute, and iteratively improve instrumentation control code.

If this is right

Researchers without coding experience can rapidly create and customize scripts for new experimental setups.
Instrumentation control becomes faster to adapt when switching between different measurement modes.
Autonomous agents can test and adjust operating parameters without constant human input.
The overall technical barrier for building specialized lab automation drops substantially.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent workflow could be applied to other instruments such as spectrometers or manipulators once initial safety wrappers are added.
Over time, these systems might chain together multiple instruments into end-to-end experimental pipelines with only high-level goals supplied by the user.
Validation on a broader range of hardware would show how much domain-specific fine-tuning the agents still require.
Real-world deployment would need explicit safety layers to prevent the agent from issuing commands that could harm equipment or samples.

Load-bearing premise

Code produced by the language model will be reliable and safe enough to run directly on physical instruments without frequent human checks or corrections.

What would settle it

Run the LLM-generated code on the actual single-pixel camera or photocurrent microscope hardware and check whether it completes measurements without errors, hardware damage, or the need for manual fixes, while also testing if the autonomous agent can produce measurable improvements over several cycles.

Figures

Figures reproduced from arXiv: 2604.03286 by Andres Castellanos-Gomez, Kexin He, Yong Xie.

read the original abstract

The control of complex laboratory instrumentation often requires significant programming expertise, creating a barrier for researchers lacking computational skills. This work explores the potential of large language models (LLMs), such as ChatGPT, and LLM-based artificial intelligence (AI) agents to enable efficient programming and automation of scientific equipment. Through a case study involving the implementation of a setup that can be used as a single-pixel camera or a scanning photocurrent microscope, we demonstrate how ChatGPT can facilitate the creation of custom scripts for instrumentation control, significantly reducing the technical barrier for experimental customization. Building on this capability, we further illustrate how LLM-assisted tools can be extended into autonomous AI agents capable of independently operating laboratory instruments and iteratively refining control strategies. This approach underscores the transformative role of LLM-based tools and AI agents in democratizing laboratory automation and accelerating scientific progress.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a practical case study of ChatGPT generating control code for a dual-mode single-pixel camera setup, but the autonomous agent extension lacks metrics or safety evidence.

read the letter

The main thing to know is that the paper walks through using ChatGPT to produce Python scripts for a single-pixel camera that doubles as a photocurrent microscope, then sketches an LLM agent that could iterate on parameters without constant human input. The specific instrument application is new enough to count as a fresh example in the AI-lab automation space, even if general LLM code generation is established elsewhere. It does a decent job showing the prompting steps and how the generated code interfaces with the hardware, which could help non-coders customize experiments more quickly. The authors keep the focus on lowering the programming barrier, and the description of the dual-mode setup is clear enough to follow. The soft spots sit in the autonomy claims. The case study appears to rely on human review of the generated code before any hardware run, and the text gives no error rates, success frequencies, recovery logs, or safety interlocks for physical instruments. Without those details the shift from assisted scripting to independent operation stays more aspirational than demonstrated. The stress-test note matches what is shown. This paper is for groups already working on AI agents for experimental control who want a concrete instrument example to build from. A reader focused on proof-of-concept applications will get value from the workflow description, while anyone needing benchmarks or validated autonomy will find it thin. I would send it to peer review so referees can ask for quantitative tests on the agent loop and hardware safety before any stronger claims are made.

Referee Report

3 major / 3 minor

Summary. The manuscript presents a case study on using LLMs such as ChatGPT to generate Python scripts for controlling a single-pixel camera or scanning photocurrent microscope setup. It extends this to LLM-based AI agents that independently operate instruments and iteratively refine control strategies, with the goal of reducing programming barriers in laboratory automation.

Significance. If the central claim of reliable autonomous operation holds, the work would be significant for democratizing access to complex instrumentation and accelerating scientific experimentation. The practical demonstration of LLM-assisted scripting is a clear strength, but the absence of quantitative validation metrics limits its immediate impact.

major comments (3)

[Case Study] Case Study section: The central claim that LLM agents can 'independently operate laboratory instruments and iteratively refine control strategies' is only weakly supported, as the description provides no quantitative metrics on success rates, error recovery frequency, failure modes, or required human interventions during autonomous loops.
[Case Study] Case Study section: The demonstration implies human review and validation of generated code before hardware execution, which directly undermines the assertion of full autonomy without frequent human intervention.
[Case Study] Case Study section: No details are provided on safety interlocks, hardware error handling, or physical instrument safeguards, which are load-bearing requirements for any claim of direct autonomous control on real equipment.

minor comments (3)

Clarify the precise agent architecture, including how iteration loops are implemented and what triggers termination or human escalation.
Include example prompts and full generated code snippets to improve reproducibility.
Add a dedicated section or paragraph discussing limitations, including reliability risks of LLM-generated code.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important aspects of the case study that require clarification and additional discussion. We address each major comment below and have revised the manuscript to strengthen the presentation of our work as a proof-of-concept demonstration rather than a fully validated autonomous system.

read point-by-point responses

Referee: The central claim that LLM agents can 'independently operate laboratory instruments and iteratively refine control strategies' is only weakly supported, as the description provides no quantitative metrics on success rates, error recovery frequency, failure modes, or required human interventions during autonomous loops.

Authors: We agree that the case study is illustrative and does not include quantitative performance metrics. The manuscript is framed as an exploration of feasibility through a specific example rather than a statistical evaluation of reliability. In the revised manuscript, we have added a dedicated limitations subsection that explicitly states the absence of such metrics, discusses potential failure modes observed during development, and outlines future work to collect quantitative data on success rates and intervention frequency. revision: yes
Referee: The demonstration implies human review and validation of generated code before hardware execution, which directly undermines the assertion of full autonomy without frequent human intervention.

Authors: The original text did describe human review during the initial script generation phase. We have revised the Case Study section to distinguish between the one-time setup (where human validation occurs) and the subsequent autonomous loops in which the agent executes, monitors, and refines strategies with reduced intervention. The revised wording clarifies that 'full autonomy' refers to the agent's operation within a controlled loop after initial configuration, while acknowledging that complete hands-off operation from start to finish is not claimed. revision: yes
Referee: No details are provided on safety interlocks, hardware error handling, or physical instrument safeguards, which are load-bearing requirements for any claim of direct autonomous control on real equipment.

Authors: We acknowledge that the manuscript does not address hardware-level safety mechanisms. The focus of this work is on the LLM-driven software layer for script generation and agent-based control. In the revised discussion, we have added a paragraph noting that any real-world deployment would require appropriate safety interlocks, error handling, and physical safeguards, which are outside the scope of the present software-oriented case study. We do not claim to have implemented or tested such hardware protections. revision: partial

Circularity Check

0 steps flagged

No circularity: practical case study without derivations or self-referential predictions

full rationale

The paper is a demonstration of LLM use for generating control scripts and agent-based iteration in a lab setup, with no equations, fitted parameters, uniqueness theorems, or predictions that reduce to inputs by construction. The central narrative relies on empirical examples of ChatGPT-assisted Python scripting for a photocurrent microscope rather than any closed derivation chain. Self-citations, if present, are not load-bearing for any claimed result and do not substitute for external validation. This matches the default expectation for non-circular practical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on the existing capability of LLMs for code generation and standard assumptions about instrument interfaces; no free parameters, new axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5442 in / 1117 out tokens · 59041 ms · 2026-05-15T00:05:20.569076+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Through a case study involving the implementation of a setup that can be used as a single-pixel camera or a scanning photocurrent microscope, we demonstrate how ChatGPT can facilitate the creation of custom scripts...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

Binnig, H

G. Binnig, H. Rohrer, C. Gerber, E. Weibel, Tunneling through a controllable vacuum gap, Appl. Phys. Lett. (1982) 40, 178–180. https://doi.org/10.1063/1.92999

work page doi:10.1063/1.92999 1982
[2]

Binnig, H

G. Binnig, H. Rohrer, Scanning tunneling microscopy, Surf. Sci. (1983),126 (1-3), 236–244. https://doi.org/10.1016/0039-6028(83)90716-1

work page doi:10.1016/0039-6028(83)90716-1 1983
[3]

Binnig, H

G. Binnig, H. Rohrer, C. Gerber, E. Weibel, 7× 7 Reconstruction on Si (111) resolved in real space, Phys. Rev. Lett. (1983), 50 (2) , 120–123. https://doi.org/10.1103/PhysRevLett.50.120

work page doi:10.1103/physrevlett.50.120 1983
[4]

Betzig, G.H

E. Betzig, G.H. Patterson, R. Sougrat, O.W. Lindwasser, S. Olenych, J.S. Bonifacino, M.W. Davidson, J. Lippincott -Schwartz, H.F. Hess, Imaging intracell ular fluorescent proteins at nanometer resolution, Science (2006), 313 (5793) , 1642–1645. https://doi.org/10.1126/science.1127344

work page doi:10.1126/science.1127344 2006
[5]

Moerner, L

W.E. Moerner, L. Kador, Optical detection and spectroscopy of single molecules in a solid, Phys. Rev. Lett. (1989), 62 (21), 2535–2538. https://doi.org/10.1103/PhysRevLett.62.2535

work page doi:10.1103/physrevlett.62.2535 1989
[6]

Journal of Microscopy 127(2), 127–138 (1982) https://doi.org/10.1111/j.1365-2818.1982.tb00405.x

J. Dubochet, J. Lepault , R. Freeman, J. Berriman, J. -C. Homo, Electron microscopy of frozen water and aqueous solutions, J. Microsc. (1982), 128 (3) , 219–237. https://doi.org/10.1111/j.1365-2818.1982.tb04625.x

work page doi:10.1111/j.1365-2818.1982.tb04625.x 1982
[7]

Frank, Averaging of low exposure electron micrographs of nonperiodic objects, Ultramicroscopy (1975), 8, 159–162

J. Frank, Averaging of low exposure electron micrographs of nonperiodic objects, Ultramicroscopy (1975), 8, 159–162. https://doi.org/10.1016/S0304-3991(75)80020-9 14

work page doi:10.1016/s0304-3991(75)80020-9 1975
[8]

Henderson, P.N.T

R. Henderson, P.N.T. Unwin, Three-dimensional model of purple membrane obtained by electron microscopy, Nature (1975), 257 (5521) , 28–32. https://doi.org/10.1038/257028a0

work page doi:10.1038/257028a0 1975
[9]

Frank, P

S. Frank, P. Poncharal, Z.L. Wang, W.A. de Heer, Carbon Nanotube Quantum Resistors, Science (1998), 280 (5370),1744–1746. https://doi.org/10.1126/science.280.5370.1744

work page doi:10.1126/science.280.5370.1744 1998
[10]

Reuter, R

C. Reuter, R. Frisenda, D.Y. Lin, T.S. Ko, D. Perez de Lara, A. Castellanos-Gomez, A versatile scanning photocurrent mapping system to characterize optoelectronic devices based on 2D materials, Small Methods (2017), 1(7), 1700119. https://doi.org/10.1002/smtd.201700119

work page doi:10.1002/smtd.201700119 2017
[11]

T. Dai, S. Vijayakrishnan, F. T. Szczypiński, J.-F. Ayme, E. Simaei, T. Fellowes, R. Clowes, L. Kotopanov, C. E. Shields, Z. Zhou, J. W. Ward, & A. I. Cooper, Autonomous mobile robots for exploratory synthetic chemistry, Nature (2024), 635, 890–897. https://doi.org/10.1038/s41586-024-08173-7

work page doi:10.1038/s41586-024-08173-7 2024
[12]

B. Hou, J. Wu, & D. Y. Qiu , Unsupervised representation learning of Kohn –Sham states and consequences for downstream predictions of many -body effects, Nature Communications (2024), 15, 9481. https://doi.org/10.1038/s41467-024-53748-7

work page doi:10.1038/s41467-024-53748-7 2024
[13]

H. Yang, R. Hu, H. Wu, X. He, Y. Zhou, Y. Xue, K. He, W. Hu, H. Chen, M. Gong, X. Zhang, P. -H. Tan, E. R. Herná ndez, Y. Xie , Identification and Structural Characterization of Twisted Atomically Thin Bilayer Materials by Deep Learning, Nano Lett. (2024), 24 (9), 2789–2797. https://doi.org/10.1021/acs.nanolett.3c04815

work page doi:10.1021/acs.nanolett.3c04815 2024
[14]

Buriak, D

J.M. Buriak, D. Akinwande, N. Artzi, et al., Best Practices for Using AI When Writing Scientific Manuscripts, ACS Nano (2023), 17(5), 4091–4093. https://doi.org/10.1021/acsnano.3c01544

work page doi:10.1021/acsnano.3c01544 2023
[15]

Castellanos-Gomez, Good Practices for Scientific Article Writing with ChatGPT and Other Artificial Intelligence Language Models, Nanomanufacturing (2023), 3(2), 135–138

A. Castellanos-Gomez, Good Practices for Scientific Article Writing with ChatGPT and Other Artificial Intelligence Language Models, Nanomanufacturing (2023), 3(2), 135–138. https://doi.org/10.3390/nanomanufacturing3020009

work page doi:10.3390/nanomanufacturing3020009 2023
[16]

Zhang, Z

X. Zhang, Z. Zhou, C. Ming, Y. -Y. Sun, GPT -Assisted Learning of Structure – Property Relationships by Graph Neural Networks: Application to Rare -Earth-Doped Phosphors, J. Phys. Chem. Lett. (2023), 14 (50) , 11342–11349. https://doi.org/10.1021/acs.jpclett.3c02848

work page doi:10.1021/acs.jpclett.3c02848 2023
[17]

Park, S.E

Y.J. Park, S.E. Jerng, S. Yoon, J. Li, 1.5 Million Materials Narratives Generated by Chatbots, Scientific Data , (2024), 11(1), 1060. https:/doi.org/10.1038/s41597-024- 03886-w

work page doi:10.1038/s41597-024- 2024
[18]

Y.J. Park, D. Kaplan, Z. Ren, C. -W. Hsu, C. Li, H. Xu, S. Li, J. Li, Can ChatGPT Be Used to Generate Scientific Hypotheses? J.Materiomics (2024), 10 (3) , 578–584. https://doi.org/10.1016/j.jmat.2023.08.007 15

work page doi:10.1016/j.jmat.2023.08.007 2024
[19]

Buriak, M.C

J.M. Buriak, M.C. Hersam, P.V. Kamat, Can ChatGPT and Other AI Bots Serve as Peer Reviewers? ACS Energy Lett. (2023), 9 (1) , 191–192. https://pubs.acs.org/doi/10.1021/acsenergylett.3c02586

work page doi:10.1021/acsenergylett.3c02586 2023
[20]

Z. Ren, Z. Ren, Z. Zhang, T. Buonassisi, J. Li, Autonomous experiments using active learning and AI, Nature Reviews Materials (2023), 8 (9) , 563–564. https://doi.org/10.1038/s41578-023-00588-4

work page doi:10.1038/s41578-023-00588-4 2023
[21]

Organa: A robotic assistant for automated chemistry experimentation and characterization

K. Darvish, M. Skreta, Y. Zhao, Y. Naruki, S. Sagnik, B. Miroslav, C. Yang, H. Han, X. Haoping, A, Alá n, G. Animesh, S. Florian, ORGANA: A Robotic Assi stant for Automated Chemistry Experimentation and Characterization, arXiv preprint (2024) arXiv:2401.06949. https://doi.org/10.48550/arXiv.2401.06949

work page doi:10.48550/arxiv.2401.06949 2024
[22]

S. Tao, L. Man, Z. Xiao, C. Lin, H. Yan, C. Jia, Z. Qing, L. Dao, Z. Bai, Z. Gang, Z. Guo, Z. Fei, S. Wei, F. Yao, J. Jiang, L. Yi, A multiagent -driven robotic ai chemist enabling autonomous chemical research on demand , Journal of the American Chemical Society, (2025), 147(15) ,12534-12545. https://doi.org/10.1021/jacs.4c17738

work page doi:10.1021/jacs.4c17738 2025
[23]

X. Shan, Y. Pan, F. Cai, H. Gao, J. Xu, D. Liu, Q. Zhu, P. Li, Z. Jin, J. Jiang, M. Zhou, Accelerating the Discovery of Efficient High -Entropy Alloy Electrocatalysts: High-Throughput Experimentation and Data-Driven Strategies, Nano Lett. (2024) 24(37) , 11632-11640. https://doi.org/10.1021/acs.nanolett.4c03208

work page doi:10.1021/acs.nanolett.4c03208 2024
[24]

Y. Pan, X. Shan, F. Cai, H. Gao, J. Xu, M. Zhou, Accelerating the Discovery of Oxygen Reduction Electrocatalysts: High ‐ Throughput Screening of Element Combinations in Pt ‐Based High‐Entropy Alloys, Angewandte Chemie International Edition, (2024), 63(37) , e202407116. https://doi.org/10.1002/anie.202407116

work page doi:10.1002/anie.202407116 2024
[25]

Fé bba, K

D. Fé bba, K. Egbo, W.A. Callahan, A. Zakutayev, From text to test: AI -generated control software for materials science instruments, Digital Discovery (2025), 4 , 35–45. https://www.sciencedirect.com/science/article/pii/S2635098X24002134

work page 2025
[26]

Boiko, R

D.A. Boiko, R. MacKnight, B. Kline, G. Gomes, Autonomous chemical research with large language models, Nature (2023), 624, 570–578. https://www.nature.com/articles/s41586-023-06792-0

work page 2023
[27]

Yoshikawa, M

N. Yoshikawa, M. Skreta, K. Darvish, S. Arellano -Rubach, Z. Ji, L.B. Kristensen, A.Z. Li, Y. Zhao, H. Xu, A. Kuramshin, A. Aspuru -Guzik, F. Shkurti, A. Garg, Large language models for chemistry robotics, Autonomous Robots (2023), 47, 1057–1086. https://link.springer.com/article/10.1007/s10514-023-10136-2 16

work page doi:10.1007/s10514-023-10136-2 2023
[28]

S. G. Baird, T. D. Sparks, What is a minimal working example for a self -driving laboratory? Matter(2022), 5 (12), 4170–4178. https://doi.org/10.1016/j.matt.2022.11.007

work page doi:10.1016/j.matt.2022.11.007 2022