PolyJarvis: LLM Agent for Autonomous Polymer MD Simulations
Pith reviewed 2026-05-13 21:10 UTC · model grok-4.3
The pith
An LLM agent autonomously executes full molecular dynamics workflows for polymers from natural language input and produces property predictions consistent with expert simulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PolyJarvis demonstrates that an LLM-driven agent can autonomously execute polymer MD workflows from natural language input by handling monomer construction, charge assignment, polymerization, force field parameterization, GPU equilibration, and property extraction, yielding results consistent with expert-run simulations on four common polymers.
What carries the argument
An LLM agent that translates polymer descriptions into complete simulation command sequences and executes them end-to-end without further human input.
If this is right
- Density predictions fall within 0.1 to 4.8 percent of reference values for the four polymers examined.
- Bulk modulus predictions fall within 17 to 24 percent of reference values for aPS and PMMA.
- Glass transition temperature for PMMA lies within 10 to 18 K of experiment, while the other three polymers show offsets attributable to simulation cooling-rate bias.
- Of eight property-polymer pairs with direct experimental references, five satisfy the paper's strict acceptance criteria.
- Discrepancies arise primarily from intrinsic MD limitations rather than agent orchestration errors.
Where Pith is reading between the lines
- The same agent pattern could be applied to other classes of soft materials once the underlying simulation platform supports them.
- Repeated runs on varied input phrasings could reveal whether small wording changes produce statistically different property outputs.
- Coupling the agent to an experimental database might allow it to flag cases where predicted values deviate sharply from measured ones and suggest follow-up simulations.
Load-bearing premise
The language model must correctly orchestrate every simulation step including force-field choice and equilibration without adding errors beyond ordinary MD limitations, and the four-polymer test set must generalize.
What would settle it
Running the agent on a fifth polymer outside the original test set and comparing its predicted density or glass transition temperature directly against new experimental measurements or independent expert simulations.
Figures
read the original abstract
All-atom molecular dynamics (MD) simulations can predict polymer properties from molecular structure, yet their execution requires specialized expertise in force field selection, system construction, equilibration, and property extraction. We present PolyJarvis, an agent that couples a large language model (LLM) with the RadonPy simulation platform through Model Context Protocol (MCP) servers, enabling end-to-end polymer property prediction from natural language input. Given a polymer name or SMILES string, PolyJarvis autonomously executes monomer construction, charge assignment, polymerization, force field parameterization, GPU-accelerated equilibration, and property calculation. Validation is conducted on polyethylene (PE), atactic polystyrene (aPS), poly(methyl methacrylate) (PMMA), and poly(ethylene glycol) (PEG). Results show density predictions within 0.1--4.8% and bulk moduli within 17--24% of reference values for aPS and PMMA. PMMA glass transition temperature (Tg) (395~K) matches experiment within +10--18~K, while the remaining three polymers overestimate Tg by +38 to +47K (vs upper experimental bounds). Of the 8 property--polymer combinations with directly comparable experimental references, 5 meet strict acceptance criteria. For cases lacking suitable amorphous-phase experimental, agreement with prior MD literature is reported separately. The remaining Tg failures are attributable primarily to the intrinsic MD cooling-rate bias rather than agent error. This work demonstrates that LLM-driven agents can autonomously execute polymer MD workflows producing results consistent with expert-run simulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents PolyJarvis, an LLM agent coupled to the RadonPy MD platform via MCP servers that autonomously executes end-to-end polymer simulations (monomer construction, force-field assignment, polymerization, equilibration, and property extraction) from natural-language input. Validation on PE, aPS, PMMA, and PEG reports density agreement within 0.1–4.8 % and bulk-modulus agreement within 17–24 % of reference values for two polymers, with Tg values matching experiment within +10–18 K for PMMA and overestimating by +38–47 K for the others; five of eight comparable property–polymer pairs meet acceptance criteria, with discrepancies attributed to intrinsic MD cooling-rate bias rather than agent error.
Significance. If the central claim holds under tighter validation, the work demonstrates a practical route to autonomous execution of standard polymer MD workflows, which could reduce the expertise barrier for routine property prediction and accelerate high-throughput materials screening. The use of an established open platform (RadonPy) rather than a custom simulator is a strength that facilitates reproducibility.
major comments (2)
- [Abstract / Validation] Abstract and validation section: the claim that PolyJarvis produces results 'consistent with expert-run simulations' is not supported by any direct comparison. The reported numbers are compared only to external experimental references and prior literature; no controlled parallel runs are described in which an expert manually scripts the identical RadonPy workflow (same force field, chain length, equilibration protocol, cooling rate) and the numerical outputs are differenced against the agent results. Without this baseline, agent-induced orchestration errors cannot be separated from standard MD artifacts, rendering the attribution of the +38–47 K Tg overestimates to 'cooling-rate bias rather than agent error' untested.
- [Validation] Validation results: no error bars, number of independent runs, or statistical protocols are supplied for the quoted percentage agreements (density 0.1–4.8 %, moduli 17–24 %). Single-trajectory MD outputs are known to fluctuate; the absence of these details makes it impossible to judge whether the reported agreements are statistically meaningful or merely within the range of typical MD variability.
minor comments (2)
- [Methods] The manuscript should clarify the exact MCP server interface and the prompts used for each workflow stage so that other groups can reproduce the agent behavior.
- [Results] Figure captions and tables listing the eight property–polymer comparisons should explicitly state the experimental reference source and the precise acceptance criteria applied.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, indicating where revisions have been made to improve clarity and rigor while maintaining the integrity of the reported results.
read point-by-point responses
-
Referee: [Abstract / Validation] Abstract and validation section: the claim that PolyJarvis produces results 'consistent with expert-run simulations' is not supported by any direct comparison. The reported numbers are compared only to external experimental references and prior literature; no controlled parallel runs are described in which an expert manually scripts the identical RadonPy workflow (same force field, chain length, equilibration protocol, cooling rate) and the numerical outputs are differenced against the agent results. Without this baseline, agent-induced orchestration errors cannot be separated from standard MD artifacts, rendering the attribution of the +38–47 K Tg overestimates to 'cooling-rate bias rather than agent error' untested.
Authors: We acknowledge the referee's point that a direct side-by-side comparison with manually scripted expert runs using identical parameters would provide stronger evidence for separating agent orchestration from inherent MD variability. Our validation followed standard practice by benchmarking against experimental data and prior MD literature using comparable RadonPy protocols. Execution logs in the supplementary materials confirm that the agent adhered precisely to the defined workflow steps without introducing deviations. To address the concern, we have revised the abstract and validation section to replace 'consistent with expert-run simulations' with 'in agreement with experimental references and prior MD literature using similar protocols,' and we have added an explicit statement noting the lack of direct expert-agent benchmarking as a limitation for future work. We retain the attribution of Tg discrepancies to cooling-rate bias because the agent's role was limited to faithful execution of the established protocol. revision: partial
-
Referee: [Validation] Validation results: no error bars, number of independent runs, or statistical protocols are supplied for the quoted percentage agreements (density 0.1–4.8 %, moduli 17–24 %). Single-trajectory MD outputs are known to fluctuate; the absence of these details makes it impossible to judge whether the reported agreements are statistically meaningful or merely within the range of typical MD variability.
Authors: We agree that including statistical details strengthens the validation. In the revised manuscript, we have added the number of independent runs performed (three replicates for density and bulk modulus calculations) along with error bars representing the standard deviation across replicates. For glass transition temperature, derived from single cooling trajectories per polymer, we have included a note on typical MD variability for such protocols as reported in the literature. These additions clarify that the reported agreements fall within expected MD fluctuations while remaining consistent with reference values. revision: yes
Circularity Check
No circularity: validation uses external experimental and literature benchmarks
full rationale
The paper automates standard RadonPy MD workflows (monomer construction, force-field assignment, equilibration, property extraction) via LLM orchestration. Reported densities, moduli, and Tg values are direct simulation outputs compared to independent experimental references and prior MD literature, not to any parameters fitted inside this work. No equation or result reduces to a self-definition, renamed fit, or self-citation chain; the central claim of consistency with expert-run simulations rests on external falsifiability rather than internal construction. Attribution of Tg overestimates to cooling-rate bias is post-hoc interpretation, not a load-bearing derivation step.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption All-atom molecular dynamics simulations with appropriate force fields and equilibration can predict polymer properties such as density and glass transition temperature.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PolyJarvis autonomously executes monomer construction, charge assignment, polymerization, force field parameterization, GPU-accelerated equilibration, and property calculation.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Tg overestimates attributed primarily to the intrinsic MD cooling-rate bias rather than agent error.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows
Lang2MLIP is an LLM multi-agent framework that automates end-to-end development of machine learning interatomic potentials from natural language input for heterogeneous materials systems.
-
SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation
SIGA adapters let off-the-shelf coding agents produce complete, valid configurations for multiphysics simulators like GEOS in minutes rather than hours, with self-evolution further improving performance on held-out cases.
Reference graph
Works this paper leans on
-
[1]
(2) Grotendorst, J., Attig, N., Bl¨ ugel, S., Marx, D., Eds.Multiscale Simulation Methods in Molecular Sciences; J¨ ulich Supercomputing Centre, 2009; NIC Series, Vol
work page 2009
-
[2]
17 (3) Larsen, G. S.; Lin, P.; Hart, K. E.; Colina, C. M. Molecular Simulations of PIM-1-like Polymers of Intrinsic Microporosity.Macromolecules2011,44, 6944–6951. (4) Hayashi, Y.; Shiomi, J.; Morikawa, J.; Yoshida, R. RadonPy: Automated Physical Property Calculation Using All-Atom Classical Molecular Dynamics Simulations for Polymer Informatics.npj Compu...
work page 2022
-
[3]
N.; Dienstfrey, A.; Browning, A
(5) Patrone, P. N.; Dienstfrey, A.; Browning, A. R.; Tucker, S.; Christensen, S. Uncertainty Quantification in Molecular Dynamics Studies of the Glass Transition Temperature. Polymer2016,87, 246–259. (6) Suter, J. L.; M¨ uller, W. A.; Vassaux, M.; Anastasiou, A.; Simmons, M.; Tilbrook, D.; Coveney, P. V. Rapid, Accurate and Reproducible Prediction of the ...
work page 2025
-
[4]
(8) Abbott, L. J.; Hart, K. E.; Colina, C. M. Polymatic: A Generalized Simulated Polymer- ization Algorithm for Amorphous Polymers Using Simulated Annealing.Theor. Chem. Acc.2013,132,
work page 2013
-
[5]
(9) Gr¨ unewald, F.; Alessandri, R.; Kroon, P. C.; Monticelli, L.; Souza, P. C. T.; Mar- rink, S. J. Polyply; a Python Suite for Facilitating Simulations of Macromolecules and Nanomaterials.Nat. Commun.2022,13,
work page 2022
-
[6]
(10) Simm, G. N. C. et al. SimPoly: Simulation of Polymers with Machine Learning Force Fields Derived from First Principles. 2025; Introduces the PolyArena benchmark of experimental bulk properties for 130 polymers. 18 (11) Tao, L.; Varshney, V.; Li, Y. Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperatur...
work page 2025
-
[7]
Toward Automated Simulation Research Workflow through LLM Prompt Engineering Design.J
(16) Liu, Z.; Chai, Y.; Li, J. Toward Automated Simulation Research Workflow through LLM Prompt Engineering Design.J. Chem. Inf. Model.2025,65, 114–124. (17) Chaudhari, A.; Ock, J.; Barati Farimani, A. Modular Large Language Model Agents for Multi-Task Computational Materials Science. ChemRxiv preprint,
work page 2025
-
[8]
S.; Vinchurkar, T.; Jadhav, Y.; Farimani, A
(18) Ock, J.; Meda, R. S.; Vinchurkar, T.; Jadhav, Y.; Farimani, A. B. Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via a Large Language Model Agent.Digital Discovery2026, (19) Zhuang, G.; Farimani, A. B. From Natural Language to Materials Discovery: The Materials Knowledge Navigation Agent.arXiv preprint arXiv:2602.11123202...
-
[9]
19 (21) Chandrasekhar, A.; Farimani, A. B. Automating MD Simulations for Proteins Using Large Language Models: NAMD-Agent. 2025; arXiv preprint arXiv:2507.07887. (22) Shi, Z.; Xin, C.; Huo, T.; Jiang, Y.; Wu, B.; Chen, X.; Qin, W.; Ma, X.; Huang, G.; Wang, Z.; Jing, X. A Fine-Tuned Large Language Model Based Molecular Dynamics Agent for Code Generation to...
-
[10]
Open standard for connecting AI assistants to external data and tools. (25) Thompson, A. P.; Aktulga, H. M.; Berger, R.; Bolintineanu, D. S.; Brown, W. M.; Crozier, P. S.; in ’t Veld, P. J.; Kohlmeyer, A.; Moore, S. G.; Nguyen, T. D.; others LAMMPS—A Flexible Simulation Tool for Particle-Based Materials Modeling at the Atomic, Meso, and Continuum Scales.C...
work page 2022
-
[11]
(29) Boyer, R. F. Glass Temperatures of Polyethylene.Macromolecules1973,6, 288–299. 20 (30) Yang, Q.; Chen, X.; He, Z.; Lan, F.; Liu, H. The Glass Transition Temperature Mea- surements of Polyethylene: Determined by Using Molecular Dynamic Method.RSC Adv.2016,6, 12053–12060. (31) Soldera, A.; Metatla, N. Glass Transition of Polymers: Atomistic Simulation ...
work page 2016
-
[12]
(42) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field.J. Comput. Chem.2004,25, 1157–1174. (43) Tr¨ ag, J.; Zahn, D. Improved GAFF2 Parameters for Fluorinated Alkanes and Mixed Hydro- and Fluorocarbons.J. Mol. Model.2019,25,
work page 2004
-
[13]
(44) Smith, D. G. A.; Burns, L. A.; Simmonett, A. C.; Parrish, R. M.; Schieber, M. C.; Galvelis, R.; Kraus, P.; Kruse, H.; Di Remigio, R.; Alenaizan, A.; others Psi4 1.4: Open- Source Software for High-Throughput Quantum Chemistry.J. Chem. Phys.2020,152, 184108. (45) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A Well-Behaved Electrostatic Poten...
work page 2020
-
[14]
22 Supplementary Information MCP Tool Inventory and Server Architecture PolyJarvis communicates with two independent MCP servers, each wrapping a distinct com- putational backend (Table 9). TheRadonPy serverruns on the user’s workstation and exposes molecular construction, force field assignment, and analysis tools. It wraps the RadonPy library 4 (v0.2.10...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.