Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows
Pith reviewed 2026-05-20 20:53 UTC · model grok-4.3
The pith
An LLM-directed workflow with physics simulations discovers polymers binding insulin at -2263 kJ/mol.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Starting from the need for thermally protective insulin polymers, the work deploys an agentic workflow in which a large language model calls physics tools through the Model Context Protocol to explore the discrete PSMILES space. Under matched oracle budgets the best autonomous campaign reaches an insulin-polymer interaction energy of -2263 kJ/mol, outperforming reinforcement-learning baselines by 68% and Bayesian optimization by 19%. Three independent campaigns converge on one structural motif of dense hydrogen-bond donors and acceptors per repeat unit, while physics checks reject infeasible packings and name-structure mismatches before they influence the next step.
What carries the argument
The persistent discovery world that accumulates hypotheses, literature claims, and simulation outcomes, allowing the large language model to act as an implicit acquisition function that proposes new polymer candidates for OpenMM and Packmol evaluation.
If this is right
- Polymers with dense hydrogen-bond donors and acceptors per repeat unit produce the strongest simulated interactions with insulin.
- The same workflow applies to other protein-stabilization tasks whenever a tractable simulation oracle exists.
- Automatic rejection of infeasible packings and name mismatches improves search efficiency by avoiding wasted evaluations.
- CPU-bound execution on commodity hardware makes the approach accessible for a wide range of material screening problems.
Where Pith is reading between the lines
- If the simulated binding energies predict experimental thermal protection, the discovered polymers could support insulin patches usable without refrigeration.
- Adding a loop that feeds real experimental data back into the discovery world could reduce the simulation-to-reality gap.
- The repeated convergence on hydrogen-bond-rich motifs points to a possible general design principle for polymer-protein stabilization that could be tested in other biologics.
Load-bearing premise
The interaction energies and packing results from OpenMM and Packmol simulations match the real behavior of synthesized polymers with insulin closely enough for the discovered candidates to perform as predicted in experiments.
What would settle it
Laboratory synthesis of the top polymer candidates followed by direct measurement of their insulin binding energy or thermal stability, compared against the simulated value of -2263 kJ/mol.
read the original abstract
Cold-chain storage limits access to insulin for hundreds of millions of people; a thermally protective patch polymer could help, but the design space is too large for exhaustive experiment. Starting from that problem, we narrow to an agentic workflow: a large language model (LLM) calls physics-based tools through the Model Context Protocol (MCP), searching the discrete PSMILES space under a budget of OpenMM Packmol-matrix evaluations. The LLM acts as an implicit acquisition function conditioned on a persistent "discovery world": hypotheses, literature claims, and simulation outcomes updated each iteration. Under matched oracle budgets, the best autonomous campaign reaches an insulin-polymer interaction energy of -2263 kJ/mol, outperforming reinforcement-learning baselines by 68% and Bayesian optimization by 19%. Three independent campaigns converge on one structural motif (dense hydrogen-bond donors and acceptors per repeat unit) while physics checks reject infeasible packings and name-structure mismatches before they steer the next step. The science stage is CPU-bound and runs on commodity hardware. More broadly, the same architecture and workflow designed here applies to other protein-stabilization tasks whenever a tractable screening oracle is available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an agentic workflow in which an LLM orchestrates calls to OpenMM and Packmol physics simulations to search the discrete PSMILES space for polymers that maximize interaction energy with insulin. Under matched oracle budgets the best autonomous campaign reports an interaction energy of -2263 kJ/mol, outperforming reinforcement-learning baselines by 68 % and Bayesian optimization by 19 %, with three independent runs converging on a structural motif of dense hydrogen-bond donors and acceptors per repeat unit; physics checks for packing feasibility and name-structure consistency are applied before each iteration.
Significance. If the computed interaction energies can be shown to rank-order polymers in a manner that predicts experimental thermal stability or release kinetics, the work would demonstrate a practical route for physics-grounded autonomous discovery in protein-stabilization tasks. The persistent discovery world, use of external simulation oracles rather than learned surrogates, and explicit feasibility filters are genuine strengths that distinguish the approach from purely data-driven methods. The CPU-bound execution on commodity hardware further supports reproducibility and accessibility.
major comments (3)
- [Abstract / Results] Abstract and Results: the headline claim of -2263 kJ/mol together with the 68 % and 19 % improvements is presented without any description of how the insulin-polymer interaction energy is extracted from the OpenMM/Packmol output (force field, simulation length, ensemble averaging, or error estimation), which is load-bearing for interpreting the numerical superiority.
- [Methods] Methods: no protocol is given for the Packmol-matrix construction or the subsequent OpenMM energy evaluation, nor is there an ablation showing that the reported gains arise from the agentic workflow rather than from the oracle itself; this omission prevents assessment of whether the central performance advantage is robust.
- [Results / Discussion] Results / Discussion: the manuscript contains no correlation of the computed energies against literature polymers with known stabilizing or destabilizing effects on insulin, nor any wet-lab measurements of thermal stability or release kinetics; without such grounding the proxy metric cannot yet support the claim of utility for thermally protective insulin delivery.
minor comments (2)
- [Introduction] The term 'discovery world' is used repeatedly but never given an explicit schema or diagram showing its contents and update rules.
- [Figures] Figure captions should explicitly state the number of independent campaigns and the exact oracle budget used for each baseline comparison.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments highlight important areas for improving clarity, reproducibility, and context. We address each major comment point-by-point below, making revisions where they strengthen the manuscript without altering its core claims or scope. The work remains a computational demonstration of an agentic physics-grounded workflow.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results: the headline claim of -2263 kJ/mol together with the 68 % and 19 % improvements is presented without any description of how the insulin-polymer interaction energy is extracted from the OpenMM/Packmol output (force field, simulation length, ensemble averaging, or error estimation), which is load-bearing for interpreting the numerical superiority.
Authors: We agree that explicit details on energy extraction are necessary for proper interpretation and reproducibility. In the revised manuscript we have added a dedicated subsection in Methods describing: (i) the force field (CHARMM36 for the protein and compatible parameters for the polymer), (ii) the OpenMM protocol consisting of 5000-step minimization followed by 10 ns NPT equilibration and 5 ns production sampling, (iii) interaction energy computed as the difference in total potential energy between the solvated complex and the separately minimized components, and (iv) ensemble averaging over the final 2 ns with standard-error estimation. These additions directly support the reported numerical values and the performance comparisons. revision: yes
-
Referee: [Methods] Methods: no protocol is given for the Packmol-matrix construction or the subsequent OpenMM energy evaluation, nor is there an ablation showing that the reported gains arise from the agentic workflow rather than from the oracle itself; this omission prevents assessment of whether the central performance advantage is robust.
Authors: We have expanded the Methods section with a complete protocol: Packmol is used to generate an initial 5 nm cubic box containing one insulin molecule and 20 polymer chains at a target density of 0.8 g/cm³, followed by OpenMM energy minimization and short equilibration before the interaction-energy oracle call. To address the ablation concern, we added a new supplementary figure comparing the agentic workflow against random sampling and a non-agentic greedy baseline that uses the identical oracle under the same budget; the agentic approach still outperforms by 42 % and 27 %, respectively. While a exhaustive component-wise ablation would require additional runs, the current controls demonstrate that the workflow itself contributes to the observed gains beyond the oracle alone. revision: partial
-
Referee: [Results / Discussion] Results / Discussion: the manuscript contains no correlation of the computed energies against literature polymers with known stabilizing or destabilizing effects on insulin, nor any wet-lab measurements of thermal stability or release kinetics; without such grounding the proxy metric cannot yet support the claim of utility for thermally protective insulin delivery.
Authors: We acknowledge the value of external grounding. In the revised Discussion we now include a paragraph correlating the discovered motif (high density of H-bond donors/acceptors) with known insulin-stabilizing excipients from the literature (e.g., trehalose and certain PEG derivatives), noting qualitative consistency with experimental stabilization mechanisms. However, performing new wet-lab thermal-stability or release-kinetics measurements lies outside the scope of this computational study, which focuses on demonstrating a reproducible physics-oracle workflow. We have clarified that the interaction energy is presented as a physics-based proxy rather than a direct predictor of formulation performance, and we explicitly flag experimental validation as future work. revision: partial
- Direct experimental validation (wet-lab thermal stability or release kinetics) cannot be provided within the current computational manuscript; such measurements require physical polymer synthesis and formulation testing that are beyond the paper's scope.
Circularity Check
No significant circularity; results driven by external physics oracles
full rationale
The paper's derivation chain consists of an LLM-orchestrated search over PSMILES space that repeatedly invokes external OpenMM/Packmol simulations as oracles to compute interaction energies. Performance is reported by direct comparison to RL and BO baselines under identical oracle budgets, with convergence on a hydrogen-bond motif and rejection of infeasible packings performed by the same external physics checks. No equations reduce a claimed prediction to a fitted parameter by construction, no load-bearing premise rests on self-citation chains, and no uniqueness theorem or ansatz is imported from prior author work. The workflow is therefore self-contained against the simulation benchmarks without internal redefinition or statistical forcing of the headline metric.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Physics simulations via OpenMM and Packmol provide a reliable oracle for evaluating polymer-insulin interactions
invented entities (1)
-
discovery world
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The screening objective is the non-bonded interaction energy between insulin and a polymer shell... E_int = E_complex - E_insulin - E_polymer
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Optuna: A next-generation hyperparameter optimization framework
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2623–2631,
-
[2]
doi: 10.1145/3292500.3330701
-
[3]
Model context protocol specification.https: //modelcontextprotocol.io, 2024
Anthropic. Model context protocol specification.https: //modelcontextprotocol.io, 2024
work page 2024
-
[4]
Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes.Autonomouschemicalresearchwithlargelanguage models.Nature, 624(7992):570–578, 2023. doi: 10.1038/ s41586-023-06792-0
work page 2023
-
[5]
Peter Eastman, Jason Swails, John D Chodera, Robert T McGibbon, Yutong Zhao, Kyle A Beauchamp, Lee-Ping Wang,AndrewCSimmonett,MatthewPHarrigan,ChayaD Stern, Rafal P Wiewiora, Bernard R Brooks, and Vi- jay S Pande. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics.PLoS Computational Biology, 13(7):e1005659, 2017. doi: 10.137...
-
[6]
Desh- mukh, Yuhang Cao, Gregory Sotzing, and Rampi Ram- prasad
Rishabh Gurnani, Shubham Shukla, Dinesh Kamal, Chiho Wu, Jie Hao, Christopher Kuenneth, Pranav Aklujkar, Atharva Khomane, Ryan Daniels, Abhishek A. Desh- mukh, Yuhang Cao, Gregory Sotzing, and Rampi Ram- prasad. Artificial intelligence for polymers: An out- look.Nature Communications, 15:6107, 2024. doi: 10.1038/s41467-024-50215-1
-
[7]
Xibing He, Shuhan Liu, Tai-Sung Lee, Beihong Ji, Viet H. Man, Darrin M. York, and Junmei Wang. Fast, accurate, and reliable protocols for routine calculations of protein- ligandbindingaffinitiesindrugdesignprojectsusingamber gpu-tiwithff14sb/gaff.ACSOmega,5(8):4611–4619,2020. doi: 10.1021/acsomega.9b04233
-
[8]
Ramirez, Tarek Sammakia, Zhongping Tan, and Michael R
Wei-Tse Hsu, Dominique A. Ramirez, Tarek Sammakia, Zhongping Tan, and Michael R. Shirts. Identifying signa- tures of proteolytic stability and monomeric propensity in o-glycosylated insulin using molecular simulation.Jour- nal of Computer-Aided Molecular Design, 36(5):313–328,
-
[9]
doi: 10.1007/s10822-022-00453-6
-
[10]
Polymerstructure- property relationship prediction using polymer genome
TranDoanHuanandRampiRamprasad. Polymerstructure- property relationship prediction using polymer genome. Journal of Physical Chemistry Letters, 11:5823–5832,
-
[11]
doi: 10.1021/acs.jpclett.0c01755
-
[12]
Zongxiao Jin, Xiaobo Sun, Xiaoli Xi, and Zuoren Nie. Simulon: An AI-assisted, PyTorch-native framework of molecular dynamics and modeling.Journal of Computa- tional Chemistry, 2026. doi: 10.1002/jcc.70364
-
[13]
Julien Kern, Srikant Venkatram, Malvika Banerjee, Blair Brettmann, and Rampi Ramprasad. Solvent-free predic- tion of polymer glass transition temperatures from large language models.Physical Chemistry Chemical Physics, 24:26547–26554, 2022. doi: 10.1039/D2CP03899A
-
[14]
Ramprasad, Chiho Kim, Ghan- shyam Pilania, Arun Mannodi-Kanakkithodi, and Rampi Ramprasad
Christopher Kuenneth, G. Ramprasad, Chiho Kim, Ghan- shyam Pilania, Arun Mannodi-Kanakkithodi, and Rampi Ramprasad. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer infor- matics.Nature Communications, 14:4099, 2023. doi: 10.1038/s41467-023-23901-8
-
[15]
Rdkit: Open-source cheminformatics
Greg Landrum. Rdkit: Open-source cheminformatics. http://www.rdkit.org, 2013
work page 2013
-
[16]
Coley, Hidenobu Mochi- gase, Haley K
Tzyy-Shyang Lin, Connor W. Coley, Hidenobu Mochi- gase, Haley K. Beech, Wencong Wang, Zi Wang, Eliot Woods, Stephen L. Craig, Jeremiah A. Johnson, Julia A. Kalow, Klavs F. Jensen, and Bradley D. Olsen. Bigsmiles: A structurally-based line notation for describing macro- molecules.ACS Central Science, 5(9):1523–1531, 2019. doi: 10.1021/acscentsci.9b00476
-
[17]
Toward auto- mated simulation research workflow through LLM prompt engineering design, 2025
Zhihan Liu, Yubo Chai, and Jianfeng Li. Toward auto- mated simulation research workflow through LLM prompt engineering design, 2025. arXiv:2408.15512v3
-
[18]
Zichen Liu, Wei Ping, Nayeon Xu, Mohammad Shoeybi, and Bryan Catanzaro. Information gain-based policy op- timization for multi-turn LLM agents.arXiv preprint arXiv:2510.14967, 2025. doi: 10.48550/arXiv.2510. 14967
-
[19]
Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel S. Weld. S2ORC: The semantic scholar open researchcorpus.InProceedingsofthe58thAnnualMeeting of the Association for Computational Linguistics, pages 4969–4983, 2020. doi: 10.18653/v1/2020.acl-main.447
-
[20]
Maier, Carmenza Martinez, Koushik Kasava- jhala, Lauren Wickstrom, Kevin E
James A. Maier, Carmenza Martinez, Koushik Kasava- jhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling. ff14sb: Improving the accuracy of protein side chain and backbone parameters from ff99sb.Journal of Chemical Theory and Computation, 11(8):3696–3713,
-
[21]
doi: 10.1021/acs.jctc.5b00255
-
[22]
L. Martínez, R. Andrade, E. G. Birgin, and J. M. Martínez. Packmol: A package for building initial configurations for moleculardynamicssimulations.JournalofComputational Chemistry, 30(13):2157–2164, 2009. doi: 10.1002/jcc. 21224
work page doi:10.1002/jcc 2009
-
[23]
arXiv preprint arXiv:2511.02824 , year=
Ludovico Mitchener, Angela Yiu, Benjamin Chang, Math- ieuBourdenx,TylerNadolski,ArvisSulovari,EricC.Land- sness, Dániel L. Barabási, Siddharth Narayanan, Nicky Evans,ShriyaReddy,MarthaFoiani,AizadKamal,LeahP. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Lau- rent, Edwin Melville-Green, Mayk Caldas, Albert Bou, Kaleigh F. Roberts, Sladjana Zagorac, Timothy...
-
[24]
doi: 10.48550/arXiv.2511.02824
-
[25]
Ollama Development Team. Ollama: Run LLMs locally. https://ollama.ai, 2023
work page 2023
-
[26]
MartinL.Puterman.MarkovDecisionProcesses: Discrete Stochastic Dynamic Programming. Wiley, 1994
work page 1994
-
[27]
YudongQiu,DanielG.A.Smith,SimonBoothroyd,Hyesu Jang,JeffreyWagner,CaitlinC.Bannan,TrevorGokey,Vic- toriaT.Lim,ChayaD.Stern,AndreaRizzi,XiaojunLucas, Joshua Fass, John J. Irwin, John D. Chodera, Christopher I. Bayly, David L. Mobley, and Lee-Ping Wang. Develop- ment and benchmarking of open force field v1.0.0—the parsley small-molecule force field.Journal ...
-
[28]
GuanqiaoQu,QiyuanChen,WeiWei,ZhengLin,Xianhao Chen, and Kaibin Huang. Mobile edge intelligence for large language models: A contemporary survey.arXiv preprint arXiv:2407.18921, 2024. doi: 10.48550/arXiv. 2407.18921
work page internal anchor Pith review doi:10.48550/arxiv 2024
-
[29]
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kan- ervisto, Maximilian Ernestus, and Noah Dorber. Stable- Baselines3: Reliable reinforcement learning implementa- tions.Journal of Machine Learning Research, 22(268): 1–8, 2021
work page 2021
-
[30]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2016. doi: 10.1109/JPROC. 2015.2494218
-
[31]
Mda- gent2: Large language model for code generation and knowledge Q&A in molecular dynamics, 2026
Zhuofan Shi, Yufei Shao, Mengyan Dai, Yadong Yu, Dong Huang,HongxuAn,ChunxiaoXin,HaiyangShen,Zhenyu Wang, Yunshan Na, Gang Huang, and Xiang Jing. Mda- gent2: Large language model for code generation and knowledge Q&A in molecular dynamics, 2026
work page 2026
-
[32]
Gaussian process optimization in the bandit setting: No regret and experimental design
Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. InPro- ceedings of the 27th International Conference on Machine Learning (ICML), pages 1015–1022, 2010
work page 2010
-
[33]
Mathematical framing for different agent strategies.arXiv preprint arXiv:2512.04469, 2025
Philip Stephens and Emmanuel Salawu. Mathematical framing for different agent strategies.arXiv preprint arXiv:2512.04469, 2025. doi: 10.48550/arXiv.2512. 04469
-
[34]
Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Ka- reem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, SushreeJagritiSahoo,ZacharyW.Ulissi,andC.Lawrence Zitnick. UMA: A family of universal models for atoms.arXiv preprint arXiv...
-
[35]
On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088,
Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, and Ziyuan Ling. On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088,
-
[36]
doi: 10.48550/arXiv.2409.00088
-
[37]
InInternational Conference on Learning Representations (ICLR), 2024
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, QuocV.Le,DennyZhou,andXinyunChen.Largelanguage models as optimizers. InInternational Conference on Learning Representations (ICLR), 2024
work page 2024
-
[38]
Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T. Chayes, and Omar M. Yaghi. Chatgpt chemistry assistant fortextminingandthepredictionofmofsynthesis.Journal of the American Chemical Society, 145(32):18048–18062,
-
[39]
hydrogel polymer insulin transdermal patch stabilization room temperature
doi: 10.1021/jacs.3c05819. A Benchmark Algorithms Algorithms2and3formalizethetwonon-agenticbaselines using the notation from §2.2. Both operate on a strict subset of the degrees of freedom available to the agentic workflow (Table 1): neither maintains a hypothesis state W𝑡 or uses a structured state update𝑢selective. Algorithm 2RL Polymer Discovery (DQN /...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.