pith. machine review for the scientific record. sign in

arxiv: 2604.09308 · v1 · submitted 2026-04-10 · 💻 cs.AI

Recognition: unknown

Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:33 UTC · model grok-4.3

classification 💻 cs.AI
keywords drug discoverylanguage modelsAI agentscorrective memoryprotocol constraintsset-level diagnosismemory compression
0
0 comments X

The pith

A corrective memory system for drug discovery agents improves success rates by precisely diagnosing set-level protocol violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that language-based drug discovery agents struggle because they plan actions step by step while success depends on the entire set of candidates meeting all constraints such as size, diversity, binding quality, and developability. CACM addresses this by adding protocol auditing and a grounded diagnostician that examines evidence from task requirements, pocket context, and candidates to find violations and suggest fixes. It also uses compressed memory with static, dynamic, and corrective channels to keep the agent's state compact and focused. This leads to a 36.4% higher target-level success rate compared to baselines by making diagnosis more accurate and memory more economical. Readers should care because it shifts the focus from better tools to better self-correction mechanisms for reliable autonomous discovery.

Core claim

The central claim is that introducing constraint-aware corrective memory with protocol auditing and a grounded diagnostician allows the agent to localize protocol violations in the candidate set from multimodal evidence, generate actionable hints for correction, and maintain concise memory through channel compression, resulting in substantially higher success in returning valid candidate sets.

What carries the argument

Constraint-Aware Corrective Memory (CACM) that organizes memory into static, dynamic, and corrective channels and employs protocol auditing with a diagnostician for violation localization and remediation hints.

If this is right

  • Precise set-level diagnosis reduces reliance on long raw histories and vague self-reflection.
  • Compressed memory preserves persistent task info while exposing only relevant failures to the planner.
  • Actionable remediation hints bias the next actions toward fixing specific violations.
  • Overall, reliable drug discovery benefits from better diagnosis and economical agent states in addition to molecular tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar approaches could enhance other LLM agents where aggregate outputs must satisfy global constraints, such as in scientific hypothesis generation.
  • Future work might explore how this interacts with increasingly capable base models to further reduce failure rates.
  • Applying the auditing to different constraint types could identify which protocols are hardest to satisfy without such memory.

Load-bearing premise

Protocol auditing and the diagnostician can accurately localize violations from multimodal evidence and generate hints that reliably steer the planner to corrections while memory compression keeps all decision-relevant information intact.

What would settle it

An experiment where the diagnostician is removed or replaced with non-actionable feedback, checking if the success rate improvement over baseline vanishes.

Figures

Figures reproduced from arXiv: 2604.09308 by Gaofeng Meng, Maochen Sun, Youzhi Zhang.

Figure 1
Figure 1. Figure 1: Overview of constraint-aware corrective memory (CACM). CACM maintains an agent state composed of static [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Planner-facing memory length across iterations. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Large language models are making autonomous drug discovery agents increasingly feasible, but reliable success in this setting is not determined by any single action or molecule. It is determined by whether the final returned set jointly satisfies protocol-level requirements such as set size, diversity, binding quality, and developability. This creates a fundamental control problem: the agent plans step by step, while task validity is decided at the level of the whole candidate set. Existing language-based drug discovery systems therefore tend to rely on long raw history and under-specified self-reflection, making failure localization imprecise and planner-facing agent states increasingly noisy. We present CACM (Constraint-Aware Corrective Memory), a language-based drug discovery framework built around precise set-level diagnosis and a concise memory write-back mechanism. CACM introduces protocol auditing and a grounded diagnostician, which jointly analyze multimodal evidence spanning task requirements, pocket context, and candidate-set evidence to localize protocol violations, generate actionable remediation hints, and bias the next action toward the most relevant correction. To keep planning context compact, CACM organizes memory into static, dynamic, and corrective channels and compresses them before write-back, thereby preserving persistent task information while exposing only the most decision-relevant failures. Our experimental results show that CACM improves the target-level success rate by 36.4% over the state-of-the-art baseline. The results show that reliable language-based drug discovery benefits not only from more powerful molecular tools, but also from more precise diagnosis and more economical agent states.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces Constraint-Aware Corrective Memory (CACM), a framework for language-based drug discovery agents. It proposes protocol auditing and a grounded diagnostician that jointly analyze multimodal evidence (task requirements, pocket context, candidate sets) to localize protocol violations and generate remediation hints. Memory is partitioned into static, dynamic, and corrective channels that are compressed before write-back to keep planner context compact. The central empirical claim is a 36.4% improvement in target-level success rate over the state-of-the-art baseline.

Significance. If the empirical result is substantiated with proper controls and ablations, the work would usefully demonstrate that set-level constraint satisfaction in LLM agents benefits from explicit violation localization and compressed corrective memory rather than raw history or generic reflection. This could inform memory and diagnosis designs for other long-horizon, set-constrained tasks.

major comments (3)
  1. [Abstract] Abstract: The 36.4% target-level success-rate improvement is stated without any description of the experimental protocol, baseline systems, evaluation dataset, statistical tests, or data-exclusion criteria. This leaves the central claim unsupported by visible evidence.
  2. [Method (CACM components)] The grounded diagnostician and protocol-auditing mechanism are presented as jointly localizing violations from multimodal evidence, yet no quantitative evaluation (precision/recall of violation detection, inter-rater agreement with human experts, or ablation that disables the diagnostician) is supplied to confirm that this component drives the reported gain rather than prompt or tool differences.
  3. [Method (memory organization)] The memory-compression step (static/dynamic/corrective channels) is asserted to preserve all decision-relevant information, but the manuscript provides no information-loss metrics, reconstruction accuracy, or ablation comparing compressed versus uncompressed memory on downstream planner performance.
minor comments (1)
  1. [Abstract] The abstract would be clearer if it named the specific state-of-the-art baseline and the primary evaluation metric (e.g., success rate definition) used for the 36.4% figure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and have outlined revisions to strengthen the presentation of our results and methods.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The 36.4% target-level success-rate improvement is stated without any description of the experimental protocol, baseline systems, evaluation dataset, statistical tests, or data-exclusion criteria. This leaves the central claim unsupported by visible evidence.

    Authors: We agree that the abstract would benefit from additional context to support the central empirical claim. In the revised manuscript, we will expand the abstract to briefly describe the experimental protocol, including the evaluation dataset, baseline systems, key metrics, statistical tests used, and any data exclusion criteria. This will provide readers with immediate visibility into the evidence supporting the 36.4% improvement while keeping the abstract concise. revision: yes

  2. Referee: [Method (CACM components)] The grounded diagnostician and protocol-auditing mechanism are presented as jointly localizing violations from multimodal evidence, yet no quantitative evaluation (precision/recall of violation detection, inter-rater agreement with human experts, or ablation that disables the diagnostician) is supplied to confirm that this component drives the reported gain rather than prompt or tool differences.

    Authors: We acknowledge that the manuscript lacks direct quantitative metrics for the diagnostician's violation detection performance. The overall gains are shown through system-level comparisons and ablations in the experiments section. To address this, we will add a dedicated evaluation subsection reporting precision and recall for violation localization against human expert annotations, inter-rater agreement scores, and an ablation study that disables the diagnostician while keeping other components fixed. This will help confirm its specific contribution. revision: yes

  3. Referee: [Method (memory organization)] The memory-compression step (static/dynamic/corrective channels) is asserted to preserve all decision-relevant information, but the manuscript provides no information-loss metrics, reconstruction accuracy, or ablation comparing compressed versus uncompressed memory on downstream planner performance.

    Authors: We agree that explicit validation of the compression's fidelity would strengthen the presentation. Although the design rationale for the channels is detailed in the method section, we will include in the revision information-loss metrics such as reconstruction accuracy for each memory channel and an ablation experiment comparing the compressed memory setup against an uncompressed variant in terms of planner performance and success rates. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; central claim is an empirical experimental outcome with no self-referential definitions or fitted predictions.

full rationale

The paper describes a framework (CACM) with protocol auditing, a grounded diagnostician, and compressed memory channels, then reports an experimental 36.4% improvement in target-level success rate over a baseline. No equations, parameters, or mathematical derivations are introduced that could reduce to their own inputs by construction. There are no self-citations invoked as uniqueness theorems, no ansatzes smuggled via prior work, and no renaming of known results as novel organization. The improvement is framed purely as an observed experimental result rather than a quantity defined in terms of the method itself. This is the most common honest finding for system-description papers whose claims rest on empirical evaluation rather than closed-form derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The framework rests on several new components whose effectiveness is asserted without independent evidence or derivation in the abstract.

axioms (2)
  • domain assumption Language models can serve as effective planners for multi-step drug discovery when provided with appropriate state feedback
    Implicit foundation for the entire language-based agent setup.
  • ad hoc to paper Multimodal evidence (task requirements, pocket context, candidate set) can be jointly analyzed to localize protocol violations
    Core assumption enabling the diagnostician component.
invented entities (3)
  • Protocol auditing no independent evidence
    purpose: Joint analysis of requirements and candidate-set evidence to detect violations
    New diagnostic layer introduced by CACM
  • Grounded diagnostician no independent evidence
    purpose: Generate actionable remediation hints from audit results
    New component that translates diagnosis into planner guidance
  • Corrective memory channels (static, dynamic, corrective) no independent evidence
    purpose: Organize and compress persistent versus failure-specific information
    New memory architecture to maintain compact agent state

pith-pipeline@v0.9.0 · 5565 in / 1452 out tokens · 53136 ms · 2026-05-10T16:33:02.913177+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    Amira Alakhdar, Barnabás Póczos, and Newell Washburn. 2024. Diffusion Models in De Novo Drug Design.Journal of Chemical Information and Modeling64, 19 (2024), 7238–7256. doi:10.1021/acs.jcim.4c01107

  2. [2]

    Baker, Ian A

    Reza Averly, Frazier N. Baker, Ian A. Watson, and Xia Ning. 2025. LID- DIA: Language-based Intelligent Drug Discovery Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Chris- tos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, Ch...

  3. [3]

    Andrius Bernatavicius, Martin Šícho, Antonius P. A. Janssen, Alan Kai Hassen, Mike Preuss, and Gerard J. P. van Westen. 2024. AlphaFold Meets de Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models.Journal of Chemical Information and Modeling64, 21 (2024), 8113–8122. doi:10.1021/acs.jcim.4c00309

  4. [4]

    R., Paolini, G

    George R. Bickerton, Gaia V. Paolini, Jérôme Besnard, Sorel Muresan, and An- drew L. Hopkins. 2012. Quantifying the Chemical Beauty of Drugs.Nature Chemistry4, 2 (2012), 90–98. doi:10.1038/nchem.1243

  5. [5]

    Benjamin E. Blass. 2021. Drug Discovery and Development: An Overview of Modern Methods and Principles. InBasic Principles of Drug Discovery and Development(2 ed.), Benjamin E. Blass (Ed.). Academic Press, 1–41. doi:10.1016/ B978-0-12-817214-8.00001-4

  6. [6]

    Autonomous chemical research with large language models

    Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. 2023. Au- tonomous Chemical Research with Large Language Models.Nature624, 7992 (2023), 570–578. doi:10.1038/s41586-023-06792-0

  7. [7]

    (2026, February 2)

    Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. 2024. Augmenting Large Language Models with Chemistry Tools.Nature Machine Intelligence6, 5 (2024), 525–535. doi:10.1038/s42256-024- 00832-8

  8. [8]

    Ziqi Chen, Bo Peng, Tianhua Zhai, Daniel Adu-Ampratwum, and Xia Ning. 2025. Generating 3D Small Binding Molecules Using Shape-Conditioned Diffusion Models with Guidance.Nature Machine Intelligence(2025). doi:10.1038/s42256- 025-01030-w

  9. [9]

    Peter Ertl and Ansgar Schuffenhauer. 2009. Estimation of Synthetic Accessibility Score of Drug-Like Molecules Based on Molecular Complexity and Fragment Contributions.Journal of Cheminformatics1 (2009), 8. doi:10.1186/1758-2946-1-8

  10. [10]

    Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik

  11. [11]

    doi:10.1016/j.cell.2024.09.022

    Empowering Biomedical Discovery with AI Agents.Cell187, 22 (2024), 6125–6151. doi:10.1016/j.cell.2024.09.022

  12. [12]

    Alireza Ghafarollahi and Markus J. Buehler. 2024. ProtAgents: Protein Dis- covery via Large Language Model Multi-Agent Collaborations Combining Physics and Machine Learning.Digital Discovery3, 7 (2024), 1389–1409. doi:10.1039/D4DD00013G

  13. [13]

    Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. 2023. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing.arXiv preprint arXiv:2305.11738(2023). https: //arxiv.org/abs/2305.11738

  14. [14]

    Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 2023. 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction. InThe Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=kJqXEPXMsE0

  15. [15]

    Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, and Quanquan Gu. 2023. DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202). PMLR, 11827–11846. https://proc...

  16. [16]

    Lei Huang, Tingyang Xu, Yang Yu, Peilin Zhao, Xingjian Chen, Jing Han, Zhi Xie, Hailong Li, Wenge Zhong, Ka-Chun Wong, and Hengtong Zhang. 2024. A Dual Diffusion Model Enables 3D Molecule Generation and Lead Optimization Based on Target Pockets.Nature Communications15, 1 (2024), 2657. doi:10.1038/s41467- 024-46569-1

  17. [17]

    Yoshitaka Inoue, Tianci Song, Xinling Wang, Augustin Luna, and Tianfan Fu

  18. [18]

    Drugagent: Explainable drug repurposing agent with large language model-based reasoning,

    DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction.arXiv preprint arXiv:2408.13378(2024). https: //arxiv.org/abs/2408.13378

  19. [19]

    Shoichi Ishida, Tomohiro Sato, Teruki Honma, and Kei Terayama. 2025. Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists. Journal of Cheminformatics17, 1 (2025), 36. doi:10.1186/s13321-025-00984-8

  20. [20]

    Jan H. Jensen. 2019. A Graph-Based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space.Chemical Science10, 12 (2019), 3567–3572. doi:10.1039/C8SC05372C

  21. [21]

    Tuan Le, Julian Cremer, Djork-Arné Clevert, and Kristof T. Schütt. 2025. Equi- variant Diffusion for Structure-Based de Novo Ligand Generation with Latent- Conditioning.Journal of Cheminformatics17, 1 (2025), 90. doi:10.1186/s13321- 025-01028-x

  22. [22]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Infor- mation Processing Systems. https://proceedings.neurips.cc/paper/202...

  23. [23]

    Yibo Li, Jiezhong Pei, and Luhua Lai. 2021. Structure-Based de Novo Drug Design Using 3D Deep Generative Models.Chemical Science12 (2021), 13664–13675. doi:10.1039/D1SC04444C

  24. [24]

    Haitao Lin, Yufei Huang, Odin Zhang, Yunfan Liu, Lirong Wu, Siyuan Li, Zhiyuan Chen, and Stan Z. Li. 2023. Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration. InAdvances in Neural Information Pro- cessing Systems. https://openreview.net/forum?id=lRG11M91dx

  25. [25]

    Lipinski, Franco Lombardo, Beryl W

    Christopher A. Lipinski, Franco Lombardo, Beryl W. Dominy, and Paul J. Feeney

  26. [26]

    doi:10.1016/S0169-409X(00)00129-0

    Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings.Advanced Drug Delivery Reviews46, 1–3 (2001), 3–26. doi:10.1016/S0169-409X(00)00129-0

  27. [27]

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. SELF-REFINE: Iterative Refinement with Self-Feedback.arXiv preprint arXiv:2303.17651(2023). htt...

  28. [28]

    McNaughton, Gautham Krishna Sankar Ramalaxmi, Agustin Kruel, Carter R

    Andrew D. McNaughton, Gautham Krishna Sankar Ramalaxmi, Agustin Kruel, Carter R. Knutson, Rohith A. Varikoti, and Neeraj Kumar. 2024. CACTUS: Chemistry Agent Connecting Tool Usage to Science.ACS Omega9, 46 (2024), 46563–46573. doi:10.1021/acsomega.4c08408

  29. [29]

    Aluru, Achuth Chandrasekhar, and Amir Barati Farimani

    Janghoon Ock, Radheesh Sharma Meda, Srivathsan Badrinarayanan, Neha S. Aluru, Achuth Chandrasekhar, and Amir Barati Farimani. 2026. Large Language Model Agent for Modular Task Execution in Drug Discovery.Journal of Chemical Information and Modeling66, 4 (2026), 2055–2068. doi:10.1021/acs.jcim.5c02454

  30. [30]

    Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, and Jianzhu Ma

  31. [31]

    In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol

    Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162). PMLR, 17644–17655. https://proceedings. mlr.press/v162/peng22b.html

  32. [32]

    Matthew Ragoza, Tomohide Masuda, and David R. Koes. 2022. Generating 3D Molecules Conditional on Receptor Binding Sites with Deep Generative Models. Chemical Science13, 9 (2022), 2701–2713. doi:10.1039/D1SC05976A

  33. [33]

    Caldas Ramos, C

    Mayk Caldas Ramos, Christopher J. Collison, and Andrew D. White. 2025. A Re- view of Large Language Models and Autonomous Agents in Chemistry.Chemical Science16, 6 (2025), 2514–2572. doi:10.1039/D4SC03921A

  34. [34]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems. https://openreview.net/forum?id= Yacmpz84TH

  35. [35]

    Arne Schneuing, Charles Harris, Yuanqi Du, Kieran Didi, Arian Jamasb, Ilia Igashov, Weitao Du, Carla Gomes, Tom Blundell, Pietro Liò, Max Welling, Michael Bronstein, and Bruno Correia. 2024. Structure-Based Drug Design with Equi- variant Diffusion Models.Nature Computational Science4, 12 (2024), 899–909. doi:10.1038/s43588-024-00737-x

  36. [36]

    Narasimhan, and Shunyu Yao

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R. Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforce- ment Learning. InAdvances in Neural Information Processing Systems. https: //openreview.net/forum?id=vAElhFcKW6

  37. [37]

    Yidan Tang, Rocco Moretti, and Jens Meiler. 2024. Recent Advances in Automated Structure-Based De Novo Drug Design.Journal of Chemical Information and Modeling64, 6 (2024), 1794–1805. doi:10.1021/acs.jcim.4c00247

  38. [38]

    Oleg Trott and Arthur J. Olson. 2010. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading.Journal of Computational Chemistry31, 2 (2010), 455–461. doi:10.1002/jcc.21334

  39. [39]

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model Based Autonomous Agents.Frontiers of Computer Science18 (2024), 186345. doi:10.1007/s11704-024- 40231-1

  40. [40]

    White, Doug Burger, and Chi Wang

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. 2024. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. InFirst Conference on Language Modeling. https://openreview.net/forum?id=BAakY1hNKS

  41. [41]

    Narasimhan, and Yuan Cao

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. https://openreview.net/forum?id=WE_vluYUL-X 19

  42. [42]

    Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-Based Agents.ACM Transactions on Information Systems 43 (2025), 1–47. doi:10.1145/3748302

  43. [43]

    May, Ge- offrey I

    Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Madeleine Yang, Lauren T. May, Ge- offrey I. Webb, Li Li, Shirui Pan, and George Church. 2025. Large Language Models for Drug Discovery and Development.Patterns6, 10 (2025), 101346. doi:10.1016/j.patter.2025.101346

  44. [44]

    Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu- Xiong Wang. 2023. Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models.arXiv preprint arXiv:2310.04406(2023). https: //arxiv.org/abs/2310.04406

  45. [45]

    Juexiao Zhou, Bin Zhang, Guowei Li, Xiuying Chen, Haoyang Li, Xiaopeng Xu, Siyuan Chen, Wenjia He, Chencheng Xu, Liwei Liu, and Xin Gao. 2024. An AI Agent for Fully Automated Multi-Omic Analyses.Advanced Science11, 44 (2024), e2407094. doi:10.1002/advs.202407094 20