arxiv: 2604.09308 · v1 · submitted 2026-04-10 · 💻 cs.AI

Recognition: unknown

Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents

Maochen Sun , Youzhi Zhang , Gaofeng Meng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:33 UTC · model grok-4.3

classification 💻 cs.AI

keywords drug discoverylanguage modelsAI agentscorrective memoryprotocol constraintsset-level diagnosismemory compression

0 comments

The pith

A corrective memory system for drug discovery agents improves success rates by precisely diagnosing set-level protocol violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that language-based drug discovery agents struggle because they plan actions step by step while success depends on the entire set of candidates meeting all constraints such as size, diversity, binding quality, and developability. CACM addresses this by adding protocol auditing and a grounded diagnostician that examines evidence from task requirements, pocket context, and candidates to find violations and suggest fixes. It also uses compressed memory with static, dynamic, and corrective channels to keep the agent's state compact and focused. This leads to a 36.4% higher target-level success rate compared to baselines by making diagnosis more accurate and memory more economical. Readers should care because it shifts the focus from better tools to better self-correction mechanisms for reliable autonomous discovery.

Core claim

The central claim is that introducing constraint-aware corrective memory with protocol auditing and a grounded diagnostician allows the agent to localize protocol violations in the candidate set from multimodal evidence, generate actionable hints for correction, and maintain concise memory through channel compression, resulting in substantially higher success in returning valid candidate sets.

What carries the argument

Constraint-Aware Corrective Memory (CACM) that organizes memory into static, dynamic, and corrective channels and employs protocol auditing with a diagnostician for violation localization and remediation hints.

If this is right

Precise set-level diagnosis reduces reliance on long raw histories and vague self-reflection.
Compressed memory preserves persistent task info while exposing only relevant failures to the planner.
Actionable remediation hints bias the next actions toward fixing specific violations.
Overall, reliable drug discovery benefits from better diagnosis and economical agent states in addition to molecular tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar approaches could enhance other LLM agents where aggregate outputs must satisfy global constraints, such as in scientific hypothesis generation.
Future work might explore how this interacts with increasingly capable base models to further reduce failure rates.
Applying the auditing to different constraint types could identify which protocols are hardest to satisfy without such memory.

Load-bearing premise

Protocol auditing and the diagnostician can accurately localize violations from multimodal evidence and generate hints that reliably steer the planner to corrections while memory compression keeps all decision-relevant information intact.

What would settle it

An experiment where the diagnostician is removed or replaced with non-actionable feedback, checking if the success rate improvement over baseline vanishes.

Figures

Figures reproduced from arXiv: 2604.09308 by Gaofeng Meng, Maochen Sun, Youzhi Zhang.

**Figure 2.** Figure 2: Planner-facing memory length across iterations. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Large language models are making autonomous drug discovery agents increasingly feasible, but reliable success in this setting is not determined by any single action or molecule. It is determined by whether the final returned set jointly satisfies protocol-level requirements such as set size, diversity, binding quality, and developability. This creates a fundamental control problem: the agent plans step by step, while task validity is decided at the level of the whole candidate set. Existing language-based drug discovery systems therefore tend to rely on long raw history and under-specified self-reflection, making failure localization imprecise and planner-facing agent states increasingly noisy. We present CACM (Constraint-Aware Corrective Memory), a language-based drug discovery framework built around precise set-level diagnosis and a concise memory write-back mechanism. CACM introduces protocol auditing and a grounded diagnostician, which jointly analyze multimodal evidence spanning task requirements, pocket context, and candidate-set evidence to localize protocol violations, generate actionable remediation hints, and bias the next action toward the most relevant correction. To keep planning context compact, CACM organizes memory into static, dynamic, and corrective channels and compresses them before write-back, thereby preserving persistent task information while exposing only the most decision-relevant failures. Our experimental results show that CACM improves the target-level success rate by 36.4% over the state-of-the-art baseline. The results show that reliable language-based drug discovery benefits not only from more powerful molecular tools, but also from more precise diagnosis and more economical agent states.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CACM targets the step-by-step vs set-level mismatch in drug discovery agents with auditing and compressed memory channels, but the 36.4% gain lacks visible experimental backing.

read the letter

The paper's main point is that language agents for drug discovery fail because they plan one molecule at a time while success depends on the whole set meeting rules for size, diversity, binding, and developability. CACM tries to close that gap with protocol auditing and a grounded diagnostician that reads task specs, pocket data, and candidate evidence together to find violations and produce fix hints. Those hints go into three compressed memory channels—static for core task facts, dynamic for current state, and corrective for past problems—so the planner sees only the relevant failures instead of raw noisy history. This combination of set-level diagnosis plus channel compression is the concrete new piece compared to generic self-reflection methods. It does a clear job naming the control problem and sketching a practical memory layout that could keep context manageable in long runs. The stress-test note is accurate on the evidence gap: the abstract gives the 36.4% target success lift over a state-of-the-art baseline but shows no ablations, no localization accuracy numbers for the diagnostician, no compression loss metrics, and no baseline or dataset details. Without those, the improvement could trace to prompt tweaks or tool differences rather than the claimed mechanisms. The full paper may contain the missing checks, but they need to be explicit for the central claim to land. This work is aimed at researchers building LLM agents for molecular design or other set-constrained planning tasks. A reader already working on agent memory or constraint handling would find the architecture worth examining even if the numbers require verification. It should go to peer review because the underlying mismatch is real and the proposed components are specific enough to test, though the current version needs stronger empirical grounding before wider adoption.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces Constraint-Aware Corrective Memory (CACM), a framework for language-based drug discovery agents. It proposes protocol auditing and a grounded diagnostician that jointly analyze multimodal evidence (task requirements, pocket context, candidate sets) to localize protocol violations and generate remediation hints. Memory is partitioned into static, dynamic, and corrective channels that are compressed before write-back to keep planner context compact. The central empirical claim is a 36.4% improvement in target-level success rate over the state-of-the-art baseline.

Significance. If the empirical result is substantiated with proper controls and ablations, the work would usefully demonstrate that set-level constraint satisfaction in LLM agents benefits from explicit violation localization and compressed corrective memory rather than raw history or generic reflection. This could inform memory and diagnosis designs for other long-horizon, set-constrained tasks.

major comments (3)

[Abstract] Abstract: The 36.4% target-level success-rate improvement is stated without any description of the experimental protocol, baseline systems, evaluation dataset, statistical tests, or data-exclusion criteria. This leaves the central claim unsupported by visible evidence.
[Method (CACM components)] The grounded diagnostician and protocol-auditing mechanism are presented as jointly localizing violations from multimodal evidence, yet no quantitative evaluation (precision/recall of violation detection, inter-rater agreement with human experts, or ablation that disables the diagnostician) is supplied to confirm that this component drives the reported gain rather than prompt or tool differences.
[Method (memory organization)] The memory-compression step (static/dynamic/corrective channels) is asserted to preserve all decision-relevant information, but the manuscript provides no information-loss metrics, reconstruction accuracy, or ablation comparing compressed versus uncompressed memory on downstream planner performance.

minor comments (1)

[Abstract] The abstract would be clearer if it named the specific state-of-the-art baseline and the primary evaluation metric (e.g., success rate definition) used for the 36.4% figure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and have outlined revisions to strengthen the presentation of our results and methods.

read point-by-point responses

Referee: [Abstract] Abstract: The 36.4% target-level success-rate improvement is stated without any description of the experimental protocol, baseline systems, evaluation dataset, statistical tests, or data-exclusion criteria. This leaves the central claim unsupported by visible evidence.

Authors: We agree that the abstract would benefit from additional context to support the central empirical claim. In the revised manuscript, we will expand the abstract to briefly describe the experimental protocol, including the evaluation dataset, baseline systems, key metrics, statistical tests used, and any data exclusion criteria. This will provide readers with immediate visibility into the evidence supporting the 36.4% improvement while keeping the abstract concise. revision: yes
Referee: [Method (CACM components)] The grounded diagnostician and protocol-auditing mechanism are presented as jointly localizing violations from multimodal evidence, yet no quantitative evaluation (precision/recall of violation detection, inter-rater agreement with human experts, or ablation that disables the diagnostician) is supplied to confirm that this component drives the reported gain rather than prompt or tool differences.

Authors: We acknowledge that the manuscript lacks direct quantitative metrics for the diagnostician's violation detection performance. The overall gains are shown through system-level comparisons and ablations in the experiments section. To address this, we will add a dedicated evaluation subsection reporting precision and recall for violation localization against human expert annotations, inter-rater agreement scores, and an ablation study that disables the diagnostician while keeping other components fixed. This will help confirm its specific contribution. revision: yes
Referee: [Method (memory organization)] The memory-compression step (static/dynamic/corrective channels) is asserted to preserve all decision-relevant information, but the manuscript provides no information-loss metrics, reconstruction accuracy, or ablation comparing compressed versus uncompressed memory on downstream planner performance.

Authors: We agree that explicit validation of the compression's fidelity would strengthen the presentation. Although the design rationale for the channels is detailed in the method section, we will include in the revision information-loss metrics such as reconstruction accuracy for each memory channel and an ablation experiment comparing the compressed memory setup against an uncompressed variant in terms of planner performance and success rates. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; central claim is an empirical experimental outcome with no self-referential definitions or fitted predictions.

full rationale

The paper describes a framework (CACM) with protocol auditing, a grounded diagnostician, and compressed memory channels, then reports an experimental 36.4% improvement in target-level success rate over a baseline. No equations, parameters, or mathematical derivations are introduced that could reduce to their own inputs by construction. There are no self-citations invoked as uniqueness theorems, no ansatzes smuggled via prior work, and no renaming of known results as novel organization. The improvement is framed purely as an observed experimental result rather than a quantity defined in terms of the method itself. This is the most common honest finding for system-description papers whose claims rest on empirical evaluation rather than closed-form derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The framework rests on several new components whose effectiveness is asserted without independent evidence or derivation in the abstract.

axioms (2)

domain assumption Language models can serve as effective planners for multi-step drug discovery when provided with appropriate state feedback
Implicit foundation for the entire language-based agent setup.
ad hoc to paper Multimodal evidence (task requirements, pocket context, candidate set) can be jointly analyzed to localize protocol violations
Core assumption enabling the diagnostician component.

invented entities (3)

Protocol auditing no independent evidence
purpose: Joint analysis of requirements and candidate-set evidence to detect violations
New diagnostic layer introduced by CACM
Grounded diagnostician no independent evidence
purpose: Generate actionable remediation hints from audit results
New component that translates diagnosis into planner guidance
Corrective memory channels (static, dynamic, corrective) no independent evidence
purpose: Organize and compress persistent versus failure-specific information
New memory architecture to maintain compact agent state

pith-pipeline@v0.9.0 · 5565 in / 1452 out tokens · 53136 ms · 2026-05-10T16:33:02.913177+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 31 canonical work pages · 2 internal anchors

[1]

Amira Alakhdar, Barnabás Póczos, and Newell Washburn. 2024. Diffusion Models in De Novo Drug Design.Journal of Chemical Information and Modeling64, 19 (2024), 7238–7256. doi:10.1021/acs.jcim.4c01107

work page doi:10.1021/acs.jcim.4c01107 2024
[2]

Baker, Ian A

Reza Averly, Frazier N. Baker, Ian A. Watson, and Xia Ning. 2025. LID- DIA: Language-based Intelligent Drug Discovery Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Chris- tos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, Ch...

work page doi:10.18653/v1/2025.emnlp-main.603 2025
[3]

Andrius Bernatavicius, Martin Šícho, Antonius P. A. Janssen, Alan Kai Hassen, Mike Preuss, and Gerard J. P. van Westen. 2024. AlphaFold Meets de Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models.Journal of Chemical Information and Modeling64, 21 (2024), 8113–8122. doi:10.1021/acs.jcim.4c00309

work page doi:10.1021/acs.jcim.4c00309 2024
[4]

R., Paolini, G

George R. Bickerton, Gaia V. Paolini, Jérôme Besnard, Sorel Muresan, and An- drew L. Hopkins. 2012. Quantifying the Chemical Beauty of Drugs.Nature Chemistry4, 2 (2012), 90–98. doi:10.1038/nchem.1243

work page doi:10.1038/nchem.1243 2012
[5]

Benjamin E. Blass. 2021. Drug Discovery and Development: An Overview of Modern Methods and Principles. InBasic Principles of Drug Discovery and Development(2 ed.), Benjamin E. Blass (Ed.). Academic Press, 1–41. doi:10.1016/ B978-0-12-817214-8.00001-4

2021
[6]

Autonomous chemical research with large language models

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. 2023. Au- tonomous Chemical Research with Large Language Models.Nature624, 7992 (2023), 570–578. doi:10.1038/s41586-023-06792-0

work page doi:10.1038/s41586-023-06792-0 2023
[7]

(2026, February 2)

Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. 2024. Augmenting Large Language Models with Chemistry Tools.Nature Machine Intelligence6, 5 (2024), 525–535. doi:10.1038/s42256-024- 00832-8

work page doi:10.1038/s42256-024- 2024
[8]

Ziqi Chen, Bo Peng, Tianhua Zhai, Daniel Adu-Ampratwum, and Xia Ning. 2025. Generating 3D Small Binding Molecules Using Shape-Conditioned Diffusion Models with Guidance.Nature Machine Intelligence(2025). doi:10.1038/s42256- 025-01030-w

work page doi:10.1038/s42256- 2025
[9]

Peter Ertl and Ansgar Schuffenhauer. 2009. Estimation of Synthetic Accessibility Score of Drug-Like Molecules Based on Molecular Complexity and Fragment Contributions.Journal of Cheminformatics1 (2009), 8. doi:10.1186/1758-2946-1-8

work page doi:10.1186/1758-2946-1-8 2009
[10]

Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik
[11]

doi:10.1016/j.cell.2024.09.022

Empowering Biomedical Discovery with AI Agents.Cell187, 22 (2024), 6125–6151. doi:10.1016/j.cell.2024.09.022

work page doi:10.1016/j.cell.2024.09.022 2024
[12]

Alireza Ghafarollahi and Markus J. Buehler. 2024. ProtAgents: Protein Dis- covery via Large Language Model Multi-Agent Collaborations Combining Physics and Machine Learning.Digital Discovery3, 7 (2024), 1389–1409. doi:10.1039/D4DD00013G

work page doi:10.1039/d4dd00013g 2024
[13]

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. 2023. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing.arXiv preprint arXiv:2305.11738(2023). https: //arxiv.org/abs/2305.11738

work page internal anchor Pith review arXiv 2023
[14]

Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 2023. 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction. InThe Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=kJqXEPXMsE0

2023
[15]

Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, and Quanquan Gu. 2023. DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202). PMLR, 11827–11846. https://proc...

2023
[16]

Lei Huang, Tingyang Xu, Yang Yu, Peilin Zhao, Xingjian Chen, Jing Han, Zhi Xie, Hailong Li, Wenge Zhong, Ka-Chun Wong, and Hengtong Zhang. 2024. A Dual Diffusion Model Enables 3D Molecule Generation and Lead Optimization Based on Target Pockets.Nature Communications15, 1 (2024), 2657. doi:10.1038/s41467- 024-46569-1

work page doi:10.1038/s41467- 2024
[17]

Yoshitaka Inoue, Tianci Song, Xinling Wang, Augustin Luna, and Tianfan Fu
[18]

Drugagent: Explainable drug repurposing agent with large language model-based reasoning,

DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction.arXiv preprint arXiv:2408.13378(2024). https: //arxiv.org/abs/2408.13378

work page arXiv 2024
[19]

Shoichi Ishida, Tomohiro Sato, Teruki Honma, and Kei Terayama. 2025. Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists. Journal of Cheminformatics17, 1 (2025), 36. doi:10.1186/s13321-025-00984-8

work page doi:10.1186/s13321-025-00984-8 2025
[20]

Jan H. Jensen. 2019. A Graph-Based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space.Chemical Science10, 12 (2019), 3567–3572. doi:10.1039/C8SC05372C

work page doi:10.1039/c8sc05372c 2019
[21]

Tuan Le, Julian Cremer, Djork-Arné Clevert, and Kristof T. Schütt. 2025. Equi- variant Diffusion for Structure-Based de Novo Ligand Generation with Latent- Conditioning.Journal of Cheminformatics17, 1 (2025), 90. doi:10.1186/s13321- 025-01028-x

work page doi:10.1186/s13321- 2025
[22]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Infor- mation Processing Systems. https://proceedings.neurips.cc/paper/202...

2020
[23]

Yibo Li, Jiezhong Pei, and Luhua Lai. 2021. Structure-Based de Novo Drug Design Using 3D Deep Generative Models.Chemical Science12 (2021), 13664–13675. doi:10.1039/D1SC04444C

work page doi:10.1039/d1sc04444c 2021
[24]

Haitao Lin, Yufei Huang, Odin Zhang, Yunfan Liu, Lirong Wu, Siyuan Li, Zhiyuan Chen, and Stan Z. Li. 2023. Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration. InAdvances in Neural Information Pro- cessing Systems. https://openreview.net/forum?id=lRG11M91dx

2023
[25]

Lipinski, Franco Lombardo, Beryl W

Christopher A. Lipinski, Franco Lombardo, Beryl W. Dominy, and Paul J. Feeney
[26]

doi:10.1016/S0169-409X(00)00129-0

Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings.Advanced Drug Delivery Reviews46, 1–3 (2001), 3–26. doi:10.1016/S0169-409X(00)00129-0

work page doi:10.1016/s0169-409x(00)00129-0 2001
[27]

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. SELF-REFINE: Iterative Refinement with Self-Feedback.arXiv preprint arXiv:2303.17651(2023). htt...

work page internal anchor Pith review arXiv 2023
[28]

McNaughton, Gautham Krishna Sankar Ramalaxmi, Agustin Kruel, Carter R

Andrew D. McNaughton, Gautham Krishna Sankar Ramalaxmi, Agustin Kruel, Carter R. Knutson, Rohith A. Varikoti, and Neeraj Kumar. 2024. CACTUS: Chemistry Agent Connecting Tool Usage to Science.ACS Omega9, 46 (2024), 46563–46573. doi:10.1021/acsomega.4c08408

work page doi:10.1021/acsomega.4c08408 2024
[29]

Aluru, Achuth Chandrasekhar, and Amir Barati Farimani

Janghoon Ock, Radheesh Sharma Meda, Srivathsan Badrinarayanan, Neha S. Aluru, Achuth Chandrasekhar, and Amir Barati Farimani. 2026. Large Language Model Agent for Modular Task Execution in Drug Discovery.Journal of Chemical Information and Modeling66, 4 (2026), 2055–2068. doi:10.1021/acs.jcim.5c02454

work page doi:10.1021/acs.jcim.5c02454 2026
[30]

Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, and Jianzhu Ma
[31]

In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162). PMLR, 17644–17655. https://proceedings. mlr.press/v162/peng22b.html
[32]

Matthew Ragoza, Tomohide Masuda, and David R. Koes. 2022. Generating 3D Molecules Conditional on Receptor Binding Sites with Deep Generative Models. Chemical Science13, 9 (2022), 2701–2713. doi:10.1039/D1SC05976A

work page doi:10.1039/d1sc05976a 2022
[33]

Caldas Ramos, C

Mayk Caldas Ramos, Christopher J. Collison, and Andrew D. White. 2025. A Re- view of Large Language Models and Autonomous Agents in Chemistry.Chemical Science16, 6 (2025), 2514–2572. doi:10.1039/D4SC03921A

work page doi:10.1039/d4sc03921a 2025
[34]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems. https://openreview.net/forum?id= Yacmpz84TH

2023
[35]

Arne Schneuing, Charles Harris, Yuanqi Du, Kieran Didi, Arian Jamasb, Ilia Igashov, Weitao Du, Carla Gomes, Tom Blundell, Pietro Liò, Max Welling, Michael Bronstein, and Bruno Correia. 2024. Structure-Based Drug Design with Equi- variant Diffusion Models.Nature Computational Science4, 12 (2024), 899–909. doi:10.1038/s43588-024-00737-x

work page doi:10.1038/s43588-024-00737-x 2024
[36]

Narasimhan, and Shunyu Yao

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R. Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforce- ment Learning. InAdvances in Neural Information Processing Systems. https: //openreview.net/forum?id=vAElhFcKW6

2023
[37]

Yidan Tang, Rocco Moretti, and Jens Meiler. 2024. Recent Advances in Automated Structure-Based De Novo Drug Design.Journal of Chemical Information and Modeling64, 6 (2024), 1794–1805. doi:10.1021/acs.jcim.4c00247

work page doi:10.1021/acs.jcim.4c00247 2024
[38]

Oleg Trott and Arthur J. Olson. 2010. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading.Journal of Computational Chemistry31, 2 (2010), 455–461. doi:10.1002/jcc.21334

work page doi:10.1002/jcc.21334 2010
[39]

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model Based Autonomous Agents.Frontiers of Computer Science18 (2024), 186345. doi:10.1007/s11704-024- 40231-1

work page doi:10.1007/s11704-024- 2024
[40]

White, Doug Burger, and Chi Wang

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. 2024. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. InFirst Conference on Language Modeling. https://openreview.net/forum?id=BAakY1hNKS

2024
[41]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Represen- tations. https://openreview.net/forum?id=WE_vluYUL-X 19

2023
[42]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-Based Agents.ACM Transactions on Information Systems 43 (2025), 1–47. doi:10.1145/3748302

work page doi:10.1145/3748302 2025
[43]

May, Ge- offrey I

Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Madeleine Yang, Lauren T. May, Ge- offrey I. Webb, Li Li, Shirui Pan, and George Church. 2025. Large Language Models for Drug Discovery and Development.Patterns6, 10 (2025), 101346. doi:10.1016/j.patter.2025.101346

work page doi:10.1016/j.patter.2025.101346 2025
[44]

Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu- Xiong Wang. 2023. Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models.arXiv preprint arXiv:2310.04406(2023). https: //arxiv.org/abs/2310.04406

work page arXiv 2023
[45]

Juexiao Zhou, Bin Zhang, Guowei Li, Xiuying Chen, Haoyang Li, Xiaopeng Xu, Siyuan Chen, Wenjia He, Chencheng Xu, Liwei Liu, and Xin Gao. 2024. An AI Agent for Fully Automated Multi-Omic Analyses.Advanced Science11, 44 (2024), e2407094. doi:10.1002/advs.202407094 20

work page doi:10.1002/advs.202407094 2024