pith. machine review for the scientific record. sign in

arxiv: 2605.12784 · v2 · submitted 2026-05-12 · 💻 cs.LG · cs.NE· q-bio.QM

Recognition: no theorem link

ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery

Authors on Pith no claims yet

Pith reviewed 2026-05-15 04:52 UTC · model grok-4.3

classification 💻 cs.LG cs.NEq-bio.QM
keywords drug discoverymulti-objective optimizationgenetic algorithmlarge language modelsmolecular generationRDKit toolsagentic frameworkbinding affinity
0
0 comments X

The pith

ToolMol pairs an LLM agent with RDKit tools inside a genetic algorithm to generate drug ligands that show stronger predicted binding than earlier methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ToolMol as a way to combine a multi-objective genetic algorithm with an agentic large language model that calls chemical tools to edit candidate molecules. Existing LLM-only approaches often produce invalid or low-quality structures because they struggle with the syntax of molecular representations. ToolMol supplies the model with a toolbox of RDKit functions so the agent can make controlled, chemically valid changes at each evolutionary step. A sympathetic reader would care because this hybrid setup promises more reliable discovery of molecules that simultaneously meet multiple goals such as binding strength, drug-likeness, and ease of synthesis. The authors report that the resulting ligands outperform prior techniques on three protein targets and on absolute binding free energy calculations.

Core claim

ToolMol achieves state-of-the-art performance on multi-objective property optimization tasks, discovering drug-like and synthesizable ligands that have greater than 10 percent stronger predicted binding affinity compared to existing methods, evaluated on three protein targets. ToolMol ligands additionally achieve state-of-the-art results in gold-standard Absolute Binding Free Energy scores, gaining over existing methods by over 35 percent. Tool-calling enables the model to more faithfully execute its planned modifications, efficiently exploiting the strong chemical prior knowledge in LLMs.

What carries the argument

The agentic LLM operator that iteratively calls RDKit-backed functions to perform precise ligand modifications inside a multi-objective genetic algorithm that evolves a population of candidate molecules.

If this is right

  • Ligands produced by ToolMol exhibit more than 10 percent stronger predicted binding affinity on the tested protein targets.
  • The same ligands reach more than 35 percent better performance on absolute binding free energy calculations than prior approaches.
  • Tool-calling allows the language model to follow its own chain-of-thought plans more accurately than free-form generation.
  • The evolutionary loop maintains a population of molecules that remain drug-like and synthesizable throughout optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent-plus-tool pattern could be applied to other molecular design problems such as materials or catalyst discovery without changing the core loop.
  • Adding more specialized tools for reaction prediction or toxicity filtering would likely further reduce the rate of invalid candidates.
  • Because the framework separates planning from execution, it may scale to larger search spaces than pure genetic algorithms or pure LLM sampling.

Load-bearing premise

The agentic LLM operator, when given RDKit tools, will reliably produce chemically valid and fitness-improving edits rather than biased or invalid structures that weaken the genetic search.

What would settle it

An experiment in which synthesized ToolMol ligands show no measurable improvement in actual binding free energy over control molecules, or in which a large fraction of proposed structures fail basic validity checks when the same prompts are run without the tool layer.

Figures

Figures reproduced from arXiv: 2605.12784 by Andrew Y. Zhou, Michael K. Gilson, Peter Eckmann, Rose Yu, Sharvaree Vadgama, Sumanth Varambally.

Figure 1
Figure 1. Figure 1: Overview of ToolMol. (a) We sample an initial ligand population from ZINC 250K. (b) Parent ligands are sampled for crossovers & mutations with probability proportional to their fitness. (c) An agent with access to a set of modification tools generates new ligands using structures from the selected parents. (d) New offspring are evaluated by an oracle for all relevant objectives. (e) A new population is for… view at source ↗
Figure 2
Figure 2. Figure 2: An example tool-calling process. The agent first decides to perform a crossover on the input molecules, utilizing crossover molecules. Then it decides to attach a methoxy group to the benzene structure, utilizing add functional group. At this point, it decides that the modifications are sufficient, and the new molecule is added to the offspring population. hit the max steps iteration budget or until the LL… view at source ↗
Figure 3
Figure 3. Figure 3: ToolMol & MOLLEO modification steps and reasoning traces. MOLLEO fails to execute its planned modifications, while ToolMol successfully executes its ideas. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: This figure shows the initial molecules, and resultant molecules after LLM modifications using MOLLEO and ToolMol. We see that MOLLEO fails to generate the required molecule We observe that while many parts of the final molecule are consistent with what is described by the reasoning trace, there are certain parts that are entirely inconsistent with the LLM’s planned modifications. For instance, it insists … view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of correlation between AutoDock & ABFE and Boltz-2 & ABFE for 32 known compounds for the c-MET protein target. We observe a significantly higher correlation between Boltz-2 and ABFE as compared to AutoDock. We see that ABFE and AutoDock docking show r 2 = 0.09 among the 32 compounds, while ABFE and Boltz-2 show r 2 = 0.42. As an oracle nearly 1000x less computationally expensive than ABFE, Boltz… view at source ↗
read the original abstract

Advances in large language models (LLMs) have recently opened new and promising avenues for small-molecule drug discovery. Yet existing LLM-based approaches for molecular generation often suffer from high rates of invalid and low-quality ligand candidates, a result of the syntactic limitations of current models with regard to molecular strings. In this paper, we introduce $\texttt{ToolMol}$, an evolutionary agentic framework for de novo drug design. $\texttt{ToolMol}$ combines a multi-objective genetic algorithm with an agentic LLM operator that iteratively updates the ligand population. We build a comprehensive toolbox of RDKit-backed functions that allows our agentic operator to consisently make precise ligand modifications. $\texttt{ToolMol}$ achieves state-of-the-art performance on multi-objective property optimization tasks, discovering drug-like and synthesizable ligands that have $>10\%$ stronger predicted binding affinity compared to existing methods, evaluated on three protein targets. $\texttt{ToolMol}$ ligands additionally achieve state-of-the-art results in gold-standard Absolute Binding Free Energy scores, gaining over existing methods by over $35\%$. By studying chain-of-thought reasoning traces, we observe that tool-calling enables the model to more faithfully execute its planned modifications, efficiently exploiting the strong chemical prior knowledge in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces ToolMol, an evolutionary agentic framework that combines a multi-objective genetic algorithm with an LLM-based operator equipped with an RDKit toolbox for precise ligand modifications in de novo drug design. It claims state-of-the-art performance on multi-objective property optimization across three protein targets, with discovered ligands showing >10% stronger predicted binding affinity than prior methods and >35% gains in gold-standard Absolute Binding Free Energy (ABFE) scores, while using chain-of-thought traces to argue that tool-calling improves execution fidelity over direct LLM generation.

Significance. If the empirical claims hold after addressing the missing validation details, ToolMol would offer a meaningful step forward in LLM-assisted molecular generation by reducing invalid structures via tool use and leveraging chemical priors, with potential implications for more reliable multi-objective optimization in drug discovery. The emphasis on reproducible tool-backed edits and CoT analysis is a constructive direction.

major comments (3)
  1. [Results section] Results section (and abstract claims): No quantitative metrics are reported on tool-call success rates, fraction of proposed edits rejected as invalid, or bias in LLM-proposed modifications, which directly undermines evaluation of the central assumption that the agentic operator with RDKit consistently produces fitness-improving edits. Without these, performance gains could stem primarily from GA selection rather than reliable operator behavior.
  2. [Experimental evaluation] Experimental evaluation: The manuscript provides no details on baselines used for the >10% affinity and >35% ABFE comparisons, statistical tests for significance, data splits, or handling of invalid molecules, leaving the SOTA claims difficult to assess or reproduce.
  3. [Methods] Methods (toolbox and operator description): There is no ablation comparing LLM-guided edits against random or rule-based edits on the same population, which is needed to attribute gains specifically to the agentic LLM operator rather than the underlying GA framework.
minor comments (2)
  1. [Methods] Clarify the exact definition and computation of the multi-objective fitness function and how synthesizability is quantified in the reported results.
  2. [Results] Ensure all tables reporting performance include standard deviations or confidence intervals for the affinity and ABFE metrics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of ToolMol's contributions. We address each major point below and commit to revisions that strengthen the empirical support for the agentic operator without overstating current results.

read point-by-point responses
  1. Referee: Results section (and abstract claims): No quantitative metrics are reported on tool-call success rates, fraction of proposed edits rejected as invalid, or bias in LLM-proposed modifications, which directly undermines evaluation of the central assumption that the agentic operator with RDKit consistently produces fitness-improving edits. Without these, performance gains could stem primarily from GA selection rather than reliable operator behavior.

    Authors: We agree that explicit quantitative metrics on tool-call success rates, invalid edit rejection fractions, and modification bias were not reported in the submitted version, which limits direct assessment of operator reliability. The CoT traces provide qualitative evidence of faithful execution, but we will add a dedicated analysis subsection in Results reporting these metrics computed from the experimental logs (e.g., success rate of RDKit calls, rejection rate due to chemical validity checks, and distribution of edit types). This will allow readers to evaluate whether gains arise from the operator or solely from GA selection pressure. revision: yes

  2. Referee: Experimental evaluation: The manuscript provides no details on baselines used for the >10% affinity and >35% ABFE comparisons, statistical tests for significance, data splits, or handling of invalid molecules, leaving the SOTA claims difficult to assess or reproduce.

    Authors: We acknowledge the lack of explicit details on baselines, statistical tests, data splits, and invalid-molecule handling in the main text, which hinders reproducibility. In revision we will expand the Experimental evaluation section to specify the exact baselines (REINVENT, GraphGA, and direct LLM prompting variants from prior literature on the same targets), the statistical tests applied (e.g., Wilcoxon rank-sum tests with reported p-values), the data splits used, and the RDKit-based sanitization pipeline for filtering invalid molecules before fitness evaluation. These clarifications will be added without altering the reported performance numbers. revision: yes

  3. Referee: Methods (toolbox and operator description): There is no ablation comparing LLM-guided edits against random or rule-based edits on the same population, which is needed to attribute gains specifically to the agentic LLM operator rather than the underlying GA framework.

    Authors: We agree that a direct ablation isolating the LLM-guided operator from the GA framework would strengthen attribution of gains. While our existing comparisons to non-agentic baselines provide supporting evidence, we will add a targeted ablation in the revised Methods and Results: a controlled comparison of the full ToolMol operator against a rule-based edit variant (predefined RDKit transformations without LLM reasoning) on the same initial populations for at least one target. This will quantify the incremental benefit of the agentic component. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on external benchmarks

full rationale

The paper's central claims concern empirical SOTA results on multi-objective optimization and ABFE scores evaluated against prior methods on three protein targets. These are external comparisons using standard benchmarks and gold-standard calculations, not quantities defined by the method's own parameters or self-referential metrics. No equations, derivations, or load-bearing self-citations appear in the abstract or described framework that would reduce predictions to inputs by construction. The agentic LLM + RDKit operator is presented as an implementation detail whose reliability is asserted via CoT traces, but the performance numbers themselves do not collapse into fitted or renamed internal quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method implicitly relies on standard GA operators and LLM priors but does not introduce new postulated entities.

pith-pipeline@v0.9.0 · 5547 in / 1253 out tokens · 38158 ms · 2026-05-15T04:52:05.971627+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 5 internal anchors

  1. [1]

    R., Paolini, G

    Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S., and Hopkins, A. L. Quantifying the chemical beauty of drugs. Nature Chemistry, 4 0 (2): 0 90–98, January 2012. ISSN 1755-4349. doi:10.1038/nchem.1243. URL http://dx.doi.org/10.1038/nchem.1243

  2. [2]

    A., MacKnight, R., Kline, B., and Gomes, G

    Boiko, D. A., MacKnight, R., Kline, B., and Gomes, G. Autonomous chemical research with large language models. Nature, 624 0 (7992): 0 570--578, 2023

  3. [3]

    ChemCrow: Augmenting large-language models with chemistry tools

    Bran, A. M., Cox, S., Schilter, O., Baldassari, C., White, A. D., and Schwaller, P. Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376, 2023

  4. [4]

    El Agente Estructural: An Artificially Intelligent Molecular Editor

    Choi, C., Zou, Y., Müller, M., Hao, H., Kang, Y., Pérez-Sánchez, J. B., Gustin, I., Xu, H., Wang, A., Vakili, M. G., Crebolder, C., Aspuru-Guzik, A., and Bernales, V. El agente estructural: An artificially intelligent molecular editor, 2026. URL https://arxiv.org/abs/2602.04849

  5. [5]

    A., Fernandez Prada, D

    Crucitti, D., Pérez Míguez, C., Díaz Arias, J. A., Fernandez Prada, D. B., and Mosquera Orgueira, A. De novo drug design through artificial intelligence: an introduction. Frontiers in Hematology, Volume 3 - 2024, 2024. ISSN 2813-3935. doi:10.3389/frhem.2024.1305741. URL https://www.frontiersin.org/journals/hematology/articles/10.3389/frhem.2024.1305741

  6. [6]

    Dorna, V., Subhalingam, D., Kolluru, K., Tuli, S., Singh, M., Singal, S., Krishnan, N. M. A., and Ranu, S. Tagmol: Target-aware gradient-guided molecule generation, 2024. URL https://arxiv.org/abs/2406.01650

  7. [7]

    Dunn, I., Toft, L., Katz, T., Gupta, J., Shah, R., Hettiarachchi, R., and Koes, D. R. Omtra: A multi-task generative model for structure-based drug design, 2025. URL https://arxiv.org/abs/2512.05080

  8. [8]

    K., and Yu, R

    Eckmann, P., Sun, K., Zhao, B., Feng, M., Gilson, M. K., and Yu, R. Limo: Latent inceptionism for targeted molecule generation, 2022. URL https://arxiv.org/abs/2206.09010

  9. [9]

    K., and Yu, R

    Eckmann, P., Wu, D., Heinzelmann, G., Gilson, M. K., and Yu, R. Mf-lal: Drug compound generation using multi-fidelity latent space active learning, 2025. URL https://arxiv.org/abs/2410.11226

  10. [10]

    and Schuffenhauer, A

    Ertl, P. and Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1 0 (1), June 2009. ISSN 1758-2946. doi:10.1186/1758-2946-1-8. URL http://dx.doi.org/10.1186/1758-2946-1-8

  11. [11]

    Feng, M., Heinzelmann, G., and Gilson, M. K. Absolute binding free energy calculations improve enrichment of actives in virtual compound screening. Scientific Reports, 12 0 (1), August 2022. ISSN 2045-2322. doi:10.1038/s41598-022-17480-w. URL http://dx.doi.org/10.1038/s41598-022-17480-w

  12. [12]

    and Aspuru-Guzik, A

    Flam-Shepherd, D. and Aspuru-Guzik, A. Language models can generate molecules, materials, and protein binding sites directly in three dimensions as xyz, cif, and pdb files, 2023. URL https://arxiv.org/abs/2305.05708

  13. [13]

    N., Duvenaud, D., Hern \'a ndez-Lobato, J

    G \'o mez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hern \'a ndez-Lobato, J. M., S \'a nchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4 0 (2): 0 268--276, 2018

  14. [14]

    Decompdiff: Diffusion models with decomposed priors for structure-based drug design, 2024

    Guan, J., Zhou, X., Yang, Y., Bao, Y., Peng, J., Ma, J., Liu, Q., Wang, L., and Gu, Q. Decompdiff: Diffusion models with decomposed priors for structure-based drug design, 2024. URL https://arxiv.org/abs/2403.07902

  15. [15]

    V., Wiest, O., and Zhang, X

    Guo, T., Guo, K., Nan, B., Liang, Z., Guo, Z., Chawla, N. V., Wiest, O., and Zhang, X. What can large language models do in chemistry? a comprehensive benchmark on eight tasks, 2023. URL https://arxiv.org/abs/2305.18365

  16. [16]

    and Gilson, M

    Heinzelmann, G. and Gilson, M. K. Automation of absolute protein-ligand binding free energy calculations for docking refinement and compound evaluation. Scientific Reports, 11 0 (1), January 2021. ISSN 2045-2322. doi:10.1038/s41598-020-80769-1. URL http://dx.doi.org/10.1038/s41598-020-80769-1

  17. [17]

    H., Eun, J

    Hong, S. H., Eun, J. W., Choi, S. K., Shen, Q., Choi, W. S., Han, J.-W., Nam, S. W., and You, J. S. Epigenetic reader brd4 inhibition as a therapeutic strategy to suppress e2f2-cell cycle regulation circuit in liver cancer. Oncotarget, 7 0 (22): 0 32628–32640, April 2016. ISSN 1949-2553. doi:10.18632/oncotarget.8701. URL http://dx.doi.org/10.18632/oncotarget.8701

  18. [18]

    G., Vignac, C., and Welling, M

    Hoogeboom, E., Satorras, V. G., Vignac, C., and Welling, M. Equivariant diffusion for molecule generation in 3d, 2022. URL https://arxiv.org/abs/2203.17003

  19. [19]

    Jensen, J. H. A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space. Chemical Science, 10 0 (12): 0 3567–3572, 2019. ISSN 2041-6539. doi:10.1039/c8sc05372c. URL http://dx.doi.org/10.1039/C8SC05372C

  20. [20]

    Junction tree variational autoencoder for molecular graph generation

    Jin, W., Barzilay, R., and Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pp.\ 2323--2332. PMLR, 2018

  21. [21]

    K., Fu, X., Liao, Y .-L., Gharakhanyan, V ., Miller, B

    Joshi, C. K., Fu, X., Liao, Y.-L., Gharakhanyan, V., Miller, B. K., Sriram, A., and Ulissi, Z. W. All-atom diffusion transformers: Unified generative modelling of molecules and materials, 2025. URL https://arxiv.org/abs/2503.03965

  22. [22]

    ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

    Lange, R. T., Imajuku, Y., and Cetin, E. Shinkaevolve: Towards open-ended and sample-efficient program evolution, 2025. URL https://arxiv.org/abs/2509.19349

  23. [23]

    Lee, S., Jo, J., and Hwang, S. J. Exploring chemical space with score-based out-of-distribution generation, 2023. URL https://arxiv.org/abs/2206.07632

  24. [24]

    Structure-informed machine learning for drug discovery: a task-centric perspective

    Li, Y., Zhan, R.-H., Rao, J., Liu, M., Sang, P., Zeng, X., Zheng, M., Li, X., and Yang, L. Structure-informed machine learning for drug discovery: a task-centric perspective. Brief. Bioinform., 27 0 (1), January 2026

  25. [25]

    Clifford group equivariant diffusion models for 3d molecular generation, 2025

    Liu, C., Vadgama, S., Ruhe, D., Bekkers, E., and Forré, P. Clifford group equivariant diffusion models for 3d molecular generation, 2025. URL https://arxiv.org/abs/2504.15773

  26. [26]

    N., and Gilson, M

    Liu, T., Lin, Y., Wen, X., Jorissen, R. N., and Gilson, M. K. Bindingdb: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research, 35 0 (Database): 0 D198–D201, January 2007. ISSN 1362-4962. doi:10.1093/nar/gkl999. URL http://dx.doi.org/10.1093/nar/gkl999

  27. [27]

    A comparative study of deep learning and classical modeling approaches for protein–ligand binding pose and affinity prediction in coronavirus main proteases

    Liu, Y., Tang, H., Niu, T., and Wang, J. A comparative study of deep learning and classical modeling approaches for protein–ligand binding pose and affinity prediction in coronavirus main proteases. Journal of Chemical Information and Modeling, 66 0 (1): 0 731--743, 2026. doi:10.1021/acs.jcim.5c02481. URL https://doi.org/10.1021/acs.jcim.5c02481. PMID: 41429653

  28. [28]

    Y-mol: A multiscale biomedical knowledge-guided large language model for drug development, 2024

    Ma, T., Lin, X., Li, T., Li, C., Chen, L., Zhou, P., Cai, X., Yang, X., Zeng, D., Cao, D., and Zeng, X. Y-mol: A multiscale biomedical knowledge-guided large language model for drug development, 2024. URL https://arxiv.org/abs/2410.11550

  29. [29]

    Illuminating search spaces by mapping elites

    Mouret, J.-B. and Clune, J. Illuminating search spaces by mapping elites, 2015. URL https://arxiv.org/abs/1504.04909

  30. [30]

    Path-aware and structure-preserving generation of synthetically accessible molecules

    Noh, J., Jeong, D.-W., Kim, K., Han, S., Lee, M., Lee, H., and Jung, Y. Path-aware and structure-preserving generation of synthetically accessible molecules. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Lea...

  31. [31]

    L., Piraud, M., and Becker, M

    Oestreich, M., Merdivan, E., Lee, M., Schultze, J. L., Piraud, M., and Becker, M. DrugDiff : small molecule diffusion model with flexible guidance towards molecular properties. J. Cheminform., 17 0 (1): 0 23, February 2025

  32. [32]

    OpenAI, :, Agarwal, S., Ahmad, L., Ai, J., Altman, S., Applebaum, A., Arbus, E., Arora, R. K., Bai, Y., Baker, B., Bao, H., Barak, B., Bennett, A., Bertao, T., Brett, N., Brevdo, E., Brockman, G., Bubeck, S., Chang, C., Chen, K., Chen, M., Cheung, E., Clark, A., Cook, D., Dukhan, M., Dvorak, C., Fives, K., Fomenko, V., Garipov, T., Georgiev, K., Glaese, M...

  33. [33]

    Organ, S. L. and Tsao, M.-S. An overview of the c-met signaling pathway. Therapeutic Advances in Medical Oncology, 3 0 (1 suppl): 0 S7–S19, November 2011. ISSN 1758-8359. doi:10.1177/1758834011422556. URL http://dx.doi.org/10.1177/1758834011422556

  34. [34]

    URL https://www.biorxiv.org/content/early/2025/06/18/2025.06.14.659707

    Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Somnath, V. R., Getz, N., Portnoi, T., Roy, J., Stark, H., Kwabi-Addo, D., Beaini, D., Jaakkola, T., and Barzilay, R. Boltz-2: Towards accurate and efficient binding affinity prediction. June 2025. doi:10.1101/2025.06.14.659707. URL http://dx.doi.org/10.1101/2025.06.14.659707

  35. [35]

    Pocket2mol: Efficient molecular sampling based on 3d protein pockets, 2025

    Peng, X., Luo, S., Guan, J., Xie, Q., Peng, J., and Ma, J. Pocket2mol: Efficient molecular sampling based on 3d protein pockets, 2025. URL https://arxiv.org/abs/2205.07249

  36. [36]

    F., Goddard, T

    Pettersen, E. F., Goddard, T. D., Huang, C. C., Meng, E. C., Couch, G. S., Croll, T. I., Morris, J. H., and Ferrin, T. E. <scp>ucsf chimerax</scp>: Structure visualization for researchers, educators, and developers. Protein Science, 30 0 (1): 0 70–82, October 2020. ISSN 1469-896X. doi:10.1002/pro.3943. URL http://dx.doi.org/10.1002/pro.3943

  37. [37]

    Druggen enhances drug discovery with large language models and reinforcement learning

    Sheikholeslami, M., Mazrouei, N., Gheisari, Y., Fasihi, A., Irajpour, M., and Motahharynia, A. Druggen enhances drug discovery with large language models and reinforcement learning. Scientific Reports, 15 0 (1), 2025. ISSN 2045-2322. doi:10.1038/s41598-025-98629-1. URL http://dx.doi.org/10.1038/s41598-025-98629-1

  38. [38]

    A., Mistryukova, L., Avchaciov, K., and Fedichev, P

    Shepard, V., Musin, A., Chebykina, K., Zeninskaya, N. A., Mistryukova, L., Avchaciov, K., and Fedichev, P. O. Harvest: Unlocking the dark bioactivity data of pharmaceutical patents via agentic ai. March 2026. doi:10.64898/2026.03.15.711910. URL http://dx.doi.org/10.64898/2026.03.15.711910

  39. [39]

    URL https: //doi.org/10.1021/acs.jcim.5b00559

    Sterling, T. and Irwin, J. J. Zinc 15 – ligand discovery for everyone. Journal of Chemical Information and Modeling, 55 0 (11): 0 2324–2337, November 2015. ISSN 1549-960X. doi:10.1021/acs.jcim.5b00559. URL http://dx.doi.org/10.1021/acs.jcim.5b00559

  40. [40]

    and Olson, A

    Trott, O. and Olson, A. J. Autodock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry, 31 0 (2): 0 455–461, June 2009. ISSN 1096-987X. doi:10.1002/jcc.21334. URL http://dx.doi.org/10.1002/jcc.21334

  41. [41]

    M., Buracas, D., Shewmake, C

    Vadgama, S., Islam, M. M., Buracas, D., Shewmake, C. A., Moskalev, A., and Bekkers, E. J. Probing equivariance and symmetry breaking in convolutional networks. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=ghyYc7hgSU

  42. [42]

    Efficient evolutionary search over chemical space with large language models, 2025

    Wang, H., Skreta, M., Ser, C.-T., Gao, W., Kong, L., Strieth-Kalthoff, F., Duan, C., Zhuang, Y., Yu, Y., Zhu, Y., Du, Y., Aspuru-Guzik, A., Neklyudov, K., and Zhang, C. Efficient evolutionary search over chemical space with large language models, 2025. URL https://arxiv.org/abs/2406.16976

  43. [43]

    White, A. D. The future of chemistry is language. Nature Reviews Chemistry, 7 0 (7): 0 457–458, May 2023. ISSN 2397-3358. doi:10.1038/s41570-023-00502-0. URL http://dx.doi.org/10.1038/s41570-023-00502-0

  44. [44]

    Prior-guided flow matching for target-aware molecule design with learnable atom number, 2025

    Zhou, J., Qian, H., Tu, S., and Xu, L. Prior-guided flow matching for target-aware molecule design with learnable atom number, 2025. URL https://arxiv.org/abs/2509.01486

  45. [45]

    Decompopt: Controllable and decomposed diffusion models for structure-based molecular optimization, 2024

    Zhou, X., Cheng, X., Yang, Y., Bao, Y., Wang, L., and Gu, Q. Decompopt: Controllable and decomposed diffusion models for structure-based molecular optimization, 2024. URL https://arxiv.org/abs/2403.13829

  46. [46]

    Sample-efficient multi-objective molecular optimization with gflownets, 2023

    Zhu, Y., Wu, J., Hu, C., Yan, J., Hsieh, C.-Y., Hou, T., and Wu, J. Sample-efficient multi-objective molecular optimization with gflownets, 2023. URL https://arxiv.org/abs/2302.04040