arxiv: 2605.12784 · v2 · submitted 2026-05-12 · 💻 cs.LG · cs.NE· q-bio.QM

Recognition: no theorem link

ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery

Andrew Y. Zhou , Sharvaree Vadgama , Sumanth Varambally , Peter Eckmann , Michael K. Gilson , Rose Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 04:52 UTC · model grok-4.3

classification 💻 cs.LG cs.NEq-bio.QM

keywords drug discoverymulti-objective optimizationgenetic algorithmlarge language modelsmolecular generationRDKit toolsagentic frameworkbinding affinity

0 comments

The pith

ToolMol pairs an LLM agent with RDKit tools inside a genetic algorithm to generate drug ligands that show stronger predicted binding than earlier methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ToolMol as a way to combine a multi-objective genetic algorithm with an agentic large language model that calls chemical tools to edit candidate molecules. Existing LLM-only approaches often produce invalid or low-quality structures because they struggle with the syntax of molecular representations. ToolMol supplies the model with a toolbox of RDKit functions so the agent can make controlled, chemically valid changes at each evolutionary step. A sympathetic reader would care because this hybrid setup promises more reliable discovery of molecules that simultaneously meet multiple goals such as binding strength, drug-likeness, and ease of synthesis. The authors report that the resulting ligands outperform prior techniques on three protein targets and on absolute binding free energy calculations.

Core claim

ToolMol achieves state-of-the-art performance on multi-objective property optimization tasks, discovering drug-like and synthesizable ligands that have greater than 10 percent stronger predicted binding affinity compared to existing methods, evaluated on three protein targets. ToolMol ligands additionally achieve state-of-the-art results in gold-standard Absolute Binding Free Energy scores, gaining over existing methods by over 35 percent. Tool-calling enables the model to more faithfully execute its planned modifications, efficiently exploiting the strong chemical prior knowledge in LLMs.

What carries the argument

The agentic LLM operator that iteratively calls RDKit-backed functions to perform precise ligand modifications inside a multi-objective genetic algorithm that evolves a population of candidate molecules.

If this is right

Ligands produced by ToolMol exhibit more than 10 percent stronger predicted binding affinity on the tested protein targets.
The same ligands reach more than 35 percent better performance on absolute binding free energy calculations than prior approaches.
Tool-calling allows the language model to follow its own chain-of-thought plans more accurately than free-form generation.
The evolutionary loop maintains a population of molecules that remain drug-like and synthesizable throughout optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent-plus-tool pattern could be applied to other molecular design problems such as materials or catalyst discovery without changing the core loop.
Adding more specialized tools for reaction prediction or toxicity filtering would likely further reduce the rate of invalid candidates.
Because the framework separates planning from execution, it may scale to larger search spaces than pure genetic algorithms or pure LLM sampling.

Load-bearing premise

The agentic LLM operator, when given RDKit tools, will reliably produce chemically valid and fitness-improving edits rather than biased or invalid structures that weaken the genetic search.

What would settle it

An experiment in which synthesized ToolMol ligands show no measurable improvement in actual binding free energy over control molecules, or in which a large fraction of proposed structures fail basic validity checks when the same prompts are run without the tool layer.

Figures

Figures reproduced from arXiv: 2605.12784 by Andrew Y. Zhou, Michael K. Gilson, Peter Eckmann, Rose Yu, Sharvaree Vadgama, Sumanth Varambally.

**Figure 1.** Figure 1: Overview of ToolMol. (a) We sample an initial ligand population from ZINC 250K. (b) Parent ligands are sampled for crossovers & mutations with probability proportional to their fitness. (c) An agent with access to a set of modification tools generates new ligands using structures from the selected parents. (d) New offspring are evaluated by an oracle for all relevant objectives. (e) A new population is for… view at source ↗

**Figure 2.** Figure 2: An example tool-calling process. The agent first decides to perform a crossover on the input molecules, utilizing crossover molecules. Then it decides to attach a methoxy group to the benzene structure, utilizing add functional group. At this point, it decides that the modifications are sufficient, and the new molecule is added to the offspring population. hit the max steps iteration budget or until the LL… view at source ↗

**Figure 3.** Figure 3: ToolMol & MOLLEO modification steps and reasoning traces. MOLLEO fails to execute its planned modifications, while ToolMol successfully executes its ideas. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: This figure shows the initial molecules, and resultant molecules after LLM modifications using MOLLEO and ToolMol. We see that MOLLEO fails to generate the required molecule We observe that while many parts of the final molecule are consistent with what is described by the reasoning trace, there are certain parts that are entirely inconsistent with the LLM’s planned modifications. For instance, it insists … view at source ↗

**Figure 5.** Figure 5: Comparison of correlation between AutoDock & ABFE and Boltz-2 & ABFE for 32 known compounds for the c-MET protein target. We observe a significantly higher correlation between Boltz-2 and ABFE as compared to AutoDock. We see that ABFE and AutoDock docking show r 2 = 0.09 among the 32 compounds, while ABFE and Boltz-2 show r 2 = 0.42. As an oracle nearly 1000x less computationally expensive than ABFE, Boltz… view at source ↗

read the original abstract

Advances in large language models (LLMs) have recently opened new and promising avenues for small-molecule drug discovery. Yet existing LLM-based approaches for molecular generation often suffer from high rates of invalid and low-quality ligand candidates, a result of the syntactic limitations of current models with regard to molecular strings. In this paper, we introduce $\texttt{ToolMol}$, an evolutionary agentic framework for de novo drug design. $\texttt{ToolMol}$ combines a multi-objective genetic algorithm with an agentic LLM operator that iteratively updates the ligand population. We build a comprehensive toolbox of RDKit-backed functions that allows our agentic operator to consisently make precise ligand modifications. $\texttt{ToolMol}$ achieves state-of-the-art performance on multi-objective property optimization tasks, discovering drug-like and synthesizable ligands that have $>10\%$ stronger predicted binding affinity compared to existing methods, evaluated on three protein targets. $\texttt{ToolMol}$ ligands additionally achieve state-of-the-art results in gold-standard Absolute Binding Free Energy scores, gaining over existing methods by over $35\%$. By studying chain-of-thought reasoning traces, we observe that tool-calling enables the model to more faithfully execute its planned modifications, efficiently exploiting the strong chemical prior knowledge in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ToolMol puts an LLM agent with RDKit tools inside a multi-objective GA to edit ligands more reliably than pure string generation, claiming clear gains on affinity and ABFE, but the operator's actual reliability is not quantified.

read the letter

ToolMol pairs a multi-objective genetic algorithm with an LLM that calls RDKit functions to edit ligand structures step by step. The main claim is that this produces drug-like molecules with stronger predicted binding to three protein targets and better absolute binding free energy scores than previous approaches. The new piece is the agentic operator that plans modifications via chain-of-thought and then executes them through the toolbox instead of relying on the LLM to output valid strings on its own. That setup directly targets the common problem of invalid or low-quality candidates in pure LLM generation. The paper shows some reasoning traces that illustrate how the model uses the tools to follow through on its plans. The paper does a reasonable job of framing the problem of invalid outputs in LLM molecular generation and proposing a tool-use fix. The multi-objective setup and the focus on drug-like, synthesizable ligands also align with real needs in early drug discovery. The main weakness is the lack of supporting numbers for the operator's reliability. There are no reported rates for successful tool calls, no count of how many proposed edits were invalid and discarded, and no comparison of the LLM edits against simpler alternatives on the same populations. Without that, the performance edge could be coming mostly from the genetic algorithm's selection rather than consistent, high-quality guidance from the agent. The abstract also skips details on the exact baselines, statistical testing, and how they handled data splits or invalid structures in the final evaluation. This kind of hybrid method will interest researchers who already work on evolutionary algorithms for molecules or who are trying to make LLMs more useful in chemistry pipelines. It is concrete enough that a serious referee could check the implementation and run the necessary controls. I would send it to peer review.

Referee Report

3 major / 2 minor

Summary. The paper introduces ToolMol, an evolutionary agentic framework that combines a multi-objective genetic algorithm with an LLM-based operator equipped with an RDKit toolbox for precise ligand modifications in de novo drug design. It claims state-of-the-art performance on multi-objective property optimization across three protein targets, with discovered ligands showing >10% stronger predicted binding affinity than prior methods and >35% gains in gold-standard Absolute Binding Free Energy (ABFE) scores, while using chain-of-thought traces to argue that tool-calling improves execution fidelity over direct LLM generation.

Significance. If the empirical claims hold after addressing the missing validation details, ToolMol would offer a meaningful step forward in LLM-assisted molecular generation by reducing invalid structures via tool use and leveraging chemical priors, with potential implications for more reliable multi-objective optimization in drug discovery. The emphasis on reproducible tool-backed edits and CoT analysis is a constructive direction.

major comments (3)

[Results section] Results section (and abstract claims): No quantitative metrics are reported on tool-call success rates, fraction of proposed edits rejected as invalid, or bias in LLM-proposed modifications, which directly undermines evaluation of the central assumption that the agentic operator with RDKit consistently produces fitness-improving edits. Without these, performance gains could stem primarily from GA selection rather than reliable operator behavior.
[Experimental evaluation] Experimental evaluation: The manuscript provides no details on baselines used for the >10% affinity and >35% ABFE comparisons, statistical tests for significance, data splits, or handling of invalid molecules, leaving the SOTA claims difficult to assess or reproduce.
[Methods] Methods (toolbox and operator description): There is no ablation comparing LLM-guided edits against random or rule-based edits on the same population, which is needed to attribute gains specifically to the agentic LLM operator rather than the underlying GA framework.

minor comments (2)

[Methods] Clarify the exact definition and computation of the multi-objective fitness function and how synthesizability is quantified in the reported results.
[Results] Ensure all tables reporting performance include standard deviations or confidence intervals for the affinity and ABFE metrics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of ToolMol's contributions. We address each major point below and commit to revisions that strengthen the empirical support for the agentic operator without overstating current results.

read point-by-point responses

Referee: Results section (and abstract claims): No quantitative metrics are reported on tool-call success rates, fraction of proposed edits rejected as invalid, or bias in LLM-proposed modifications, which directly undermines evaluation of the central assumption that the agentic operator with RDKit consistently produces fitness-improving edits. Without these, performance gains could stem primarily from GA selection rather than reliable operator behavior.

Authors: We agree that explicit quantitative metrics on tool-call success rates, invalid edit rejection fractions, and modification bias were not reported in the submitted version, which limits direct assessment of operator reliability. The CoT traces provide qualitative evidence of faithful execution, but we will add a dedicated analysis subsection in Results reporting these metrics computed from the experimental logs (e.g., success rate of RDKit calls, rejection rate due to chemical validity checks, and distribution of edit types). This will allow readers to evaluate whether gains arise from the operator or solely from GA selection pressure. revision: yes
Referee: Experimental evaluation: The manuscript provides no details on baselines used for the >10% affinity and >35% ABFE comparisons, statistical tests for significance, data splits, or handling of invalid molecules, leaving the SOTA claims difficult to assess or reproduce.

Authors: We acknowledge the lack of explicit details on baselines, statistical tests, data splits, and invalid-molecule handling in the main text, which hinders reproducibility. In revision we will expand the Experimental evaluation section to specify the exact baselines (REINVENT, GraphGA, and direct LLM prompting variants from prior literature on the same targets), the statistical tests applied (e.g., Wilcoxon rank-sum tests with reported p-values), the data splits used, and the RDKit-based sanitization pipeline for filtering invalid molecules before fitness evaluation. These clarifications will be added without altering the reported performance numbers. revision: yes
Referee: Methods (toolbox and operator description): There is no ablation comparing LLM-guided edits against random or rule-based edits on the same population, which is needed to attribute gains specifically to the agentic LLM operator rather than the underlying GA framework.

Authors: We agree that a direct ablation isolating the LLM-guided operator from the GA framework would strengthen attribution of gains. While our existing comparisons to non-agentic baselines provide supporting evidence, we will add a targeted ablation in the revised Methods and Results: a controlled comparison of the full ToolMol operator against a rule-based edit variant (predefined RDKit transformations without LLM reasoning) on the same initial populations for at least one target. This will quantify the incremental benefit of the agentic component. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on external benchmarks

full rationale

The paper's central claims concern empirical SOTA results on multi-objective optimization and ABFE scores evaluated against prior methods on three protein targets. These are external comparisons using standard benchmarks and gold-standard calculations, not quantities defined by the method's own parameters or self-referential metrics. No equations, derivations, or load-bearing self-citations appear in the abstract or described framework that would reduce predictions to inputs by construction. The agentic LLM + RDKit operator is presented as an implementation detail whose reliability is asserted via CoT traces, but the performance numbers themselves do not collapse into fitted or renamed internal quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method implicitly relies on standard GA operators and LLM priors but does not introduce new postulated entities.

pith-pipeline@v0.9.0 · 5547 in / 1253 out tokens · 38158 ms · 2026-05-15T04:52:05.971627+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 5 internal anchors

[1]

R., Paolini, G

Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S., and Hopkins, A. L. Quantifying the chemical beauty of drugs. Nature Chemistry, 4 0 (2): 0 90–98, January 2012. ISSN 1755-4349. doi:10.1038/nchem.1243. URL http://dx.doi.org/10.1038/nchem.1243

work page doi:10.1038/nchem.1243 2012
[2]

A., MacKnight, R., Kline, B., and Gomes, G

Boiko, D. A., MacKnight, R., Kline, B., and Gomes, G. Autonomous chemical research with large language models. Nature, 624 0 (7992): 0 570--578, 2023

work page 2023
[3]

ChemCrow: Augmenting large-language models with chemistry tools

Bran, A. M., Cox, S., Schilter, O., Baldassari, C., White, A. D., and Schwaller, P. Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376, 2023

work page internal anchor Pith review arXiv 2023
[4]

El Agente Estructural: An Artificially Intelligent Molecular Editor

Choi, C., Zou, Y., Müller, M., Hao, H., Kang, Y., Pérez-Sánchez, J. B., Gustin, I., Xu, H., Wang, A., Vakili, M. G., Crebolder, C., Aspuru-Guzik, A., and Bernales, V. El agente estructural: An artificially intelligent molecular editor, 2026. URL https://arxiv.org/abs/2602.04849

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

A., Fernandez Prada, D

Crucitti, D., Pérez Míguez, C., Díaz Arias, J. A., Fernandez Prada, D. B., and Mosquera Orgueira, A. De novo drug design through artificial intelligence: an introduction. Frontiers in Hematology, Volume 3 - 2024, 2024. ISSN 2813-3935. doi:10.3389/frhem.2024.1305741. URL https://www.frontiersin.org/journals/hematology/articles/10.3389/frhem.2024.1305741

work page doi:10.3389/frhem.2024.1305741 2024
[6]

Dorna, V., Subhalingam, D., Kolluru, K., Tuli, S., Singh, M., Singal, S., Krishnan, N. M. A., and Ranu, S. Tagmol: Target-aware gradient-guided molecule generation, 2024. URL https://arxiv.org/abs/2406.01650

work page arXiv 2024
[7]

Dunn, I., Toft, L., Katz, T., Gupta, J., Shah, R., Hettiarachchi, R., and Koes, D. R. Omtra: A multi-task generative model for structure-based drug design, 2025. URL https://arxiv.org/abs/2512.05080

work page arXiv 2025
[8]

K., and Yu, R

Eckmann, P., Sun, K., Zhao, B., Feng, M., Gilson, M. K., and Yu, R. Limo: Latent inceptionism for targeted molecule generation, 2022. URL https://arxiv.org/abs/2206.09010

work page arXiv 2022
[9]

K., and Yu, R

Eckmann, P., Wu, D., Heinzelmann, G., Gilson, M. K., and Yu, R. Mf-lal: Drug compound generation using multi-fidelity latent space active learning, 2025. URL https://arxiv.org/abs/2410.11226

work page arXiv 2025
[10]

and Schuffenhauer, A

Ertl, P. and Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1 0 (1), June 2009. ISSN 1758-2946. doi:10.1186/1758-2946-1-8. URL http://dx.doi.org/10.1186/1758-2946-1-8

work page doi:10.1186/1758-2946-1-8 2009
[11]

Feng, M., Heinzelmann, G., and Gilson, M. K. Absolute binding free energy calculations improve enrichment of actives in virtual compound screening. Scientific Reports, 12 0 (1), August 2022. ISSN 2045-2322. doi:10.1038/s41598-022-17480-w. URL http://dx.doi.org/10.1038/s41598-022-17480-w

work page doi:10.1038/s41598-022-17480-w 2022
[12]

and Aspuru-Guzik, A

Flam-Shepherd, D. and Aspuru-Guzik, A. Language models can generate molecules, materials, and protein binding sites directly in three dimensions as xyz, cif, and pdb files, 2023. URL https://arxiv.org/abs/2305.05708

work page arXiv 2023
[13]

N., Duvenaud, D., Hern \'a ndez-Lobato, J

G \'o mez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hern \'a ndez-Lobato, J. M., S \'a nchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4 0 (2): 0 268--276, 2018

work page 2018
[14]

Decompdiff: Diffusion models with decomposed priors for structure-based drug design, 2024

Guan, J., Zhou, X., Yang, Y., Bao, Y., Peng, J., Ma, J., Liu, Q., Wang, L., and Gu, Q. Decompdiff: Diffusion models with decomposed priors for structure-based drug design, 2024. URL https://arxiv.org/abs/2403.07902

work page arXiv 2024
[15]

V., Wiest, O., and Zhang, X

Guo, T., Guo, K., Nan, B., Liang, Z., Guo, Z., Chawla, N. V., Wiest, O., and Zhang, X. What can large language models do in chemistry? a comprehensive benchmark on eight tasks, 2023. URL https://arxiv.org/abs/2305.18365

work page arXiv 2023
[16]

and Gilson, M

Heinzelmann, G. and Gilson, M. K. Automation of absolute protein-ligand binding free energy calculations for docking refinement and compound evaluation. Scientific Reports, 11 0 (1), January 2021. ISSN 2045-2322. doi:10.1038/s41598-020-80769-1. URL http://dx.doi.org/10.1038/s41598-020-80769-1

work page doi:10.1038/s41598-020-80769-1 2021
[17]

H., Eun, J

Hong, S. H., Eun, J. W., Choi, S. K., Shen, Q., Choi, W. S., Han, J.-W., Nam, S. W., and You, J. S. Epigenetic reader brd4 inhibition as a therapeutic strategy to suppress e2f2-cell cycle regulation circuit in liver cancer. Oncotarget, 7 0 (22): 0 32628–32640, April 2016. ISSN 1949-2553. doi:10.18632/oncotarget.8701. URL http://dx.doi.org/10.18632/oncotarget.8701

work page doi:10.18632/oncotarget.8701 2016
[18]

G., Vignac, C., and Welling, M

Hoogeboom, E., Satorras, V. G., Vignac, C., and Welling, M. Equivariant diffusion for molecule generation in 3d, 2022. URL https://arxiv.org/abs/2203.17003

work page arXiv 2022
[19]

Jensen, J. H. A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space. Chemical Science, 10 0 (12): 0 3567–3572, 2019. ISSN 2041-6539. doi:10.1039/c8sc05372c. URL http://dx.doi.org/10.1039/C8SC05372C

work page doi:10.1039/c8sc05372c 2019
[20]

Junction tree variational autoencoder for molecular graph generation

Jin, W., Barzilay, R., and Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pp.\ 2323--2332. PMLR, 2018

work page 2018
[21]

K., Fu, X., Liao, Y .-L., Gharakhanyan, V ., Miller, B

Joshi, C. K., Fu, X., Liao, Y.-L., Gharakhanyan, V., Miller, B. K., Sriram, A., and Ulissi, Z. W. All-atom diffusion transformers: Unified generative modelling of molecules and materials, 2025. URL https://arxiv.org/abs/2503.03965

work page arXiv 2025
[22]

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

Lange, R. T., Imajuku, Y., and Cetin, E. Shinkaevolve: Towards open-ended and sample-efficient program evolution, 2025. URL https://arxiv.org/abs/2509.19349

work page internal anchor Pith review arXiv 2025
[23]

Lee, S., Jo, J., and Hwang, S. J. Exploring chemical space with score-based out-of-distribution generation, 2023. URL https://arxiv.org/abs/2206.07632

work page arXiv 2023
[24]

Structure-informed machine learning for drug discovery: a task-centric perspective

Li, Y., Zhan, R.-H., Rao, J., Liu, M., Sang, P., Zeng, X., Zheng, M., Li, X., and Yang, L. Structure-informed machine learning for drug discovery: a task-centric perspective. Brief. Bioinform., 27 0 (1), January 2026

work page 2026
[25]

Clifford group equivariant diffusion models for 3d molecular generation, 2025

Liu, C., Vadgama, S., Ruhe, D., Bekkers, E., and Forré, P. Clifford group equivariant diffusion models for 3d molecular generation, 2025. URL https://arxiv.org/abs/2504.15773

work page arXiv 2025
[26]

N., and Gilson, M

Liu, T., Lin, Y., Wen, X., Jorissen, R. N., and Gilson, M. K. Bindingdb: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research, 35 0 (Database): 0 D198–D201, January 2007. ISSN 1362-4962. doi:10.1093/nar/gkl999. URL http://dx.doi.org/10.1093/nar/gkl999

work page doi:10.1093/nar/gkl999 2007
[27]

A comparative study of deep learning and classical modeling approaches for protein–ligand binding pose and affinity prediction in coronavirus main proteases

Liu, Y., Tang, H., Niu, T., and Wang, J. A comparative study of deep learning and classical modeling approaches for protein–ligand binding pose and affinity prediction in coronavirus main proteases. Journal of Chemical Information and Modeling, 66 0 (1): 0 731--743, 2026. doi:10.1021/acs.jcim.5c02481. URL https://doi.org/10.1021/acs.jcim.5c02481. PMID: 41429653

work page doi:10.1021/acs.jcim.5c02481 2026
[28]

Y-mol: A multiscale biomedical knowledge-guided large language model for drug development, 2024

Ma, T., Lin, X., Li, T., Li, C., Chen, L., Zhou, P., Cai, X., Yang, X., Zeng, D., Cao, D., and Zeng, X. Y-mol: A multiscale biomedical knowledge-guided large language model for drug development, 2024. URL https://arxiv.org/abs/2410.11550

work page arXiv 2024
[29]

Illuminating search spaces by mapping elites

Mouret, J.-B. and Clune, J. Illuminating search spaces by mapping elites, 2015. URL https://arxiv.org/abs/1504.04909

work page internal anchor Pith review Pith/arXiv arXiv 2015
[30]

Path-aware and structure-preserving generation of synthetically accessible molecules

Noh, J., Jeong, D.-W., Kim, K., Han, S., Lee, M., Lee, H., and Jung, Y. Path-aware and structure-preserving generation of synthetically accessible molecules. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Lea...

work page 2022
[31]

L., Piraud, M., and Becker, M

Oestreich, M., Merdivan, E., Lee, M., Schultze, J. L., Piraud, M., and Becker, M. DrugDiff : small molecule diffusion model with flexible guidance towards molecular properties. J. Cheminform., 17 0 (1): 0 23, February 2025

work page 2025
[32]

OpenAI, :, Agarwal, S., Ahmad, L., Ai, J., Altman, S., Applebaum, A., Arbus, E., Arora, R. K., Bai, Y., Baker, B., Bao, H., Barak, B., Bennett, A., Bertao, T., Brett, N., Brevdo, E., Brockman, G., Bubeck, S., Chang, C., Chen, K., Chen, M., Cheung, E., Clark, A., Cook, D., Dukhan, M., Dvorak, C., Fives, K., Fomenko, V., Garipov, T., Georgiev, K., Glaese, M...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Organ, S. L. and Tsao, M.-S. An overview of the c-met signaling pathway. Therapeutic Advances in Medical Oncology, 3 0 (1 suppl): 0 S7–S19, November 2011. ISSN 1758-8359. doi:10.1177/1758834011422556. URL http://dx.doi.org/10.1177/1758834011422556

work page doi:10.1177/1758834011422556 2011
[34]

URL https://www.biorxiv.org/content/early/2025/06/18/2025.06.14.659707

Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Somnath, V. R., Getz, N., Portnoi, T., Roy, J., Stark, H., Kwabi-Addo, D., Beaini, D., Jaakkola, T., and Barzilay, R. Boltz-2: Towards accurate and efficient binding affinity prediction. June 2025. doi:10.1101/2025.06.14.659707. URL http://dx.doi.org/10.1101/2025.06.14.659707

work page doi:10.1101/2025.06.14.659707 2025
[35]

Pocket2mol: Efficient molecular sampling based on 3d protein pockets, 2025

Peng, X., Luo, S., Guan, J., Xie, Q., Peng, J., and Ma, J. Pocket2mol: Efficient molecular sampling based on 3d protein pockets, 2025. URL https://arxiv.org/abs/2205.07249

work page arXiv 2025
[36]

F., Goddard, T

Pettersen, E. F., Goddard, T. D., Huang, C. C., Meng, E. C., Couch, G. S., Croll, T. I., Morris, J. H., and Ferrin, T. E. <scp>ucsf chimerax</scp>: Structure visualization for researchers, educators, and developers. Protein Science, 30 0 (1): 0 70–82, October 2020. ISSN 1469-896X. doi:10.1002/pro.3943. URL http://dx.doi.org/10.1002/pro.3943

work page doi:10.1002/pro.3943 2020
[37]

Druggen enhances drug discovery with large language models and reinforcement learning

Sheikholeslami, M., Mazrouei, N., Gheisari, Y., Fasihi, A., Irajpour, M., and Motahharynia, A. Druggen enhances drug discovery with large language models and reinforcement learning. Scientific Reports, 15 0 (1), 2025. ISSN 2045-2322. doi:10.1038/s41598-025-98629-1. URL http://dx.doi.org/10.1038/s41598-025-98629-1

work page doi:10.1038/s41598-025-98629-1 2025
[38]

A., Mistryukova, L., Avchaciov, K., and Fedichev, P

Shepard, V., Musin, A., Chebykina, K., Zeninskaya, N. A., Mistryukova, L., Avchaciov, K., and Fedichev, P. O. Harvest: Unlocking the dark bioactivity data of pharmaceutical patents via agentic ai. March 2026. doi:10.64898/2026.03.15.711910. URL http://dx.doi.org/10.64898/2026.03.15.711910

work page doi:10.64898/2026.03.15.711910 2026
[39]

URL https: //doi.org/10.1021/acs.jcim.5b00559

Sterling, T. and Irwin, J. J. Zinc 15 – ligand discovery for everyone. Journal of Chemical Information and Modeling, 55 0 (11): 0 2324–2337, November 2015. ISSN 1549-960X. doi:10.1021/acs.jcim.5b00559. URL http://dx.doi.org/10.1021/acs.jcim.5b00559

work page doi:10.1021/acs.jcim.5b00559 2015
[40]

and Olson, A

Trott, O. and Olson, A. J. Autodock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry, 31 0 (2): 0 455–461, June 2009. ISSN 1096-987X. doi:10.1002/jcc.21334. URL http://dx.doi.org/10.1002/jcc.21334

work page doi:10.1002/jcc.21334 2009
[41]

M., Buracas, D., Shewmake, C

Vadgama, S., Islam, M. M., Buracas, D., Shewmake, C. A., Moskalev, A., and Bekkers, E. J. Probing equivariance and symmetry breaking in convolutional networks. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=ghyYc7hgSU

work page 2026
[42]

Efficient evolutionary search over chemical space with large language models, 2025

Wang, H., Skreta, M., Ser, C.-T., Gao, W., Kong, L., Strieth-Kalthoff, F., Duan, C., Zhuang, Y., Yu, Y., Zhu, Y., Du, Y., Aspuru-Guzik, A., Neklyudov, K., and Zhang, C. Efficient evolutionary search over chemical space with large language models, 2025. URL https://arxiv.org/abs/2406.16976

work page arXiv 2025
[43]

White, A. D. The future of chemistry is language. Nature Reviews Chemistry, 7 0 (7): 0 457–458, May 2023. ISSN 2397-3358. doi:10.1038/s41570-023-00502-0. URL http://dx.doi.org/10.1038/s41570-023-00502-0

work page doi:10.1038/s41570-023-00502-0 2023
[44]

Prior-guided flow matching for target-aware molecule design with learnable atom number, 2025

Zhou, J., Qian, H., Tu, S., and Xu, L. Prior-guided flow matching for target-aware molecule design with learnable atom number, 2025. URL https://arxiv.org/abs/2509.01486

work page arXiv 2025
[45]

Decompopt: Controllable and decomposed diffusion models for structure-based molecular optimization, 2024

Zhou, X., Cheng, X., Yang, Y., Bao, Y., Wang, L., and Gu, Q. Decompopt: Controllable and decomposed diffusion models for structure-based molecular optimization, 2024. URL https://arxiv.org/abs/2403.13829

work page arXiv 2024
[46]

Sample-efficient multi-objective molecular optimization with gflownets, 2023

Zhu, Y., Wu, J., Hu, C., Yan, J., Hsieh, C.-Y., Hou, T., and Wu, J. Sample-efficient multi-objective molecular optimization with gflownets, 2023. URL https://arxiv.org/abs/2302.04040

work page arXiv 2023