My Chemical Harness: Evolutionary Molecular Design over Synthetic Pathways with Large Language Model Agents

C\'esar Ojeda; Darius A. Faroughy; Mart\'in Carballo-Pacheco; Maryam Karimi; Mir Mehdi Seyedebrahimi; Payam Zarrintaj

arxiv: 2606.11256 · v1 · pith:MXSGKKBTnew · submitted 2026-06-08 · ⚛️ physics.chem-ph · cs.LG· cs.NE

My Chemical Harness: Evolutionary Molecular Design over Synthetic Pathways with Large Language Model Agents

C\'esar Ojeda , Darius A. Faroughy , Maryam Karimi , Payam Zarrintaj , Mir Mehdi Seyedebrahimi , Mart\'in Carballo-Pacheco This is my paper

Pith reviewed 2026-06-27 14:23 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cs.LGcs.NE

keywords evolutionary molecular designsynthetic pathwaysLLM agentssoluble epoxide hydrolaseroute-native searchsynthetic accessibilityAiZynthFinder

0 comments

The pith

LLM agents controlling high-level preferences in an evolutionary search over executable synthetic routes outperform both direct LLM generation and deterministic controllers on a soluble epoxide hydrolase proxy task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework in which the search population consists of complete synthetic pathways built from purchasable blocks and reaction templates rather than standalone molecules. Large language models act solely as strategy controllers that choose preferences for route length, move type, reaction families, motifs, and exploration pressure; all route construction, validation, deduplication, scoring, and memory management remain deterministic. On the sEH design task the resulting agent records higher sEH scores, better synthetic accessibility, and higher AiZynthFinder success rates than single-pass LLM baselines or purely deterministic controllers. This separation is presented as sufficient to let the LLM guide exploration productively while eliminating hallucinated products or unsupported steps. The work therefore claims that constrained LLM agents can contribute to molecular discovery without any task-specific training or fine-tuning.

Core claim

By populating an evolutionary algorithm with executable synthetic pathways and restricting the LLM to high-level strategic preferences, the My Chemical Harness framework achieves state-of-the-art results on the sEH proxy task across the sEH score, synthetic accessibility score, and AiZynthFinder success rate, while deterministic chemistry tools guarantee route validity.

What carries the argument

LLM as high-level strategy controller that selects preferences over route length, move type, reaction families, motifs, and exploration pressure; deterministic code executes all route construction, validation, scoring, selection, and memory updates.

If this is right

Molecules discovered by the method come with verified synthetic routes by construction.
The same controller can be swapped across different molecular oracles without retraining.
Performance gains arise from the evolutionary loop over routes rather than from any chemical knowledge inside the LLM.
No dedicated generative model or fine-tuning step is required for the observed improvements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on additional oracle tasks such as docking scores or ADMET properties to check whether the same preference vocabulary remains effective.
If the preference set proves insufficient for more complex targets, the framework would require either richer controller outputs or additional deterministic heuristics.
Because routes are stored and deduplicated, the memory component may scale to larger design campaigns than molecule-only evolutionary methods.

Load-bearing premise

That high-level preferences chosen by the LLM are sufficient to produce useful search guidance even though the model never proposes or validates any actual chemical steps.

What would settle it

A controlled run on the same sEH task in which the LLM controller is replaced by random or fixed preferences and the performance metrics fall to or below those of the deterministic baseline.

Figures

Figures reproduced from arXiv: 2606.11256 by C\'esar Ojeda, Darius A. Faroughy, Mart\'in Carballo-Pacheco, Maryam Karimi, Mir Mehdi Seyedebrahimi, Payam Zarrintaj.

**Figure 1.** Figure 1: Example of an executable synthetic reaction route used as the search object in route native molecular optimization. are described through crystallographic representations. In the present work, we bring these capabilities to the search for synthesizable molecules and represent candidates through synthetic routes. Our approach differs from synthesis aware generative models and posthoc projection methods6,13… view at source ↗

**Figure 2.** Figure 2: System architecture for My Chemical Harness, showing the interaction between LLM guided strategy generation, route sampling, deterministic execution, scoring, and memory conditioned evolutionary search. 3. LLM draft strategy. The LLM then proposes an initial plan for where the search should spend its effort. Using the task, memory, parent routes, query results, and previous learning report, it produces a s… view at source ↗

**Figure 3.** Figure 3: sEH case study for reflective route optimization. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Create New Route. This control ignores the parent population and builds a fresh executable route from newly instantiated reaction steps. 2. Extend Parent. This is an exploitative growth move. Given a parent route r = (s1, . . . , sL), the full parent is copied exactly and the sampler appends one or more new executable steps: (s1, . . . , sL) −→ (s1, . . . , sL, s′ L+1, . . . , s′ L′), s′ j ∈ S. (10) This p… view at source ↗

**Figure 5.** Figure 5: Extend Parent. This control preserves the full parent route and appends at least one new executable reaction step. 3. Mutate Parent Prefix. This is a local route repair or route variation move. A route position choice ht determines the preserved prefix length k: an early mutation keeps little or none of the parent, a middle mutation keeps an intermediate prefix, and a final step mutation keeps most of the … view at source ↗

**Figure 6.** Figure 6: Mutate Parent Prefix. This control preserves a prefix of the parent route, discards the suffix, and regrows the remainder using locally compatible SMARTS/SMILES choices. 4. Insert Route Step. This is a route depth expansion move. Given a parent route, the sampler chooses an insertion position, keeps the unaffected part of the route, and inserts a new executable reaction layer. Because the inserted intermed… view at source ↗

**Figure 7.** Figure 7: Insert Route Step. This control inserts a new reaction layer inside a parent route and repairs the affected product side connection before validation. 5. Delete Route Step. This is a route shortening move. The sampler removes one selected route step and keeps the remaining executable subtree: (s1, . . . , si , si+1, . . . , sL) −→ (s1, . . . , si , . . . , sL−1). (13) No new reaction is added by this opera… view at source ↗

**Figure 8.** Figure 8: Delete Route Step. This control removes a route step, retains the remaining executable subtree, and validates the shorter route. 6. Substitute Building Block. This is the most local reactant level move. The reaction context is preserved, but one building block leaf is replaced by a new compatible building block sampled from B: si = (Ri , b1, . . . , bm) −→ s ′ i = (Ri , b1, . . . , b′ q , . . . , bm), b′ q… view at source ↗

**Figure 9.** Figure 9: Substitute Building Block. This control keeps the reaction context and replaces one building block input with a newly sampled compatible building block. 7. Substitute Route Step. This is a step level replacement move. The sampler selects one step, preserves the unaffected prefix, and replaces the selected reaction with a newly sampled compatible step. If the replaced step changes an intermediate needed dow… view at source ↗

**Figure 10.** Figure 10: Substitute Route Step. This control preserves the unaffected prefix, replaces one route step, and repairs or regrows the affected downstream connection. Route length reconciliation. The prefix preserving control applies an additional route length reconciliation. If k steps are kept, the final route must contain at least one newly sampled step after that prefix: Lactual = min (Lmax, max (Lt , k + 1)). (16)… view at source ↗

read the original abstract

Designing molecules with target properties is most useful when candidate structures are accompanied by feasible synthetic routes. We introduce My Chemical Harness, a route-native evolutionary framework for goal-directed molecular design in which the search population consists of executable synthetic pathways rather than isolated molecular graphs. Each route is built from purchasable building blocks and reaction templates, executed by deterministic chemistry tools, and scored through task-specific molecular oracles. Large language models (LLMs) are used only as strategy controllers that select high-level preferences over route length, move type, reaction families, motifs, and exploration pressure, while local code performs route construction, validation, deduplication, scoring, selection, and memory updates. This separation lets the LLM guide exploration without allowing it to introduce hallucinated products or unsupported reaction steps. On a soluble epoxide hydrolase proxy task, our LLM agent improves over single pass LLM and deterministic controllers, reaching state-of-the-art performance across the sEH score, synthetic accessibility score, and AiZynthFinder success rate metrics. These results suggest that constrained LLM agents can play a significant role in molecular discovery without requiring training, fine-tuning, or dedicated generative models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Restricting the LLM to high-level preferences while deterministic code builds and validates routes is a clean safety move, but the abstract gives no numbers or ablations to show those preferences actually drive the reported gains.

read the letter

The paper's main idea is to run evolutionary search over full synthetic pathways instead of molecules, with the LLM limited to picking preferences like route length, reaction families, and exploration pressure. Deterministic tools then build, score, and deduplicate the routes. This separation is presented as a way to get LLM guidance without hallucinated chemistry.

It does one thing clearly: it frames the problem so the model never proposes a reaction step or product itself. That matches a real practical constraint in applied molecular design. The proxy task on soluble epoxide hydrolase is a reasonable choice for testing route-aware optimization.

The soft spot is the evidence. The abstract states the LLM agent reaches SOTA on sEH score, SA score, and AiZynthFinder success rate, beating single-pass LLM and deterministic controllers. No values, no error bars, no run counts, and no ablation isolating the effect of the preference signals appear in the provided text. Without those, the claim that the high-level preferences are doing chemically useful work rather than acting as generic search tweaks cannot be checked. The stress-test concern about matched budgets and informative signals therefore stands on the current information.

This is for labs already running evolutionary or oracle-based molecular design who want to test LLM controllers without full generative models. The citation pattern looks standard and the architecture is internally consistent. It deserves a serious referee once the methods and results sections supply the missing quantitative controls; the idea is worth checking even if the gains turn out modest.

Referee Report

3 major / 2 minor

Summary. The paper introduces My Chemical Harness, a route-native evolutionary framework for goal-directed molecular design. The search population consists of executable synthetic pathways built from purchasable building blocks and reaction templates. LLMs are used solely as strategy controllers that select high-level preferences over route length, move type, reaction families, motifs, and exploration pressure; all route construction, validation, deduplication, scoring, selection, and memory updates are performed by deterministic chemistry tools. On a soluble epoxide hydrolase (sEH) proxy task, the LLM agent is claimed to improve over single-pass LLM and deterministic controllers, reaching state-of-the-art performance on the sEH score, synthetic accessibility score, and AiZynthFinder success rate metrics.

Significance. If the performance claims hold under rigorous evaluation, the work offers a practical template for integrating LLMs into molecular discovery while strictly separating high-level strategy from chemical execution. This constrained role for the LLM avoids hallucinated reactions and allows reuse of existing deterministic oracles and route planners. The explicit credit for reproducible separation of concerns and the use of off-the-shelf LLMs without fine-tuning are strengths that could be adopted more broadly if the empirical gains are shown to be robust.

major comments (3)

[Results] Results section: the central claim that the LLM agent reaches SOTA across sEH score, SA score, and AiZynthFinder success rate is presented without any reported numerical values, error bars, number of independent runs, or statistical tests. This absence makes it impossible to assess whether the reported improvements are statistically meaningful or reproducible.
[Methods] Methods section: no ablation is described that replaces the LLM-derived high-level preferences with fixed heuristics or random preferences while keeping the same evolutionary machinery, population size, and evaluation budget. Without this control, it is unclear whether the performance gain is attributable to chemically informative guidance or simply to the evolutionary search itself.
[Experimental Setup] Experimental setup: the comparison to deterministic controllers does not state whether the total number of oracle evaluations, generations, or wall-clock time was matched across conditions. Matched budgets are required to attribute any advantage specifically to the LLM preference signals rather than differences in search effort.

minor comments (2)

[Abstract] Abstract: the phrase 'state-of-the-art performance' is used without citing the specific prior methods or numerical thresholds that define the current SOTA on the sEH proxy task.
[Methods] The description of the preference vocabulary (route length, move type, reaction families, motifs, exploration pressure) would benefit from an explicit enumeration or table of the discrete options available to the LLM at each decision point.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which identify key gaps in reporting and experimental controls. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Results] Results section: the central claim that the LLM agent reaches SOTA across sEH score, SA score, and AiZynthFinder success rate is presented without any reported numerical values, error bars, number of independent runs, or statistical tests. This absence makes it impossible to assess whether the reported improvements are statistically meaningful or reproducible.

Authors: We agree that the absence of numerical values, error bars, run counts, and statistical tests prevents proper assessment of the claims. In the revised manuscript, we will add a dedicated results table reporting mean performance metrics with standard deviations across multiple independent runs (minimum of five), along with appropriate statistical comparisons (e.g., t-tests or Wilcoxon tests) against baselines. This will include the sEH score, SA score, and AiZynthFinder success rate. revision: yes
Referee: [Methods] Methods section: no ablation is described that replaces the LLM-derived high-level preferences with fixed heuristics or random preferences while keeping the same evolutionary machinery, population size, and evaluation budget. Without this control, it is unclear whether the performance gain is attributable to chemically informative guidance or simply to the evolutionary search itself.

Authors: This comment correctly identifies a missing control. We will add an ablation study to the Methods and Results sections in the revision. The study will compare the LLM preference controller against (i) random preference selection and (ii) fixed heuristic rules, while holding the evolutionary machinery, population size, and total evaluation budget constant. Performance differences will be quantified and reported. revision: yes
Referee: [Experimental Setup] Experimental setup: the comparison to deterministic controllers does not state whether the total number of oracle evaluations, generations, or wall-clock time was matched across conditions. Matched budgets are required to attribute any advantage specifically to the LLM preference signals rather than differences in search effort.

Authors: We acknowledge that budget matching must be explicitly documented. The original experiments were designed with matched generation counts and oracle evaluation limits, but this was not stated clearly. In the revision, we will add explicit statements confirming that all compared conditions use identical total oracle evaluations and generations; wall-clock times will also be reported. If any prior runs deviated, we will rerun under strictly matched conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on external benchmarks

full rationale

The manuscript reports an empirical comparison of an LLM agent versus single-pass LLM and deterministic controllers on a soluble epoxide hydrolase proxy task, measuring sEH score, synthetic accessibility, and AiZynthFinder success rate. No equations, parameter fits, or derivations are presented whose outputs are forced by construction from the inputs. The separation of high-level LLM preferences from deterministic route construction, validation, and scoring is described procedurally rather than as a mathematical reduction; results are obtained by running the system on external oracles and reporting observed metrics. The work is therefore self-contained against independent benchmarks with no load-bearing step that collapses to self-definition, fitted prediction, or self-citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the ledger is populated from the high-level description alone. No free parameters, axioms, or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption Deterministic chemistry tools can reliably execute and validate proposed reaction sequences without error.
The framework depends on the correctness of the external chemistry execution layer.

pith-pipeline@v0.9.1-grok · 5770 in / 1327 out tokens · 19932 ms · 2026-06-27T14:23:09.693952+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 28 canonical work pages · 5 internal anchors

[1]

S.; Reid, M.; Matsuo, Y.; Iwasawa, Y.Adv

Kojima, T.; Gu, S. S.; Reid, M.; Matsuo, Y.; Iwasawa, Y.Adv. Neural Inf. Process. Syst. 2022,35, 22199–22213

2022
[2]

Program Synthesis with Large Language Models

Austin, J.; Odena, A.; Nye, M.; Bosma, M.; Michalewski, H.; Dohan, D.; Jiang, E.; Cai, C.; Terry, M.; Le, Q.; Sutton, C. Program Synthesis with Large Language Models. arXiv, Version 1, August 16, 2021; DOI: 10.48550/arXiv.2108.07732

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.07732 2021
[3]

Chen, M. et al. Evaluating Large Language Models Trained on Code. arXiv, Version 2, July 14, 2021; DOI: 10.48550/arXiv.2107.03374

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374 2021
[4]

Score-Based Generative Modeling through Stochastic Differential Equations

Song, Y.; Sohl-Dickstein, J.; Kingma, D. P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv, Version 1, February 10, 2020; DOI: 10.48550/arXiv.2011.13456

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2011.13456 2020
[5]

Mater.2024,10, 273, DOI: 10.1038/s41524-024-01466-5

Qiu, H.; Sun, Z.-Y.npj Comput. Mater.2024,10, 273, DOI: 10.1038/s41524-024-01466-5

work page doi:10.1038/s41524-024-01466-5 2024
[6]

Gao, W.; Luo, S.; Coley, C. W.Proc. Natl. Acad. Sci. U.S.A.2025,122, e2415665122, DOI: 10.1073/pnas.2415665122. 23

work page doi:10.1073/pnas.2415665122 2025
[7]

M.; Ros, K.; Honke, G.; Cho, K.; Ji, H

Edwards, C.; Lai, T. M.; Ros, K.; Honke, G.; Cho, K.; Ji, H. Translation between Molecules and Natural Language. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; Goldberg, Y.; Kozareva, Z.; Zhang, Y., Eds.; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022; pp 375–413. DOI: 10.18653/v...

work page doi:10.18653/v1/2022.emnlp-main.26 2022
[8]

G.; Vignac, C.; Welling, M

Hoogeboom, E.; Satorras, V. G.; Vignac, C.; Welling, M. Equivariant Diffusion for Molecule Generation in 3D. InProceedings of the 39th International Conference on Machine Learning; Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; Sabato, S., Eds.; PMLR, 2022; Proceedings of Machine Learning Research, Vol. 162, pp 8867–8887. URL:https: //proc...

2022
[9]

Dunn, I.; Koes, D. R.Digit. Discov.2026,5, 2052–2066, DOI: 10.1039/D5DD00363F

work page doi:10.1039/d5dd00363f 2026
[10]

Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation

Song, Y.; Gong, J.; Xu, M.; Cao, Z.; Lan, Y.; Ermon, S.; Zhou, H.; Ma, W.-Y. Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. InAdvances in Neural Information Processing Systems 36; Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; Levine, S., Eds.; Curran Associates, Inc., 2023; pp 549–568. URL: https://pro...

2023
[11]

DiGress: Discrete Denoising Diffusion for Graph Generation

Vignac, C.; Krawczuk, I.; Siraudin, A.; Wang, B.; Cevher, V.; Frossard, P. DiGress: Discrete Denoising Diffusion for Graph Generation. The Eleventh International Conference on Learning Representations, 2023; URL:https://openreview.net/forum?id=UaAD-Nu86WX

2023
[12]

B.; Arnold, A.; Zou, J.; Stokes, J

Swanson, K.; Liu, G.; Catacutan, D. B.; Arnold, A.; Zou, J.; Stokes, J. M.Nat. Mach. Intell.2024,6, 338–353, DOI: 10.1038/s42256-024-00809-7

work page doi:10.1038/s42256-024-00809-7 2024
[13]

P.; Liu, M.; Reidenbach, D.; Paliwal, S

Lee, S.; Kreis, K.; Veccham, S. P.; Liu, M.; Reidenbach, D.; Paliwal, S. G.; Nie, W.; Vahdat, A. Exploring Synthesizable Chemical Space with Iterative Pathway Refinements. The Fourteenth International Conference on Learning Representations, 2026; URL:https: //openreview.net/forum?id=aQKVfKOkR5

2026
[14]

Nature625, 7995 (01 Jan 2024), 468–475

Romera-Paredes, B.; Barekatain, M.; Novikov, A.; Balog, M.; Kumar, M. P.; Dupont, E.; Ruiz, F. J. R.; Ellenberg, J. S.; Wang, P.; Fawzi, O.; Kohli, P.; Fawzi, A.Nature2024,625, 468–475, DOI: 10.1038/s41586-023-06924-6

work page doi:10.1038/s41586-023-06924-6
[15]

Novikov, A. et al. AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery. arXiv, Version 1, June 16, 2025; DOI: 10.48550/arXiv.2506.13131

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.13131 2025
[16]

Kang and J

Kang, Y.; Kim, J.Nat. Commun.2024,15, 4705, DOI: 10.1038/s41467-024-48998-4

work page doi:10.1038/s41467-024-48998-4 2024
[17]

LLMatDesign: Autonomous Materials Discovery with Large Language Models

Jia, S.; Zhang, C.; Fung, V. LLMatDesign: Autonomous Materials Discovery with Large Language Models. arXiv, Version 1, June 19, 2024; DOI: 10.48550/arXiv.2406.13163

work page doi:10.48550/arxiv.2406.13163 2024
[18]

Esmaeil Zadeh, J

Luo, F.; Zhang, J.; Wang, Q.; Yang, C.ACS Cent. Sci.2025,11, 511–519, DOI: 10.1021/ac- scentsci.4c01935

work page doi:10.1021/ac- 2025
[19]

L.; Rampal, N.; Alawadhi, A

Zheng, Z.; Zhang, O.; Nguyen, H. L.; Rampal, N.; Alawadhi, A. H.; Rong, Z.; Head- Gordon, T.; Borgs, C.; Chayes, J. T.; Yaghi, O. M.ACS Cent. Sci.2023,9, 2161–2170, DOI: 10.1021/acscentsci.3c01087

work page doi:10.1021/acscentsci.3c01087 2023
[20]

ACS Cent

Lee, J.; Woo, J.; Kim, Y.; Kim, S.; Paulina, C.; Park, H.; Kim, H.-T.; Park, S.; Kim, J. ACS Cent. Sci.2026,12, 484–496, DOI: 10.1021/acscentsci.5c02433

work page doi:10.1021/acscentsci.5c02433 2026
[21]

S.; White, A

Caldas Ramos, M.; Michtavy, S. S.; White, A. D.; Porosoff, M. D.ACS Cent. Sci.2026, DOI: 10.1021/acscentsci.5c02418. 24

work page doi:10.1021/acscentsci.5c02418 2026
[22]

Abhyankar, N.; Kabra, S.; Desai, S.; Reddy, C. K. LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery. The Fourteenth International Conference on Learning Representations, 2026; URL:https://openreview.net/forum?id=TIqzhBvCNB

2026
[23]

T.; Tian, Y.; Tang, Y

Lange, R. T.; Tian, Y.; Tang, Y. Large Language Models as Evolution Strategies. In Proceedings of the Genetic and Evolutionary Computation Conference Companion; Li, X.; Handl, J., Eds.; ACM, 2024; pp 579–582. DOI: 10.1145/3638530.3654238

work page doi:10.1145/3638530.3654238 2024
[24]

Holland, J. H.Sci. Am.1992,267, 66–72, DOI: 10.1038/scientificamerican0792-66

work page doi:10.1038/scientificamerican0792-66 1992
[25]

Neural Inf

Bengio, E.; Jain, M.; Korablyov, M.; Precup, D.; Bengio, Y.Adv. Neural Inf. Process. Syst. 2021,34, 27381–27394, URL:https://papers.nips.cc/paper/2021/hash/e614f646836 aaed9f89ce58e837e2310-Abstract.html

2021
[26]

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Cretu, M.; Harris, C.; Igashov, I.; Schneuing, A.; Segler, M.; Correia, B.; Roy, J.; Bengio, E.; Liò, P. SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints. The Thirteenth International Conference on Learning Representations, 2025; URL:https: //openreview.net/forum?id=uvHmnahyp1

2025
[27]

A.; Morisseau, C.; Goodrow, M

Argiriadi, M. A.; Morisseau, C.; Goodrow, M. H.; Dowdy, D. L.; Hammock, B. D.; Chris- tianson, D. W.J. Biol. Chem.2000,275, 15265–15270, DOI: 10.1074/jbc.M000278200

work page doi:10.1074/jbc.m000278200 2000
[28]

A.; Morisseau, C.; Hammock, B

Gomez, G. A.; Morisseau, C.; Hammock, B. D.; Christianson, D. W.Protein Sci.2006,15, 58–64, DOI: 10.1110/ps.051720206

work page doi:10.1110/ps.051720206 2006
[29]

Kim, I.-H.; Tsai, H.-J.; Nishi, K.; Kasagami, T.; Morisseau, C.; Hammock, B. D.J. Med. Chem.2007,50, 5217–5226, DOI: 10.1021/jm070705c

work page doi:10.1021/jm070705c 2007
[30]

D.; Long, Y.-Q.J

Huang, S.-X.; Li, H.-Y.; Liu, J.-Y.; Morisseau, C.; Hammock, B. D.; Long, Y.-Q.J. Med. Chem.2010,53, 8376–8386, DOI: 10.1021/jm101087u

work page doi:10.1021/jm101087u 2010
[31]

Lee, K. S. S. et al.J. Med. Chem.2014,57, 7016–7030, DOI: 10.1021/jm500694p

work page doi:10.1021/jm500694p 2014
[32]

W.; Xiao, C.; Sun, J.; Zitnik, M

Huang, K.; Fu, T.; Gao, W.; Zhao, Y.; Roohani, Y.; Leskovec, J.; Coley, C. W.; Xiao, C.; Sun, J.; Zitnik, M. Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021; URL:https://openreview.net/for um?id=8nvgnORnoWr

2021
[33]

Gao, W.; Fu, T.; Sun, J.; Coley, C. W.Adv. Neural Inf. Process. Syst.2022,35, 21342– 21357, URL:https://proceedings.neurips.cc/paper_files/paper/2022/hash/86443 53f7d307baaf29bc1e56fe8e0ec-Abstract-Datasets_and_Benchmarks.html

2022
[34]

W.; Matusik, W

Sun, M.; Lo, A.; Guo, M.; Chen, J.; Coley, C. W.; Matusik, W. Procedural Synthesis of Syn- thesizable Molecules. The Thirteenth International Conference on Learning Representations, 2025; URL:https://openreview.net/forum?id=OGfyzExd69

2025
[35]

M.; Wang, Y.; Sawyer, J

Sun, K.; Bagni, D.; Cavanagh, J. M.; Wang, Y.; Sawyer, J. M.; Zhou, B.; Gritsevskiy, A.; Zhang, O.; Head-Gordon, T.ACS Cent. Sci.2025,11, 2108–2120, DOI: 10.1021/acs- centsci.5c01285

work page doi:10.1021/acs- 2025
[36]

Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

Li, T.; Hou, K.; Vinh, T.; Raj, M.; Guo, Z.; Yang, C. Reinforcement Learning with LLM- Guided Action Spaces for Synthesizable Lead Optimization. arXiv, Version 2, May 1, 2026; DOI: 10.48550/arXiv.2604.07669

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.07669 2026
[37]

Gottweis, J.; Weng, W.-H.; Daryin, A.; Tu, T.; Sirkovic, P.; Myaskovsky, A.; Glowaty, G.; Weissenberger, F.; Orlandi, A.; Natarajan, V.Nature2026, DOI: 10.1038/s41586-026-10644- y. 25

work page doi:10.1038/s41586-026-10644-
[38]

E.; Chang, B.; Mitchener, L.; Yiu, A.; Szostkiewicz, C

Ghareeb, A. E.; Chang, B.; Mitchener, L.; Yiu, A.; Szostkiewicz, C. J.; Shved, D.; Gy- imesi, G. J.; Laurent, J. M.; Wright, S. M.; Razzak, M. T.; White, A. D.; Finnemann, S. C.; Hinks, M. M.; Rodriques, S. G.Nature2026, DOI: 10.1038/s41586-026-10652-y

work page doi:10.1038/s41586-026-10652-y
[39]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Boiko, D. A.; MacKnight, R.; Kline, B.; Gomes, G.Nature2023,624, 570–578, DOI: 10.1038/s41586-023-06792-0. A. Objective and Scoring Details This appendix gives the implementation level scoring details that are omitted from the main Methods. The main text treats the objective as a blackbox fitness function; here we specify the normalization and scalar aggr...

work page doi:10.1038/s41586-023-06792-0

[1] [1]

S.; Reid, M.; Matsuo, Y.; Iwasawa, Y.Adv

Kojima, T.; Gu, S. S.; Reid, M.; Matsuo, Y.; Iwasawa, Y.Adv. Neural Inf. Process. Syst. 2022,35, 22199–22213

2022

[2] [2]

Program Synthesis with Large Language Models

Austin, J.; Odena, A.; Nye, M.; Bosma, M.; Michalewski, H.; Dohan, D.; Jiang, E.; Cai, C.; Terry, M.; Le, Q.; Sutton, C. Program Synthesis with Large Language Models. arXiv, Version 1, August 16, 2021; DOI: 10.48550/arXiv.2108.07732

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.07732 2021

[3] [3]

Chen, M. et al. Evaluating Large Language Models Trained on Code. arXiv, Version 2, July 14, 2021; DOI: 10.48550/arXiv.2107.03374

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374 2021

[4] [4]

Score-Based Generative Modeling through Stochastic Differential Equations

Song, Y.; Sohl-Dickstein, J.; Kingma, D. P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv, Version 1, February 10, 2020; DOI: 10.48550/arXiv.2011.13456

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2011.13456 2020

[5] [5]

Mater.2024,10, 273, DOI: 10.1038/s41524-024-01466-5

Qiu, H.; Sun, Z.-Y.npj Comput. Mater.2024,10, 273, DOI: 10.1038/s41524-024-01466-5

work page doi:10.1038/s41524-024-01466-5 2024

[6] [6]

Gao, W.; Luo, S.; Coley, C. W.Proc. Natl. Acad. Sci. U.S.A.2025,122, e2415665122, DOI: 10.1073/pnas.2415665122. 23

work page doi:10.1073/pnas.2415665122 2025

[7] [7]

M.; Ros, K.; Honke, G.; Cho, K.; Ji, H

Edwards, C.; Lai, T. M.; Ros, K.; Honke, G.; Cho, K.; Ji, H. Translation between Molecules and Natural Language. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; Goldberg, Y.; Kozareva, Z.; Zhang, Y., Eds.; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022; pp 375–413. DOI: 10.18653/v...

work page doi:10.18653/v1/2022.emnlp-main.26 2022

[8] [8]

G.; Vignac, C.; Welling, M

Hoogeboom, E.; Satorras, V. G.; Vignac, C.; Welling, M. Equivariant Diffusion for Molecule Generation in 3D. InProceedings of the 39th International Conference on Machine Learning; Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; Sabato, S., Eds.; PMLR, 2022; Proceedings of Machine Learning Research, Vol. 162, pp 8867–8887. URL:https: //proc...

2022

[9] [9]

Dunn, I.; Koes, D. R.Digit. Discov.2026,5, 2052–2066, DOI: 10.1039/D5DD00363F

work page doi:10.1039/d5dd00363f 2026

[10] [10]

Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation

Song, Y.; Gong, J.; Xu, M.; Cao, Z.; Lan, Y.; Ermon, S.; Zhou, H.; Ma, W.-Y. Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. InAdvances in Neural Information Processing Systems 36; Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; Levine, S., Eds.; Curran Associates, Inc., 2023; pp 549–568. URL: https://pro...

2023

[11] [11]

DiGress: Discrete Denoising Diffusion for Graph Generation

Vignac, C.; Krawczuk, I.; Siraudin, A.; Wang, B.; Cevher, V.; Frossard, P. DiGress: Discrete Denoising Diffusion for Graph Generation. The Eleventh International Conference on Learning Representations, 2023; URL:https://openreview.net/forum?id=UaAD-Nu86WX

2023

[12] [12]

B.; Arnold, A.; Zou, J.; Stokes, J

Swanson, K.; Liu, G.; Catacutan, D. B.; Arnold, A.; Zou, J.; Stokes, J. M.Nat. Mach. Intell.2024,6, 338–353, DOI: 10.1038/s42256-024-00809-7

work page doi:10.1038/s42256-024-00809-7 2024

[13] [13]

P.; Liu, M.; Reidenbach, D.; Paliwal, S

Lee, S.; Kreis, K.; Veccham, S. P.; Liu, M.; Reidenbach, D.; Paliwal, S. G.; Nie, W.; Vahdat, A. Exploring Synthesizable Chemical Space with Iterative Pathway Refinements. The Fourteenth International Conference on Learning Representations, 2026; URL:https: //openreview.net/forum?id=aQKVfKOkR5

2026

[14] [14]

Nature625, 7995 (01 Jan 2024), 468–475

Romera-Paredes, B.; Barekatain, M.; Novikov, A.; Balog, M.; Kumar, M. P.; Dupont, E.; Ruiz, F. J. R.; Ellenberg, J. S.; Wang, P.; Fawzi, O.; Kohli, P.; Fawzi, A.Nature2024,625, 468–475, DOI: 10.1038/s41586-023-06924-6

work page doi:10.1038/s41586-023-06924-6

[15] [15]

Novikov, A. et al. AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery. arXiv, Version 1, June 16, 2025; DOI: 10.48550/arXiv.2506.13131

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.13131 2025

[16] [16]

Kang and J

Kang, Y.; Kim, J.Nat. Commun.2024,15, 4705, DOI: 10.1038/s41467-024-48998-4

work page doi:10.1038/s41467-024-48998-4 2024

[17] [17]

LLMatDesign: Autonomous Materials Discovery with Large Language Models

Jia, S.; Zhang, C.; Fung, V. LLMatDesign: Autonomous Materials Discovery with Large Language Models. arXiv, Version 1, June 19, 2024; DOI: 10.48550/arXiv.2406.13163

work page doi:10.48550/arxiv.2406.13163 2024

[18] [18]

Esmaeil Zadeh, J

Luo, F.; Zhang, J.; Wang, Q.; Yang, C.ACS Cent. Sci.2025,11, 511–519, DOI: 10.1021/ac- scentsci.4c01935

work page doi:10.1021/ac- 2025

[19] [19]

L.; Rampal, N.; Alawadhi, A

Zheng, Z.; Zhang, O.; Nguyen, H. L.; Rampal, N.; Alawadhi, A. H.; Rong, Z.; Head- Gordon, T.; Borgs, C.; Chayes, J. T.; Yaghi, O. M.ACS Cent. Sci.2023,9, 2161–2170, DOI: 10.1021/acscentsci.3c01087

work page doi:10.1021/acscentsci.3c01087 2023

[20] [20]

ACS Cent

Lee, J.; Woo, J.; Kim, Y.; Kim, S.; Paulina, C.; Park, H.; Kim, H.-T.; Park, S.; Kim, J. ACS Cent. Sci.2026,12, 484–496, DOI: 10.1021/acscentsci.5c02433

work page doi:10.1021/acscentsci.5c02433 2026

[21] [21]

S.; White, A

Caldas Ramos, M.; Michtavy, S. S.; White, A. D.; Porosoff, M. D.ACS Cent. Sci.2026, DOI: 10.1021/acscentsci.5c02418. 24

work page doi:10.1021/acscentsci.5c02418 2026

[22] [22]

Abhyankar, N.; Kabra, S.; Desai, S.; Reddy, C. K. LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery. The Fourteenth International Conference on Learning Representations, 2026; URL:https://openreview.net/forum?id=TIqzhBvCNB

2026

[23] [23]

T.; Tian, Y.; Tang, Y

Lange, R. T.; Tian, Y.; Tang, Y. Large Language Models as Evolution Strategies. In Proceedings of the Genetic and Evolutionary Computation Conference Companion; Li, X.; Handl, J., Eds.; ACM, 2024; pp 579–582. DOI: 10.1145/3638530.3654238

work page doi:10.1145/3638530.3654238 2024

[24] [24]

Holland, J. H.Sci. Am.1992,267, 66–72, DOI: 10.1038/scientificamerican0792-66

work page doi:10.1038/scientificamerican0792-66 1992

[25] [25]

Neural Inf

Bengio, E.; Jain, M.; Korablyov, M.; Precup, D.; Bengio, Y.Adv. Neural Inf. Process. Syst. 2021,34, 27381–27394, URL:https://papers.nips.cc/paper/2021/hash/e614f646836 aaed9f89ce58e837e2310-Abstract.html

2021

[26] [26]

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Cretu, M.; Harris, C.; Igashov, I.; Schneuing, A.; Segler, M.; Correia, B.; Roy, J.; Bengio, E.; Liò, P. SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints. The Thirteenth International Conference on Learning Representations, 2025; URL:https: //openreview.net/forum?id=uvHmnahyp1

2025

[27] [27]

A.; Morisseau, C.; Goodrow, M

Argiriadi, M. A.; Morisseau, C.; Goodrow, M. H.; Dowdy, D. L.; Hammock, B. D.; Chris- tianson, D. W.J. Biol. Chem.2000,275, 15265–15270, DOI: 10.1074/jbc.M000278200

work page doi:10.1074/jbc.m000278200 2000

[28] [28]

A.; Morisseau, C.; Hammock, B

Gomez, G. A.; Morisseau, C.; Hammock, B. D.; Christianson, D. W.Protein Sci.2006,15, 58–64, DOI: 10.1110/ps.051720206

work page doi:10.1110/ps.051720206 2006

[29] [29]

Kim, I.-H.; Tsai, H.-J.; Nishi, K.; Kasagami, T.; Morisseau, C.; Hammock, B. D.J. Med. Chem.2007,50, 5217–5226, DOI: 10.1021/jm070705c

work page doi:10.1021/jm070705c 2007

[30] [30]

D.; Long, Y.-Q.J

Huang, S.-X.; Li, H.-Y.; Liu, J.-Y.; Morisseau, C.; Hammock, B. D.; Long, Y.-Q.J. Med. Chem.2010,53, 8376–8386, DOI: 10.1021/jm101087u

work page doi:10.1021/jm101087u 2010

[31] [31]

Lee, K. S. S. et al.J. Med. Chem.2014,57, 7016–7030, DOI: 10.1021/jm500694p

work page doi:10.1021/jm500694p 2014

[32] [32]

W.; Xiao, C.; Sun, J.; Zitnik, M

Huang, K.; Fu, T.; Gao, W.; Zhao, Y.; Roohani, Y.; Leskovec, J.; Coley, C. W.; Xiao, C.; Sun, J.; Zitnik, M. Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021; URL:https://openreview.net/for um?id=8nvgnORnoWr

2021

[33] [33]

Gao, W.; Fu, T.; Sun, J.; Coley, C. W.Adv. Neural Inf. Process. Syst.2022,35, 21342– 21357, URL:https://proceedings.neurips.cc/paper_files/paper/2022/hash/86443 53f7d307baaf29bc1e56fe8e0ec-Abstract-Datasets_and_Benchmarks.html

2022

[34] [34]

W.; Matusik, W

Sun, M.; Lo, A.; Guo, M.; Chen, J.; Coley, C. W.; Matusik, W. Procedural Synthesis of Syn- thesizable Molecules. The Thirteenth International Conference on Learning Representations, 2025; URL:https://openreview.net/forum?id=OGfyzExd69

2025

[35] [35]

M.; Wang, Y.; Sawyer, J

Sun, K.; Bagni, D.; Cavanagh, J. M.; Wang, Y.; Sawyer, J. M.; Zhou, B.; Gritsevskiy, A.; Zhang, O.; Head-Gordon, T.ACS Cent. Sci.2025,11, 2108–2120, DOI: 10.1021/acs- centsci.5c01285

work page doi:10.1021/acs- 2025

[36] [36]

Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

Li, T.; Hou, K.; Vinh, T.; Raj, M.; Guo, Z.; Yang, C. Reinforcement Learning with LLM- Guided Action Spaces for Synthesizable Lead Optimization. arXiv, Version 2, May 1, 2026; DOI: 10.48550/arXiv.2604.07669

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.07669 2026

[37] [37]

Gottweis, J.; Weng, W.-H.; Daryin, A.; Tu, T.; Sirkovic, P.; Myaskovsky, A.; Glowaty, G.; Weissenberger, F.; Orlandi, A.; Natarajan, V.Nature2026, DOI: 10.1038/s41586-026-10644- y. 25

work page doi:10.1038/s41586-026-10644-

[38] [38]

E.; Chang, B.; Mitchener, L.; Yiu, A.; Szostkiewicz, C

Ghareeb, A. E.; Chang, B.; Mitchener, L.; Yiu, A.; Szostkiewicz, C. J.; Shved, D.; Gy- imesi, G. J.; Laurent, J. M.; Wright, S. M.; Razzak, M. T.; White, A. D.; Finnemann, S. C.; Hinks, M. M.; Rodriques, S. G.Nature2026, DOI: 10.1038/s41586-026-10652-y

work page doi:10.1038/s41586-026-10652-y

[39] [39]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Boiko, D. A.; MacKnight, R.; Kline, B.; Gomes, G.Nature2023,624, 570–578, DOI: 10.1038/s41586-023-06792-0. A. Objective and Scoring Details This appendix gives the implementation level scoring details that are omitted from the main Methods. The main text treats the objective as a blackbox fitness function; here we specify the normalization and scalar aggr...

work page doi:10.1038/s41586-023-06792-0