MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models

Ismail Ben Ayed; Maxime Darrin; Pablo Piantanida; Philippe Formont

arxiv: 2603.18256 · v2 · submitted 2026-03-18 · 💻 cs.LG · cs.AI

MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models

Philippe Formont , Maxime Darrin , Ismail Ben Ayed , Pablo Piantanida This is my paper

Pith reviewed 2026-05-15 09:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords de novo molecular generationreasoning LLMsmolecular verifierdocking scoresGRPOmulti-objective optimizationdiversity metricreinforcement learning

0 comments

The pith

MolRGen supplies a real-time verifier so reasoning LLMs can generate molecules from scratch using docking and property rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a new training and evaluation setting called MolRGen for reasoning large language models on de novo molecular generation. It supplies roughly 4,500 protein targets that produce 50,000 multi-objective prompts, each scored by a verifier that calculates docking results and properties such as QED, synthetic accessibility, and logP without any reference molecule. Models propose structures directly, receive immediate rewards, and can be improved through reinforcement learning. The authors benchmark several open-source LLMs and then fine-tune a 128B model with GRPO to demonstrate measurable gains on the benchmark while documenting a resulting loss in molecular diversity. This setup creates a scalable testbed where verifiable outcomes guide step-by-step reasoning toward novel compounds.

Core claim

MolRGen is a benchmark and molecular verifier containing approximately 4,500 protein-pocket targets that yield 50k multi-objective optimization prompts. The verifier computes docking scores together with molecular properties at generation time, enabling training and evaluation of reasoning LLMs on molecules proposed entirely from scratch. Benchmarking of general and chemistry-specialized models reveals performance differences, and fine-tuning a 128B LLM via GRPO produces improved scores at the expense of a diversity-exploitation trade-off. The framework supports study of verifier-based reasoning and reinforcement learning in molecular design.

What carries the argument

The MolRGen molecular verifier, which evaluates each generated molecule in real time by running docking simulations and calculating property scores to supply rewards for reinforcement learning without reference structures.

If this is right

Reasoning LLMs can be trained to optimize multiple objectives at once through immediate verifier feedback during generation.
A diversity-aware top-k metric quantifies whether high-scoring outputs come from structurally varied molecules.
GRPO fine-tuning on the verifier improves benchmark scores for a 128B model.
The observed diversity-exploitation trade-off appears when models focus on maximizing verifier rewards.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The verifier approach could be applied to other design domains where outcomes are computable but difficult to specify in natural language alone.
Expanding the set of protein targets would allow tests of whether the same reasoning patterns generalize across unrelated biological systems.
Closing the loop with periodic laboratory measurements of top-scoring molecules would show how well the computational rewards predict actual success.

Load-bearing premise

Docking scores and computed molecular properties provide a reliable proxy for real-world binding affinity and synthesizability when molecules are generated without any reference compounds.

What would settle it

An experiment that synthesizes and tests the binding affinity of a set of molecules proposed by the fine-tuned model versus those from the base model would directly test whether the reported performance gains hold in the laboratory.

Figures

Figures reproduced from arXiv: 2603.18256 by Ismail Ben Ayed, Maxime Darrin, Pablo Piantanida, Philippe Formont.

**Figure 2.** Figure 2: Diversity-aware top-k score. Evaluation of the diversity-aware top-k score (y-axis) against varying similarity thresholds (x-axis) between candidate clusters. performed the evaluation of RL-Mistral on these tasks, although it is worth noting, that the model has only seen 10% of the training set of these tasks during its training (see details in Appendix E). Regression Tasks. Overall, all models struggle to… view at source ↗

**Figure 3.** Figure 3: Property prediction performances. Accuracy of the LLMs on classification tasks (left), and normalized Spearman correlation on regression tasks (right). the chemical space. We evaluated a range of open-source large language models and showed that, on de novo molecular generation tasks, some reasoning-oriented LLMs can achieve performance comparable to chemically specialized models (not trained on de novo ge… view at source ↗

**Figure 4.** Figure 4: Overview of the target proteins. (a) Function of the proteins extracted from the PDB, our dataset comprises 21 molecular functions with at least 10 targets, the majority of which are kinases (30%). (b) Annotation score of the proteins on UniProt (from 1 to 5). The vast majority of the target proteins are high quality protein with strong evidence on their existence. C Molecular Property Prediction Data Crea… view at source ↗

**Figure 5.** Figure 5: Task sizes in the molecular property prediction objectives. The vast majority of tasks consist of regression tasks, and the largest benchmark used is the TDC benchmark. 0 50 100 150 200 250 300 Count origin novartis tdcommons polaris asap-discovery biogen [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Scaffold occurrence in the various benchmarks. Occurrences of the most frequent Murcko scaffolds (of at least 6 atoms) in each benchmark, illustrating the chemical diversity across tasks. scaffold patterns, indicating that the dataset covers chemically diverse molecular spaces rather than being biased towards a single scaffold class. However, data extracted from the asa-discovery dataset are mainly centere… view at source ↗

**Figure 7.** Figure 7: Overview of the molecular reaction dataset generation pipeline. Iterative stochastic process of synthesis generation: initialization with seed reactions, relaxed filtering for early steps, property filtering for later steps, probabilistic product selection, and chain extension up to 5 reaction steps. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 9.** Figure 9: Frequency and chemical diversity of reaction templates. D.3.2 Reaction Template Analysis The reaction templates form the core vocabulary of the synthesis dataset. We examine both the frequency distribution and chemical diversity of the SMARTS patterns used during generation in [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Training curves of RL-Mistral. Evolution of the average reward (left) and average completion length (right) during training. E.1 Loss Function We use the Group Relative Policy Optimization (GRPO) [Shao et al., 2024] with the modifications introduced by Magistral [Mistral-AI et al., 2025] to the loss function. For each prompt q, we generate G completions {oi} G i=1 with the policy πθold , and compute their… view at source ↗

**Figure 11.** Figure 11: Validity of the generated completions. Description of the validity of the generated completions. Generations can be invalid due to no answer being generated in the expected format, no SMILES being parsed in the answer, no valid SMILES or multiple SMILES being proposed. 25 50 75 100 nr 0.0 0.2 0.4 0.6 0.8 1.0 Uniqueness-Prompt-wise ChemDFM-R ChemDFM-v2.0 RL-Mistral RL-Mistral-100 ether0 MiniMax-M2 Qwen3 Qw… view at source ↗

**Figure 12.** Figure 12: Uniqueness and diversity evolution with the number of rollouts. We display the uniqueness (left) and diversity (right) of the generated molecules with respect to the number of rollouts. The figure at the center displays the average number of prompts a given molecule appears in. • Most models struggle generating valid completionsm and only a few models manage to generate more than 80% valid completions. • … view at source ↗

**Figure 13.** Figure 13: Evolution of the top-k score with the number of rollouts. Evolution of the top-k score as we sample more molecules per prompt. The x-axis represents the number of rollouts divided by the value of k. F.2 Diversity and Uniqueness of the generated molecules [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Diverity-aware top-k score for different fingerprints. We display the diversity-aware metric when the similarity between molecues is based on: ECFP, MACCS, Gobbi2d, MACCS, and Avalon fingerprints. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗

**Figure 15.** Figure 15 [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗

**Figure 16.** Figure 16: Ether0 refusal. Representative examples where Ether0 refuses to generate a molecule, interpreting property-optimization, or prediction instructions as requests to produce harmful substances. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_16.png] view at source ↗

read the original abstract

Recent reasoning-based large language models have shown strong performance on tasks with verifiable outcomes, but their use in de novo molecular generation remains limited by the lack of training environments where rewards can be computed without reference molecules. We introduce MolRGen, a benchmark and molecular verifier for training and evaluating reasoning LLMs on de novo molecular generation. MolRGen contains approximately 4,500 protein-pocket targets, resulting in 50k multi-objective optimization prompts combining docking scores with molecular properties such as QED, synthetic accessibility, logP, and physicochemical descriptors. Unlike caption-based generation or molecule-editing benchmarks, MolRGen evaluates molecules proposed from scratch by computing rewards at generation time. We benchmark general-purpose and chemistry-specialized open-source LLMs and introduce a diversity-aware top-k metric to measure whether models can generate a diverse set of high-scoring molecules. Finally, we use the verifier to fine-tune a 128B LLM with GRPO, showing improved performance, at the cost of a diversity-exploitation trade-off. MolRGen provides a scalable testbed for studying verifier-based reasoning and reinforcement learning in molecular design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MolRGen sets up a closed in-silico loop for training reasoning LLMs on de novo molecule design with on-the-fly verifier rewards, and the GRPO fine-tuning demo shows the expected performance-diversity trade-off inside that loop.

read the letter

The main thing to know is that this paper gives a practical benchmark where you generate molecules from scratch, score them immediately with docking plus QED/SA/logP, and close the loop for reinforcement learning without needing reference structures. That fills a real gap for reasoning models, which usually need verifiable outcomes. They built it from 4500 protein pockets into 50k prompts, benchmarked a range of open LLMs, added a diversity-aware top-k metric, and ran GRPO on a 128B model to show score gains at the expense of diversity. The construction is internally consistent and the verifier-based training works as described. What they do well is keep everything scalable and explicit: rewards are computed at generation time, the prompts are multi-objective, and they actually ship the fine-tuning result instead of just proposing the setup. The diversity metric is a reasonable addition to avoid trivial high-score repetition. Soft spots are mostly about scope rather than errors. All gains are measured against the same verifier, so the lift is real inside the benchmark but stays silent on whether those molecules would hold up in wet-lab assays or external docking tools. The abstract gives no error analysis or stability checks on the docking scores, and the target selection process for the 4500 pockets is not detailed here. The trade-off is noted but not explored with ablations on how much diversity you can recover without losing score. No head-to-head with non-LLM generative models either. This is aimed at groups working on LLM reasoning for molecular design or computational chemistry toolkits. The benchmark itself looks adoptable, so the paper deserves a serious referee even if the fine-tuning part is mainly a demonstration. I'd send it out for review.

Referee Report

2 major / 2 minor

Summary. The paper introduces MolRGen, a benchmark and molecular verifier for training and evaluating reasoning LLMs on de novo molecular generation. It features approximately 4,500 protein-pocket targets leading to 50k multi-objective prompts involving docking scores, QED, SA, logP, and descriptors. The work benchmarks LLMs, proposes a diversity-aware top-k metric, and shows that fine-tuning a 128B LLM with GRPO using the verifier improves performance, albeit with a diversity-exploitation trade-off.

Significance. If the reported improvements hold and the verifier provides a meaningful proxy, this establishes a valuable testbed for verifier-based RL in molecular design, potentially accelerating the application of reasoning models to chemistry by providing verifiable rewards without reference structures. The introduction of the diversity metric is a positive step toward balanced generation.

major comments (2)

[Fine-tuning results] The claim that GRPO fine-tuning leads to improved performance is central but lacks specific quantitative evidence such as pre- and post-fine-tuning scores on docking, QED, or the top-k metric, as well as details on the number of training steps or reward curves; this undermines assessment of the practical utility.
[Verifier and benchmark construction] The multi-objective reward computation is described at a high level; the paper should specify the exact aggregation method (e.g., weighted sum, Pareto optimization) and any validation against known molecular datasets to ensure the scores are not arbitrary.

minor comments (2)

The title contains 'Reasonning' which is likely a misspelling of 'Reasoning'.
[Abstract] The abstract could benefit from a brief mention of the scale of the benchmark (e.g., number of molecules generated or evaluation protocol) for better context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to provide the requested details and clarifications.

read point-by-point responses

Referee: [Fine-tuning results] The claim that GRPO fine-tuning leads to improved performance is central but lacks specific quantitative evidence such as pre- and post-fine-tuning scores on docking, QED, or the top-k metric, as well as details on the number of training steps or reward curves; this undermines assessment of the practical utility.

Authors: We agree that quantitative details are necessary to substantiate the fine-tuning claims. In the revised version, we will add a dedicated results table reporting pre- and post-GRPO scores on docking, QED, SA, logP, and the diversity-aware top-k metric. We will also include the number of training steps, training reward curves, and any relevant hyperparameters to allow full assessment of the improvements and the noted diversity-exploitation trade-off. revision: yes
Referee: [Verifier and benchmark construction] The multi-objective reward computation is described at a high level; the paper should specify the exact aggregation method (e.g., weighted sum, Pareto optimization) and any validation against known molecular datasets to ensure the scores are not arbitrary.

Authors: We acknowledge that the aggregation method requires explicit specification. In the revision, we will detail that the multi-objective reward is computed as a weighted sum of normalized individual scores (docking, QED, SA, logP, and physicochemical descriptors) with weights chosen to balance the objectives. We will also add a validation section comparing the verifier outputs against established datasets (e.g., known active compounds from PDBbind or ChEMBL) to demonstrate that the scores align with expected trends and are not arbitrary. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces a new benchmark (MolRGen) and verifier that defines multi-objective rewards from docking scores, QED, SA, logP and descriptors computed at generation time without reference molecules. It then benchmarks LLMs on this setting and applies GRPO fine-tuning to maximize the same verifier signal, reporting the resulting performance lift and diversity trade-off. This is an expected empirical outcome of closed-loop optimization on a self-defined reward rather than a claimed first-principles derivation that reduces to its inputs by construction. No load-bearing step matches any enumerated circularity pattern; the work is self-contained with newly introduced data, prompts and evaluation metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark relies on established computational chemistry methods for docking and property calculation, with no new free parameters or entities introduced in the abstract.

axioms (1)

domain assumption Docking scores and molecular properties like QED, synthetic accessibility, and logP can be reliably computed for any proposed molecule
This underpins the reward computation in the verifier at generation time.

pith-pipeline@v0.9.0 · 5506 in / 1243 out tokens · 44005 ms · 2026-05-15T09:25:30.404024+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

r(q,ô) = (∏ r(s)({νi,ρi,xi,σi},ô))^(1/nprops) ... diversity-aware top-k with Tanimoto smax constraint
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery from Law of Logic unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GRPO loss on 49k de-novo prompts; top-1 improves but diversity collapses

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

[1]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

URLhttps://arxiv.org/abs/2501.12948. Vishal Dey, Xiao Hu, and Xia Ning. Gellmo: Generalizing large language models for multi-property molecule optimization, 2025. URLhttps://arxiv.org/abs/2502.13398. Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1186/1758-2946-1-8 2025
[2]

doi: 10.1038/s41598-025-99785-0

ISSN 2045-2322. doi: 10.1038/s41598-025-99785-0. URL https://www.nature.com/ articles/s41598-025-99785-0. Publisher: Nature Publishing Group. Nafisa M. Hassan, Amr A. Alhossary, Yuguang Mu, and Chee-Keong Kwoh. Protein-Ligand Blind Docking Using QuickVina-W With Inter-Process Spatio-Temporal Integration.Scientific Reports, 7(1):15451, November 2017. ISSN ...

work page doi:10.1038/s41598-025-99785-0 2045
[3]

Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian

URLhttps://arxiv.org/abs/2508.08401. Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian. Towards 3d molecule-text interpretation in language models, 2024. URL https://arxiv.org/abs/2401.13923. Hannes H. Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey V oronov, Lewis H. Mervin, and Ola En...

work page doi:10.1186/s13321-024-00812-5 2024
[4]

Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang

URLhttps://arxiv.org/abs/2402.09391. Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang. Does reinforcement learning really incentivize reasoning capacity in llms beyond the base model? arXiv preprint arXiv:2504.13837, 2025. 12 Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Yi Xia, Bo Chen, Hongshen Xu, Ziche...

work page doi:10.1016/j.xcrp.2025.102523 2025
[5]

Loads a set of property definitions, docking targets, and pocket metadata from the data directory

work page
[6]

Uses a rule-based prompt generator to sample multi-objective molecular-generation prompts

work page
[7]

Stores the prompts and their metadata in two formats: a JSONL file and a HuggingFace dataset. B.1 Per-prompt sampling loop (inner generator) For each prompt, the following steps are executed: • Property Selection:The number of properties, nprops, is sampled from a probability distri- bution, ensuring that the selection adheres to the constraints defined b...

work page
[8]

Per-sequence potency filter: for each protein sequence, keep only structures whose measured ligand potency (pIC50) is in the top 50% for that sequence

work page
[9]

Per-sequence confidence filter: among the retained structures, keep only those with a confidence score in the top 50% of the retained set. 17 This double-filter yields, for each sequence, a subset of CIF files whose ligands are both potent and associated with high-confidence measurements; these files form the input for pocket detection. Pocket identificat...

work page
[10]

Parse the CIF using Biopython’sMMCIFParserand select the first model

work page
[11]

standard residues

For each ligand atom, compute distances to all protein atom coordinates (atoms whose residue id flag equals the blank flag for “standard residues”). Select thetop-kclosest residues for each ligand atom (k= 3 ). The union of these residues across all ligand atoms forms the pocket residue set for that CIF. Aggregation across conformations (IoU clustering)Ma...

work page
[12]

For each member structure, extract atomic coordinates for the residues in the aggregated pocket

work page
[13]

Compute pairwise RMSD values (using Biopython) between all structures restricted to the pocket residues

work page
[14]

The chosen structure is then written as a PDB file (ligand removed)

Aggregate pairwise RMSD values into a matrix and select the structure with the smallest mean RMSD relative to the others as the best conformation. The chosen structure is then written as a PDB file (ligand removed). 18 (a) (b) Figure 4:Overview of the target proteins.(a) Function of the proteins extracted from the PDB, our dataset comprises 21 molecular f...

work page 2025
[15]

Initialization: Select a random seed reaction and identify available reactants via the com- patibility matrix

work page
[16]

Relaxed Filtering for Early Steps: For multi-step syntheses (i.e., when the total number of steps nsteps >1 ), we randomly sample a number of initial stepsnnf ∼ U {0,⌊(n steps +1)/2⌋} allowed to produce molecules with abnormal properties for drug-like compounds, and products are selected by randomly selecting one allowed reaction given the previous produc...

work page
[17]

Probabilistic Product Selection: After the no-filter steps (i.e., for steps i > n nf), property- based filtering is re-enabled. For each valid product, we compute a probability score based on a target distribution over molecular properties (QED, molecular weight, TPSA, H-bond donors/acceptors, rotatable bonds, aromatic rings). Products are selected propor...

work page
[18]

Final Product: Predict the final product of a multi-step synthesis given all reaction SMARTS 2.Reactant Prediction: Identify a missing reactant for a single synthesis step

work page
[19]

All Reactants: Given a reaction SMARTS and target product, predict all required reactants

work page
[20]

Building Block Constrained: All reactants task with molecules restricted to a provided set 5.SMARTS Identification: Predict the SMARTS representation for a reaction step 6.Full Synthesis Path: Generate a multi-step synthesis pathway to a target molecule

work page
[21]

Path with Building Block Reference: Synthesis design constrained to a provided set of building blocks 8.Path with SMARTS Reference: Synthesis design using only reactions from a curated set

work page
[22]

Path with Both References: Full pathway design under both building block and reaction constraints

work page
[23]

No building blocks or reaction templates are provided, requiring the model to identify appropriate reactants autonomously

Path with Intermediate Products: Given a target molecule and ashuffledlist of interme- diate products (i.e., all products of the synthesis route except the final one), determine the correct ordering of intermediates and provide the full synthesis route, including the reactants for each step. No building blocks or reaction templates are provided, requiring...

work page
[24]

impossible

Path with Intermediate Products and Building Blocks: Same as the previous task, but the model is additionally provided with a set of commercially available building blocks (containing the ground-truth reactants mixed with random distractors) to select from when constructing the synthesis route. Each prompt is formatted with a system message establishing c...

work page 2000

[1] [1]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

URLhttps://arxiv.org/abs/2501.12948. Vishal Dey, Xiao Hu, and Xia Ning. Gellmo: Generalizing large language models for multi-property molecule optimization, 2025. URLhttps://arxiv.org/abs/2502.13398. Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1186/1758-2946-1-8 2025

[2] [2]

doi: 10.1038/s41598-025-99785-0

ISSN 2045-2322. doi: 10.1038/s41598-025-99785-0. URL https://www.nature.com/ articles/s41598-025-99785-0. Publisher: Nature Publishing Group. Nafisa M. Hassan, Amr A. Alhossary, Yuguang Mu, and Chee-Keong Kwoh. Protein-Ligand Blind Docking Using QuickVina-W With Inter-Process Spatio-Temporal Integration.Scientific Reports, 7(1):15451, November 2017. ISSN ...

work page doi:10.1038/s41598-025-99785-0 2045

[3] [3]

Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian

URLhttps://arxiv.org/abs/2508.08401. Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian. Towards 3d molecule-text interpretation in language models, 2024. URL https://arxiv.org/abs/2401.13923. Hannes H. Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey V oronov, Lewis H. Mervin, and Ola En...

work page doi:10.1186/s13321-024-00812-5 2024

[4] [4]

Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang

URLhttps://arxiv.org/abs/2402.09391. Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang. Does reinforcement learning really incentivize reasoning capacity in llms beyond the base model? arXiv preprint arXiv:2504.13837, 2025. 12 Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Yi Xia, Bo Chen, Hongshen Xu, Ziche...

work page doi:10.1016/j.xcrp.2025.102523 2025

[5] [5]

Loads a set of property definitions, docking targets, and pocket metadata from the data directory

work page

[6] [6]

Uses a rule-based prompt generator to sample multi-objective molecular-generation prompts

work page

[7] [7]

Stores the prompts and their metadata in two formats: a JSONL file and a HuggingFace dataset. B.1 Per-prompt sampling loop (inner generator) For each prompt, the following steps are executed: • Property Selection:The number of properties, nprops, is sampled from a probability distri- bution, ensuring that the selection adheres to the constraints defined b...

work page

[8] [8]

Per-sequence potency filter: for each protein sequence, keep only structures whose measured ligand potency (pIC50) is in the top 50% for that sequence

work page

[9] [9]

Per-sequence confidence filter: among the retained structures, keep only those with a confidence score in the top 50% of the retained set. 17 This double-filter yields, for each sequence, a subset of CIF files whose ligands are both potent and associated with high-confidence measurements; these files form the input for pocket detection. Pocket identificat...

work page

[10] [10]

Parse the CIF using Biopython’sMMCIFParserand select the first model

work page

[11] [11]

standard residues

For each ligand atom, compute distances to all protein atom coordinates (atoms whose residue id flag equals the blank flag for “standard residues”). Select thetop-kclosest residues for each ligand atom (k= 3 ). The union of these residues across all ligand atoms forms the pocket residue set for that CIF. Aggregation across conformations (IoU clustering)Ma...

work page

[12] [12]

For each member structure, extract atomic coordinates for the residues in the aggregated pocket

work page

[13] [13]

Compute pairwise RMSD values (using Biopython) between all structures restricted to the pocket residues

work page

[14] [14]

The chosen structure is then written as a PDB file (ligand removed)

Aggregate pairwise RMSD values into a matrix and select the structure with the smallest mean RMSD relative to the others as the best conformation. The chosen structure is then written as a PDB file (ligand removed). 18 (a) (b) Figure 4:Overview of the target proteins.(a) Function of the proteins extracted from the PDB, our dataset comprises 21 molecular f...

work page 2025

[15] [15]

Initialization: Select a random seed reaction and identify available reactants via the com- patibility matrix

work page

[16] [16]

Relaxed Filtering for Early Steps: For multi-step syntheses (i.e., when the total number of steps nsteps >1 ), we randomly sample a number of initial stepsnnf ∼ U {0,⌊(n steps +1)/2⌋} allowed to produce molecules with abnormal properties for drug-like compounds, and products are selected by randomly selecting one allowed reaction given the previous produc...

work page

[17] [17]

Probabilistic Product Selection: After the no-filter steps (i.e., for steps i > n nf), property- based filtering is re-enabled. For each valid product, we compute a probability score based on a target distribution over molecular properties (QED, molecular weight, TPSA, H-bond donors/acceptors, rotatable bonds, aromatic rings). Products are selected propor...

work page

[18] [18]

Final Product: Predict the final product of a multi-step synthesis given all reaction SMARTS 2.Reactant Prediction: Identify a missing reactant for a single synthesis step

work page

[19] [19]

All Reactants: Given a reaction SMARTS and target product, predict all required reactants

work page

[20] [20]

Building Block Constrained: All reactants task with molecules restricted to a provided set 5.SMARTS Identification: Predict the SMARTS representation for a reaction step 6.Full Synthesis Path: Generate a multi-step synthesis pathway to a target molecule

work page

[21] [21]

Path with Building Block Reference: Synthesis design constrained to a provided set of building blocks 8.Path with SMARTS Reference: Synthesis design using only reactions from a curated set

work page

[22] [22]

Path with Both References: Full pathway design under both building block and reaction constraints

work page

[23] [23]

No building blocks or reaction templates are provided, requiring the model to identify appropriate reactants autonomously

Path with Intermediate Products: Given a target molecule and ashuffledlist of interme- diate products (i.e., all products of the synthesis route except the final one), determine the correct ordering of intermediates and provide the full synthesis route, including the reactants for each step. No building blocks or reaction templates are provided, requiring...

work page

[24] [24]

impossible

Path with Intermediate Products and Building Blocks: Same as the previous task, but the model is additionally provided with a set of commercially available building blocks (containing the ground-truth reactants mixed with random distractors) to select from when constructing the synthesis route. Each prompt is formatted with a system message establishing c...

work page 2000