Fine-tuning Pocket-Aware Diffusion Models via Denoising Policy Optimization

Daniel Kudenko; Megha Khosla; Yuan Xue

arxiv: 2605.17693 · v1 · pith:AO4RQVZInew · submitted 2026-05-17 · 💻 cs.LG · cs.AI

Fine-tuning Pocket-Aware Diffusion Models via Denoising Policy Optimization

Yuan Xue , Daniel Kudenko , Megha Khosla This is my paper

Pith reviewed 2026-05-20 13:26 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords pocket-aware diffusionstructure-based molecule optimizationdenoising policy optimizationreinforcement learning for moleculesbinding affinity optimizationmulti-property molecule generationCrossDocked2020 benchmark

0 comments

The pith

Modeling the denoising trajectory as sequential decisions lets reinforcement learning fine-tune pocket-aware diffusion models to optimize multiple drug properties at once.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DEPPA, which applies denoising diffusion policy optimization to fine-tune a pre-trained pocket-aware diffusion model. It treats each step of the reverse denoising process as part of a multi-step decision sequence and assigns rewards based on how well the finished ligand molecule meets desired properties such as binding strength and drug-likeness. This setup lets the model move beyond simply matching a training distribution toward satisfying several practical requirements at the same time. A sympathetic reader would care because structure-based drug design needs molecules that bind tightly to a target pocket while also being drug-like, diverse, and reasonably easy to make.

Core claim

DEPPA formulates the reverse denoising process of the pretrained pocket-aware diffusion model as a multi-step Markov Decision Process, evaluates reward signals only on the final generated ligand, and applies a coarse denoising scheduler during reinforcement learning fine-tuning, resulting in generated molecules that outperform baselines on binding affinity, drug-likeness, and diversity while remaining competitive on synthesizability in the CrossDocked2020 benchmark.

What carries the argument

The central mechanism is the formulation of the entire reverse denoising trajectory as a multi-step Markov Decision Process whose policy is updated by reinforcement learning using rewards computed on the completed ligand molecule.

If this is right

Ligands generated after fine-tuning reach an average Vina score of -8.5 kcal/mol, higher than the baselines.
The generated molecules show measurable gains in drug-likeness and diversity metrics.
Synthesizability remains at a level comparable to existing methods.
Multiple molecular properties can be optimized together through the same reward-based fine-tuning procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decision-process framing could be applied to other generative models that produce molecules or materials step by step.
Adding further reward terms for properties such as toxicity or metabolic stability would test whether the approach scales to more realistic drug-design constraints.
The method may shorten the filtering stage in computational drug discovery by producing higher-quality candidates directly from the generator.

Load-bearing premise

Reward signals taken only from the final ligand after the full denoising sequence can supply stable and effective gradients for updating every step of the multi-step reverse process.

What would settle it

An experiment on the CrossDocked2020 benchmark in which the reinforcement learning updates produce no gain in average Vina score or cause a sharp drop in diversity compared with the untuned diffusion model.

Figures

Figures reproduced from arXiv: 2605.17693 by Daniel Kudenko, Megha Khosla, Yuan Xue.

**Figure 2.** Figure 2: Variance of denoising transitions along the denoising process under different step sizes. Larger denoising step size leads to higher variance of the denoising transition, especially within the middle stages of the denoising process. The adoption of a coarser denoising scheduler to efficiently navigate chemical space toward high-reward regions can be justified from the following two perspectives: (1) Mitiga… view at source ↗

read the original abstract

Structure-based drug design has been accelerated by pocket-aware 3D generative models, yet most methods primarily fit the training distribution and may fall short of satisfying multiple properties required in real-world therapeutic drug discovery. Recently, increasing attention has focused on structure-based molecule optimization (SBMO), which targets fine-grained control over multiple specified molecular properties. In this paper, we present DEPPA, a novel SBMO approach building upon Denoising Diffusion Policy Optimization for fine-tuning a pre-trained pocket-aware diffusion model via reinforcement learning. DEPPA enables optimization over multiple properties, including binding affinity, drug-likeness, synthesizability and diversity. We formulate the reverse denoising process of the pretrained pocket-aware diffusion model as a multi-step Markov Decision Process, where the desired properties that serve as reward signals are evaluated on the final generated ligand molecules. DEPPA incorporates a coarse denoising scheduler during the RL fine-tuning to achieve efficient and effective molecule optimization. Experimental results on the CrossDocked2020 benchmark demonstrate that DEPPA outperforms baselines in binding affinity (Vina Score -8.5 kcal/mol), drug-likeness and diversity while exhibiting competitive performance in synthesizability. The source code is available at https://github.com/xy9485/DePPA .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DEPPA applies RL fine-tuning to pocket-aware diffusion models for multi-property ligand optimization, with plausible benchmark gains but lingering questions on training stability.

read the letter

This paper's main move is to take a pre-trained pocket-aware diffusion model and fine-tune it with reinforcement learning so the generated molecules hit better binding affinity, drug-likeness, and diversity at the same time. It casts the full denoising trajectory as a multi-step MDP and adds a coarse scheduler to make the RL updates practical on long chains. The reported CrossDocked2020 results show a Vina score around -8.5 plus gains on the other metrics, which is the kind of concrete outcome people in structure-based design actually care about. Code release helps too. What is new here is the specific combination of denoising diffusion policy optimization with pocket-aware models for simultaneous multi-objective control; earlier papers handled diffusion or RL or pocket conditioning separately, but this framing for SBMO looks like a fresh application rather than routine extension. The approach is grounded in existing diffusion and RL ideas, and the empirical claims are at least directionally plausible for the subfield. The soft spot is the credit-assignment issue. Rewards arrive only after the complete denoising run, so gradients have to travel back through hundreds of steps. The coarse scheduler is supposed to ease this, but the abstract and visible parts do not show detailed variance measurements or ablations that confirm the updates stay stable instead of collapsing or relying on lucky seeds. Baseline details and statistical tests are also thin in what is shown, which makes the comparative wins harder to weigh right now. This is for readers already working on 3D generative models for drug design or RL steering of diffusion processes. Someone following that literature would find a usable recipe and benchmark numbers to build from. It has enough substance and a clear empirical hook to deserve a serious referee rather than a desk reject. I would send it out for review; the core method is worth checking in detail even if the stability analysis needs more work.

Referee Report

2 major / 2 minor

Summary. The paper introduces DEPPA, which fine-tunes a pre-trained pocket-aware 3D diffusion model for structure-based molecule optimization via Denoising Policy Optimization. The reverse denoising trajectory is formulated as a multi-step MDP whose only non-zero reward is computed from external property predictors (Vina score, drug-likeness, etc.) on the completed ligand; a coarse denoising scheduler is added for efficiency. On the CrossDocked2020 benchmark the method reports a Vina score of -8.5 kcal/mol together with gains in drug-likeness and diversity while remaining competitive on synthesizability.

Significance. If the terminal-reward RL procedure can be shown to produce stable gradients across long denoising trajectories, the approach would supply a general recipe for multi-objective fine-tuning of diffusion generators in drug design, moving beyond pure distribution matching. Public release of source code is a positive factor for reproducibility.

major comments (2)

[§3] §3 (MDP formulation): the reverse process is defined as an MDP whose reward is zero except at the final ligand after the complete denoising chain. With typical schedules of several hundred steps, the policy-gradient update therefore relies on a single sparse terminal signal. The manuscript provides no explicit value function, advantage estimator, or per-step shaping analysis that would demonstrate mitigation of the high-variance and credit-assignment problems inherent to naïve REINFORCE on long-horizon MDPs; the reported gains could therefore rest on implicit regularization rather than reliable optimization.
[Experimental section] Experimental section (results on CrossDocked2020): the central claim of outperformance (Vina -8.5, superior drug-likeness and diversity) is presented without reported details on baseline re-implementations, number of molecules sampled per target, statistical significance tests, or run-to-run variance. These omissions make it impossible to judge whether the numerical improvements are robust or attributable to the proposed coarse scheduler and PPO-style objective.

minor comments (2)

[Abstract] The abstract states 'competitive performance in synthesizability' without a numerical value or explicit baseline comparison; adding a table row or sentence with the actual metric would improve clarity.
[Method] Notation for the coarse denoising scheduler is introduced in prose; a compact algorithm box or equation defining the reduced step schedule would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's thorough and constructive review of our manuscript. We address each major comment point by point below and will revise the paper to improve clarity, detail, and robustness as suggested.

read point-by-point responses

Referee: [§3] §3 (MDP formulation): the reverse process is defined as an MDP whose reward is zero except at the final ligand after the complete denoising chain. With typical schedules of several hundred steps, the policy-gradient update therefore relies on a single sparse terminal signal. The manuscript provides no explicit value function, advantage estimator, or per-step shaping analysis that would demonstrate mitigation of the high-variance and credit-assignment problems inherent to naïve REINFORCE on long-horizon MDPs; the reported gains could therefore rest on implicit regularization rather than reliable optimization.

Authors: We thank the referee for this insightful observation on the challenges of sparse terminal rewards in long-horizon MDPs. In DEPPA, the coarse denoising scheduler explicitly reduces the number of effective denoising steps during RL fine-tuning to a much shorter horizon (typically 10-20 steps), which substantially alleviates credit assignment difficulties. We also employ a PPO-style objective that incorporates an advantage estimator derived from the terminal returns to lower gradient variance. We will revise §3 to explicitly detail the advantage estimator, quantify the horizon reduction from the coarse scheduler, and include a supplementary analysis of gradient variance to demonstrate optimization stability. These additions will clarify that the reported gains arise from the proposed method rather than solely from implicit effects. revision: yes
Referee: Experimental section (results on CrossDocked2020): the central claim of outperformance (Vina -8.5, superior drug-likeness and diversity) is presented without reported details on baseline re-implementations, number of molecules sampled per target, statistical significance tests, or run-to-run variance. These omissions make it impossible to judge whether the numerical improvements are robust or attributable to the proposed coarse scheduler and PPO-style objective.

Authors: We agree that greater experimental transparency is necessary to substantiate the claims. In the revised manuscript, we will expand the experimental section to include: full details on baseline re-implementations and their hyperparameters, the number of molecules sampled per target (100 ligands per pocket), results of statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values), and run-to-run variance reported as mean ± standard deviation across multiple independent runs (e.g., 3-5 random seeds). These additions will allow readers to better evaluate the robustness of the improvements and the specific contributions of the coarse scheduler and PPO-style objective. revision: yes

Circularity Check

0 steps flagged

External rewards from Vina and property predictors; no definitional reduction

full rationale

The paper formulates the denoising trajectory as an MDP and applies RL fine-tuning with terminal rewards drawn exclusively from external oracles (Vina docking scores, drug-likeness, synthesizability metrics). These signals are independent of any internal fitted parameters or self-defined quantities. Reported gains on the CrossDocked2020 benchmark are therefore empirical outcomes rather than tautological predictions or self-referential identities. No self-definitional, fitted-input, or uniqueness-imported patterns appear in the derivation; any reference to prior DDPO work is not load-bearing for the central experimental claim. This yields a low but non-zero circularity score reflecting standard methodological self-reference without reduction of results to inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling decision to treat denoising as an MDP and on the availability of external reward functions; no new physical entities are postulated and only a small number of scheduler hyperparameters are introduced.

free parameters (1)

coarse denoising scheduler hyperparameters
Chosen to achieve efficient RL fine-tuning; their specific values are not reported in the abstract.

axioms (1)

domain assumption The reverse denoising process of a diffusion model can be formulated as a multi-step Markov Decision Process whose terminal reward is computed on the final ligand.
This modeling step is required to apply policy optimization and is stated directly in the abstract.

pith-pipeline@v0.9.0 · 5751 in / 1314 out tokens · 45915 ms · 2026-05-20T13:26:37.711967+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 5 internal anchors

[1]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Tagmol: Target-aware gradient-guided molecule generation.arXiv preprint arXiv:2406.01650,

Vineeth Dorna, D Subhalingam, Keshav Kolluru, Shreshth Tuli, Mrityunjay Singh, Saurabh Singal, NM Kr- ishnan, and Sayan Ranu. Tagmol: Target-aware gradient-guided molecule generation.arXiv preprint arXiv:2406.01650,

work page arXiv
[3]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen- Or. An image is worth one word: Personalizing text-to-image generation using textual inversion.arXiv preprint arXiv:2208.01618,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

URLhttps://arxiv.org/abs/2303. 03543. Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, and Quanquan Gu. Decompdiff: diffusion models with decomposed priors for structure-based drug design. arXiv preprint arXiv:2403.07902,

work page arXiv
[5]

Benchmarking generated poses: How rational is structure-based drug design with generative models?arXiv preprint arXiv:2308.07413,

Charles Harris, Kieran Didi, Arian R Jamasb, Chaitanya K Joshi, Simon V Mathis, Pietro Lio, and Tom Blundell. Benchmarking generated poses: How rational is structure-based drug design with generative models?arXiv preprint arXiv:2308.07413,

work page arXiv
[6]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

3dmolformer: A dual-channel framework for structure-based drug discovery.arXiv preprint arXiv:2502.05107, 2025a

Xiuyuan Hu, Guoqing Liu, Can Chen, Yang Zhao, Hao Zhang, and Xue Liu. 3dmolformer: A dual-channel framework for structure-based drug discovery.arXiv preprint arXiv:2502.05107, 2025a. Zijing Hu, Fengda Zhang, Long Chen, Kun Kuang, Jiahui Li, Kaifeng Gao, Jun Xiao, Xin Wang, and Wenwu Zhu. Towards better alignment: Training diffusion models with reinforceme...

work page arXiv
[8]

Kingma, Tim Salimans, Ben Poole, and Jonathan Ho

URL https://arxiv.org/abs/2107.00630. Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, and Shuiwang Ji. Generating 3d molecules for target protein binding.arXiv preprint arXiv:2204.09410, 2022a. Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B Tenenbaum. Compositional visual generation with composable diffusion models. InEuropean conference...

work page arXiv
[9]

Fragment-based ligand generation guided by geometric deep learning on protein-ligand structure.BioRxiv, pp

Alexander S Powers, Helen H Yu, Patricia Suriana, and Ron O Dror. Fragment-based ligand generation guided by geometric deep learning on protein-ligand structure.BioRxiv, pp. 2022–03,

work page 2022
[10]

Empower structure-based molecule optimization with gradient guided bayesian flow networks.arXiv preprint arXiv:2411.13280,

Keyue Qiu, Yuxuan Song, Jie Yu, Hongbo Ma, Ziyao Cao, Zhilong Zhang, Yushuai Wu, Mingyue Zheng, Hao Zhou, and Wei-Ying Ma. Empower structure-based molecule optimization with gradient guided bayesian flow networks.arXiv preprint arXiv:2411.13280,

work page arXiv
[11]

Molcraft: structure-based drug design in continuous parameter space.arXiv preprint arXiv:2404.12141,

Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, and Wei-Ying Ma. Molcraft: structure-based drug design in continuous parameter space.arXiv preprint arXiv:2404.12141,

work page arXiv
[12]

Proximal Policy Optimization Algorithms

14 John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimiza- tion algorithms.arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

URLhttps://arxiv.org/abs/2402.03300. Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pp. 2256–2265. pmlr,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Zaixi Zhang, Yaosen Min, Shuxin Zheng, and Qi Liu

URLhttps://arxiv.org/abs/2305.13997. Zaixi Zhang, Yaosen Min, Shuxin Zheng, and Qi Liu. Molecule generation for target protein binding with structural motifs. InThe eleventh international conference on learning representations,

work page arXiv
[15]

Decompopt: Con- trollable and decomposed diffusion models for structure-based molecular optimization.arXiv preprint arXiv:2403.13829,

Xiangxin Zhou, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, and Quanquan Gu. Decompopt: Con- trollable and decomposed diffusion models for structure-based molecular optimization.arXiv preprint arXiv:2403.13829,

work page arXiv
[16]

For Vina Score, the evaluated values are reversed before the Gaussian rank transformation

15 A More Implementation Details Reward Processing.InDePPA, we feed raw continuous values of Vina score and molecule properties to the Gaussian rank transformation, resulting in a zero-centered distribution that resembles a normal distribution. For Vina Score, the evaluated values are reversed before the Gaussian rank transformation. The transformed value...

work page 2022

[1] [1]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Tagmol: Target-aware gradient-guided molecule generation.arXiv preprint arXiv:2406.01650,

Vineeth Dorna, D Subhalingam, Keshav Kolluru, Shreshth Tuli, Mrityunjay Singh, Saurabh Singal, NM Kr- ishnan, and Sayan Ranu. Tagmol: Target-aware gradient-guided molecule generation.arXiv preprint arXiv:2406.01650,

work page arXiv

[3] [3]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen- Or. An image is worth one word: Personalizing text-to-image generation using textual inversion.arXiv preprint arXiv:2208.01618,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

URLhttps://arxiv.org/abs/2303. 03543. Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, and Quanquan Gu. Decompdiff: diffusion models with decomposed priors for structure-based drug design. arXiv preprint arXiv:2403.07902,

work page arXiv

[5] [5]

Benchmarking generated poses: How rational is structure-based drug design with generative models?arXiv preprint arXiv:2308.07413,

Charles Harris, Kieran Didi, Arian R Jamasb, Chaitanya K Joshi, Simon V Mathis, Pietro Lio, and Tom Blundell. Benchmarking generated poses: How rational is structure-based drug design with generative models?arXiv preprint arXiv:2308.07413,

work page arXiv

[6] [6]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

3dmolformer: A dual-channel framework for structure-based drug discovery.arXiv preprint arXiv:2502.05107, 2025a

Xiuyuan Hu, Guoqing Liu, Can Chen, Yang Zhao, Hao Zhang, and Xue Liu. 3dmolformer: A dual-channel framework for structure-based drug discovery.arXiv preprint arXiv:2502.05107, 2025a. Zijing Hu, Fengda Zhang, Long Chen, Kun Kuang, Jiahui Li, Kaifeng Gao, Jun Xiao, Xin Wang, and Wenwu Zhu. Towards better alignment: Training diffusion models with reinforceme...

work page arXiv

[8] [8]

Kingma, Tim Salimans, Ben Poole, and Jonathan Ho

URL https://arxiv.org/abs/2107.00630. Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, and Shuiwang Ji. Generating 3d molecules for target protein binding.arXiv preprint arXiv:2204.09410, 2022a. Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B Tenenbaum. Compositional visual generation with composable diffusion models. InEuropean conference...

work page arXiv

[9] [9]

Fragment-based ligand generation guided by geometric deep learning on protein-ligand structure.BioRxiv, pp

Alexander S Powers, Helen H Yu, Patricia Suriana, and Ron O Dror. Fragment-based ligand generation guided by geometric deep learning on protein-ligand structure.BioRxiv, pp. 2022–03,

work page 2022

[10] [10]

Empower structure-based molecule optimization with gradient guided bayesian flow networks.arXiv preprint arXiv:2411.13280,

Keyue Qiu, Yuxuan Song, Jie Yu, Hongbo Ma, Ziyao Cao, Zhilong Zhang, Yushuai Wu, Mingyue Zheng, Hao Zhou, and Wei-Ying Ma. Empower structure-based molecule optimization with gradient guided bayesian flow networks.arXiv preprint arXiv:2411.13280,

work page arXiv

[11] [11]

Molcraft: structure-based drug design in continuous parameter space.arXiv preprint arXiv:2404.12141,

Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, and Wei-Ying Ma. Molcraft: structure-based drug design in continuous parameter space.arXiv preprint arXiv:2404.12141,

work page arXiv

[12] [12]

Proximal Policy Optimization Algorithms

14 John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimiza- tion algorithms.arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

URLhttps://arxiv.org/abs/2402.03300. Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pp. 2256–2265. pmlr,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Zaixi Zhang, Yaosen Min, Shuxin Zheng, and Qi Liu

URLhttps://arxiv.org/abs/2305.13997. Zaixi Zhang, Yaosen Min, Shuxin Zheng, and Qi Liu. Molecule generation for target protein binding with structural motifs. InThe eleventh international conference on learning representations,

work page arXiv

[15] [15]

Decompopt: Con- trollable and decomposed diffusion models for structure-based molecular optimization.arXiv preprint arXiv:2403.13829,

Xiangxin Zhou, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, and Quanquan Gu. Decompopt: Con- trollable and decomposed diffusion models for structure-based molecular optimization.arXiv preprint arXiv:2403.13829,

work page arXiv

[16] [16]

For Vina Score, the evaluated values are reversed before the Gaussian rank transformation

15 A More Implementation Details Reward Processing.InDePPA, we feed raw continuous values of Vina score and molecule properties to the Gaussian rank transformation, resulting in a zero-centered distribution that resembles a normal distribution. For Vina Score, the evaluated values are reversed before the Gaussian rank transformation. The transformed value...

work page 2022