Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization
Pith reviewed 2026-05-20 23:27 UTC · model grok-4.3
The pith
Supergroup Relative Policy Optimization uses set-level diversity rewards to expand the utility-diversity Pareto frontier in biomolecular generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Supergroup Relative Policy Optimization constructs rewards from set-level diversity by sampling supergroups of candidate sets, comparing their diversity under the same condition, and redistributing the group diversity reward to individual rollouts through leave-one-out diversity contributions before combining it with rollout-level utility. This leads to expanded utility-diversity Pareto frontiers and superior frontier-level metrics in de novo small-molecule design, pocket-based small-molecule design, and de novo protein design.
What carries the argument
Supergroup Relative Policy Optimization (SGRPO), which samples supergroups of candidate sets and redistributes diversity rewards using leave-one-out contributions to balance utility and diversity in policy optimization.
If this is right
- Across decoding sweeps, SGRPO achieves the best frontier-level metrics relative to pretrained generators, GRPO, and memory-assisted GRPO.
- Direct set-level diversity rewards remain effective with small groups.
- SGRPO helps preserve broader generation-distribution coverage during post-training.
- The method can be instantiated with both GRPO and Coupled-GRPO across autoregressive and discrete diffusion generators.
Where Pith is reading between the lines
- This approach might reduce reliance on memory-assisted techniques for maintaining diversity in generative models.
- Applying SGRPO to other domains like image or text generation could test if set-level diversity rewards generalize beyond biomolecular tasks.
- Further work could explore how the size of the supergroup affects the stability of the diversity signal.
Load-bearing premise
That leave-one-out diversity contributions provide an unbiased and effective redistribution of the group-level diversity reward to individual rollouts without distorting the overall optimization signal.
What would settle it
A decoding sweep on one of the biomolecular design tasks where SGRPO does not expand the utility-diversity Pareto frontier beyond the baselines.
Figures
read the original abstract
Biomolecular generators are often adapted with reward feedback to improve task-specific utility, but pushing utility alone can concentrate generation on a narrow family of candidates. Maintaining diversity is difficult because sample diversity is a set-level property. We introduce Supergroup Relative Policy Optimization (SGRPO), a flexible GRPO-style framework that directly constructs rewards from set-level diversity. For each condition, SGRPO samples a supergroup of candidate sets, compares their diversity under the same condition, and redistributes the group diversity reward to individual rollouts through leave-one-out diversity contributions before combining it with rollout-level utility. This design decouples SGRPO from a particular generator, utility reward, or diversity metric, and allows instantiation with different GRPO-style approaches. We evaluate SGRPO on de novo small-molecule design, pocket-based small-molecule design, and de novo protein design, instantiating it with both GRPO and Coupled-GRPO across autoregressive and discrete diffusion generators. Across decoding sweeps, SGRPO expands the utility-diversity Pareto frontier and achieves the best frontier-level metrics relative to pretrained generators, GRPO, and memory-assisted GRPO when applicable. Our analyses further show that direct set-level diversity rewards remain effective with small groups and help preserve broader generation-distribution coverage during post-training. The code is available at https://github.com/IDEA-XL/SGRPO.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Supergroup Relative Policy Optimization (SGRPO), a GRPO-style framework that constructs set-level diversity rewards by sampling supergroups of K candidate sets per condition, computing a group diversity score D, and redistributing it to individual rollouts via leave-one-out contributions D_{-i} before combining with rollout-level utility. It instantiates the approach with GRPO and Coupled-GRPO on autoregressive and discrete diffusion generators, and evaluates on de novo small-molecule design, pocket-based small-molecule design, and de novo protein design. The central claim is that SGRPO expands the utility-diversity Pareto frontier and achieves the best frontier-level metrics relative to pretrained generators, GRPO, and memory-assisted GRPO across decoding sweeps, while remaining effective with small groups and preserving broader distribution coverage.
Significance. If the reported Pareto expansions hold after verification that the leave-one-out redistribution does not introduce gradient artifacts, the work would be significant for multi-objective RL fine-tuning of biomolecular generators. It offers a decoupled, metric-agnostic way to handle set-level properties like diversity, with code release supporting reproducibility and extension to other generators or tasks in computational chemistry and biology.
major comments (1)
- [Section 3.2] Section 3.2: The leave-one-out redistribution assigns D_{-i} values whose sum is fixed relative to D, inducing linear dependence and correlation among per-rollout advantages within each supergroup. The GRPO-style loss (Eqs. 4-5) applies these advantages without covariance correction; this risks distorting the policy gradient signal and could partly artifactually inflate the reported frontier gains, particularly at the small supergroup sizes noted as effective. A covariance analysis or control experiment with independent per-rollout diversity sampling is needed to confirm the improvements reflect genuine utility-diversity trade-offs.
minor comments (2)
- [Evaluation sections] The manuscript should provide explicit formulas and implementation details for the diversity metrics used in the supergroup comparisons, along with any statistical tests (e.g., confidence intervals or significance levels) supporting the Pareto frontier comparisons.
- [Results] Clarify the exact range of decoding parameters in the sweeps and any controls for confounding factors such as sampling temperature or generator-specific biases when claiming superiority over baselines.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address the major concern regarding the leave-one-out redistribution and potential gradient effects below.
read point-by-point responses
-
Referee: [Section 3.2] Section 3.2: The leave-one-out redistribution assigns D_{-i} values whose sum is fixed relative to D, inducing linear dependence and correlation among per-rollout advantages within each supergroup. The GRPO-style loss (Eqs. 4-5) applies these advantages without covariance correction; this risks distorting the policy gradient signal and could partly artifactually inflate the reported frontier gains, particularly at the small supergroup sizes noted as effective. A covariance analysis or control experiment with independent per-rollout diversity sampling is needed to confirm the improvements reflect genuine utility-diversity trade-offs.
Authors: We acknowledge that leave-one-out redistribution induces linear dependence among the D_{-i} values by construction, since their sum is fixed relative to the group diversity D. This dependence is intentional: diversity is a set-level property, and the redistribution attributes each rollout's marginal contribution while preserving the total group reward. The GRPO loss normalizes advantages within each supergroup, which accounts for relative comparisons and reduces the impact of within-group correlations on the policy gradient. Our empirical results demonstrate consistent Pareto frontier expansion across multiple generators, tasks, and supergroup sizes (including small K), alongside preserved distribution coverage, which would be unlikely if the gains were primarily artifacts. An independent per-rollout diversity sampling control would not evaluate joint set diversity and thus would not test the core claim. We will add a short discussion of advantage correlations and their effect on the gradient in the revised manuscript. revision: partial
Circularity Check
No significant circularity; SGRPO reward construction is definitional and claims rest on empirical evaluation
full rationale
The paper introduces SGRPO by explicitly defining a supergroup sampling process, computing a group-level diversity score D, redistributing via leave-one-out contributions D_{-i} to individual rollouts, and combining with rollout utility rewards before applying a GRPO-style loss. This is a direct construction of the optimization objective from external set-level metrics, not a derivation that reduces by the paper's equations to a quantity fitted from target utility data or to prior self-citations. Performance claims of Pareto frontier expansion are supported by empirical decoding sweeps on small-molecule and protein design tasks, with comparisons to pretrained generators, GRPO, and memory-assisted baselines. No self-definitional reductions, fitted-input predictions, or load-bearing self-citation chains appear in the derivation; the leave-one-out redistribution is presented as an explicit design choice whose effects are analyzed separately rather than assumed to be unbiased by construction.
Axiom & Free-Parameter Ledger
free parameters (2)
- supergroup size
- diversity metric
axioms (1)
- domain assumption Leave-one-out contributions accurately apportion group diversity reward to individual samples without bias
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SGRPO samples a supergroup of candidate sets, compares their diversity under the same condition, and redistributes the group diversity reward to individual rollouts through leave-one-out diversity contributions before combining it with rollout-level utility.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For each rollout xm,i ∈ Gm, we first compute its leave-one-out contribution cm,i = D(Gm) - D(Gm ∖ {xm,i})
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[2]
Model-based reinforcement learning for biological sequence design
Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, and Lucy Colwell. Model-based reinforcement learning for biological sequence design. In International conference on learning representations, 2019
work page 2019
-
[3]
Viraj Bagal, Rishal Aggarwal, PK Vinod, and U Deva Priyakumar. Molgpt: molecular generation using a transformer-decoder model.Journal of chemical information and modeling, 62(9): 2064–2076, 2021
work page 2064
-
[4]
Dávid Bajusz, Anita Rácz, and Károly Héberger. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations?Journal of cheminformatics, 7(1):20, 2015
work page 2015
-
[5]
Quantifying the chemical beauty of drugs.Nature chemistry, 4(2):90–98, 2012
G Richard Bickerton, Gaia V Paolini, Jérémy Besnard, Sorel Muresan, and Andrew L Hopkins. Quantifying the chemical beauty of drugs.Nature chemistry, 4(2):90–98, 2012
work page 2012
-
[6]
Esben Jannik Bjerrum, Christian Margreitter, Thomas Blaschke, Simona Kolarova, and Raquel López-Ríos de Castro. Faster and more diverse de novo molecular optimization with double- loop reinforcement learning using augmented smiles.Journal of Computer-Aided Molecular Design, 37(8):373–394, 2023
work page 2023
-
[7]
Thomas Blaschke, Ola Engkvist, Jürgen Bajorath, and Hongming Chen. Memory-assisted reinforcement learning for diverse molecular de novo design.Journal of cheminformatics, 12 (1):68, 2020
work page 2020
-
[8]
Design by adaptive sampling.arXiv preprint arXiv:1810.03714, 2018
David H Brookes and Jennifer Listgarten. Design by adaptive sampling.arXiv preprint arXiv:1810.03714, 2018
-
[9]
Egbert Castro, Abhinav Godavarthi, Julian Rubinfien, Kevin Givechian, Dhananjay Bhaskar, and Smita Krishnaswamy. Transformer-based protein generation with regularized latent space optimization.Nature Machine Intelligence, 4(10):840–851, 2022
work page 2022
-
[10]
Curiosity as a self- supervised method to improve exploration in de novo drug design
Mohamed-Amine Chadi, Hajar Mousannif, and Ahmed Aamouche. Curiosity as a self- supervised method to improve exploration in de novo drug design. In2023 International Conference on Information Technology Research and Innovation (ICITRI), pages 151–156. IEEE, 2023
work page 2023
-
[11]
Decomposed direct preference optimization for structure-based drug design,
Xiwei Cheng, Xiangxin Zhou, Yuwei Yang, Yu Bao, and Quanquan Gu. Decomposed direct preference optimization for structure-based drug design.arXiv preprint arXiv:2407.13981, 2024
-
[12]
Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022
Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022
work page 2022
-
[14]
Yasha Ektefaie, Olivia Viessmann, Siddharth Narayanan, Drew Dresser, J Mark Kim, and Armen Mkrtchyan. Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding.arXiv preprint arXiv:2410.17173, 2024
-
[15]
Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions.Journal of cheminfor- matics, 1(1):8, 2009
work page 2009
-
[16]
Vendy Fialková, Jiaxi Zhao, Kostas Papadopoulos, Ola Engkvist, Esben Jannik Bjerrum, Thierry Kogej, and Atanas Patronov. Libinvent: reaction-based generative scaffold decoration for in silico library design.Journal of Chemical Information and Modeling, 62(9):2046–2063, 2021. 11
work page 2046
-
[17]
Paul G Francoeur, Tomohide Masuda, Jocelyn Sunseri, Andrew Jia, Richard B Iovanisci, Ian Snyder, and David R Koes. Three-dimensional convolutional neural networks and a cross- docked data set for structure-based drug design.Journal of chemical information and modeling, 60(9):4200–4215, 2020
work page 2020
-
[18]
Raj Ghugare, Santiago Miret, Adriana Hugessen, Mariano Phielipp, and Glen Berseth. Search- ing for high-value molecules using reinforcement learning and transformers.arXiv preprint arXiv:2310.02902, 2023
-
[19]
Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018
work page 2018
-
[20]
Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, and Yizhe Zhang. Diffucoder: Understanding and improving masked diffusion models for code generation.arXiv preprint arXiv:2506.20639, 2025
-
[21]
Ryan-Rhys Griffiths and José Miguel Hernández-Lobato. Constrained bayesian optimization for automatic chemical design using variational autoencoders.Chemical science, 11(2):577–586, 2020
work page 2020
-
[22]
Utilizing reinforcement learning for de novo drug design.Machine Learning, 113(7):4811–4843, 2024
Hampus Gummesson Svensson, Christian Tyrchan, Ola Engkvist, and Morteza Haghir Chehreghani. Utilizing reinforcement learning for de novo drug design.Machine Learning, 113(7):4811–4843, 2024
work page 2024
-
[23]
Max Hebditch, M Alejandro Carballo-Amador, Spyros Charonis, Robin Curtis, and Jim War- wicker. Protein–sol: a web tool for predicting protein solubility from sequence.Bioinformatics, 33(19):3098–3100, 2017
work page 2017
-
[24]
Learning inverse folding from millions of predicted structures
Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives. Learning inverse folding from millions of predicted structures. InInternational conference on machine learning, pages 8946–8970. PMLR, 2022
work page 2022
-
[25]
Xiuyuan Hu, Guoqing Liu, Quanming Yao, Yang Zhao, and Hao Zhang. Hamiltonian diver- sity: effectively measuring molecular diversity by shortest hamiltonian circuits.Journal of Cheminformatics, 16(1):94, 2024
work page 2024
-
[26]
Hyosoon Jang, Yunhui Jang, Jaehyung Kim, and Sungsoo Ahn. Can llms generate diverse molecules? towards alignment with structural diversity.arXiv preprint arXiv:2410.03138, 2024
-
[27]
Multi-objective molecule generation using interpretable substructures
Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Multi-objective molecule generation using interpretable substructures. InInternational conference on machine learning, pages 4849–4859. PMLR, 2020
work page 2020
-
[28]
Alexia Jolicoeur-Martineau, Aristide Baratin, Kisoo Kwon, Boris Knyazev, and Yan Zhang. Any-property-conditional molecule generation with self-criticism using spanning trees.arXiv preprint arXiv:2407.09357, 2024
-
[29]
Panagiotis-Christos Kotsias, Josep Arús-Pous, Hongming Chen, Ola Engkvist, Christian Tyr- chan, and Esben Jannik Bjerrum. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks.Nature Machine Intelligence, 2(5):254–265, 2020
work page 2020
-
[30]
Genmol: A drug discovery generalist with discrete diffusion.arXiv preprint arXiv:2501.06158, 2025
Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Yuxing Peng, Saee Paliwal, Weili Nie, and Arash Vahdat. Genmol: A drug discovery generalist with discrete diffusion.arXiv preprint arXiv:2501.06158, 2025
-
[31]
CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation
Yanting Li, Zhuoyang Jiang, Enyan Dai, Lei Wang, Wen-Cai Ye, and Li Liu. Cagenmol: Condition-aware diffusion language model for goal-directed molecular generation.arXiv preprint arXiv:2604.11483, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[32]
Jaechang Lim, Seongok Ryu, Jin Woo Kim, and Woo Youn Kim. Molecular generative model based on conditional variational autoencoder for de novo molecular design.Journal of cheminformatics, 10(1):31, 2018. 12
work page 2018
-
[33]
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023
work page 2023
-
[34]
Xuhan Liu, Kai Ye, Herman WT Van Vlijmen, Adriaan P IJzerman, and Gerard JP Van Westen. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine a2a receptor.Journal of cheminformatics, 11(1):35, 2019
work page 2019
-
[35]
Xuhan Liu, Kai Ye, Herman WT van Vlijmen, Adriaan P IJzerman, and Gerard JP van Westen. Drugex v3: scaffold-constrained drug design with graph transformer-based reinforcement learning.Journal of Cheminformatics, 15(1):24, 2023
work page 2023
-
[36]
Reinvent 4: Modern ai–driven generative molecule design.Journal of Cheminformatics, 16(1):20, 2024
Hannes H Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey V oronov, Lewis H Mervin, and Ola Engkvist. Reinvent 4: Modern ai–driven generative molecule design.Journal of Cheminformatics, 16(1):20, 2024
work page 2024
-
[37]
Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023
Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani. Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023
work page 2023
-
[38]
Gotta be safe: a new framework for molecular design.Digital Discovery, 3(4):796–804, 2024
Emmanuel Noutahi, Cristian Gabellini, Michael Craig, Jonathan SC Lim, and Prudencio Tossou. Gotta be safe: a new framework for molecular design.Digital Discovery, 3(4):796–804, 2024
work page 2024
-
[39]
Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hongming Chen. Molecular de-novo design through deep reinforcement learning.Journal of cheminformatics, 9(1):48, 2017
work page 2017
-
[40]
Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, and Jibum Kim. Mol-air: Molecular reinforce- ment learning with adaptive intrinsic rewards for goal-directed molecular generation.Journal of Chemical Information and Modeling, 65(5):2283–2296, 2025
work page 2025
-
[41]
Ryan Park, Darren J Hsu, C Brian Roland, Maria Korshunova, Chen Tessler, Shie Mannor, Olivia Viessmann, and Bruno Trentini. Improving inverse folding for peptide design with diversity-regularized direct preference optimization.arXiv preprint arXiv:2410.19471, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[42]
Tiago Pereira, Maryam Abbasi, Bernardete Ribeiro, and Joel P Arrais. Diversity oriented deep reinforcement learning for targeted molecule generation.Journal of cheminformatics, 13(1):21, 2021
work page 2021
-
[43]
Deep reinforcement learning for de novo drug design.Science advances, 4(7):eaap7885, 2018
Mariya Popova, Olexandr Isayev, and Alexander Tropsha. Deep reinforcement learning for de novo drug design.Science advances, 4(7):eaap7885, 2018
work page 2018
-
[44]
Chiara Rodella, Symela Lazaridi, and Thomas Lemmin. Temberture: advancing protein ther- mostability prediction with deep learning and attention mechanisms.Bioinformatics Advances, 4(1):vbae103, 2024
work page 2024
-
[45]
Silvr: guided diffusion for molecule generation
Nicholas T Runcie and Antonia SJS Mey. Silvr: guided diffusion for molecule generation. Journal of chemical information and modeling, 63(19):5996–6005, 2023
work page 2023
-
[46]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Sequential posterior sampling with diffusion models
Tristan SW Stevens, Oisín Nolan, Jean-Luc Robert, and Ruud JG Van Sloun. Sequential posterior sampling with diffusion models. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025
work page 2025
-
[48]
Diversity-aware reinforcement learning for de novo drug design.arXiv preprint arXiv:2410.10431, 2024
Hampus Gummesson Svensson, Christian Tyrchan, Ola Engkvist, and Morteza Haghir Chehreghani. Diversity-aware reinforcement learning for de novo drug design.arXiv preprint arXiv:2410.10431, 2024
-
[49]
Hampus Gummesson Svensson, Ola Engkvist, Jon Paul Janet, Christian Tyrchan, and Morteza Haghir Chehreghani. Diverse mini-batch selection in reinforcement learning for efficient chemical exploration in de novo drug design.arXiv preprint arXiv:2506.21158, 2025. 13
-
[50]
Bc-design: A biochemistry-aware framework for inverse protein design.bioRxiv, 2025
Xiangru Tang, Xinwu Ye, Fang Wu, Yimeng Liu, Anna Su, Antonia Panescu, Guanlue Li, Daniel Shao, Dong Xu, and Mark Gerstein. Bc-design: A biochemistry-aware framework for inverse protein design.bioRxiv, 2025. doi: 10.1101/2024.10.28.620755. URL https: //www.biorxiv.org/content/early/2025/11/24/2024.10.28.620755
-
[51]
Morgan Thomas, Noel M O’Boyle, Andreas Bender, and Chris De Graaf. Augmented hill-climb increases reinforcement learning efficiency for language-based de novo molecule generation. Journal of cheminformatics, 14(1):68, 2022
work page 2022
-
[52]
Austin Tripp, Erik Daxberger, and José Miguel Hernández-Lobato. Sample-efficient optimiza- tion in the latent space of deep generative models via weighted retraining.Advances in Neural Information Processing Systems, 33:11259–11272, 2020
work page 2020
-
[53]
Oleg Trott and Arthur J Olson. Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.Journal of computational chemistry, 31(2):455–461, 2010
work page 2010
-
[54]
Ziwen Wang, Jiajun Fan, Ruihan Guo, Thao Nguyen, Heng Ji, and Ge Liu. Pro- teinzero: Self-improving protein generation via online reinforcement learning.arXiv preprint arXiv:2506.07459, 2025
-
[55]
Talal Widatalla, Rafael Rafailov, and Brian Hie. Aligning protein generative models with experimental fitness via direct preference optimization.bioRxiv, pages 2024–05, 2024
work page 2024
-
[56]
Junhao Xiong, Ishan Gaur, Maria Lukarska, Hunter Nisonoff, Luke M Oltrogge, David F Savage, and Jennifer Listgarten. Proteinguide: On-the-fly property guidance for protein sequence generative models.arXiv preprint arXiv:2505.04823, 2025
-
[57]
Soojung Yang, Doyeong Hwang, Seul Lee, Seongok Ryu, and Sung Ju Hwang. Hit and lead discovery with explorative rl and fragment-based molecule generation.Advances in Neural Information Processing Systems, 34:7924–7936, 2021
work page 2021
-
[58]
Xinwu Ye, Yicheng Mao, Jia Zhang, Yimeng Liu, Li Hao, Fang Wu, Zhiwei Li, Yuxuan Liao, Zehong Wang, Yingcheng Wu, et al. Latentchem: From textual cot to latent thinking in chemical reasoning.arXiv preprint arXiv:2602.07075, 2026
-
[59]
Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay Pande, and Jure Leskovec. Graph convolutional policy network for goal-directed molecular graph generation.Advances in neural information processing systems, 31, 2018
work page 2018
-
[60]
Optimization of molecules via deep reinforcement learning.Scientific reports, 9(1):10752, 2019
Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N Zare, and Patrick Riley. Optimization of molecules via deep reinforcement learning.Scientific reports, 9(1):10752, 2019
work page 2019
-
[61]
Xingzheng Zhu, Zhihong Zhao, and Fei Zhu. Scaffold-driven molecular generation via rein- forced rnn with centroid distance evaluation.Expert Systems with Applications, 292:128606, 2025. 14 A Full Training Procedure of SGRPO This appendix provides the full training procedure of Supergroup Relative Policy Optimization (SGRPO), corresponding to Section 4 in ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.