SGRPO is a GRPO-style framework that constructs set-level diversity rewards via supergroup sampling and leave-one-out redistribution to expand the utility-diversity Pareto frontier in biomolecular design tasks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2roles
background 1polarities
background 1representative citing papers
MolReAct uses an LLM agent to dynamically constrain RL action spaces to validated reaction templates, achieving the highest average Top-10 score of 0.571 across 14 drug optimization tasks while providing explicit synthetic pathways.
citing papers explorer
-
Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization
SGRPO is a GRPO-style framework that constructs set-level diversity rewards via supergroup sampling and leave-one-out redistribution to expand the utility-diversity Pareto frontier in biomolecular design tasks.
-
Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization
MolReAct uses an LLM agent to dynamically constrain RL action spaces to validated reaction templates, achieving the highest average Top-10 score of 0.571 across 14 drug optimization tasks while providing explicit synthetic pathways.