SGRPO is a GRPO-style framework that constructs set-level diversity rewards via supergroup sampling and leave-one-out redistribution to expand the utility-diversity Pareto frontier in biomolecular design tasks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2roles
background 1polarities
background 1representative citing papers
MolReAct uses an LLM agent to dynamically constrain RL action spaces to validated reaction templates, achieving the highest average Top-10 score of 0.571 across 14 drug optimization tasks while providing explicit synthetic pathways.
citing papers explorer
-
Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization
MolReAct uses an LLM agent to dynamically constrain RL action spaces to validated reaction templates, achieving the highest average Top-10 score of 0.571 across 14 drug optimization tasks while providing explicit synthetic pathways.