PPR-GDE is a new RL approach that integrates pairwise preference rewards with group-based diversity enhancement in a unified objective to improve both alignment quality and expressive diversity in open-ended generation tasks such as role-playing.
International Conference on Learning Representations , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
unclear 1representative citing papers
LPDP adds a local re-solving operator to edit-flow DNA generators so that reward signals can guide insertions, deletions, and substitutions without retraining.
citing papers explorer
-
Pairwise Preference Reward and Group-Based Diversity Enhancement for Superior Open-Ended Generation
PPR-GDE is a new RL approach that integrates pairwise preference rewards with group-based diversity enhancement in a unified objective to improve both alignment quality and expressive diversity in open-ended generation tasks such as role-playing.
-
LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows
LPDP adds a local re-solving operator to edit-flow DNA generators so that reward signals can guide insertions, deletions, and substitutions without retraining.