PPR-GDE is a new RL approach that integrates pairwise preference rewards with group-based diversity enhancement in a unified objective to improve both alignment quality and expressive diversity in open-ended generation tasks such as role-playing.
Advances in Neural Information Processing Systems , volume =
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AI 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Synthetic training data designed to break the correlation between semantic and preferential signals in text embeddings provably improves preference prediction across 11 online deliberation datasets.
SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.
citing papers explorer
-
Pairwise Preference Reward and Group-Based Diversity Enhancement for Superior Open-Ended Generation
PPR-GDE is a new RL approach that integrates pairwise preference rewards with group-based diversity enhancement in a unified objective to improve both alignment quality and expressive diversity in open-ended generation tasks such as role-playing.
-
Embeddings for Preferences, Not Semantics
Synthetic training data designed to break the correlation between semantic and preferential signals in text embeddings provably improves preference prediction across 11 online deliberation datasets.
-
Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization
SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.