Advances in Neural Information Processing Systems , volume =

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Pairwise Preference Reward and Group-Based Diversity Enhancement for Superior Open-Ended Generation

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

PPR-GDE is a new RL approach that integrates pairwise preference rewards with group-based diversity enhancement in a unified objective to improve both alignment quality and expressive diversity in open-ended generation tasks such as role-playing.

Embeddings for Preferences, Not Semantics

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

Synthetic training data designed to break the correlation between semantic and preferential signals in text embeddings provably improves preference prediction across 11 online deliberation datasets.

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.

citing papers explorer

Showing 3 of 3 citing papers.

Pairwise Preference Reward and Group-Based Diversity Enhancement for Superior Open-Ended Generation cs.AI · 2026-05-18 · unverdicted · none · ref 12
PPR-GDE is a new RL approach that integrates pairwise preference rewards with group-based diversity enhancement in a unified objective to improve both alignment quality and expressive diversity in open-ended generation tasks such as role-playing.
Embeddings for Preferences, Not Semantics cs.AI · 2026-05-08 · unverdicted · none · ref 13
Synthetic training data designed to break the correlation between semantic and preferential signals in text embeddings provably improves preference prediction across 11 online deliberation datasets.
Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization cs.AI · 2026-05-02 · unverdicted · none · ref 27
SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.

Advances in Neural Information Processing Systems , volume =

fields

years

verdicts

representative citing papers

citing papers explorer