arXiv, abs/2505.02686

Sailing by the Stars: A Survey on Reward Models, Learning Strategies for Learning from Rewards · arXiv 2505.02686

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning

cs.CL · 2025-08-13 · unverdicted · novelty 6.0

PEER applies GRPO reinforcement learning with a unified process-outcome reward model to structured empathetic reasoning steps on the SER dataset, yielding gains in empathy, strategy alignment, and human-likeness.

citing papers explorer

Showing 1 of 1 citing paper.

PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning cs.CL · 2025-08-13 · unverdicted · none · ref 10
PEER applies GRPO reinforcement learning with a unified process-outcome reward model to structured empathetic reasoning steps on the SER dataset, yielding gains in empathy, strategy alignment, and human-likeness.

arXiv, abs/2505.02686

fields

years

verdicts

representative citing papers

citing papers explorer