Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction while RAPO asymmetrically weights role-specific tokens during policy optimization, outperforming prior CoT and GRPO baselines on role-playing benchmarks.
ISBN 979-8-89176-256-5
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CRPO modifies GRPO with three mechanisms—decoupling task and style rewards, adapting constraints to character complexity, and using generic responses as negative baselines—to improve character fidelity in role-playing agents.
citing papers explorer
-
Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization
Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction while RAPO asymmetrically weights role-specific tokens during policy optimization, outperforming prior CoT and GRPO baselines on role-playing benchmarks.
-
CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents
CRPO modifies GRPO with three mechanisms—decoupling task and style rewards, adapting constraints to character complexity, and using generic responses as negative baselines—to improve character fidelity in role-playing agents.