Sgpo: Self-generated preference optimization based on self-improver.arXiv preprint arXiv:2507.20181,

Lee, H · arXiv 2507.20181

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

cs.CL · 2026-05-25 · unverdicted · novelty 6.0

CRPO modifies GRPO with three mechanisms—decoupling task and style rewards, adapting constraints to character complexity, and using generic responses as negative baselines—to improve character fidelity in role-playing agents.

citing papers explorer

Showing 1 of 1 citing paper.

CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents cs.CL · 2026-05-25 · unverdicted · none · ref 11
CRPO modifies GRPO with three mechanisms—decoupling task and style rewards, adapting constraints to character complexity, and using generic responses as negative baselines—to improve character fidelity in role-playing agents.

Sgpo: Self-generated preference optimization based on self-improver.arXiv preprint arXiv:2507.20181,

fields

years

verdicts

representative citing papers

citing papers explorer