HER trains LLMs on reverse-engineered reasoning data and human preference rewards to improve cognitive persona simulation, reporting 30-point gains on CoSER and 15% on Minimax benchmarks over Qwen3-32B.
I believe I can manage
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing
HER trains LLMs on reverse-engineered reasoning data and human preference rewards to improve cognitive persona simulation, reporting 30-point gains on CoSER and 15% on Minimax benchmarks over Qwen3-32B.