LLM moral robustness under persona role-play is largely determined by model family with Claude models most consistent, while susceptibility shows little family dependence.
URL https://aclanthology.org/2024.ac l-long.847/
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
fields
cs.CL 2roles
background 2polarities
background 2representative citing papers
citing papers explorer
-
Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
LLM moral robustness under persona role-play is largely determined by model family with Claude models most consistent, while susceptibility shows little family dependence.
- DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories