Realsim shows simulated users fail to reproduce communication frictions present in real multi-turn chatbot dialogues, yielding overly optimistic evaluations with domain-dependent variability.
Large language models that replace human participants can harmfully misportray and flatten identity groups
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations
Realsim shows simulated users fail to reproduce communication frictions present in real multi-turn chatbot dialogues, yielding overly optimistic evaluations with domain-dependent variability.