Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.
S imulator A rena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study
Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.