S imulator A rena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?

Dou, Yao, Galley, Michel, Peng, Baolin, Kedzie, Chris, Cai, Weixin, Ritter, Alan · 2025 · DOI 10.18653/v1/2025.emnlp-main.1786

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.

citing papers explorer

Showing 1 of 1 citing paper.

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study cs.CL · 2026-05-20 · unverdicted · none · ref 13
Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.

S imulator A rena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?

fields

years

verdicts

representative citing papers

citing papers explorer