Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.
Carolin Kaiser, Jakob Kaiser, Vladimir Manewitsch, Lea Rau, and Rene Schallner
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
LLM warm-starts for bandits remain better than cold-starts up to roughly 30% random label noise but increase regret under systematic misalignment, with a derived sufficient condition on prior error that predicts when the warm-start helps.
LLM digital personas improve alignment with human survey response distributions for stable attributes but remain limited for individual prediction and fail to recover multivariate respondent structure.
LLM agents display limited alignment with human emotional responses to red tape across cultures, performing worse in Eastern contexts, while cultural prompting offers little improvement.
citing papers explorer
-
The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study
Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.
-
Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits
LLM warm-starts for bandits remain better than cold-starts up to roughly 30% random label noise but increase regret under systematic misalignment, with a derived sufficient condition on prior error that predicts when the warm-start helps.
-
When Can Digital Personas Reliably Approximate Human Survey Findings?
LLM digital personas improve alignment with human survey response distributions for stable attributes but remain limited for individual prediction and fail to recover multivariate respondent structure.
-
Cross-Cultural Simulation of Citizen Emotional Responses to Bureaucratic Red Tape Using LLM Agents
LLM agents display limited alignment with human emotional responses to red tape across cultures, performing worse in Eastern contexts, while cultural prompting offers little improvement.