Uncertainty quantification for LLM- based survey simulations

· 2025 · stat.ME · arXiv 2502.17773

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous simulation fidelity across different LLMs and domains.

representative citing papers

Adaptive Querying with AI Persona Priors

stat.ML · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

A persona-induced latent variable model with LLM response distributions enables closed-form Bayesian updates and finite-mixture predictions for scalable adaptive querying of user-dependent quantities.

Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys

cs.AI · 2026-04-19 · unverdicted · novelty 7.0

A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.

Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling

cs.AI · 2026-03-10 · conditional · novelty 6.0

Repeated sampling of the same safety prompts reveals substantial differences in LLM failure probabilities across temperatures that conventional single-evaluation benchmarks miss.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Adaptive Querying with AI Persona Priors stat.ML · 2026-05-01 · unverdicted · none · ref 5 · 2 links · internal anchor
A persona-induced latent variable model with LLM response distributions enables closed-form Bayesian updates and finite-mixture predictions for scalable adaptive querying of user-dependent quantities.
Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys cs.AI · 2026-04-19 · unverdicted · none · ref 48 · internal anchor
A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.

Uncertainty quantification for LLM- based survey simulations

fields

years

verdicts

representative citing papers

citing papers explorer