The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

Alexander D'Amour; Arthur Gretton; John Canny; Maja Matari\'c; Taedong Yun; Victoria Lin

arxiv: 2605.20767 · v1 · pith:3XSC4SZZnew · submitted 2026-05-20 · 💻 cs.CL · cs.LG· stat.ME

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

Victoria Lin , Taedong Yun , Maja Matari\'c , John Canny , Arthur Gretton , Alexander D'Amour This is my paper

Pith reviewed 2026-05-21 05:18 UTC · model grok-4.3

classification 💻 cs.CL cs.LGstat.ME

keywords LLM simulationuser driftconfounding biasnegative control outcomessynthetic usersobservational studiespersona adjustment

0 comments

The pith

Interventions in LLM-simulated experiments induce unintended shifts in latent user attributes, distorting effect estimates through user drift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that using large language models to simulate users in experiments with interventions is not equivalent to running a randomized controlled trial. Because LLMs are trained on observational data, specifying an intervention can cause the model to change other implicit characteristics of the simulated user. This user drift means the groups being compared are not comparable, introducing bias similar to that in observational studies. A sympathetic reader would care because this undermines the reliability of a growing body of research that relies on LLM simulations to test interventions at scale without real human subjects.

Core claim

The authors show that intervention-dependent shifts in latent user attributes lead to user drift, where the implicit simulated population differs across treatment conditions. They formalize how this can cause confounding or selection bias that inflates or attenuates observed differences in responses. They propose using negative control outcomes to diagnose such shifts and demonstrate that eliciting additional setting-relevant confounders in persona specifications can reduce the bias in both survey-style and multi-turn agent evaluations.

What carries the argument

User drift from intervention-dependent shifts in latent attributes, diagnosed via negative control outcomes that should remain invariant.

If this is right

Negative control outcomes can detect distribution shifts indicative of user drift across intervention conditions.
Adjusting persona specifications with targeted confounders substantially reduces bias in effect estimates.
This holds for both survey-style evaluations and multi-turn agent interactions.
LLM-simulated experiments may require additional controls to approximate true experimental designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers using LLM simulations for causal inference should routinely include negative controls to validate their setups.
This issue may extend to other synthetic data generation methods trained on observational corpora.
Combining LLM simulations with small-scale human validation could help quantify the extent of drift.

Load-bearing premise

That LLMs trained on observational data will produce shifts in latent user attributes when interventions are specified in simulations.

What would settle it

An experiment showing that effect estimates from LLM simulations match those from randomized human trials when negative controls indicate no distribution shift, or diverge when shifts are present.

Figures

Figures reproduced from arXiv: 2605.20767 by Alexander D'Amour, Arthur Gretton, John Canny, Maja Matari\'c, Taedong Yun, Victoria Lin.

**Figure 2.** Figure 2: DAGs for (a) the real-world observational data-generating process, (b) detecting selection [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: TVD over adjustment iterations. Shaded bands indicate 95% CIs. Dashed line indicates [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Observed effects over adjustment iterations. Shaded bands indicate 95% CIs. Dashed [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Negative control outcome distributions (Qwen3-30B, Book Opinions). At each adjustment [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Primary outcome distributions faceted by intervention condition at adjustment iteration 0. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Proportion of specified persona attributes correctly reported after intervention dialogue in [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Marginal negative control outcome distributions over iterations of Gemma-4-31B on [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Marginal demographics-based negative control outcome distributions in MovieLens. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: OpinionQA outcome distributions (Gemma-3 4B-it). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: OpinionQA outcome distributions (Gemma-4 31B-it). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: OpinionQA outcome distributions (GPT-OSS 20B). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗

**Figure 13.** Figure 13: OpinionQA outcome distributions (Qwen3 30B-A3B). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗

**Figure 14.** Figure 14: OpinionQA outcome distributions (Qwen3 30B-A3B-Instruct-2507). At each adjustment [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗

**Figure 15.** Figure 15: OpinionQA outcome distributions (Gemini 3 Flash). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗

**Figure 16.** Figure 16: Book Opinions outcome distributions (Gemma-3 4B-it). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

**Figure 17.** Figure 17: Book Opinions outcome distributions (Gemma-4 31B-it). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p028_17.png] view at source ↗

**Figure 18.** Figure 18: Book Opinions outcome distributions (GPT-OSS 20B). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p029_18.png] view at source ↗

**Figure 19.** Figure 19: Book Opinions outcome distributions (Qwen3 30B-A3B). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p029_19.png] view at source ↗

**Figure 20.** Figure 20: Book Opinions outcome distributions (Qwen3 30B-A3B-Instruct-2507). At each adjust [PITH_FULL_IMAGE:figures/full_fig_p030_20.png] view at source ↗

**Figure 21.** Figure 21: Book Opinions outcome distributions (Gemini 3 Flash). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗

**Figure 22.** Figure 22: MovieLens outcome distributions (Gemma-3 4B-it). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p031_22.png] view at source ↗

**Figure 23.** Figure 23: MovieLens outcome distributions (Gemma-4 31B-it). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p031_23.png] view at source ↗

**Figure 24.** Figure 24: MovieLens outcome distributions (GPT-OSS 20B). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p032_24.png] view at source ↗

**Figure 25.** Figure 25: MovieLens outcome distributions (Qwen3 30B-A3B). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p032_25.png] view at source ↗

**Figure 26.** Figure 26: MovieLens outcome distributions (Qwen3 30B-A3B-Instruct-2507). At each adjustment [PITH_FULL_IMAGE:figures/full_fig_p033_26.png] view at source ↗

**Figure 27.** Figure 27: MovieLens outcome distributions (Gemini 3 Flash). At each adjustment iteration, [PITH_FULL_IMAGE:figures/full_fig_p033_27.png] view at source ↗

read the original abstract

Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM-simulated interventions can create user drift that confounds results, and the paper formalizes this with a negative-control diagnostic plus persona fixes, but the invariance of those controls is still an open question.

read the letter

The main point is that prompting an LLM with different intervention conditions can shift the latent attributes of the simulated users, so the treatment and control groups are no longer drawn from the same implicit population. The authors call this user drift and show how it introduces confounding or selection bias into what looks like an experiment. They formalize the problem and suggest negative control outcomes—attributes that should stay fixed—to detect when the simulated population has moved. They also test adjusting the persona prompt with extra confounders and report that this cuts the bias in both survey-style and multi-turn agent setups. That framing and the negative-control suggestion are the clearest new pieces. It is useful to have the issue named explicitly for people running these simulations in HCI or evaluation work. The stress-test concern lands: because the model generates every variable from the same prompt, an intervention could move even the putatively invariant negative controls through prompt sensitivity or training-data correlations. The paper needs to show that its chosen controls actually stay stable or provide a check that the diagnosis still works. The abstract gives no effect sizes, standard errors, or details on how drift was quantified, so the practical size of the problem and the fix remain hard to judge from what is here. This is for researchers who use or review LLM behavioral simulations and want to avoid overclaiming causal effects. A reader who already runs these studies will see immediate value in the diagnostic idea. It deserves a serious referee because the validity threat is real and the proposed tools are concrete enough to test, even if the current evidence is preliminary. I would send it out for review and ask the referees to focus on whether the negative controls hold up under intervention prompts.

Referee Report

2 major / 1 minor

Summary. The paper claims that interventions in LLM-simulated experiments induce unintended shifts in latent user attributes (termed 'user drift'), causing the implicit simulated population to differ across treatment conditions and thereby introducing confounding or selection bias that distorts effect estimates. It formalizes this bias, proposes negative control outcomes to diagnose distribution shifts across intervention conditions as evidence of drift, and reports that targeted persona adjustment by eliciting additional setting-relevant confounders substantially reduces bias in both survey-style and multi-turn agent evaluations.

Significance. If the formalization and empirical results hold, the work identifies a systematic source of bias in LLM-based behavioral simulation that parallels issues in observational studies, providing diagnostic tools (negative controls) and a mitigation strategy (persona adjustment). This could improve the validity of LLM-simulated experiments for studying interventions, particularly as such methods scale. The analogy to observational data and the focus on latent attribute shifts offer a useful conceptual framing, though the absence of quantitative details in the abstract limits immediate assessment of practical impact.

major comments (2)

[Abstract] Abstract: the statement that targeted persona adjustment 'substantially reduce[s] bias' across two evaluation styles provides no quantitative effect sizes, error bars, or details on how drift magnitude was measured or how the reduction was quantified, which is load-bearing for evaluating whether the mitigation is effective or merely cosmetic.
[Diagnosis of confounding] The diagnostic relying on negative control outcomes assumes these attributes remain invariant under intervention while still detecting shifts in other attributes. Because the LLM generates every attribute jointly from the same prompt, an intervention prompt can alter even putatively invariant attributes via prompt sensitivity or training-data associations, making it impossible to cleanly separate genuine user drift from model behavior; this assumption underpins both the formalization of bias and the reliability of the proposed diagnosis.

minor comments (1)

[Abstract] The abstract refers to 'two evaluation styles' without naming them explicitly (e.g., survey-style vs. multi-turn); adding a brief parenthetical or table reference would improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which identify key areas for improving the clarity and rigor of our presentation. We address each major comment below and outline the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that targeted persona adjustment 'substantially reduce[s] bias' across two evaluation styles provides no quantitative effect sizes, error bars, or details on how drift magnitude was measured or how the reduction was quantified, which is load-bearing for evaluating whether the mitigation is effective or merely cosmetic.

Authors: We agree that the abstract would be strengthened by including quantitative details. The main text reports specific bias reductions and measurement procedures (via negative control outcome shifts), but these are not summarized in the abstract. In revision we will add concise quantitative results, including effect sizes for bias reduction and a brief description of the drift metric, while remaining within length limits. revision: yes
Referee: [Diagnosis of confounding] The diagnostic relying on negative control outcomes assumes these attributes remain invariant under intervention while still detecting shifts in other attributes. Because the LLM generates every attribute jointly from the same prompt, an intervention prompt can alter even putatively invariant attributes via prompt sensitivity or training-data associations, making it impossible to cleanly separate genuine user drift from model behavior; this assumption underpins both the formalization of bias and the reliability of the proposed diagnosis.

Authors: This correctly identifies a modeling assumption whose validity is not automatic. Our negative controls were chosen on substantive grounds (attributes whose invariance follows from the intervention definition and domain knowledge). We will add (i) explicit empirical checks confirming that selected negative controls show negligible shifts relative to primary outcomes and (ii) a limitations paragraph acknowledging residual prompt-sensitivity risk inherent to joint generation. We view this as a partial but honest strengthening rather than a full resolution of the joint-generation issue. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical comparisons and formalization without self-referential reduction

full rationale

The paper formalizes user drift and bias from LLM-simulated interventions, proposes negative control outcomes for diagnosis, and evaluates mitigation via persona adjustment. No equations, fitted parameters, or derivations are shown that reduce by construction to the target result. Central claims rely on described empirical comparisons across survey-style and multi-turn evaluations rather than tautological inputs or load-bearing self-citations. The premise about observational training data inducing shifts is stated as an assumption, not derived circularly. This is a normal non-finding for a conceptual/empirical paper without mathematical self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that LLMs inherit observational-data biases and on the new concept of user drift; no explicit free parameters or invented physical entities are introduced.

axioms (1)

domain assumption LLMs are trained largely on observational data
Explicitly stated in the first sentence of the abstract as the source of unintended shifts.

invented entities (1)

user drift no independent evidence
purpose: To name the intervention-dependent shift in latent attributes that produces confounding
Introduced in the abstract as the mechanism that turns simulated experiments into observational studies; no independent falsifiable handle is provided in the abstract.

pith-pipeline@v0.9.0 · 5702 in / 1284 out tokens · 33956 ms · 2026-05-21T05:18:11.064097+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose using negative control outcomes—attributes that should remain invariant under intervention—to identify distribution shifts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

[1]

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation , author=

work page
[2]

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations , author=

work page
[3]

Reasoning Elicitation in Language Models via Counterfactual Feedback , author=

work page
[4]

arXiv preprint arXiv:2502.11425 , year=

Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models , author=. arXiv preprint arXiv:2502.11425 , year=

work page arXiv
[5]

Proceedings of the 40th International Conference on Machine Learning , pages =

Whose Opinions Do Language Models Reflect? , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

work page 2023
[6]

Benchmarking Distributional Alignment of Large Language Models

Meister, Nicole and Guestrin, Carlos and Hashimoto, Tatsunori. Benchmarking Distributional Alignment of Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.2

work page doi:10.18653/v1/2025.naacl-long.2 2025
[7]

Maxwell and Konstan, Joseph A

Harper, F. Maxwell and Konstan, Joseph A. , title =. ACM Trans. Interact. Intell. Syst. , month = dec, articleno =. 2015 , issue_date =. doi:10.1145/2827872 , abstract =

work page doi:10.1145/2827872 2015
[8]

and Kalai, Adam Tauman , title =

Aher, Gati and Arriaga, Rosa I. and Kalai, Adam Tauman , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

work page 2023
[9]

Second Conference on Language Modeling , year=

Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions , author=. Second Conference on Language Modeling , year=

work page
[10]

2026 , eprint=

Identity, Cooperation and Framing Effects within Groups of Real and Simulated Humans , author=. 2026 , eprint=

work page 2026
[11]

The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective , ISSN=

Gui, George and Toubia, Olivier , year=. The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective , ISSN=. doi:10.2139/ssrn.4650172 , journal=

work page doi:10.2139/ssrn.4650172
[12]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Persistent Instability in LLM’s Personality Measurements: Effects of Scale, Reasoning, and Conversation History , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , month=. doi:10.1609/aaai.v40i44.41133 , abstractNote=

work page doi:10.1609/aaai.v40i44.41133 2026
[13]

S imulator A rena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?

Dou, Yao and Galley, Michel and Peng, Baolin and Kedzie, Chris and Cai, Weixin and Ritter, Alan and Quirk, Chris and Xu, Wei and Gao, Jianfeng. S imulator A rena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/20...

work page doi:10.18653/v1/2025.emnlp-main.1786 2025
[14]

, year = 2023, booktitle =

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology , articleno =. 2023 , isbn =. doi:10.1145/3586183.3606763 , abstract =

work page doi:10.1145/3586183.3606763 2023
[15]

2025 , url=

Ang Li and Haozhe Chen and Hongseok Namkoong and Tianyi Peng , booktitle=. 2025 , url=

work page 2025
[16]

Quantifying the Persona Effect in LLM Simulations

Hu, Tiancheng and Collier, Nigel. Quantifying the Persona Effect in LLM Simulations. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.554

work page doi:10.18653/v1/2024.acl-long.554 2024
[17]

2026 , eprint=

LLMs Can Infer Political Alignment from Online Conversations , author=. 2026 , eprint=

work page 2026
[18]

Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages =

Kaiser, Carolin and Kaiser, Jakob and Manewitsch, Vladimir and Rau, Lea and Schallner, Rene , title =. Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages =. 2025 , isbn =. doi:10.1145/3708319.3733685 , abstract =

work page doi:10.1145/3708319.3733685 2025
[19]

The Annals of Applied Statistics , volume=

SPLIT-DOOR CRITERION , author=. The Annals of Applied Statistics , volume=. 2018 , publisher=

work page 2018
[20]

2025 , eprint=

Gemini: A Family of Highly Capable Multimodal Models , author=. 2025 , eprint=

work page 2025
[21]

Doubly-Robust

Luke Guerdan and Justin Whitehouse and Kimberly Truong and Ken Holstein and Steven Wu , booktitle=. Doubly-Robust. 2026 , url=

work page 2026
[22]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , month=. doi:10.1609/aaai.v40i46.41254 , abstractNote=

work page doi:10.1609/aaai.v40i46.41254 2026
[23]

2025 , eprint=

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina , author=. 2025 , eprint=

work page 2025
[24]

Can LLM be a Personalized Judge?

Dong, Yijiang River and Hu, Tiancheng and Collier, Nigel. Can LLM be a Personalized Judge?. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.592

work page doi:10.18653/v1/2024.findings-emnlp.592 2024
[25]

Personas with Attitudes: Controlling LLM s for Diverse Data Annotation

Fr. Personas with Attitudes: Controlling LLM s for Diverse Data Annotation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

work page 2025
[26]

LLM Tropes: Revealing Fine-Grained Values and Opinions in Large Language Models

Wright, Dustin and Arora, Arnav and Borenstein, Nadav and Yadav, Srishti and Belongie, Serge and Augenstein, Isabelle. LLM Tropes: Revealing Fine-Grained Values and Opinions in Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.995

work page doi:10.18653/v1/2024.findings-emnlp.995 2024
[27]

Epidemiology , volume=

Negative controls: a tool for detecting confounding and bias in observational studies , author=. Epidemiology , volume=. 2010 , publisher=

work page 2010
[28]

, author=

Estimating causal effects of treatments in randomized and nonrandomized studies. , author=. Journal of educational Psychology , volume=. 1974 , publisher=

work page 1974
[29]

Biometrika , volume=

Identifying causal effects with proxy variables of an unmeasured confounder , author=. Biometrika , volume=. 2018 , publisher=

work page 2018
[30]

arXiv preprint arXiv:2009.10982 , year=

An introduction to proximal causal learning , author=. arXiv preprint arXiv:2009.10982 , year=

work page arXiv 2009
[31]

Biometrika , pages=

Measurement bias and effect restoration in causal inference , author=. Biometrika , pages=. 2014 , publisher=

work page 2014
[32]

International conference on machine learning , pages=

Proximal causal learning with kernels: Two-stage estimation and moment restriction , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[33]

arXiv preprint arXiv:2512.24413 , year=

Demystifying Proximal Causal Inference , author=. arXiv preprint arXiv:2512.24413 , year=

work page arXiv
[34]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Lee, Harrison and Phatale, Samrat and Mansoor, Hassan and Mesnard, Thomas and Ferret, Johan and Lu, Kellie and Bishop, Colton and Hall, Ethan and Carbune, Victor and Rastogi, Abhinav and Prakash, Sushant , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024
[35]

PERSONA : A Reproducible Testbed for Pluralistic Alignment

Castricato, Louis and Lile, Nathan and Rafailov, Rafael and Fr. PERSONA : A Reproducible Testbed for Pluralistic Alignment. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025
[36]

2022 , eprint=

Constitutional AI: Harmlessness from AI Feedback , author=. 2022 , eprint=

work page 2022
[37]

Yun, Taedong and Yang, Eric and Safdari, Mustafa and Lee, Jong Ha and Kumar, Vaishnavi Vinod and Mahdavi, S. Sara and Amar, Jonathan and Peyton, Derek and Aharony, Reut and PhD, Andreas Michaelides and Schneider, Logan Douglas and Galatzer-Levy, Isaac and Jia, Yugang and Canny, John and Gretton, Arthur and Mataric, Maja. Sleepless Nights, Sugary Days: Cre...

work page doi:10.18653/v1/2025.findings-acl.729 2025
[38]

2024 , eprint=

Automated Social Science: Language Models as Scientist and Subjects , author=. 2024 , eprint=

work page 2024
[39]

Virtual Personas for Language Models via an Anthology of Backstories

Moon, Suhong and Abdulhai, Marwa and Kang, Minwoo and Suh, Joseph and Soedarmadji, Widyadewi and Behar, Eran Kohen and Chan, David M. Virtual Personas for Language Models via an Anthology of Backstories. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1110

work page doi:10.18653/v1/2024.emnlp-main.1110 2024
[40]

Argyle, Ethan C

Out of One, Many: Using Language Models to Simulate Human Samples , volume=. Political Analysis , author=. 2023 , pages=. doi:10.1017/pan.2023.2 , number=

work page doi:10.1017/pan.2023.2 2023
[41]

Synthetic users: insights from designers’ interactions with persona-based chatbots , volume=

Gu, (Eric) Heng and Chandrasegaran, Senthil and Lloyd, Peter , year=. Synthetic users: insights from designers’ interactions with persona-based chatbots , volume=. doi:10.1017/S0890060424000283 , journal=

work page doi:10.1017/s0890060424000283
[42]

2026 , eprint=

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals , author=. 2026 , eprint=

work page 2026
[43]

Quantifying Language Models Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , url =

Sclar, Melanie and Choi, Yejin and Tsvetkov, Yulia and Suhr, Alane , booktitle =. Quantifying Language Models Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , url =

work page
[44]

Kozlowski and Bernard Koch and Erik Brynjolfsson and James Evans and Michael S

Jacy Reese Anthis and Ryan Liu and Sean M Richardson and Austin C. Kozlowski and Bernard Koch and Erik Brynjolfsson and James Evans and Michael S. Bernstein , booktitle=. Position:. 2025 , url=

work page 2025
[45]

Sociological Methods and Research , issue=

Balancing large language model alignment and algorithmic fidelity in social science research , author=. Sociological Methods and Research , issue=. 2025 , volume=

work page 2025
[46]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

work page 2024
[47]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025
[48]

2025 , eprint=

gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=

work page 2025
[49]

Gemma-4-31B-it , year =

work page
[50]

Gemma-3-4B-it , year =

work page
[51]

Gemini 3 Developer Guide | Gemini API , year =

work page

[1] [1]

RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation , author=

work page

[2] [2]

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations , author=

work page

[3] [3]

Reasoning Elicitation in Language Models via Counterfactual Feedback , author=

work page

[4] [4]

arXiv preprint arXiv:2502.11425 , year=

Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models , author=. arXiv preprint arXiv:2502.11425 , year=

work page arXiv

[5] [5]

Proceedings of the 40th International Conference on Machine Learning , pages =

Whose Opinions Do Language Models Reflect? , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

work page 2023

[6] [6]

Benchmarking Distributional Alignment of Large Language Models

Meister, Nicole and Guestrin, Carlos and Hashimoto, Tatsunori. Benchmarking Distributional Alignment of Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.2

work page doi:10.18653/v1/2025.naacl-long.2 2025

[7] [7]

Maxwell and Konstan, Joseph A

Harper, F. Maxwell and Konstan, Joseph A. , title =. ACM Trans. Interact. Intell. Syst. , month = dec, articleno =. 2015 , issue_date =. doi:10.1145/2827872 , abstract =

work page doi:10.1145/2827872 2015

[8] [8]

and Kalai, Adam Tauman , title =

Aher, Gati and Arriaga, Rosa I. and Kalai, Adam Tauman , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

work page 2023

[9] [9]

Second Conference on Language Modeling , year=

Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions , author=. Second Conference on Language Modeling , year=

work page

[10] [10]

2026 , eprint=

Identity, Cooperation and Framing Effects within Groups of Real and Simulated Humans , author=. 2026 , eprint=

work page 2026

[11] [11]

The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective , ISSN=

Gui, George and Toubia, Olivier , year=. The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective , ISSN=. doi:10.2139/ssrn.4650172 , journal=

work page doi:10.2139/ssrn.4650172

[12] [12]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Persistent Instability in LLM’s Personality Measurements: Effects of Scale, Reasoning, and Conversation History , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , month=. doi:10.1609/aaai.v40i44.41133 , abstractNote=

work page doi:10.1609/aaai.v40i44.41133 2026

[13] [13]

S imulator A rena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?

Dou, Yao and Galley, Michel and Peng, Baolin and Kedzie, Chris and Cai, Weixin and Ritter, Alan and Quirk, Chris and Xu, Wei and Gao, Jianfeng. S imulator A rena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/20...

work page doi:10.18653/v1/2025.emnlp-main.1786 2025

[14] [14]

, year = 2023, booktitle =

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology , articleno =. 2023 , isbn =. doi:10.1145/3586183.3606763 , abstract =

work page doi:10.1145/3586183.3606763 2023

[15] [15]

2025 , url=

Ang Li and Haozhe Chen and Hongseok Namkoong and Tianyi Peng , booktitle=. 2025 , url=

work page 2025

[16] [16]

Quantifying the Persona Effect in LLM Simulations

Hu, Tiancheng and Collier, Nigel. Quantifying the Persona Effect in LLM Simulations. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.554

work page doi:10.18653/v1/2024.acl-long.554 2024

[17] [17]

2026 , eprint=

LLMs Can Infer Political Alignment from Online Conversations , author=. 2026 , eprint=

work page 2026

[18] [18]

Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages =

Kaiser, Carolin and Kaiser, Jakob and Manewitsch, Vladimir and Rau, Lea and Schallner, Rene , title =. Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages =. 2025 , isbn =. doi:10.1145/3708319.3733685 , abstract =

work page doi:10.1145/3708319.3733685 2025

[19] [19]

The Annals of Applied Statistics , volume=

SPLIT-DOOR CRITERION , author=. The Annals of Applied Statistics , volume=. 2018 , publisher=

work page 2018

[20] [20]

2025 , eprint=

Gemini: A Family of Highly Capable Multimodal Models , author=. 2025 , eprint=

work page 2025

[21] [21]

Doubly-Robust

Luke Guerdan and Justin Whitehouse and Kimberly Truong and Ken Holstein and Steven Wu , booktitle=. Doubly-Robust. 2026 , url=

work page 2026

[22] [22]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , month=. doi:10.1609/aaai.v40i46.41254 , abstractNote=

work page doi:10.1609/aaai.v40i46.41254 2026

[23] [23]

2025 , eprint=

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina , author=. 2025 , eprint=

work page 2025

[24] [24]

Can LLM be a Personalized Judge?

Dong, Yijiang River and Hu, Tiancheng and Collier, Nigel. Can LLM be a Personalized Judge?. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.592

work page doi:10.18653/v1/2024.findings-emnlp.592 2024

[25] [25]

Personas with Attitudes: Controlling LLM s for Diverse Data Annotation

Fr. Personas with Attitudes: Controlling LLM s for Diverse Data Annotation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

work page 2025

[26] [26]

LLM Tropes: Revealing Fine-Grained Values and Opinions in Large Language Models

Wright, Dustin and Arora, Arnav and Borenstein, Nadav and Yadav, Srishti and Belongie, Serge and Augenstein, Isabelle. LLM Tropes: Revealing Fine-Grained Values and Opinions in Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.995

work page doi:10.18653/v1/2024.findings-emnlp.995 2024

[27] [27]

Epidemiology , volume=

Negative controls: a tool for detecting confounding and bias in observational studies , author=. Epidemiology , volume=. 2010 , publisher=

work page 2010

[28] [28]

, author=

Estimating causal effects of treatments in randomized and nonrandomized studies. , author=. Journal of educational Psychology , volume=. 1974 , publisher=

work page 1974

[29] [29]

Biometrika , volume=

Identifying causal effects with proxy variables of an unmeasured confounder , author=. Biometrika , volume=. 2018 , publisher=

work page 2018

[30] [30]

arXiv preprint arXiv:2009.10982 , year=

An introduction to proximal causal learning , author=. arXiv preprint arXiv:2009.10982 , year=

work page arXiv 2009

[31] [31]

Biometrika , pages=

Measurement bias and effect restoration in causal inference , author=. Biometrika , pages=. 2014 , publisher=

work page 2014

[32] [32]

International conference on machine learning , pages=

Proximal causal learning with kernels: Two-stage estimation and moment restriction , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021

[33] [33]

arXiv preprint arXiv:2512.24413 , year=

Demystifying Proximal Causal Inference , author=. arXiv preprint arXiv:2512.24413 , year=

work page arXiv

[34] [34]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Lee, Harrison and Phatale, Samrat and Mansoor, Hassan and Mesnard, Thomas and Ferret, Johan and Lu, Kellie and Bishop, Colton and Hall, Ethan and Carbune, Victor and Rastogi, Abhinav and Prakash, Sushant , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024

[35] [35]

PERSONA : A Reproducible Testbed for Pluralistic Alignment

Castricato, Louis and Lile, Nathan and Rafailov, Rafael and Fr. PERSONA : A Reproducible Testbed for Pluralistic Alignment. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025

[36] [36]

2022 , eprint=

Constitutional AI: Harmlessness from AI Feedback , author=. 2022 , eprint=

work page 2022

[37] [37]

Yun, Taedong and Yang, Eric and Safdari, Mustafa and Lee, Jong Ha and Kumar, Vaishnavi Vinod and Mahdavi, S. Sara and Amar, Jonathan and Peyton, Derek and Aharony, Reut and PhD, Andreas Michaelides and Schneider, Logan Douglas and Galatzer-Levy, Isaac and Jia, Yugang and Canny, John and Gretton, Arthur and Mataric, Maja. Sleepless Nights, Sugary Days: Cre...

work page doi:10.18653/v1/2025.findings-acl.729 2025

[38] [38]

2024 , eprint=

Automated Social Science: Language Models as Scientist and Subjects , author=. 2024 , eprint=

work page 2024

[39] [39]

Virtual Personas for Language Models via an Anthology of Backstories

Moon, Suhong and Abdulhai, Marwa and Kang, Minwoo and Suh, Joseph and Soedarmadji, Widyadewi and Behar, Eran Kohen and Chan, David M. Virtual Personas for Language Models via an Anthology of Backstories. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1110

work page doi:10.18653/v1/2024.emnlp-main.1110 2024

[40] [40]

Argyle, Ethan C

Out of One, Many: Using Language Models to Simulate Human Samples , volume=. Political Analysis , author=. 2023 , pages=. doi:10.1017/pan.2023.2 , number=

work page doi:10.1017/pan.2023.2 2023

[41] [41]

Synthetic users: insights from designers’ interactions with persona-based chatbots , volume=

Gu, (Eric) Heng and Chandrasegaran, Senthil and Lloyd, Peter , year=. Synthetic users: insights from designers’ interactions with persona-based chatbots , volume=. doi:10.1017/S0890060424000283 , journal=

work page doi:10.1017/s0890060424000283

[42] [42]

2026 , eprint=

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals , author=. 2026 , eprint=

work page 2026

[43] [43]

Quantifying Language Models Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , url =

Sclar, Melanie and Choi, Yejin and Tsvetkov, Yulia and Suhr, Alane , booktitle =. Quantifying Language Models Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , url =

work page

[44] [44]

Kozlowski and Bernard Koch and Erik Brynjolfsson and James Evans and Michael S

Jacy Reese Anthis and Ryan Liu and Sean M Richardson and Austin C. Kozlowski and Bernard Koch and Erik Brynjolfsson and James Evans and Michael S. Bernstein , booktitle=. Position:. 2025 , url=

work page 2025

[45] [45]

Sociological Methods and Research , issue=

Balancing large language model alignment and algorithmic fidelity in social science research , author=. Sociological Methods and Research , issue=. 2025 , volume=

work page 2025

[46] [46]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

work page 2024

[47] [47]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025

[48] [48]

2025 , eprint=

gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=

work page 2025

[49] [49]

Gemma-4-31B-it , year =

work page

[50] [50]

Gemma-3-4B-it , year =

work page

[51] [51]

Gemini 3 Developer Guide | Gemini API , year =

work page