arxiv: 2605.12147 · v1 · submitted 2026-05-12 · 💻 cs.CR · cs.LG

Recognition: 2 theorem links

· Lean Theorem

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

James Flemings , Murali Annavaram

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:51 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords LLM simulationprivacy behavioruser personasevaluation benchmarkdata sharingindividual decisionsprivacy attitudes

0 comments

The pith

Conditioning LLMs on user personas improves simulation of individual privacy decisions yet tops out at 40.4 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PrivacySIM to test whether frontier LLMs can reproduce the privacy choices of specific people when given information about their demographics, prior experiences, and stated attitudes. It pits nine models against ground-truth answers from one thousand participants drawn from five published studies covering healthcare consultations, conversational agents, and chatbots. Adding persona details raises match rates above the no-persona baseline, but the best model still reaches only 40.4 percent agreement. Stated attitudes alone prove weak predictors because they frequently diverge from actual behavior, and users with high AI experience but low privacy concern are especially hard to simulate. The benchmark is released to support further work on making LLMs better stand-ins for individual privacy decisions.

Core claim

Conditioning nine frontier LLMs on subsets of three persona facets—demographics, previous experiences, and stated privacy attitudes—consistently raises the rate at which their responses to data-sharing scenarios match the ground-truth answers of 1,000 users from five published studies, yet the strongest model achieves only 40.4 percent accuracy and users with high AI experience but low stated privacy attitudes remain the hardest to simulate.

What carries the argument

The PrivacySIM evaluation suite, which conditions LLMs on subsets of persona facets and scores how often each model's output matches an individual user's recorded response to a privacy scenario.

Load-bearing premise

The responses collected in the five published user studies accurately represent each participant's real privacy behavior in the scenarios.

What would settle it

Collect fresh privacy decisions from the same 1,000 participants on the identical scenarios after a delay of months and measure how many original ground-truth answers no longer match the new responses.

Figures

Figures reproduced from arXiv: 2605.12147 by James Flemings, Murali Annavaram.

**Figure 1.** Figure 1: Overview of PRIVACYSIM, an evaluation suite for simulating user privacy behavior. We collect user responses and questionnaires from existing user studies on privacy behavior in LLM and AI contexts. We then condition LLMs on subsets of a user’s privacy persona (demographics, previous experiences with LLMs, and stated privacy attitudes) to simulate their responses to data-sharing questions. Finally, we evalu… view at source ↗

**Figure 2.** Figure 2: Average accuracy by prompt type and model across user studies. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Per-domain accuracy with two averages (solid black: 5-study average; dashed black: 3-study [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Per-domain tolerance accuracy across all eight prompt types for every model evaluated in [PITH_FULL_IMAGE:figures/full_fig_p031_4.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly used to simulate human behavior, but their ability to simulate $individual$ privacy decisions is not well understood. In this paper, we address the problem of evaluating whether a core set of user persona attributes can drive LLMs to simulate individual-level privacy behavior. We introduce PrivacySIM, an evaluation suite that benchmarks LLM simulation of user privacy behavior against the ground-truth responses of 1,000 users. These users are drawn from five published user studies on privacy spanning LLM healthcare consultations, conversational agents, and chatbots. Drawing on these user studies, we hypothesize three persona facets as plausible predictors of privacy decision-making: demographics, previous experiences, and stated privacy attitudes. We condition nine frontier LLMs on subsets of these three facets and measure how often each model's response to a data-sharing scenario matches the user's actual response. Our findings show that (1) privacy persona conditioning consistently improves simulation quality over no-persona conditioning, but even the strongest model (40.4\% accuracy) remains far from faithfully simulating individual privacy decisions. (2) A user's stated privacy attitudes alone may not be the best predictor because they often diverge from the user's actual privacy behavior. (3) Users with high AI/chatbot experience but low stated privacy attitudes are the most challenging to simulate. PrivacySIM is a first step toward understanding and improving the capabilities of LLMs to simulate user privacy decisions. We release PrivacySIM to enable further evaluation of LLM privacy simulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PrivacySIM gives a usable benchmark for testing how well LLMs match survey answers on privacy vignettes when given persona details, but tops out at 40% and rests on hypothetical responses rather than observed behavior.

read the letter

The paper introduces PrivacySIM, a released benchmark that takes 1,000 responses from five prior user studies on data-sharing in healthcare, chatbots, and agents, then measures how often nine frontier LLMs reproduce an individual user's answer when conditioned on subsets of demographics, prior experiences, and stated attitudes. Persona conditioning lifts accuracy over the no-persona baseline, with the best model reaching 40.4%, and the work flags that attitudes alone are weak predictors while high-AI-experience/low-attitude users are hardest to match. Releasing the full suite is the clearest practical step forward here.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces PrivacySIM, a benchmark suite for evaluating LLMs' ability to simulate individual-level user privacy decisions. It draws on ground-truth responses from 1,000 users across five published studies involving data-sharing scenarios in healthcare, conversational agents, and chatbots. The authors posit three persona facets (demographics, prior experiences, and stated privacy attitudes) as drivers of decisions, condition nine frontier LLMs on subsets of these facets, and measure how often model outputs match the users' reported choices. Results indicate that persona conditioning improves simulation accuracy over baselines, with the best model reaching 40.4% accuracy, while also noting that attitudes alone are weak predictors and that high-AI-experience/low-attitude users are hardest to simulate. The benchmark is released publicly.

Significance. If the empirical results hold under scrutiny, the work provides a concrete, reusable benchmark for assessing LLM simulation of privacy behavior, an increasingly relevant capability for user modeling and privacy research. The release of PrivacySIM and the identification of specific conditioning effects and failure modes are strengths that enable follow-on work. The paper is measured in its claims, avoiding overstatement of current LLM fidelity.

major comments (3)

[§3] §3 (User Studies and Ground Truth): The central evaluation equates matching LLM outputs to survey responses on hypothetical vignettes with 'simulation of individual privacy behavior.' However, the source studies collect stated intentions rather than observed decisions; the manuscript should explicitly discuss the privacy paradox and social-desirability bias as threats to interpreting the 40.4% ceiling as evidence of (or distance from) faithful simulation of underlying decision processes.
[§4.3] §4.3 (Evaluation Metrics): The exact procedure for determining a 'match' between an LLM response and a user's ground-truth answer is not fully specified (e.g., exact string match, semantic similarity threshold, or LLM judge). This definition is load-bearing for all reported accuracy figures and the claim that persona conditioning 'consistently improves' quality.
[§5] §5 (Results and Analysis): No statistical tests (p-values, confidence intervals, or effect sizes) are reported for the accuracy differences between conditioning regimes. Without these, it is difficult to assess whether the observed improvements over no-persona baselines are reliable or could be explained by variance across the 1,000 users or five studies.

minor comments (2)

[Abstract and §1] The abstract and §1 state that 'privacy persona conditioning consistently improves simulation quality,' but the precise subsets of facets tested (e.g., demographics only vs. all three) and their per-model breakdowns should be tabulated for clarity.
[Figures] Figure captions and legends could more explicitly label the conditioning conditions (none, demographics, experiences, attitudes, full persona) to aid quick reading of the accuracy plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their constructive comments, which have helped us identify areas for improvement in the manuscript. We have addressed each major comment by planning revisions to enhance the discussion of limitations, clarify the evaluation methodology, and add statistical analysis. These changes will strengthen the paper's contributions and transparency.

read point-by-point responses

Referee: [§3] §3 (User Studies and Ground Truth): The central evaluation equates matching LLM outputs to survey responses on hypothetical vignettes with 'simulation of individual privacy behavior.' However, the source studies collect stated intentions rather than observed decisions; the manuscript should explicitly discuss the privacy paradox and social-desirability bias as threats to interpreting the 40.4% ceiling as evidence of (or distance from) faithful simulation of underlying decision processes.

Authors: We thank the referee for highlighting this important distinction. The ground-truth data indeed consists of stated responses to hypothetical scenarios from published user studies, which are subject to the privacy paradox (where stated attitudes differ from actual behavior) and potential social-desirability bias. We will revise §3 to explicitly discuss these limitations and their implications for interpreting the simulation accuracy results. This will clarify that the benchmark evaluates alignment with stated intentions rather than unobserved real-world decisions. revision: yes
Referee: [§4.3] §4.3 (Evaluation Metrics): The exact procedure for determining a 'match' between an LLM response and a user's ground-truth answer is not fully specified (e.g., exact string match, semantic similarity threshold, or LLM judge). This definition is load-bearing for all reported accuracy figures and the claim that persona conditioning 'consistently improves' quality.

Authors: We agree that the matching procedure requires more precise specification. In the revised manuscript, we will expand §4.3 to detail the exact method used for determining matches, including any parsing of responses, similarity metrics if applicable, or use of automated judges. This will ensure reproducibility and allow readers to assess the robustness of the accuracy figures. revision: yes
Referee: [§5] §5 (Results and Analysis): No statistical tests (p-values, confidence intervals, or effect sizes) are reported for the accuracy differences between conditioning regimes. Without these, it is difficult to assess whether the observed improvements over no-persona baselines are reliable or could be explained by variance across the 1,000 users or five studies.

Authors: We appreciate this feedback on the statistical rigor of our analysis. We will incorporate appropriate statistical tests in §5, such as paired t-tests or Wilcoxon signed-rank tests across users for the accuracy differences, along with confidence intervals and effect sizes. This will provide evidence for the reliability of the improvements observed with persona conditioning. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical comparison to external ground-truth data

full rationale

The paper evaluates LLM simulation quality via direct match rates between model outputs and user responses drawn from five independent published studies (1,000 users total). No equations, fitted parameters, or derivations are present that reduce the reported accuracy (e.g., 40.4%) to a self-referential definition or input by construction. Persona facets serve as experimental conditioning variables whose effects are measured against the external benchmark, keeping the chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The evaluation rests on the domain assumption that the selected persona facets drive privacy decisions and that the published studies supply valid ground truth; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Demographics, previous experiences, and stated privacy attitudes are plausible predictors of individual privacy decision-making
Explicitly hypothesized in the abstract as the basis for conditioning the LLMs.

pith-pipeline@v0.9.0 · 5560 in / 1213 out tokens · 70654 ms · 2026-05-13T04:51:36.754543+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We condition nine frontier LLMs on subsets of these three facets [demographics, previous experiences, stated privacy attitudes] and measure how often each model’s response to a data-sharing scenario matches the user’s actual response.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

tolerance accuracy g(ˆru,ru) = 1/|Qu| Σ 1[|ru,i − ˆru,i| ≤ τ·WQu] (τ=0.15)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 3 internal anchors

[1]

Privacy norms for smart home personal assistants

Noura Abdi, Xiao Zhan, Kopo M Ramokapane, and Jose Such. Privacy norms for smart home personal assistants. InProceedings of the 2021 CHI conference on human factors in computing systems, pages 1–14, 2021

work page 2021
[2]

Privacy in electronic commerce and the economics of immediate grat- ification

Alessandro Acquisti. Privacy in electronic commerce and the economics of immediate grat- ification. InProceedings of the 5th ACM conference on Electronic commerce, pages 21–29, 2004

work page 2004
[3]

Privacy and rationality in individual decision making

Alessandro Acquisti and Jens Grossklags. Privacy and rationality in individual decision making. IEEE Security & Privacy, 3(1):26–33, 2005

work page 2005
[4]

Is there a cost to privacy breaches? an event study.ICIS 2006 proceedings, page 94, 2006

Alessandro Acquisti, Allan Friedman, and Rahul Telang. Is there a cost to privacy breaches? an event study.ICIS 2006 proceedings, page 94, 2006

work page 2006
[5]

Using large language models to simulate multiple humans and replicate human subject studies

Gati V Aher, Rosa I Arriaga, and Adam Tauman Kalai. Using large language models to simulate multiple humans and replicate human subject studies. InInternational conference on machine learning, pages 337–371. PMLR, 2023

work page 2023
[6]

Noah Apthorpe, Yan Shvartzshnaider, Arunesh Mathur, Dillon Reisman, and Nick Feamster. Discovering smart home internet of things privacy norms using contextual integrity.Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 2(2):1–23, 2018

work page 2018
[7]

Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351, 2023

Lisa P Argyle, Ethan C Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, and David Wingate. Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351, 2023

work page 2023
[8]

Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M

James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. Synthetic replacements for human survey data? the perils of large language models.Political Analysis, 32(4):401–416, 2024. doi: 10.1017/pan.2024.5

work page doi:10.1017/pan.2024.5 2024
[9]

The role of privacy fatigue in online privacy behavior.Computers in Human Behavior, 81:42–51, 2018

Hanbyul Choi, Jonghwa Park, and Yoonhyuk Jung. The role of privacy fatigue in online privacy behavior.Computers in Human Behavior, 81:42–51, 2018

work page 2018
[10]

Bot among us: Exploring user awareness and privacy concerns about chatbots in group chats

Kai-Hsiang Chou, Yi-An Wang, Chong Kai Lau, Mahmood Sharif, and Hsu-Chun Hsiao. Bot among us: Exploring user awareness and privacy concerns about chatbots in group chats. Proceedings on Privacy Enhancing Technologies, 2026

work page 2026
[11]

Information privacy concerns, procedural fairness, and impersonal trust: An empirical investigation.Organization science, 10(1):104–115, 1999

Mary J Culnan and Pamela K Armstrong. Information privacy concerns, procedural fairness, and impersonal trust: An empirical investigation.Organization science, 10(1):104–115, 1999

work page 1999
[12]

Questioning the survey responses of large language models

Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-Dünner. Questioning the survey responses of large language models. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[13]

Privacy personas: Clustering users via attitudes and behaviors toward security practices

Janna Lynn Dupree, Richard Devries, Daniel M Berry, and Edward Lank. Privacy personas: Clustering users via attitudes and behaviors toward security practices. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 5228–5239, 2016

work page 2016
[14]

Text-Based Personas for Simulating User Privacy Decisions

Kassem Fawaz, Ren Yi, Octavian Suciu, Rishabh Khandelwal, Hamza Harkous, Nina Taft, and Marco Gruteser. Text-based personas for simulating user privacy decisions.arXiv preprint arXiv:2603.19791, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Android permissions: User attention, comprehension, and behavior

Adrienne Porter Felt, Elizabeth Ha, Serge Egelman, Ariel Haney, Erika Chin, and David Wagner. Android permissions: User attention, comprehension, and behavior. InProceedings of the eighth symposium on usable privacy and security, pages 1–14, 2012

work page 2012
[16]

Personalizing agent privacy decisions via logical entailment.arXiv preprint arXiv:2512.05065, 2025

James Flemings, Ren Yi, Octavian Suciu, Kassem Fawaz, Murali Annavaram, and Marco Gruteser. Personalizing agent privacy decisions via logical entailment.arXiv preprint arXiv:2512.05065, 2025

work page arXiv 2025
[17]

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, and Paul Röttger. Simbench: Benchmarking the ability of large language models to simulate human behaviors. arXiv preprint arXiv:2510.17516, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Aligning language models to user opinions

EunJeong Hwang, Bodhisattwa Majumder, and Niket Tandon. Aligning language models to user opinions. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 5906–5919, 2023

work page 2023
[19]

Improv- ing language model personas via rationalization with psychological scaffolds

Brihi Joshi, Xiang Ren, Swabha Swayamdipta, Rik Koncel-Kedziorski, and Tim Paek. Improv- ing language model personas via rationalization with psychological scaffolds. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China. Association for Computational Linguistics, 2025

work page 2025
[20]

Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon.Computers & Security, 64:122–134, 2017

Spyros Kokolakis. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon.Computers & Security, 64:122–134, 2017

work page 2017
[21]

Privacy indexes: a survey of westin’s studies

Ponnurangam Kumaraguru and Lorrie Faith Cranor. Privacy indexes: a survey of westin’s studies. 2005

work page 2005
[22]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large lan- guage model serving with pagedattention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

work page 2023
[23]

How well can llm agents simulate end-user security and privacy attitudes and behaviors?arXiv preprint arXiv:2602.18464, 2026

Yuxuan Li, Leyang Li, Hao-Ping Lee, and Sauvik Das. How well can llm agents simulate end-user security and privacy attitudes and behaviors?arXiv preprint arXiv:2602.18464, 2026

work page arXiv 2026
[24]

Prevalence over- shadows concerns? understanding chinese users’ privacy awareness and expectations towards llm-based healthcare consultation

Zhihuang Liu, Ling Hu, Tongqing Zhou, Yonghao Tang, and Zhiping Cai. Prevalence over- shadows concerns? understanding chinese users’ privacy awareness and expectations towards llm-based healthcare consultation. In2025 IEEE Symposium on Security and Privacy (SP), pages 2716–2734. IEEE, 2025

work page 2025
[25]

Lisa Mekioussa Malki, Akhil Polamarasetty, Majid Hatamian, Mark Warner, and Enrico Costanza. Hoovered up as a data point: Exploring privacy behaviours, awareness, and con- cerns among uk users of llm-based conversational agents.Proceedings on Privacy Enhancing Technologies, 2025

work page 2025
[26]

Measuring privacy: An empirical test using context to expose confounding variables.Columbia Science & Technology Law Review, 18:176, 2016

Kirsten Martin and Helen Nissenbaum. Measuring privacy: An empirical test using context to expose confounding variables.Columbia Science & Technology Law Review, 18:176, 2016

work page 2016
[27]

Can llms keep a secret? testing privacy implications of language models via contextual integrity theory

Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, and Yejin Choi. Can llms keep a secret? testing privacy implications of language models via contextual integrity theory. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[28]

Privacy as contextual integrity.Washington Law Review, 79:119, 2004

Helen Nissenbaum. Privacy as contextual integrity.Washington Law Review, 79:119, 2004

work page 2004
[29]

The privacy paradox: Personal information disclosure intentions versus behaviors.Journal of consumer affairs, 41(1):100–126, 2007

Patricia A Norberg, Daniel R Horne, and David A Horne. The privacy paradox: Personal information disclosure intentions versus behaviors.Journal of consumer affairs, 41(1):100–126, 2007

work page 2007
[30]

Nvidia nemotron 3: Efficient and open intelligence, 2025

NVIDIA. Nvidia nemotron 3: Efficient and open intelligence, 2025. URL https://arxiv. org/abs/2512.20856. White Paper

work page arXiv 2025
[31]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023

work page 2023
[32]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Joon Sung Park, Carolyn Q Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S Bernstein. Generative agent simulations of 1,000 people.arXiv preprint arXiv:2411.10109, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https: //qwen.ai/blog?id=qwen3.5

work page 2026
[34]

A protection motivation theory of fear appeals and attitude change1.The journal of psychology, 91(1):93–114, 1975

Ronald W Rogers. A protection motivation theory of fear appeals and attitude change1.The journal of psychology, 91(1):93–114, 1975. 11

work page 1975
[35]

Whose opinions do language models reflect? InProceedings of the 40th In- ternational Conference on Machine Learning (ICML), 2023

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect? InProceedings of the 40th In- ternational Conference on Machine Learning (ICML), 2023

work page 2023
[36]

Privacylens: Evaluating privacy norm awareness of language models in action.Advances in Neural Information Processing Systems, 37:89373–89407, 2024

Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, and Diyi Yang. Privacylens: Evaluating privacy norm awareness of language models in action.Advances in Neural Information Processing Systems, 37:89373–89407, 2024

work page 2024
[37]

Understanding privacy norms around llm-based chatbots: A contextual integrity perspective

Sarah Tran, Hongfan Lu, Isaac Slaughter, Bernease Herman, Aayushi Dangol, Yue Fu, Lufei Chen, Biniyam Gebreyohannes, Bill Howe, Alexis Hiniker, et al. Understanding privacy norms around llm-based chatbots: A contextual integrity perspective. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 8, pages 2522–2534, 2025

work page 2025
[38]

Replication Data for: Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective, 2025

Sarah Tran, Robert Wolfe, and Nicholas Weber. Replication Data for: Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective, 2025. URL https: //doi.org/10.7910/DVN/M6ABJ3

work page doi:10.7910/dvn/m6abj3 2025
[39]

The need for a socially-grounded persona framework for user simulation.arXiv preprint arXiv:2601.07110, 2026

Pranav Narayanan Venkit, Yu Li, Yada Pruksachatkun, and Chien-Sheng Wu. The need for a socially-grounded persona framework for user simulation.arXiv preprint arXiv:2601.07110, 2026

work page arXiv 2026
[40]

i regretted the minute i pressed share

Yang Wang, Gregory Norcie, Saranga Komanduri, Alessandro Acquisti, Pedro Giovanni Leon, and Lorrie Faith Cranor. “i regretted the minute i pressed share” a qualitative study of regrets on facebook. InProceedings of the seventh symposium on usable privacy and security, pages 1–16, 2011

work page 2011
[41]

The feasibility of dynamically granted permissions: Aligning mobile privacy with user preferences

Primal Wijesekera, Arjun Baokar, Lynn Tsai, Joel Reardon, Serge Egelman, David Wagner, and Konstantin Beznosov. The feasibility of dynamically granted permissions: Aligning mobile privacy with user preferences. In2017 IEEE Symposium on Security and Privacy (SP), pages 1077–1093. IEEE, 2017

work page 2017
[42]

Towards automating data access permissions in ai agents,

Yuhao Wu, Ke Yang, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. Towards automating data access permissions in ai agents.arXiv preprint arXiv:2511.17959, 2025

work page arXiv 2025
[43]

it’s a fair game

Zhiping Zhang, Michelle Jia, Hao-Ping Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, and Tianshi Li. “it’s a fair game”, or is it? examining how users navigate disclosure risks and benefits when using llm-based conversational agents. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–26, 2024

work page 2024
[44]

ai is from the devil

Noé Zufferey, Sarah Abdelwahab Gaballah, Karola Marky, and Verena Zimmermann. “ai is from the devil.” behaviors and concerns toward personal data sharing with llm-based conversational agents.Proceedings on Privacy Enhancing Technologies, 2025(3):5–28, 2025. 12 Appendix Contents A Persona Prompts 14 A.1 LLM Healthcare Consultation . . . . . . . . . . . . ....

work page 2025
[49]

Q1": "<answer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<answer>", "Q2": "<answer>", "Q3": "<answer>", "Q4": "<answer>", "Q5": "<answer>", "Q6": "<answer>", "Q7": "<answer>", "Q8": "<answer>", "Q9": "<answer>", "Q10": "<answer>" } QUESTIONNAIRE 14 Imagine...

work page
[54]

Q1": "<answer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<answer>", "Q2": "<answer>", "Q3": "<answer>", "Q4": "<answer>", "Q5": "<answer>", "Q6": "<answer>", "Q7": "<answer>", "Q8": "<answer>", "Q9": "<answer>", "Q10": "<answer>" } QUESTIONNAIRE 15 In year...

work page 2049
[59]

Q1": "<integer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<integer>", "Q2": "<integer>", "Q3": "<integer>", "Q4": "<integer>", "Q5": "<integer>", "Q6": "<integer>", "Q7": "<integer>", "Q8": "<integer>", "Q9": "<integer>", "Q10": "<integer>" } QUESTIONNAIRE ...

work page
[60]

Answer all data-sharing scenarios in the QUESTIONNAIRE below from the perspective of the User ,→Persona. 18

work page
[64]

Q1": "<answer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<answer>", "Q2": "<answer>", "Q3": "<answer>", "Q4": "<answer>", "Q5": "<answer>", "Q6": "<answer>", "Q7": "<answer>" } QUESTIONNAIRE Please rate your level of agreement with the following statements...

work page
[69]

Q1": "<answer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<answer>", "Q2": "<answer>", "Q3": "<answer>", "Q4": "<answer>", "Q5": "<answer>", "Q6": "<answer>", "Q7": "<answer>", "Q8": "<answer>", "Q9": "<answer>", "Q10": "<answer>" } QUESTIONNAIRE In the fol...

work page
[70]

Identify the perceived BENEFITS of sharing -- service utility, personalization quality, convenience, ,→time saved, social or relational value

work page
[71]

Identify the perceived RISKS or COSTS of sharing -- loss of control over the data, downstream reuse ,→or profiling, embarrassment, identifiability, future harm, regulatory or employer exposure

work page
[72]

Weigh the two through the lens of this person’s stated attitudes and prior experiences; their answer ,→should reflect whichever side they judge to dominate

work page
[73]

Pick the answer this person would arrive at after this tradeoff, not the answer a privacy-maximizing or ,→utility-maximizing agent would give

When benefits clearly outweigh perceived risks for them, lean toward acceptance; when risks ,→dominate, lean toward refusal; when the two are comparable, lean toward a conditional or ,→intermediate option. Pick the answer this person would arrive at after this tradeoff, not the answer a privacy-maximizing or ,→utility-maximizing agent would give. TASK & C...

work page
[78]

Q1": "<answer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<answer>", "Q2": "<answer>", "Q3": "<answer>", "Q4": "<answer>", "Q5": "<answer>", "Q6": "<answer>", "Q7": "<answer>", "Q8": "<answer>", "Q9": "<answer>", "Q10": "<answer>" } QUESTIONNAIRE In the fol...

work page
[83]

Q1": "<answer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. 23 TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<answer>", "Q2": "<answer>", "Q3": "<answer>", "Q4": "<answer>", "Q5": "<answer>", "Q6": "<answer>", "Q7": "<answer>", "Q8": "<answer>", "Q9": "<answer>", "Q10": "<answer>" } QUESTIONNAIRE In the ...

work page
[84]

Answer all data-sharing scenarios in the QUESTIONNAIRE below from the perspective of the User ,→Persona

work page
[85]

Return ONLY valid, parseable JSON

work page
[86]

DO NOT wrap your response in markdown code blocks (e.g., do not use ‘‘‘json ... ‘‘‘)

work page
[87]

DO NOT add any commentary, greetings, or explanations outside the JSON object

work page
[88]

Q1": "<answer>

Keep field names, nesting, and data types strictly aligned with the TARGET OUTPUT FORMAT. TARGET OUTPUT FORMAT (STRUCTURE ONLY - DO NOT COPY VALUES) { "Q1": "<answer>", "Q2": "<answer>", "Q3": "<answer>", "Q4": "<answer>", "Q5": "<answer>", "Q6": "<answer>", "Q7": "<answer>", "Q8": "<answer>", "Q9": "<answer>", "Q10": "<answer>" } QUESTIONNAIRE 25 In the ...

work page 2049