arxiv: 2605.06232 · v1 · submitted 2026-05-07 · 💻 cs.CR

Recognition: unknown

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

Jiahao Chen , Qi Zhang , Ruixiao Lin , Chunyi Zhou , Tianyu Du , Qingming Li , Tong Zhang , Junhao Li

show 2 more authors

Yuwen Pu Shouling Ji

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:08 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM agentsprivacy riskspersonal profilingPII exposurePrivacyIcebergIcebergExplorerdata aggregationautomated profiling

0 comments

The pith

LLM agents can reconstruct detailed personal profiles from minimal PII seeds with over 90 percent accuracy at under three dollars in cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLM agents create an accessible new avenue for privacy intrusion by turning small amounts of personal data into high-fidelity individual profiles. It shows this capability through a practical tool that operates in real-world conditions without needing platform cooperation or large datasets. The authors highlight a mismatch between public privacy concerns and current platform practices, then organize the risks into tiers to make the threat measurable and addressable. If correct, the work implies that automated profiling becomes cheap and widespread, changing how individuals and organizations must think about data exposure online.

Core claim

IcebergExplorer uses minimal personally identifiable information as an initial search seed and leverages LLM web access plus reasoning to reconstruct profiles that reach over 90 percent factual accuracy in under 10 minutes for less than three dollars, while the PrivacyIceberg framework divides real-world privacy exposure into explicitly searched, contextually inferred, and deeply aggregated tiers based on the depth of LLM exploitation.

What carries the argument

IcebergExplorer, a tool that starts from minimal PII seeds and applies LLM-driven web searches and reasoning to aggregate and verify profile details across the three tiers of the PrivacyIceberg model.

If this is right

Platforms fail to address privacy concerns either technically or through policy, creating a gap with public awareness.
Six root causes drive the observed privacy disclosures in LLM-integrated systems.
Multi-stakeholder countermeasures are required involving LLM vendors, individuals, and data publishers.
Privacy risks scale with the sophistication of LLM exploitation from basic searches to deep aggregation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread availability of such tools could normalize low-cost profiling and increase the chilling effect on online behavior.
Individuals may need to reduce public data footprints more aggressively as LLM agents improve in aggregation speed.
Regulators could treat LLM agent profiling capabilities as a distinct category when updating data protection rules.

Load-bearing premise

Minimal PII seeds combined with current LLM web-access and reasoning capabilities suffice to produce high-fidelity, generalizable profiles across diverse real-world individuals without substantial additional data or platform cooperation.

What would settle it

A controlled test on a diverse set of 100 real-world individuals where the method achieves factual accuracy below 70 percent or requires costs above 10 dollars on average.

Figures

Figures reproduced from arXiv: 2605.06232 by Chunyi Zhou, Jiahao Chen, Junhao Li, Qingming Li, Qi Zhang, Ruixiao Lin, Shouling Ji, Tianyu Du, Tong Zhang, Yuwen Pu.

**Figure 1.** Figure 1: Overview of Public-Platform Gap, PrivacyIceberg, and IcebergExplorer. forms explicitly disclose data reuse by their embedded LLMs. Risk Quantification. In response to the public’s concern of being peeped by strangers, we introduce and formalize PrivacyIceberg (§ 4.2). The framework assumes a stranger as the attacker (§ 4.1) to hierarchically decompose LLM-generated privacy into three tiers in view at source ↗

**Figure 2.** Figure 2: (a) Distribution of PPIS score for three-level privacy. view at source ↗

**Figure 3.** Figure 3: Distribution of valid facts, categorized by fact type. view at source ↗

**Figure 5.** Figure 5: Depth of valid facts regarding different categories. view at source ↗

**Figure 6.** Figure 6: The streamlined workflow of IcebergExplorer, illustrating how the six root causes are exploited. to discover highly sensitive personal insights synthesized from sources they were unaware of. By utilizing the evidence provided by IcebergExplorer’s cross-source auditing reports, we assisted these volunteers in filing specific, data-backed “Right to Erasure” requests to the primary information publishers an… view at source ↗

**Figure 7.** Figure 7: Sankey diagram illustrating the cross-platform hy view at source ↗

**Figure 8.** Figure 8: Word cloud regarding perception and attribution. view at source ↗

**Figure 9.** Figure 9: Word cloud regarding doxing experiences. view at source ↗

**Figure 10.** Figure 10: Word frequency of doxing harms and results. view at source ↗

**Figure 11.** Figure 11: Word frequency regarding public attribution and view at source ↗

**Figure 14.** Figure 14: PPIS of various privacy categories view at source ↗

**Figure 13.** Figure 13: Three-level distributions for six scenes, arranged view at source ↗

**Figure 15.** Figure 15: Username Overlap Across Social Media Platforms. view at source ↗

read the original abstract

Large Language Models (LLMs) have revolutionized how information are collected, aggregated, and reasoned. However, this enables a novel and accessible vector of privacy intrusion: the automated and in-depth personal profiling; this engenders a chilling effect of "peepers everywhere". Existing research primarily unfolds from the training pipeline of LLM, emphasizing the exposure of Personally Identifiable Information (PII) through memorization, while privacy studies from a human-centric perspective remain underexplored. To fill this void, we empirically investigate privacy perception in the real world through the lens of human awareness and the practices of LLM-integrated platforms, revealing a significant dissonance: platforms fail to technically or policy-wise address public privacy concerns. To facilitate a systematic and quantifiable study of privacy risk, we propose the PrivacyIceberg, which categorizes real-world human privacy risks into three tiers: explicitly searched, contextually inferred, and deeply aggregated, based on the sophistication of LLM exploitation. We developed IcebergExplorer to audit privacy exposure, utilizing minimal PII as a search seed to reconstruct high-fidelity profiles, achieving over 90% factual accuracy within 10 minutes at a cost under $3, for real-world scenarios. Additionally, we identify six root causes contributing to such privacy disclosures and propose multi-stakeholder countermeasures for LLM vendors, individuals, and data publishers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM agents can turn minimal PII into detailed profiles at low cost, but the 90% accuracy claim rests on unverified evaluation methods.

read the letter

The central claim here is that LLM agents enable cheap, high-fidelity personal profiling from tiny seeds of information, which the authors illustrate through a new three-tier framework and an auditing tool. This moves the discussion past training-data memorization to what agents can actively infer and aggregate at runtime using web access and reasoning. The paper also checks real platform practices and finds they lag behind public privacy worries, then lists six root causes and some multi-stakeholder fixes. Those concrete details on time and cost make the risk feel practical instead of abstract, and the categorization into explicitly searched, contextually inferred, and deeply aggregated tiers gives a useful way to talk about different levels of exposure. The work is honest about the gap between what platforms say and what their agents can do. The soft spot is the accuracy number. The abstract states over 90% factual accuracy in real-world scenarios within 10 minutes for under $3, yet supplies no description of the test cases, how ground truth was collected independently, or whether accuracy was checked against external records rather than internal consistency or LLM self-scoring. Without those details the headline result stays hard to trust, and the stress-test note on missing external ground truth lands directly on the empirical core. This is aimed at people working on AI privacy, agent safety, and platform policy who need a concrete way to discuss inference risks. Readers who want to audit or regulate LLM agents could get value from the tool idea and the tiered model. It deserves peer review because the topic is timely and the framing is clear, even though the performance claims will need tighter methods and data to stand up.

Referee Report

2 major / 2 minor

Summary. The paper empirically investigates privacy risks from LLM agents, highlighting a gap between public concerns and platform practices. It introduces the PrivacyIceberg framework categorizing risks into explicitly searched, contextually inferred, and deeply aggregated tiers, and presents IcebergExplorer, a tool that reconstructs high-fidelity personal profiles from minimal PII seeds, claiming over 90% factual accuracy within 10 minutes at under $3 cost in real-world scenarios. It identifies six root causes of disclosures and proposes multi-stakeholder countermeasures.

Significance. If the empirical claims hold under rigorous validation, the work provides a concrete, low-cost demonstration of accessible profiling risks enabled by current LLM web-access and reasoning capabilities. This could usefully inform discussions on LLM platform responsibilities, user awareness, and data publisher practices, particularly by quantifying practical attack surfaces that prior memorization-focused studies have not emphasized.

major comments (2)

[IcebergExplorer evaluation] The headline claim of >90% factual accuracy for IcebergExplorer-reconstructed profiles (abstract and IcebergExplorer section) lacks any description of evaluation methodology, including test subject count and diversity, sources of independent ground-truth facts, controls for selection bias, or whether accuracy was measured via external verification rather than LLM self-scoring or limited author inspection. This is load-bearing for the central empirical result.
[Empirical investigation of privacy perception] The reported dissonance between human privacy awareness and LLM platform practices (introduction and empirical investigation sections) is presented without details on survey or data-collection methods, sample sizes, or analysis approach, making it impossible to assess the strength of this supporting observation.

minor comments (2)

[Abstract] The abstract states that six root causes are identified but does not enumerate them; including a brief list or table would improve readability and allow readers to connect them to the proposed countermeasures.
[PrivacyIceberg framework] The three tiers of the PrivacyIceberg are introduced conceptually but would benefit from a clarifying diagram or concrete examples to distinguish 'contextually inferred' from 'deeply aggregated' risks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help improve the clarity and rigor of our empirical findings. We address each major comment below and have made revisions to incorporate the suggested details.

read point-by-point responses

Referee: [IcebergExplorer evaluation] The headline claim of >90% factual accuracy for IcebergExplorer-reconstructed profiles (abstract and IcebergExplorer section) lacks any description of evaluation methodology, including test subject count and diversity, sources of independent ground-truth facts, controls for selection bias, or whether accuracy was measured via external verification rather than LLM self-scoring or limited author inspection. This is load-bearing for the central empirical result.

Authors: We acknowledge that the original manuscript did not provide sufficient details on the evaluation methodology for the accuracy claim. This was an oversight in the presentation. In the revised version, we have expanded the IcebergExplorer section with a dedicated 'Evaluation Setup' subsection. It now includes the number of test subjects and their diversity, the sources used for independent ground-truth facts (such as public records and verified self-reports), measures taken to control for selection bias, and confirmation that accuracy was assessed through external verification by independent reviewers rather than LLM self-assessment. We believe this addresses the concern and strengthens the central result. revision: yes
Referee: [Empirical investigation of privacy perception] The reported dissonance between human privacy awareness and LLM platform practices (introduction and empirical investigation sections) is presented without details on survey or data-collection methods, sample sizes, or analysis approach, making it impossible to assess the strength of this supporting observation.

Authors: We agree that the methods for the empirical investigation of privacy perception were not described in adequate detail. The revised manuscript now includes an 'Empirical Investigation Methodology' subsection in the relevant section. This details the survey design, sample size and recruitment approach, data collection procedures, and the analysis methods used to identify the dissonance between public concerns and platform practices. These additions allow for proper evaluation of the supporting observation. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical demonstration without derivation chain

full rationale

The paper is an empirical study proposing the PrivacyIceberg categorization and IcebergExplorer tool for auditing LLM privacy risks. It reports an observed >90% factual accuracy from minimal PII seeds in real-world scenarios but contains no equations, fitted parameters, predictions, or self-citations that reduce the central claims to inputs by construction. The accuracy metric is presented as a direct experimental outcome rather than a self-referential or fitted result. No load-bearing steps rely on renaming known results, smuggling ansatzes, or uniqueness theorems from prior self-work. The work is therefore self-contained as a tool proposal and demonstration.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on the empirical validity of the three-tier risk model and the tool's reported performance, both introduced by the paper; no free parameters are fitted in the abstract, but the work assumes LLM agents can effectively exploit public web data for inference.

axioms (2)

domain assumption LLM-integrated platforms fail to technically or policy-wise address public privacy concerns
Stated as a revealed dissonance from investigation of real-world platforms.
domain assumption Minimal PII seeds enable reconstruction of high-fidelity profiles via LLM reasoning over web data
Core premise underlying the IcebergExplorer development and accuracy claims.

invented entities (2)

PrivacyIceberg no independent evidence
purpose: Categorize real-world human privacy risks into three tiers based on LLM exploitation sophistication
New framework proposed to systematize privacy risks beyond prior PII memorization focus.
IcebergExplorer no independent evidence
purpose: Audit privacy exposure by reconstructing profiles from minimal PII seeds
Tool developed to demonstrate and quantify the identified risks.

pith-pipeline@v0.9.0 · 5563 in / 1504 out tokens · 38973 ms · 2026-05-08T09:08:13.166291+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 11 canonical work pages · 2 internal anchors

[1]

https://gdpr-info.eu/art-4-gdpr/, 2016

General Data Protection Regulation (GDPR), Article 4. https://gdpr-info.eu/art-4-gdpr/, 2016

2016
[2]

https:// oag.ca.gov/privacy/ccpa, 2018

California Consumer Privacy Act (CCPA). https:// oag.ca.gov/privacy/ccpa, 2018

2018
[3]

Netizens claim their wechat ids were searched by a stranger using doubao, lawyer: possibly infringing on rights

AIGCDaily. Netizens claim their wechat ids were searched by a stranger using doubao, lawyer: possibly infringing on rights. https://www.aigcdaily.cn/n ews/a24qmwowb6jx7d6/, 2024

2024
[4]

The moment i found my student number, my heart stopped

AITNTNews. The moment i found my student number, my heart stopped. https://www.aitntnews.com/ newDetail.html?newId=9702/, 2024

2024
[5]

Social analyzer: An osint tool for analyzing and correlating profiles across social media platforms

Mohammad Alaa and Contributors. Social analyzer: An osint tool for analyzing and correlating profiles across social media platforms. https://github.com/qee qbox/social-analyzer , 2020. Accessed: Oct. 20, 2025

2020
[6]

The menlo report.IEEE Security & Privacy, 10(2):71–75, 2012

Michael Bailey, David Dittrich, Erin Kenneally, and Doug Maughan. The menlo report.IEEE Security & Privacy, 10(2):71–75, 2012

2012
[7]

Extracting training data from large language models

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. InUSENIX Security Sym- posium, 2021

2021
[8]

xbench: Tracking agents productivity scaling with profession-aligned real-world evaluations.arXiv preprint arXiv:2506.13651,

Kaiyuan Chen, Yixin Ren, Yang Liu, Xiaobo Hu, Hao- tong Tian, Tianbao Xie, Fangfu Liu, Haoye Zhang, Hongzhang Liu, Yuan Gong, et al. xbench: Tracking 14 agents productivity scaling with profession-aligned real- world evaluations.arXiv preprint arXiv:2506.13651, 2025

work page arXiv 2025
[9]

Graph unlearning

Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, and Yang Zhang. Graph unlearning. InProceedings of the 2022 ACM SIGSAC conference on computer and communications security, pages 499–513, 2022

2022
[10]

The janus interface: How fine- tuning in large language models amplifies the privacy risks

Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, Zhikun Zhang, XiaoFeng Wang, and Haixu Tang. The janus interface: How fine- tuning in large language models amplifies the privacy risks. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1285–1299, 2024

2024
[11]

Scrub it out! erasing sensitive memoriza- tion in code language models via machine unlearning

Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, and David Lo. Scrub it out! erasing sensitive memoriza- tion in code language models via machine unlearning. arXiv preprint arXiv:2509.13755, 2025

work page arXiv 2025
[12]

Automated Profile Inference with Language Model Agents

Yuntao Du, Zitao Li, Bolin Ding, Yaliang Li, Hanshen Xiao, Jingren Zhou, and Ninghui Li. Automated profile inference with language model agents.arXiv preprint arXiv:2505.12402, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Automated profile inference with language model agents.arXiv preprint,

Yuntao Du, Zitao Li, Bolin Ding, Yaliang Li, Hanshen Xiao, Jingren Zhou, and Ninghui Li. Automated profile inference with language model agents.arXiv preprint,
[14]

Beyond data privacy: New privacy risks for large language models,

Yuntao Du, Zitao Li, Ninghui Li, and Bolin Ding. Be- yond data privacy: New privacy risks for large language models.arXiv preprint arXiv:2509.14278, 2025

work page arXiv 2025
[15]

Top 35 social media platforms

Fabio Duarte. Top 35 social media platforms. https: //explodingtopics.com/blog/top-social-med ia-platforms#top-35-most-popular-social-m edia-websites, 2026

2026
[16]

On the privacy risks of cell- based nas architectures

Hai Huang, Zhikun Zhang, Yun Shen, Michael Backes, Qi Li, and Yang Zhang. On the privacy risks of cell- based nas architectures. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communi- cations Security, pages 1427–1441, 2022

2022
[17]

Trustllm: Trustworthiness in large language models

Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qi- hui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wen- han Lyu, Yixuan Zhang, et al. Trustllm: Trustwor- thiness in large language models.arXiv preprint arXiv:2401.05561, 2024

work page arXiv 2024
[18]

When {LLMs} go online: The emerging threat of {Web-Enabled}{LLMs}

Hanna Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin, and Kimin Lee. When {LLMs} go online: The emerging threat of {Web-Enabled}{LLMs}. In34th USENIX Security Symposium (USENIX Security 25), pages 1729–1748, 2025

2025
[19]

Ethi- cal frameworks and computer security trolley problems: Foundations for conversations

Tadayoshi Kohno, Yasemin Acar, and Wulf Loh. Ethi- cal frameworks and computer security trolley problems: Foundations for conversations. In32nd USENIX Se- curity Symposium (USENIX Security 23), pages 5145– 5162, 2023

2023
[20]

Beyond gdpr: Unauthorized reidentifi- cation and the mosaic effect in the eu ai act

Gary LaFever. Beyond gdpr: Unauthorized reidentifi- cation and the mosaic effect in the eu ai act. https: //iapp.org/news/a/beyond-gdpr-unauthorize d-reidentification-and-the-mosaic-effec t-in-the-eu-ai-act/, 2023

2023
[21]

Formalizing and benchmarking prompt injection attacks and defenses

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24), pages 1831– 1847, 2024

2024
[22]

Evaluating LLM-based Personal Information Extraction and Countermeasures

Yupei Liu, Yuqi Jia, Jinyuan Jia, and Neil Zhenqiang Gong. Evaluating llm-based personal information ex- traction and countermeasures. InUSENIX Security Sym- posium (to appear), 2025. arXiv:2408.07291

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

This prompt can make an ai chatbot identify and extract personal details from your chats

WIRED Matt Burgess. This prompt can make an ai chatbot identify and extract personal details from your chats. https://www.wired.com/story/ai-impro mpter-malware-llm/, 2024

2024
[24]

Meta watch out! modified meta ai glasses used to ‘reveal anyone’s personal info’ in seconds just by looking at them

Senior Technology & Science Reporter Millie Turner. Meta watch out! modified meta ai glasses used to ‘reveal anyone’s personal info’ in seconds just by looking at them. https://www.thesun.ie/tech/13939493/ meta-ai-ray-ban-glasses-reveal-personal-i nformation/, 2024

work page arXiv 2024
[25]

Netizens said that they were used ai to search for a wechat account, and the person in charge responded

Jiupai News. Netizens said that they were used ai to search for a wechat account, and the person in charge responded. https://news.qq.com/rain/a/20241 211A052NQ00/, 2024

2024
[26]

Privacy as contextual integrity

Helen Nissenbaum. Privacy as contextual integrity. Wash. L. Rev., 79:119, 2004

2004
[27]

Exploring llm-based agents for root cause analysis

Devjeet Roy, Xuchao Zhang, Rashi Bhave, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, and Saravan Rajmohan. Exploring llm-based agents for root cause analysis. InCompanion proceedings of the 32nd ACM international conference on the foundations of software engineering, pages 208–219, 2024

2024
[28]

Comparing traditional and llm-based search for consumer choice: A random- ized experiment.arXiv preprint arXiv:2307.03744, 2023

Sofia Eleni Spatharioti, David M Rothschild, Daniel G Goldstein, and Jake M Hofman. Comparing traditional and llm-based search for consumer choice: A random- ized experiment.arXiv preprint arXiv:2307.03744, 2023. 15

work page arXiv 2023
[29]

Beyond memorization: Violating privacy via inference with large language models.arXiv preprint arXiv:2310.07298, 2023

Robin Staab et al. Beyond memorization: Violating privacy via inference with large language models.arXiv preprint arXiv:2310.07298, 2023

work page arXiv 2023
[30]

Robin Staab, Mark Vero, Mislav Balunovic, and Mar- tin T. Vechev. Beyond memorization: Violating privacy via inference with large language models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenRe- view.net, 2024

2024
[31]

Large language models for data annotation and synthesis: A survey,

Zhen Tan, Dawei Li, Song Wang, Alimohammad Beigi, Bohan Jiang, Amrita Bhattacharjee, Mansooreh Karami, Jundong Li, Lu Cheng, and Huan Liu. Large language models for data annotation and synthesis: A survey. arXiv preprint arXiv:2402.13446, 2024

work page arXiv 2024
[32]

Batuhan Tömekçe, Mark Vero, Robin Staab, and Mar- tin T. Vechev. Private attribute inference from images with vision-language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Proc...

2024
[33]

Rigging the foundation: Manipulating pre-training for advanced membership inference attacks

Zihao Wang, Rui Zhu, Zhikun Zhang, Haixu Tang, and XiaoFeng Wang. Rigging the foundation: Manipulating pre-training for advanced membership inference attacks. In2025 IEEE Symposium on Security and Privacy (SP), pages 2509–2526. IEEE, 2025

2025
[34]

Mosaic effect

Wikipedia. Mosaic effect. https://en.wikipedia .org/wiki/Mosaic_effect, 2025

2025
[35]

Ai chatbots can guess your personal information from what you type

WIRED Will Knight. Ai chatbots can guess your personal information from what you type. https: //www.wired.com/story/ai-chatbots-can-g uess-your-personal-information/, 2023

2023
[36]

Leveraging large language models to enhance personalized recommenda- tions in e-commerce

Wei Xu, Jue Xiao, and Jianlong Chen. Leveraging large language models to enhance personalized recommenda- tions in e-commerce. In2024 International Conference on Electrical, Communication and Computer Engineer- ing (ICECCE), pages 1–6. IEEE, 2024

2024
[37]

pep talk

Hanna Yukhymenko, Robin Staab, Mark Vero, and Mar- tin T. Vechev. A synthetic dataset for personal attribute inference. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tom- czak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Confer- ence on Neural Information Processing Syst...

work page arXiv 2024
[38]

Perception of Privacy Violation:Public’semotional responsesto this data exposure and thereasonsbehind
[39]

Doxing and Mining Risk Awareness:User awareness andfirsthand experienceswith the risks of “doxing” themselves or others, including theirunderstandingof how modernLLM toolscan be used for privacy mining
[40]

Resulting Harm from Data Exposure:What spectrum of actualharmsare cited by users
[41]

helpless

Expectations for Protection and Responsibility:Pub- lic’sexpectationsfor privacy safeguards and theirattri- butionof responsibility. We use LLMs (Appendix E) to analyze each comment (text and meme) to obtain the keywords for the four topics and filter out the topic-irrelevant words and visualize them. To handle “memes” of the platforms, we utilized a the ...
[42]

privacy modes

DRP: Data Retention Policy. 5) Policy: Existence of a public AI privacy policy. 6) User Control: Opt = opt-out, Del = delete/view history, Sw = dedicated switch. 7) Output: PII = synthesized personal information in outputs; Src = outputs cite sources. 8) Comp.: Whether public user complaints exist regarding AI privacy issues. 9) Reg. Scr.: Whether the pla...

2025