arxiv: 2604.27438 · v2 · submitted 2026-04-30 · 💻 cs.CR · cs.CY

Recognition: no theorem link

Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots

Muhammad Jazlan , Ethan Wang , Yash Vekaria , Zubair Shafiq

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:12 UTC · model grok-4.3

classification 💻 cs.CR cs.CY

keywords AI chatbotsweb trackingprivacy leakagethird-party sharingsession replaycontent exposureidentity exposure

0 comments

The pith

Seventeen of twenty popular AI chatbots share conversation data with third parties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures network traffic on 20 AI chatbots during controlled chats that include sensitive prompts. It finds that 17 chatbots send user data to outside companies, and three of them transmit actual prompt and response text in plain form to an analytics service. The measurements also track how chat URLs, identifiers, and some user details reach advertising and analytics endpoints. This shows that even private chat options often fail to block third-party access to what users type.

Core claim

Under controlled settings using a sensitive prompt, network traffic capture shows that 17 of 20 chatbots share information with at least one third party. Three chatbots transmit plaintext conversation text, including prompt and response snippets, to Microsoft Clarity through session replay. Fifteen chatbots forward conversation URLs or chat identifiers to third-party advertising, analytics, or social endpoints, and several expose user identity details such as hashed emails or account identifiers.

What carries the argument

Network traffic capture during normal and private chat sessions that identifies exposure of content (prompts, URLs, identifiers) and identity (cookies, emails, IP fields) to third-party endpoints.

If this is right

Conversation text can reach analytics providers without explicit user consent.
Chat URLs and identifiers allow third parties to associate separate conversations with the same user.
Private chat modes do not reliably prevent third-party receipt of content or identifiers.
Identity details such as hashed emails can leak through support widgets or analytics tags.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Users who discuss medical or financial topics may expose details to advertisers even when they choose private mode.
Providers could limit exposure by auditing or removing session-replay and analytics scripts from chatbot pages.
Regulators might examine whether current privacy notices cover third-party sharing of live chat content.

Load-bearing premise

That controlled lab tests with chosen sensitive prompts and network capture represent all tracking that occurs for typical users in real production settings.

What would settle it

Run the same sensitive prompts on one of the 20 chatbots from an ordinary browser, inspect the live network requests, and check whether any third-party payloads contain matching plaintext text or chat identifiers.

Figures

Figures reproduced from arXiv: 2604.27438 by Ethan Wang, Muhammad Jazlan, Yash Vekaria, Zubair Shafiq.

**Figure 1.** Figure 1: Sankey of data flows from chatbots to adver view at source ↗

**Figure 2.** Figure 2: Sankey diagram of data flow from chatbots view at source ↗

**Figure 3.** Figure 3: Sankey diagram of data flow from chatbots to view at source ↗

read the original abstract

AI chatbots are becoming a primary interface for seeking information. As their popularity grows, chatbot providers are starting to deploy advertising and analytics. Despite this, tracking on AI chatbots has not been systematically studied. We present a systematic measurement of web tracking on 20 popular AI chatbots. Under controlled settings using a sensitive prompt, we capture and compare network traffic in normal chats and, where supported, private chats. We search for exposure of two categories of information: content, including prompts, prompt-derived titles, chat URLs, and chat identifiers; and identity, including names, emails, account identifiers, first-party cookies, and explicit IP/User-Agent fields in payloads. We find that 17 of 20 chatbots share information with at least one third party. Three chatbots share plaintext conversation text, including both prompt and response snippets, with Microsoft Clarity through session replay. Fifteen chatbots share conversation URLs or chat identifiers with third-party advertising, analytics, or social endpoints. Several chatbots expose user identity through support widgets, analytics, advertising, and session replay tags; in some cases, hashed emails are shared.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript reports the results of a measurement study that examines web tracking and data exposure on 20 popular AI chatbots. Using controlled experiments with sensitive prompts, the authors capture network traffic to identify sharing of content (prompts, responses, chat URLs, identifiers) and identity information (names, emails, cookies, IPs) with third parties. Key findings include that 17 of the 20 chatbots share information with at least one third party, with three transmitting plaintext conversation text to Microsoft Clarity, and fifteen sharing conversation URLs or identifiers with advertising, analytics, or social endpoints.

Significance. This study is significant as it provides the first systematic empirical evidence of tracking practices in AI chatbots, a rapidly growing user interface. The direct observation of network traffic under controlled conditions yields concrete, falsifiable results about data exposure, including specific examples of plaintext sharing. These findings highlight privacy vulnerabilities that could inform better design practices, user awareness, and regulatory efforts in the field of AI privacy and security.

minor comments (2)

The methodology would benefit from an explicit list or table enumerating the 20 chatbots and the criteria used for their selection to improve reproducibility.
Results on private-chat mode (mentioned in the abstract) should be presented with the same level of per-chatbot granularity as the normal-chat results to clarify any differences in tracking exposure.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our measurement study, recognition of its significance as the first systematic empirical evidence on tracking in AI chatbots, and recommendation to accept the manuscript. No major comments were raised, so we have no points requiring response or revision.

Circularity Check

0 steps flagged

No significant circularity: pure empirical traffic measurement

full rationale

The paper conducts a controlled measurement of network traffic from 20 AI chatbots using sensitive prompts, directly observing payload contents sent to third-party endpoints. No equations, derivations, fitted parameters, or load-bearing self-citations appear in the reported claims. Central results (17/20 chatbots share data; three transmit plaintext snippets to Microsoft Clarity) rest solely on captured traffic observations under the stated lab conditions, with no reduction to any constructed model or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study relies on standard assumptions of web measurement without introducing free parameters, new axioms beyond domain norms, or invented entities.

axioms (1)

domain assumption Captured network requests during controlled sessions represent the complete set of tracking activity performed by the chatbots.
The measurement depends on traffic capture being exhaustive under the test conditions described.

pith-pipeline@v0.9.0 · 5499 in / 1165 out tokens · 32354 ms · 2026-05-14T22:12:41.709050+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 2 internal anchors

[1]

Google Maps — google.com

n.d.. Google Maps — google.com. https://www.google.com/maps. [Accessed 27-04-2026]

work page 2026
[2]

Intercom | AI Customer Service Platform — intercom.com

n.d.. Intercom | AI Customer Service Platform — intercom.com. https: //www.intercom.com/drlp/ai-customer-service. [Accessed 27-04- 2026]

work page 2026
[3]

Microsoft Clarity - Free Heatmaps & Session Recordings — clar- ity.microsoft.com

n.d.. Microsoft Clarity - Free Heatmaps & Session Recordings — clar- ity.microsoft.com. https://clarity.microsoft.com. [Accessed 27-04- 2026]

work page 2026
[4]

No, hashing still doesn’t make your data anonymous — ftc.gov

n.d.. No, hashing still doesn’t make your data anonymous — ftc.gov. https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/ 2024/07/no-hashing-still-doesnt-make-your-data-anonymous. [Ac- cessed 26-04-2026]

work page 2024
[5]

Temporary Chat FAQ | OpenAI Help Center — help.openai.com

n.d.. Temporary Chat FAQ | OpenAI Help Center — help.openai.com. https://help.openai.com/en/articles/8914046-temporary-chat-faq. [Ac- cessed 28-04-2026]

work page arXiv 2026
[6]

AIBY. n.d.. ChatOn. https://chaton.ai. Accessed: 2026-04-27

work page 2026
[8]

Alibaba Cloud. n.d.. Qwen Chat. https://chat.qwen.ai. Accessed: 2026-04-27

work page 2026
[10]

Anthropic. n.d.. Claude. https://claude.ai. Accessed: 2026-04-27

work page 2026
[11]

Guendalina Caldarini, Sardar Jaf, and Kenneth McGarry. 2022. A literature survey of recent advances in chatbots.Information13, 1 (2022), 41

work page 2022
[12]

Juan-Carlos Carrillo, Jose Luis Martin-Navarro, Rongjun Ma, and Jose Such. 2026. Personal Data Flows and Privacy Policy Traceability in Third-party LLM Apps in the GPT Ecosystem.Proceedings on Privacy Enhancing Technologies(2026)

work page 2026
[13]

Character Technologies. n.d.. Character.AI. https://character.ai. Ac- cessed: 2026-04-27

work page 2026
[14]

Character.AI. 2026. Privacy Policy. https://web.archive.org/web/ 20260427024931/https://policies.character.ai/privacy. Archived snap- shot 2026-04-27 of https://policies.character.ai/privacy

work page 2026
[15]

ChatOn. 2026. Privacy Policy. https://web.archive.org/web/ 20260409165952/https://chaton.ai/legal/privacy/en/. Archived snap- shot 2026-04-09 of https://chaton.ai/legal/privacy/en/

work page 2026
[16]

The Conversation. 2026. You probably wouldn’t notice if an AI chatbot slipped ads into its responses. https://theconversation.com/you- probably-wouldnt-notice-if-an-ai-chatbot-slipped-ads-into-its- responses-276010 Accessed: April 2026

work page 2026
[18]

DeepSeek. n.d.. DeepSeek. https://chat.deepseek.com. Accessed: 2026-04-27

work page 2026
[20]

DuckDuckGo. n.d.. Duck.ai. https://duck.ai. Accessed: 2026-04-27

work page 2026
[21]

Jinyan Fan, Tianjun Sun, Jiayi Liu, Teng Zhao, Bo Zhang, Zheng Chen, Melissa Glorioso, and Elissa Hack. 2023. How well can an AI chatbot infer personality? Examining psychometric properties of machine- inferred personality scores.Journal of Applied Psychology108, 8 (2023), 1277

work page 2023
[22]

Genspark. 2026. Privacy Policy. https://web.archive.org/web/ 20260405154843/https://www.genspark.ai/privacy. Archived snapshot 2026-04-05 of https://www.genspark.ai/privacy

work page 2026
[23]

Genspark. n.d.. Genspark. https://genspark.ai. Accessed: 2026-04-27

work page 2026
[24]

Joanna Gerber. 2025. Programmatic Ads Are Coming To AI Chat- bots. https://www.adexchanger.com/publishers/programmatic-ads- are-coming-to-ai-chatbots/

work page 2025
[25]

Google. 2025. Gemini adds Temporary Chats and new personaliza- tion features. https://blog.google/products/gemini/temporary-chats- privacy-controls/ Accessed: April 2026

work page 2025
[27]

Google. n.d.. Gemini. https://gemini.google.com. Accessed: 2026-04- 27

work page 2026
[28]

Ece Gumusel, Kyrie Zhixuan Zhou, and Madelyn Rose Sanfilippo

work page
[29]

User privacy harms and risks in conversational ai: A proposed framework.arXiv preprint arXiv:2402.09716(2024)

work page arXiv 2024
[30]

Bowen Jiang, Zhuoqun Hao, Young Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo Jose Taylor, and Dan Roth. [n. d.]. Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale. InSecond Conference on Language Modeling

work page
[31]

Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, and Giovanni Vigna. 2025. When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins.arXiv 7 Jazlan et al. preprint arXiv:2511.05797(2025)

work page arXiv 2025
[32]

Lisa Mekioussa Malki, Akhil Polamarasetty, Majid Hatamian, Mark Warner, and Enrico Costanza. 2025. Hoovered up as a data point: Exploring Privacy Behaviours, Awareness, and Concerns Among UK Users of LLM-based Conversational Agents.Proceedings on Privacy Enhancing Technologies(2025)

work page 2025
[33]

Manus. 2026. Privacy Policy. https://web.archive.org/web/ 20260425063340/https://manus.im/privacy. Archived snapshot 2026- 04-25 of https://manus.im/privacy

work page 2026
[34]

Manus. n.d.. Manus. https://manus.im. Accessed: 2026-04-27

work page 2026
[35]

Meta. n.d.. Meta AI. https://meta.ai. Accessed: 2026-04-27

work page 2026
[38]

Microsoft. n.d.. About Ads in Copilot. https://help.ads.microsoft.com/ #apex/ads/en/60343/0. [Accessed 27-04-2026]

work page 2026
[39]

Microsoft. n.d.. Microsoft Copilot. https://copilot.microsoft.com. Ac- cessed: 2026-04-27

work page 2026
[40]

Niloofar Mireshghallah and Tianshi Li. 2025. Position: Privacy Is Not Just Memorization!arXiv preprint arXiv:2510.01645(2025)

work page arXiv 2025
[41]

Mistral AI. 2026. Privacy Policy. https://web.archive.org/web/ 20260406094100/https://legal.mistral.ai/terms/privacy-policy. Archived snapshot 2026-04-06 of https://legal.mistral.ai/terms/privacy- policy

work page 2026
[42]

Mistral AI. n.d.. Mistral. https://mistral.ai. Accessed: 2026-04-27

work page 2026
[43]

Moonshot AI. 2026. Kimi User Privacy Policy. https: //web.archive.org/web/20260330170734/https://platform.kimi. ai/docs/agreement/userprivacy. Archived snapshot 2026-03-30 of https://platform.kimi.ai/docs/agreement/userprivacy

work page arXiv 2026
[44]

Moonshot AI. n.d.. Kimi. https://kimi.com. Accessed: 2026-04-27

work page 2026
[46]

OpenAI. 2026. Data Controls FAQ. https://help.openai.com/en/ articles/7730893-data-controls-faq Accessed: April 2026

work page arXiv 2026
[47]

OpenAI. 2026. How your data is used to improve model perfor- mance. https://openai.com/policies/how-your-data-is-used-to- improve-model-performance/ Accessed: April 2026

work page 2026
[48]

OpenAI. 2026. Memory FAQ. https://help.openai.com/en/articles/ 8590148-memory-faq Accessed: April 2026

work page 2026
[49]

OpenAI. n.d.. Ads in ChatGPT | OpenAI Help Center — help.openai.com. https://help.openai.com/en/articles/20001047-ads- in-chatgpt. [Accessed 28-04-2026]

work page arXiv 2026
[50]

OpenAI. n.d.. ChatGPT. https://chatgpt.com. Accessed: 2026-04-27

work page 2026
[51]

OpenRouter. 2026. Privacy Policy. https://web.archive.org/web/ 20260418174831/https://openrouter.ai/privacy. Archived snapshot 2026-04-18 of https://openrouter.ai/privacy

work page 2026
[52]

OpenRouter. n.d.. OpenRouter. https://openrouter.ai. Accessed: 2026-04-27

work page 2026
[53]

Perplexity AI. 2026. Privacy Policy. https://web.archive.org/web/ 20260425043522/https://www.perplexity.ai/hub/legal/privacy-policy. Archived snapshot 2026-04-25 of https://www.perplexity.ai/hub/legal/ privacy-policy

work page 2026
[54]

Perplexity AI. n.d.. Perplexity. https://perplexity.ai. Accessed: 2026- 04-27

work page 2026
[55]

PolyBuzz. 2026. Privacy Policy. https://web.archive.org/web/ 20260409194027/https://www.polybuzz.ai/privacy-policy. Archived snapshot 2026-04-09 of https://www.polybuzz.ai/privacy-policy

work page 2026
[56]

PolyBuzz. n.d.. PolyBuzz. https://polybuzz.ai. Accessed: 2026-04-27

work page 2026
[57]

Synthia Qia Wang, Sai Teja Peddinti, Nina Taft, and Nick Feamster

work page
[58]

InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems

Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. 1–17

work page 2026
[59]

Quora. 2026. Poe Privacy Policy. https://web.archive.org/web/ 20260427025713/https://poe.com/pages/privacy. Archived snapshot 2026-04-27 of https://poe.com/pages/privacy

work page 2026
[60]

Quora. n.d.. Poe. https://poe.com. Accessed: 2026-04-27

work page 2026
[61]

Trust Me Over My Privacy Policy

Abdelrahman Ragab, Mohammad Mannan, and Amr Youssef. 2024. “Trust Me Over My Privacy Policy”: Privacy Discrepancies in Romantic AI Chatbot Apps. In2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 484–495

work page 2024
[62]

Chris Rowlands. 2025. Goodbye Google? People are increas- ingly switching to the likes of ChatGPT, according to major survey. https://www.techradar.com/tech/people-are-increasingly- swapping-google-for-the-likes-of-chatgpt-according-to-a-major- survey-heres-why

work page 2025
[63]

SeaArt. 2026. Privacy Policy. https://web.archive.org/web/ 20260425184324/https://node1.cdn2.seaart.ai/mirror/static/ upload/policy.html. Archived snapshot 2026-04-25 of https://node1.cdn2.seaart.ai/mirror/static/upload/policy.html

work page 2026
[64]

SeaArt. n.d.. SeaArt. https://seaart.ai. Accessed: 2026-04-27

work page 2026
[65]

Similarweb. 2026. Top AI Chatbots and Tools Websites Rank- ing. https://web.archive.org/web/20260424195050/https://www. similarweb.com/top-websites/ai-chatbots-and-tools/

work page arXiv 2026
[66]

Robin Staab, Mark Vero, Mislav Balunović, and Martin Vechev. 2023. Beyond memorization: Violating privacy via inference with large language models.arXiv preprint arXiv:2310.07298(2023)

work page arXiv 2023
[67]

Mozilla Support. 2026. Enhanced Tracking Protection in Firefox for desktop. https://support.mozilla.org/en-US/kb/enhanced-tracking- protection-firefox-desktop

work page 2026
[68]

Surfshark. 2025. AI Chatbots Ranked by Data They Collect. https: //surfshark.com/research/chart/ai-chatbots-privacy Accessed: April 2026

work page 2025
[69]

Jan Tolsdorf, Alan F Luo, Monica Kodwani, Junho Eum, Mahmood Sharif, Michelle L Mazurek, and Adam J Aviv. 2025. Safety Perceptions of Generative {AI} Conversational Agents: Uncovering Perceptual Differences in Trust, Risk, and Fairness. InTwenty-First Symposium on Usable Privacy and Security (SOUPS 2025). 93–112

work page 2025
[70]

Yash Vekaria, Aurelio Loris Canino, Jonathan Levitsky, Alex Ciechon- ski, Patricia Callejo, Anna Maria Mandalari, and Zubair Shafiq. 2025. Big Help or Big Brother? Auditing Tracking, Profiling, and Personal- ization in Generative {AI} Assistants. In34th USENIX Security Sym- posium (USENIX Security 25). 8115–8134

work page 2025
[71]

Yash Vekaria, Nurullah Demir, Konrad Kollnig, and Zubair Shafiq

work page
[72]

Understanding Data Collection, Brokerage, and Spam in the Lead Marketing Ecosystem.arXiv preprint arXiv:2604.06759(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[73]

WebKit. [n.d.]. Tracking Prevention in WebKit. https://webkit.org/ tracking-prevention/

work page
[74]

Yuhao Wu, Evin Jaff, Ke Yang, Ning Zhang, and Umar Iqbal. 2025. An in-depth investigation of data collection in llm app ecosystems. In Proceedings of the 2025 ACM Internet Measurement Conference. 150– 170

work page 2025
[75]

xAI. 2026. Privacy Policy. https://web.archive.org/web/ 20260426055921/https://x.ai/legal/privacy-policy. Archived snapshot 2026-04-26 of https://x.ai/legal/privacy-policy

work page 2026
[76]

xAI. n.d.. Grok. https://grok.com. Accessed: 2026-04-27. 8 Tracking Conversations

work page 2026
[77]

Maxwell Zeff. 2025. Meta plans to sell targeted ads based on data in your AI chats. https://techcrunch.com/2025/10/01/meta-plans-to- sell-targeted-ads-based-on-data-in-your-ai-chats/

work page 2025
[78]

Xiao Zhan, Juan Carlos Carrillo, William Seymour, and Jose Such

work page
[79]

In34th USENIX Security Symposium (USENIX Security 25)

Malicious {LLM-Based} Conversational {AI} Makes Users Reveal Personal Information. In34th USENIX Security Symposium (USENIX Security 25). 61–80

work page
[80]

It’s a Fair Game

Zhiping Zhang, Michelle Jia, Hao-Ping Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, and Tianshi Li. 2024. “It’s a Fair Game”, or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–26

work page 2024
[81]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.182231, 2 (2023), 1–124. A Ethical Considerations This work studies the privacy practices of AI chatbot web- sites and does not rely on data collected fro...

work page internal anchor Pith review Pith/arXiv arXiv 2023