pith. machine review for the scientific record. sign in

arxiv: 2604.27438 · v2 · submitted 2026-04-30 · 💻 cs.CR · cs.CY

Recognition: no theorem link

Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:12 UTC · model grok-4.3

classification 💻 cs.CR cs.CY
keywords AI chatbotsweb trackingprivacy leakagethird-party sharingsession replaycontent exposureidentity exposure
0
0 comments X

The pith

Seventeen of twenty popular AI chatbots share conversation data with third parties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures network traffic on 20 AI chatbots during controlled chats that include sensitive prompts. It finds that 17 chatbots send user data to outside companies, and three of them transmit actual prompt and response text in plain form to an analytics service. The measurements also track how chat URLs, identifiers, and some user details reach advertising and analytics endpoints. This shows that even private chat options often fail to block third-party access to what users type.

Core claim

Under controlled settings using a sensitive prompt, network traffic capture shows that 17 of 20 chatbots share information with at least one third party. Three chatbots transmit plaintext conversation text, including prompt and response snippets, to Microsoft Clarity through session replay. Fifteen chatbots forward conversation URLs or chat identifiers to third-party advertising, analytics, or social endpoints, and several expose user identity details such as hashed emails or account identifiers.

What carries the argument

Network traffic capture during normal and private chat sessions that identifies exposure of content (prompts, URLs, identifiers) and identity (cookies, emails, IP fields) to third-party endpoints.

If this is right

  • Conversation text can reach analytics providers without explicit user consent.
  • Chat URLs and identifiers allow third parties to associate separate conversations with the same user.
  • Private chat modes do not reliably prevent third-party receipt of content or identifiers.
  • Identity details such as hashed emails can leak through support widgets or analytics tags.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Users who discuss medical or financial topics may expose details to advertisers even when they choose private mode.
  • Providers could limit exposure by auditing or removing session-replay and analytics scripts from chatbot pages.
  • Regulators might examine whether current privacy notices cover third-party sharing of live chat content.

Load-bearing premise

That controlled lab tests with chosen sensitive prompts and network capture represent all tracking that occurs for typical users in real production settings.

What would settle it

Run the same sensitive prompts on one of the 20 chatbots from an ordinary browser, inspect the live network requests, and check whether any third-party payloads contain matching plaintext text or chat identifiers.

Figures

Figures reproduced from arXiv: 2604.27438 by Ethan Wang, Muhammad Jazlan, Yash Vekaria, Zubair Shafiq.

Figure 1
Figure 1. Figure 1: Sankey of data flows from chatbots to adver view at source ↗
Figure 2
Figure 2. Figure 2: Sankey diagram of data flow from chatbots view at source ↗
Figure 3
Figure 3. Figure 3: Sankey diagram of data flow from chatbots to view at source ↗
read the original abstract

AI chatbots are becoming a primary interface for seeking information. As their popularity grows, chatbot providers are starting to deploy advertising and analytics. Despite this, tracking on AI chatbots has not been systematically studied. We present a systematic measurement of web tracking on 20 popular AI chatbots. Under controlled settings using a sensitive prompt, we capture and compare network traffic in normal chats and, where supported, private chats. We search for exposure of two categories of information: content, including prompts, prompt-derived titles, chat URLs, and chat identifiers; and identity, including names, emails, account identifiers, first-party cookies, and explicit IP/User-Agent fields in payloads. We find that 17 of 20 chatbots share information with at least one third party. Three chatbots share plaintext conversation text, including both prompt and response snippets, with Microsoft Clarity through session replay. Fifteen chatbots share conversation URLs or chat identifiers with third-party advertising, analytics, or social endpoints. Several chatbots expose user identity through support widgets, analytics, advertising, and session replay tags; in some cases, hashed emails are shared.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript reports the results of a measurement study that examines web tracking and data exposure on 20 popular AI chatbots. Using controlled experiments with sensitive prompts, the authors capture network traffic to identify sharing of content (prompts, responses, chat URLs, identifiers) and identity information (names, emails, cookies, IPs) with third parties. Key findings include that 17 of the 20 chatbots share information with at least one third party, with three transmitting plaintext conversation text to Microsoft Clarity, and fifteen sharing conversation URLs or identifiers with advertising, analytics, or social endpoints.

Significance. This study is significant as it provides the first systematic empirical evidence of tracking practices in AI chatbots, a rapidly growing user interface. The direct observation of network traffic under controlled conditions yields concrete, falsifiable results about data exposure, including specific examples of plaintext sharing. These findings highlight privacy vulnerabilities that could inform better design practices, user awareness, and regulatory efforts in the field of AI privacy and security.

minor comments (2)
  1. The methodology would benefit from an explicit list or table enumerating the 20 chatbots and the criteria used for their selection to improve reproducibility.
  2. Results on private-chat mode (mentioned in the abstract) should be presented with the same level of per-chatbot granularity as the normal-chat results to clarify any differences in tracking exposure.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our measurement study, recognition of its significance as the first systematic empirical evidence on tracking in AI chatbots, and recommendation to accept the manuscript. No major comments were raised, so we have no points requiring response or revision.

Circularity Check

0 steps flagged

No significant circularity: pure empirical traffic measurement

full rationale

The paper conducts a controlled measurement of network traffic from 20 AI chatbots using sensitive prompts, directly observing payload contents sent to third-party endpoints. No equations, derivations, fitted parameters, or load-bearing self-citations appear in the reported claims. Central results (17/20 chatbots share data; three transmit plaintext snippets to Microsoft Clarity) rest solely on captured traffic observations under the stated lab conditions, with no reduction to any constructed model or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study relies on standard assumptions of web measurement without introducing free parameters, new axioms beyond domain norms, or invented entities.

axioms (1)
  • domain assumption Captured network requests during controlled sessions represent the complete set of tracking activity performed by the chatbots.
    The measurement depends on traffic capture being exhaustive under the test conditions described.

pith-pipeline@v0.9.0 · 5499 in / 1165 out tokens · 32354 ms · 2026-05-14T22:12:41.709050+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 2 internal anchors

  1. [1]

    Google Maps — google.com

    n.d.. Google Maps — google.com. https://www.google.com/maps. [Accessed 27-04-2026]

  2. [2]

    Intercom | AI Customer Service Platform — intercom.com

    n.d.. Intercom | AI Customer Service Platform — intercom.com. https: //www.intercom.com/drlp/ai-customer-service. [Accessed 27-04- 2026]

  3. [3]

    Microsoft Clarity - Free Heatmaps & Session Recordings — clar- ity.microsoft.com

    n.d.. Microsoft Clarity - Free Heatmaps & Session Recordings — clar- ity.microsoft.com. https://clarity.microsoft.com. [Accessed 27-04- 2026]

  4. [4]

    No, hashing still doesn’t make your data anonymous — ftc.gov

    n.d.. No, hashing still doesn’t make your data anonymous — ftc.gov. https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/ 2024/07/no-hashing-still-doesnt-make-your-data-anonymous. [Ac- cessed 26-04-2026]

  5. [5]

    Temporary Chat FAQ | OpenAI Help Center — help.openai.com

    n.d.. Temporary Chat FAQ | OpenAI Help Center — help.openai.com. https://help.openai.com/en/articles/8914046-temporary-chat-faq. [Ac- cessed 28-04-2026]

  6. [6]

    AIBY. n.d.. ChatOn. https://chaton.ai. Accessed: 2026-04-27

  7. [8]

    Alibaba Cloud. n.d.. Qwen Chat. https://chat.qwen.ai. Accessed: 2026-04-27

  8. [10]

    Anthropic. n.d.. Claude. https://claude.ai. Accessed: 2026-04-27

  9. [11]

    Guendalina Caldarini, Sardar Jaf, and Kenneth McGarry. 2022. A literature survey of recent advances in chatbots.Information13, 1 (2022), 41

  10. [12]

    Juan-Carlos Carrillo, Jose Luis Martin-Navarro, Rongjun Ma, and Jose Such. 2026. Personal Data Flows and Privacy Policy Traceability in Third-party LLM Apps in the GPT Ecosystem.Proceedings on Privacy Enhancing Technologies(2026)

  11. [13]

    Character Technologies. n.d.. Character.AI. https://character.ai. Ac- cessed: 2026-04-27

  12. [14]

    Character.AI. 2026. Privacy Policy. https://web.archive.org/web/ 20260427024931/https://policies.character.ai/privacy. Archived snap- shot 2026-04-27 of https://policies.character.ai/privacy

  13. [15]

    ChatOn. 2026. Privacy Policy. https://web.archive.org/web/ 20260409165952/https://chaton.ai/legal/privacy/en/. Archived snap- shot 2026-04-09 of https://chaton.ai/legal/privacy/en/

  14. [16]

    The Conversation. 2026. You probably wouldn’t notice if an AI chatbot slipped ads into its responses. https://theconversation.com/you- probably-wouldnt-notice-if-an-ai-chatbot-slipped-ads-into-its- responses-276010 Accessed: April 2026

  15. [18]

    DeepSeek. n.d.. DeepSeek. https://chat.deepseek.com. Accessed: 2026-04-27

  16. [20]

    DuckDuckGo. n.d.. Duck.ai. https://duck.ai. Accessed: 2026-04-27

  17. [21]

    Jinyan Fan, Tianjun Sun, Jiayi Liu, Teng Zhao, Bo Zhang, Zheng Chen, Melissa Glorioso, and Elissa Hack. 2023. How well can an AI chatbot infer personality? Examining psychometric properties of machine- inferred personality scores.Journal of Applied Psychology108, 8 (2023), 1277

  18. [22]

    Genspark. 2026. Privacy Policy. https://web.archive.org/web/ 20260405154843/https://www.genspark.ai/privacy. Archived snapshot 2026-04-05 of https://www.genspark.ai/privacy

  19. [23]

    Genspark. n.d.. Genspark. https://genspark.ai. Accessed: 2026-04-27

  20. [24]

    Joanna Gerber. 2025. Programmatic Ads Are Coming To AI Chat- bots. https://www.adexchanger.com/publishers/programmatic-ads- are-coming-to-ai-chatbots/

  21. [25]

    Google. 2025. Gemini adds Temporary Chats and new personaliza- tion features. https://blog.google/products/gemini/temporary-chats- privacy-controls/ Accessed: April 2026

  22. [27]

    Google. n.d.. Gemini. https://gemini.google.com. Accessed: 2026-04- 27

  23. [28]

    Ece Gumusel, Kyrie Zhixuan Zhou, and Madelyn Rose Sanfilippo

  24. [29]

    User privacy harms and risks in conversational ai: A proposed framework.arXiv preprint arXiv:2402.09716(2024)

  25. [30]

    Bowen Jiang, Zhuoqun Hao, Young Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo Jose Taylor, and Dan Roth. [n. d.]. Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale. InSecond Conference on Language Modeling

  26. [31]

    Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, and Giovanni Vigna. 2025. When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins.arXiv 7 Jazlan et al. preprint arXiv:2511.05797(2025)

  27. [32]

    Lisa Mekioussa Malki, Akhil Polamarasetty, Majid Hatamian, Mark Warner, and Enrico Costanza. 2025. Hoovered up as a data point: Exploring Privacy Behaviours, Awareness, and Concerns Among UK Users of LLM-based Conversational Agents.Proceedings on Privacy Enhancing Technologies(2025)

  28. [33]

    Manus. 2026. Privacy Policy. https://web.archive.org/web/ 20260425063340/https://manus.im/privacy. Archived snapshot 2026- 04-25 of https://manus.im/privacy

  29. [34]

    Manus. n.d.. Manus. https://manus.im. Accessed: 2026-04-27

  30. [35]

    Meta. n.d.. Meta AI. https://meta.ai. Accessed: 2026-04-27

  31. [38]

    Microsoft. n.d.. About Ads in Copilot. https://help.ads.microsoft.com/ #apex/ads/en/60343/0. [Accessed 27-04-2026]

  32. [39]

    Microsoft. n.d.. Microsoft Copilot. https://copilot.microsoft.com. Ac- cessed: 2026-04-27

  33. [40]

    Niloofar Mireshghallah and Tianshi Li. 2025. Position: Privacy Is Not Just Memorization!arXiv preprint arXiv:2510.01645(2025)

  34. [41]

    Mistral AI. 2026. Privacy Policy. https://web.archive.org/web/ 20260406094100/https://legal.mistral.ai/terms/privacy-policy. Archived snapshot 2026-04-06 of https://legal.mistral.ai/terms/privacy- policy

  35. [42]

    Mistral AI. n.d.. Mistral. https://mistral.ai. Accessed: 2026-04-27

  36. [43]

    Moonshot AI. 2026. Kimi User Privacy Policy. https: //web.archive.org/web/20260330170734/https://platform.kimi. ai/docs/agreement/userprivacy. Archived snapshot 2026-03-30 of https://platform.kimi.ai/docs/agreement/userprivacy

  37. [44]

    Moonshot AI. n.d.. Kimi. https://kimi.com. Accessed: 2026-04-27

  38. [46]

    OpenAI. 2026. Data Controls FAQ. https://help.openai.com/en/ articles/7730893-data-controls-faq Accessed: April 2026

  39. [47]

    OpenAI. 2026. How your data is used to improve model perfor- mance. https://openai.com/policies/how-your-data-is-used-to- improve-model-performance/ Accessed: April 2026

  40. [48]

    OpenAI. 2026. Memory FAQ. https://help.openai.com/en/articles/ 8590148-memory-faq Accessed: April 2026

  41. [49]

    OpenAI. n.d.. Ads in ChatGPT | OpenAI Help Center — help.openai.com. https://help.openai.com/en/articles/20001047-ads- in-chatgpt. [Accessed 28-04-2026]

  42. [50]

    OpenAI. n.d.. ChatGPT. https://chatgpt.com. Accessed: 2026-04-27

  43. [51]

    OpenRouter. 2026. Privacy Policy. https://web.archive.org/web/ 20260418174831/https://openrouter.ai/privacy. Archived snapshot 2026-04-18 of https://openrouter.ai/privacy

  44. [52]

    OpenRouter. n.d.. OpenRouter. https://openrouter.ai. Accessed: 2026-04-27

  45. [53]

    Perplexity AI. 2026. Privacy Policy. https://web.archive.org/web/ 20260425043522/https://www.perplexity.ai/hub/legal/privacy-policy. Archived snapshot 2026-04-25 of https://www.perplexity.ai/hub/legal/ privacy-policy

  46. [54]

    Perplexity AI. n.d.. Perplexity. https://perplexity.ai. Accessed: 2026- 04-27

  47. [55]

    PolyBuzz. 2026. Privacy Policy. https://web.archive.org/web/ 20260409194027/https://www.polybuzz.ai/privacy-policy. Archived snapshot 2026-04-09 of https://www.polybuzz.ai/privacy-policy

  48. [56]

    PolyBuzz. n.d.. PolyBuzz. https://polybuzz.ai. Accessed: 2026-04-27

  49. [57]

    Synthia Qia Wang, Sai Teja Peddinti, Nina Taft, and Nick Feamster

  50. [58]

    InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems

    Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. 1–17

  51. [59]

    Quora. 2026. Poe Privacy Policy. https://web.archive.org/web/ 20260427025713/https://poe.com/pages/privacy. Archived snapshot 2026-04-27 of https://poe.com/pages/privacy

  52. [60]

    Quora. n.d.. Poe. https://poe.com. Accessed: 2026-04-27

  53. [61]

    Trust Me Over My Privacy Policy

    Abdelrahman Ragab, Mohammad Mannan, and Amr Youssef. 2024. “Trust Me Over My Privacy Policy”: Privacy Discrepancies in Romantic AI Chatbot Apps. In2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 484–495

  54. [62]

    Chris Rowlands. 2025. Goodbye Google? People are increas- ingly switching to the likes of ChatGPT, according to major survey. https://www.techradar.com/tech/people-are-increasingly- swapping-google-for-the-likes-of-chatgpt-according-to-a-major- survey-heres-why

  55. [63]

    SeaArt. 2026. Privacy Policy. https://web.archive.org/web/ 20260425184324/https://node1.cdn2.seaart.ai/mirror/static/ upload/policy.html. Archived snapshot 2026-04-25 of https://node1.cdn2.seaart.ai/mirror/static/upload/policy.html

  56. [64]

    SeaArt. n.d.. SeaArt. https://seaart.ai. Accessed: 2026-04-27

  57. [65]

    Similarweb. 2026. Top AI Chatbots and Tools Websites Rank- ing. https://web.archive.org/web/20260424195050/https://www. similarweb.com/top-websites/ai-chatbots-and-tools/

  58. [66]

    Robin Staab, Mark Vero, Mislav Balunović, and Martin Vechev. 2023. Beyond memorization: Violating privacy via inference with large language models.arXiv preprint arXiv:2310.07298(2023)

  59. [67]

    Mozilla Support. 2026. Enhanced Tracking Protection in Firefox for desktop. https://support.mozilla.org/en-US/kb/enhanced-tracking- protection-firefox-desktop

  60. [68]

    Surfshark. 2025. AI Chatbots Ranked by Data They Collect. https: //surfshark.com/research/chart/ai-chatbots-privacy Accessed: April 2026

  61. [69]

    Jan Tolsdorf, Alan F Luo, Monica Kodwani, Junho Eum, Mahmood Sharif, Michelle L Mazurek, and Adam J Aviv. 2025. Safety Perceptions of Generative {AI} Conversational Agents: Uncovering Perceptual Differences in Trust, Risk, and Fairness. InTwenty-First Symposium on Usable Privacy and Security (SOUPS 2025). 93–112

  62. [70]

    Yash Vekaria, Aurelio Loris Canino, Jonathan Levitsky, Alex Ciechon- ski, Patricia Callejo, Anna Maria Mandalari, and Zubair Shafiq. 2025. Big Help or Big Brother? Auditing Tracking, Profiling, and Personal- ization in Generative {AI} Assistants. In34th USENIX Security Sym- posium (USENIX Security 25). 8115–8134

  63. [71]

    Yash Vekaria, Nurullah Demir, Konrad Kollnig, and Zubair Shafiq

  64. [72]

    Understanding Data Collection, Brokerage, and Spam in the Lead Marketing Ecosystem.arXiv preprint arXiv:2604.06759(2026)

  65. [73]

    WebKit. [n.d.]. Tracking Prevention in WebKit. https://webkit.org/ tracking-prevention/

  66. [74]

    Yuhao Wu, Evin Jaff, Ke Yang, Ning Zhang, and Umar Iqbal. 2025. An in-depth investigation of data collection in llm app ecosystems. In Proceedings of the 2025 ACM Internet Measurement Conference. 150– 170

  67. [75]

    xAI. 2026. Privacy Policy. https://web.archive.org/web/ 20260426055921/https://x.ai/legal/privacy-policy. Archived snapshot 2026-04-26 of https://x.ai/legal/privacy-policy

  68. [76]

    xAI. n.d.. Grok. https://grok.com. Accessed: 2026-04-27. 8 Tracking Conversations

  69. [77]

    Maxwell Zeff. 2025. Meta plans to sell targeted ads based on data in your AI chats. https://techcrunch.com/2025/10/01/meta-plans-to- sell-targeted-ads-based-on-data-in-your-ai-chats/

  70. [78]

    Xiao Zhan, Juan Carlos Carrillo, William Seymour, and Jose Such

  71. [79]

    In34th USENIX Security Symposium (USENIX Security 25)

    Malicious {LLM-Based} Conversational {AI} Makes Users Reveal Personal Information. In34th USENIX Security Symposium (USENIX Security 25). 61–80

  72. [80]

    It’s a Fair Game

    Zhiping Zhang, Michelle Jia, Hao-Ping Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, and Tianshi Li. 2024. “It’s a Fair Game”, or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–26

  73. [81]

    A Survey of Large Language Models

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.182231, 2 (2023), 1–124. A Ethical Considerations This work studies the privacy practices of AI chatbot web- sites and does not rely on data collected fro...