Users' Activity Logs: the Good, the Bad, the Misconception, and the Disastrous
Pith reviewed 2026-07-01 07:59 UTC · model grok-4.3
The pith
Saudi Google users describe activity logs as useful for tracking yet prone to misconceptions and severe privacy harms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through template analysis of the interview data viewed through the four themes, the study surfaces new themes and use cases for activity logs that together present a balanced picture of how users perceive the logs' benefits, risks, misconceptions, and potential for extreme negative outcomes.
What carries the argument
Four themes (good, bad, misconception, disastrous) applied as an analytical lens via template analysis to secondary interview data on Google's activity controls.
If this is right
- Service providers can incorporate the identified perceptions when designing or updating privacy controls for activity logs.
- The work supplies practical recommendations for users, privacy researchers, and service providers.
- New use cases for activity logs emerge that prior studies had not emphasized.
- Later research on related privacy topics gains a source of balanced themes to build upon.
Where Pith is reading between the lines
- Designers of log interfaces could directly test features that correct the misconceptions the users expressed.
- The same four-theme lens might reveal different patterns if applied to activity logs from services other than Google.
- Cultural or regional factors could shift the balance among the four themes, suggesting value in targeted follow-up samples.
Load-bearing premise
Secondary analysis of a convenience sample of 30 users from one country can surface themes that apply more broadly to user perceptions of activity logs.
What would settle it
A new study collecting primary data from a larger, demographically broader sample across multiple countries that finds no evidence for the reported positive use cases or misconceptions would show the themes do not generalize.
read the original abstract
Most service providers, such as Google, save logs from data generated by users while using the service. Many service providers provide users with privacy controls to manage whether, how, and for how long the data is saved and used by the service provider. While most prior studies focused on the negative side of users' activity logs, such as users' lack of awareness about the logs' privacy controls and users' privacy concerns toward their data, this work aims to provide a balanced view of users' perceptions regarding activity logs by considering the positive, negative, and extremely negative (hence disastrous) sides, as well as the misconceptions of activity logs. In this work, we present a case study of Google's Activity controls by conducting a secondary analysis of interview data from 30 Google personal account holders in Saudi Arabia. Using template analysis, we analyzed the data from the lens of four main themes: the good, the bad, the misconception, and the disastrous aspects of users' activity logs from the users' perspective. Our findings uncover new themes and use cases, offering a balanced view of users' perceptions of activity logs, and provide a better understanding and a useful source for subsequent studies on related topics. We conclude with practical recommendations for service providers, privacy researchers and experts, and users alike.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a case study of Google's Activity controls via secondary template analysis of existing interview data from 30 Google personal account holders in Saudi Arabia. It applies a four-theme lens (the good, the bad, the misconception, and the disastrous aspects of users' activity logs) to provide a balanced view of perceptions, claiming to uncover new themes and use cases that extend beyond prior work's negative focus, and concludes with practical recommendations for providers, researchers, and users.
Significance. If the analysis demonstrates rigorous, independent theme identification not constrained by the original protocol, the work could usefully expand HCI/privacy literature by documenting positive perceptions and misconceptions alongside negatives, offering a source for subsequent studies. The secondary-analysis approach and small convenience sample from one country, however, constrain broader claims of a general 'balanced view.'
major comments (3)
- [Methods section] Methods section: The description of the template analysis supplies no details on theme derivation process, inter-coder agreement, how the secondary dataset of 30 interviews was chosen, or exclusion criteria. This information is required to assess whether the four themes represent independent findings or are shaped by the original study's questions and Saudi-specific context.
- [Sample and data description] Sample and data description: The central claim of uncovering new themes and a balanced view rests on secondary analysis of a convenience sample of 30 users without primary data collection or explicit checks for selection bias; the manuscript must address how this sample supports generalizable insights rather than protocol artifacts.
- [Findings/Discussion] Findings/Discussion: The assertion that the analysis surfaces 'new themes and use cases' requires evidence that these categories are not simply re-labelings of responses elicited by the prior interview protocol; without such comparison or saturation discussion, the novelty and independence of the four-theme lens cannot be evaluated.
minor comments (2)
- [Abstract] Abstract: The abstract could briefly note the secondary-analysis limitations to calibrate reader expectations about generalizability.
- [Recommendations section] Recommendations section: Some practical suggestions for service providers would be strengthened by explicit mapping back to specific themes identified in the analysis.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below, indicating planned revisions where appropriate. Our response aims to clarify the secondary analysis approach while acknowledging its inherent limitations.
read point-by-point responses
-
Referee: [Methods section] Methods section: The description of the template analysis supplies no details on theme derivation process, inter-coder agreement, how the secondary dataset of 30 interviews was chosen, or exclusion criteria. This information is required to assess whether the four themes represent independent findings or are shaped by the original study's questions and Saudi-specific context.
Authors: We agree that the methods section requires expansion for transparency. The dataset comprises the complete set of 30 interviews collected in our prior study on privacy perceptions among Saudi Google users, selected without additional exclusion criteria for this secondary analysis. We will revise the methods to detail the template analysis process, including how the four themes were used as an initial template and how sub-themes were derived iteratively from the data. As the analysis was led by one author with team discussion for consensus rather than formal inter-coder reliability calculation, we will explicitly describe this approach and its implications. We will also summarize the original interview protocol to allow evaluation of potential influence. revision: yes
-
Referee: [Sample and data description] Sample and data description: The central claim of uncovering new themes and a balanced view rests on secondary analysis of a convenience sample of 30 users without primary data collection or explicit checks for selection bias; the manuscript must address how this sample supports generalizable insights rather than protocol artifacts.
Authors: The manuscript frames the work as a case study of Saudi users rather than claiming broad generalizability. We will add explicit language in the introduction, methods, and limitations sections to emphasize that insights are specific to this population and context, and do not extend to all users. Regarding selection bias, we will discuss the convenience sampling from the original recruitment and note it as a limitation, while arguing that the balanced themes identified still contribute novel perspectives to the literature even within this scope. revision: yes
-
Referee: [Findings/Discussion] Findings/Discussion: The assertion that the analysis surfaces 'new themes and use cases' requires evidence that these categories are not simply re-labelings of responses elicited by the prior interview protocol; without such comparison or saturation discussion, the novelty and independence of the four-theme lens cannot be evaluated.
Authors: We will incorporate a new subsection in the methods or findings that maps the original interview questions to the current themes, demonstrating that elements of 'the good', 'misconceptions', and 'the disastrous' emerged from unprompted discussions or tangential responses. However, as this is secondary analysis without the ability to pursue data saturation through additional interviews, we cannot fully address saturation and will instead highlight this as a limitation of the approach. We maintain that the four-theme lens offers a structured, balanced framing that extends prior negative-focused work. revision: partial
- Quantitative inter-coder agreement statistics, as the template analysis did not involve multiple independent coders calculating agreement metrics.
Circularity Check
No circularity: qualitative secondary analysis with no derivations or self-referential reductions
full rationale
The paper reports a secondary template analysis of existing interview data from 30 Saudi users, identifying four themes (good, bad, misconception, disastrous) from users' perspectives on activity logs. No equations, predictions, fitted parameters, or mathematical derivations appear. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The claims rest on analysis of external data rather than reducing by construction to the paper's own inputs or prior author definitions. This matches the default non-circular case for qualitative work without self-referential structure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Template analysis is a valid and unbiased method for extracting themes from interview data on privacy perceptions.
Reference graph
Works this paper leans on
-
[1]
Mohammed Abuhamad, Tamer AbuHmed, Aziz Mohaisen, and DaeHun Nyang. 2018. Large-Scale and Language- Oblivious Code Authorship Identification. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18). Association for Computing Machinery, New York, NY, USA, 101–114. doi:10.1145/ 3243734.3243738
-
[3]
Amazon Web Services
Inc. Amazon Web Services. 2023. CodeWhisperer. https://aws.amazon.com/codewhisperer/
2023
-
[4]
Anubis. 2022. Anubis. https://github.com/0sir1ss/Anubis
2022
-
[5]
Lutz Büch and Artur Andrzejak. 2019. Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection. In2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Hangzhou, China, 95–104. doi:10.1109/SANER.2019.8668039
-
[6]
Molloy, and Biplav Srivastava
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian M. Molloy, and Biplav Srivastava. 2019. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. InWorkshop on Artificial Intelligence Safety 2019 co-located with the Thirty-Third AAAI Conference on Artificial Intelligence 2019 (AAAI-19) (...
2019
-
[7]
Shuzheng Gao, Cuiyun Gao, Chaozheng Wang, Jun Sun, David Lo, and Yue Yu. 2023. Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension. InProceedings of the 45th International Conference on Software Engineering (ICSE ’23). IEEE Press, Los Alamitos, CA, USA, 1933–1945. doi:10.1109/ICSE48619.2023.00164
-
[8]
Inc. GitHub. 2022. GitHub Copilot. https://copilot.github.com/
2022
-
[9]
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence.ArXivabs/2401.14196 (2024). https://arxiv.org/abs/2401.14196
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
H. Hotelling. 1933. Analysis of a complex of statistical variables into principal components.Journal of Educational Psychology24 (1933), 417–441. Issue 6. doi:10.1037/h0071325
-
[11]
Chao Hu, Yitian Chai, Hao Zhou, Fandong Meng, Jie Zhou, and Xiaodong Gu. 2024. How Effectively Do Code Language Models Understand Poor-Readability Code?. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE ’24). Association for Computing Machinery, New York, NY, USA, 795–806. doi:10.1145/3691620.3695072
-
[12]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search.arXivabs/1909.09436 (2019). arXiv:1909.09436 http: //arxiv.org/abs/1909.09436
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[13]
Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code Authorship Attribution: Methods and Challenges.ACM Comput. Surv.52, 1, Article 3 (Feb. 2019), 36 pages. doi:10.1145/3292577
-
[14]
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A Watermark for Large Language Models. InInternational Conference on Machine Learning (Proceedings of Machine Learning Re- search, Vol. 202). PMLR, Honolulu, Hawaii, USA, 17061–17084. https://proceedings.mlr.press/v202/kirchenbauer23a/ kirchenbauer23a.pdf
2023
-
[15]
Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim
-
[16]
InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics
Who Wrote this Code? Watermarking for Code Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Bangkok, Thailand, 4890–4911. https://aclanthology.org/2024.acl-long.268
2024
-
[17]
Jia Li, Zhuo Li, Huangzhao Zhang, Ge Li, Zhi Jin, Xing Hu, and Xin Xia. 2024. Poison Attack and Poison Detection on Deep Source Code Processing Models.ACM Trans. Softw. Eng. Methodol.33, 3 (2024), 62:1–62:31. doi:10.1145/3630008
-
[18]
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. StarCoder: may the source be with you!Transactions on Machine Learning Research2023 (2023). https://openreview.net/forum?id=KoFOg41haE
2023
-
[19]
Vadim Markovtsev and Waren Long. 2018. Public git archive: a big code dataset for all. InProceedings of the 15th International Conference on Mining Software Repositories. Association for Computing Machinery, New York, NY, USA, 34–37. https://doi.org/10.1145/3196398.3196464
-
[20]
M.Brunsfeld, P.Thomson, A.Hlynskyi, J.Vera, P.Turnbull, T.Clem, D.Creager, A.Helwer, R.Rix, H.van Antwerpen, M.Davis, Ika, T.-A.Nguyen, S.Brunk, N.Hasabnis, bfredl, M.Dong, V.Panteleev, ikrima, S.Kalt, K.Lampe, A.Pinkus, M.Schmitz, M.Krupcale, narpfel, S.Gallegos, V.Martí, Edgar, and G.Fraser. 2020. Tree-sitter: An incremental parsing system for programmi...
2020
-
[21]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, Philadelphia, PA, USA, 311–318. doi:10.3115/1073083.1073135
-
[22]
Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space.The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science2, 11 (1901), 559–572. doi:10.1080/14786440109462720
-
[23]
Perforce. 2025. Klocwork. https://help.klocwork.com/current/en-us/concepts/functionandmethodlevelmetrics.htm
2025
-
[24]
Julie Peterson. 2025. Top Source Code Leaks, 2020-2025. https://cycode.com/blog/top-source-code-leaks-2020-2025/
2025
-
[25]
PuzzleMark. 2025. PuzzleMark. https://github.com/Fenriel/PuzzleMark
2025
-
[26]
Pyarmor. 2025. Pyarmor. https://github.com/dashingsoft/pyarmor
2025
-
[27]
Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You Autocomplete Me: Poisoning Vul- nerabilities in Neural Code Completion. InProceedings of the 30th USENIX Security Symposium. USENIX Association, Vancouver, B.C., Canada, 1559–1575. https://www.usenix.org/conference/usenixsecurity21/presentation/schuster
2021
-
[28]
Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, and Bin Luo. 2023. Backdooring Neural Code Search. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Toronto, Canada, 9692–9708. doi:10.18653/V1/2023.ACL- LONG.540
-
[29]
Weisong Sun, Yuchen Chen, Mengzhe Yuan, Chunrong Fang, Zhenpeng Chen, Chong Wang, Yang Liu, Baowen Xu, and Zhenyu Chen. 2025. Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code Naturalness . In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 2663–2675. doi:10....
-
[30]
Zhensu Sun, Xiaoning Du, Fu Song, and Li Li. 2023. CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, San Francisco, CA, USA, 1561–1572. doi:10.1145/3611643. 3616297
-
[31]
Zhensu Sun, Xiaoning Du, Fu Song, Mingze Ni, and Li Li. 2022. CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning. InWWW ’22: The ACM Web Conference 2022. ACM, Virtual Event, Lyon, France, 652–660. doi:10.1145/3485447.3512225
-
[32]
Qwen Team. 2024. Qwen2.5: A Party of Foundation Models. https://qwenlm.github.io/blog/qwen2.5/
2024
-
[33]
Brandon Tran, Jerry Li, and Aleksander Madry. 2018. Spectral Signatures in Backdoor Attacks. InAdvances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems. Curran Associates, Montréal, Canada, 8011–8021. https://proceedings.neurips.cc/paper/2018/hash/280cf18baf4311c92aa5a042336587d3- Abstract.html
2018
-
[34]
Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, and Lichao Sun. 2022. You see what I want you to see: poisoning vulnerabilities in neural code search. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, Singapore, Singapore, 1233–1245...
-
[35]
Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, and Bing Xiang. 2023. ReCode: Robustness Evaluation of Code Generation Models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguist...
-
[36]
Yuan Xiao, Yuchen Chen, Shiqing Ma, Haocheng Huang, Chunrong Fang, Yanwei Chen, Weisong Sun, Yunfeng Zhu, Xiaofang Zhang, and Zhenyu Chen. 2025. DeCoMa: Detecting and Purifying Code Dataset Watermarks through Dual Channel Code Abstraction. InProceedings of the 34th ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Compu...
-
[37]
Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, and Yuan Hong. 2024. An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 1795–1812. https://www.usenix.org/conf...
2024
-
[38]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Ke-Yang Chen, Kexin Yang, Mei Li, Min Xue...
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [39]
-
[40]
Zhang, Hong Jin Kang, Jieke Shi, Junda He, and David Lo
Zhou Yang, Bowen Xu, Jie M. Zhang, Hong Jin Kang, Jieke Shi, Junda He, and David Lo. 2024. Stealthy Backdoor Attack for Code Models.IEEE Trans. Software Eng.50, 4 (2024), 721–741. doi:10.1109/TSE.2024.3361661
-
[41]
Kwangsun Yoon and Ching Lai Hwang. 1995. Multiple attribute decision making.European Journal of Operational Research4, 4 (1995), 287–288
1995
-
[42]
Jiale Zhang, Haoxuan Li, Di Wu, Xiaobing Sun, Qinghua Lu, and Guodong Long. 2025. Beyond Dataset Watermarking: Model-Level Copyright Protection for Code Summarization Models. InProceedings of the ACM on Web Conference 2025. Association for Computing Machinery, New York, NY, USA, 147–157. doi:10.1145/3696410.3714641
-
[43]
Yuming Zhou, Yibiao Yang, Hongmin Lu, Lin Chen, Yanhui Li, Yangyang Zhao, Junyan Qian, and Baowen Xu. 2018. How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction.ACM Trans. Softw. Eng. Methodol.27, 1, Article 1 (April 2018), 51 pages. doi:10.1145/3183339 Received 2025-09-12; accepted 2025-12-22 Proc. ACM Softw. Eng.,...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.