Do Privacy Policies Match with the Logs? An Empirical Study of Privacy Disclosure in Android Application Logs
Pith reviewed 2026-05-10 04:15 UTC · model grok-4.3
The pith
Privacy policies align with actual Android app logs in only 0.4% of cases, as most apps leak sensitive data not declared in their policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that only 0.4% of the examined Android applications exhibit consistent alignment between the contents of their privacy policies and the sensitive information disclosed through their application logs. While 88% of apps provide policies and 28.5% reference logging, 27.7% of those references are vague, and 67.6% of apps leak sensitive data absent from their policies. The study concludes that current privacy policies frequently offer incomplete or ambiguous accounts of logging practices that do not match actual app behavior.
What carries the argument
Direct comparison of log-related statements extracted from privacy policies against sensitive data detected in generated application logs across 1,000 apps.
If this is right
- Most Android apps do not fully disclose their logging of sensitive information to users through privacy policies.
- Current policies leave users with limited accurate information about what data apps actually record.
- Widespread undisclosed leaks suggest that policy text alone does not reliably indicate real data collection behavior.
- App developers often fail to update or detail logging practices in policies to match implementation.
- Regulatory focus may need to shift toward verifying actual log contents rather than relying on policy statements.
Where Pith is reading between the lines
- Users relying on privacy policies for decisions about app installation may be making choices based on incomplete information.
- The mismatch could stem from policies not being revised when app features change, creating a maintenance gap.
- Similar studies on iOS or web applications might reveal whether this alignment problem is specific to Android logging.
- Automated tools that scan both policies and logs could help developers or regulators spot discrepancies before release.
Load-bearing premise
The methods used to extract relevant statements from policies and to identify sensitive information in logs are accurate and complete without significant false positives or missed items.
What would settle it
A manual audit of a random sample of the apps that finds the automated policy extraction missed many log mentions or that the log analysis flagged data as sensitive when it was not would undermine the reported mismatch rates.
Figures
read the original abstract
Privacy policies are intended to inform users about how software systems collect and handle data, yet they often remain vague or incomplete. This paper presents an empirical study of patterns in log-related statements within privacy policies and their alignment with privacy disclosures observed in Android application logs. We analyzed 1,000 Android apps across multiple categories, generating 86,836,964 log entries. Our findings reveal that while most applications (88.0%) provide privacy policies, only 28.5% explicitly mention logging practices. Among those that reference logging, most clearly describe what information is logged; however, 27.7% of log-related statements remain overly simplistic or vague, offering limited insight into actual data collection. We further observed widespread privacy leakages in application logs, with 67.6% of apps leaking sensitive information not mentioned in their policies. Alarmingly, only 0.4% of applications demonstrated consistent alignment between declared policy contents and actual logged data. These findings highlight that current privacy policies provide incomplete or ambiguous descriptions of logging practices, which frequently do not align with actual logging behaviors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports an empirical study of 1,000 Android apps that generated 86,836,964 log entries. It finds that 88.0% of apps supply privacy policies but only 28.5% mention logging; among those, 27.7% of log-related statements are vague. The study claims 67.6% of apps leak sensitive data not disclosed in their policies and that only 0.4% of apps exhibit consistent alignment between policy statements and observed logs.
Significance. If the automated detection pipelines are reliable, the work supplies large-scale evidence that privacy policies frequently fail to describe actual logging behavior, with direct relevance to mobile privacy, app-store auditing, and regulatory enforcement. The dataset size (1,000 apps, 86 M logs) is a clear strength and supports statistical claims about prevalence.
major comments (3)
- [§3.2] §3.2 (Log Analysis): The criteria and patterns used to flag sensitive information in the 86 M log entries are described only at a high level; no explicit keyword list, regex set, or classifier is supplied, and no manual validation or inter-annotator agreement on a sample is reported. This directly undermines the 67.6% leakage statistic.
- [§3.3] §3.3 (Policy Extraction): The method for identifying and classifying log-related statements in privacy policies is not accompanied by accuracy metrics, ground-truth sampling, or false-negative analysis. If the extractor relies on keyword matching, nuanced statements may be missed, scaling the reported 0.4% alignment figure.
- [§4.2] §4.2 (Alignment Results): The 0.4% consistent-alignment claim assumes exhaustive and error-free detection in both logs and policies; the paper provides neither error bounds nor a sensitivity analysis showing how classification mistakes would affect the headline percentage.
minor comments (2)
- [Table 2] Table 2 caption should explicitly state the total number of apps and logs underlying each percentage.
- [§2] The paper cites prior log-analysis studies but does not compare its detection rules or leakage rates against those earlier works.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our empirical study. We address each major comment below and indicate the revisions we will make to improve methodological transparency and the robustness of our claims.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Log Analysis): The criteria and patterns used to flag sensitive information in the 86 M log entries are described only at a high level; no explicit keyword list, regex set, or classifier is supplied, and no manual validation or inter-annotator agreement on a sample is reported. This directly undermines the 67.6% leakage statistic.
Authors: We agree that §3.2 would benefit from greater specificity. In the revised manuscript we will append the complete keyword list and regular-expression patterns used to detect sensitive information across the 86,836,964 log entries. We will also report a manual validation performed on a stratified random sample of 500 log entries, including inter-annotator agreement (Cohen’s κ). These additions will directly support the reliability of the 67.6 % leakage figure. revision: yes
-
Referee: [§3.3] §3.3 (Policy Extraction): The method for identifying and classifying log-related statements in privacy policies is not accompanied by accuracy metrics, ground-truth sampling, or false-negative analysis. If the extractor relies on keyword matching, nuanced statements may be missed, scaling the reported 0.4% alignment figure.
Authors: We acknowledge the absence of quantitative evaluation for the policy extractor. We will revise §3.3 to include precision, recall, and F1 scores obtained on a manually annotated ground-truth set of 200 privacy policies. We will further add a false-negative analysis that examines the risk of overlooking nuanced or context-dependent logging statements. This will allow readers to assess the potential effect on the 0.4 % alignment statistic. revision: yes
-
Referee: [§4.2] §4.2 (Alignment Results): The 0.4% consistent-alignment claim assumes exhaustive and error-free detection in both logs and policies; the paper provides neither error bounds nor a sensitivity analysis showing how classification mistakes would affect the headline percentage.
Authors: We concur that error bounds and sensitivity analysis are needed. In the revision we will insert a sensitivity analysis in §4.2 that varies detection thresholds and propagates estimated error rates from the log and policy validation studies. We will also report approximate confidence bounds on the 0.4 % alignment figure. These changes will clarify the robustness of the headline result. revision: yes
Circularity Check
No significant circularity in empirical study
full rationale
This is a direct empirical comparison of privacy policies and application logs across 1,000 Android apps and 86M log entries. The reported statistics (88.0% have policies, 28.5% mention logging, 67.6% leak sensitive info, 0.4% alignment) are computed from observed data via automated extraction and matching. No mathematical derivations, fitted parameters, predictions, self-citations, or ansatzes appear in the derivation chain; the claims rest on external data collection rather than reducing to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- criteria for classifying log entries as sensitive leaks
axioms (2)
- domain assumption The 1,000 selected apps form a representative sample of Android applications
- domain assumption Dynamic execution logs capture all relevant privacy-related data collection behaviors
Reference graph
Works this paper leans on
-
[1]
Benjamin Andow, Samin Yaseer Mahmud, Wenyu Wang, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Tao Xie. 2019.{PolicyLint}: investigating Do Privacy Policies Match with the Logs? An Empirical Study of Privacy Disclosure EASE 2026, 9–12 June, 2026, Glasgow, Scotland, United Kingdom internal privacy policy contradictions on google play. In2...
work page 2019
-
[2]
Benjamin Andow, Samin Yaseer Mahmud, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Serge Egelman. 2020. Actions Speak Louder than Words: Entity-Sensitive Privacy Policy and Data Flow Analysis with PoliCheck. In29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 985–1002. https://www.usenix.org/conference/usenixsecuri...
work page 2020
-
[3]
Rawan Baalous, Alanoud Althobaiti, Dareen Alyoubi, Rama Alzahrani, and Mona Aljohani. 2025. Detecting the Inconsistency between Android Apps’ Data Col- lection and Google Play’s Data Safety Using Static Analysis.Cybernetics and Information Technologies25, 1 (2025)
work page 2025
-
[4]
Jaspreet Bhatia and Travis D. Breaux. 2015. Towards an information type lexicon for privacy policies. In2015 IEEE Eighth International Workshop on Requirements Engineering and Law (RELA W). 19–24. doi:10.1109/RELAW.2015.7330207
-
[5]
Duc Bui, Yuan Yao, Kang G Shin, Jong-Min Choi, and Junbum Shin. 2021. Consis- tency analysis of data-usage purposes in mobile apps. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2824–2843
work page 2021
-
[6]
Kai Chih Chang, Razieh Nokhbeh Zaeem, and K. Suzanne Barber. 2020. Is Your Phone You? How Privacy Policies of Mobile Apps Allow the Use of Your Personally Identifiable Information. In2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). 256–262. doi:10.1109/TPS-ISA50397.2020.00041
-
[7]
Boyuan Chen and Zhen Ming (Jack) Jiang. 2017. Characterizing and detecting anti- patterns in the logging code. InProceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.). IEEE / ACM, 71–81. doi:10.1109/ICSE.2017.15
-
[8]
Zhiyuan Chen. 2024. A Comprehensive Study of Privacy Leakage Vulnerability in Android App Logs. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering(Sacramento, CA, USA)(ASE ’24). Association for Computing Machinery, New York, NY, USA, 2510–2513. doi:10.1145/3691620. 3695609
-
[9]
Zhiyuan Chen, Soham Sanjay Deo, Poorna Chander Reddy Puttaparthi, Yiming Tang, Xueling Zhang, and Weiyi Shang. 2024. From Logging to Leakage: A Study of Privacy Leakage in Android App Logs. In2024 39th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2484–2485
work page 2024
-
[10]
Zhiyuan Chen, Vanessa Nava-Camal, Tiash Roy, Zhe Li, Yiming Tang, Xueling Zhang, and Haibo Yang. 2025. Exploring LLMs’ Potential for Privacy Leakage Detection in Android App Logs: An Empirical Study.IEEE Software(2025), 1–6. doi:10.1109/MS.2025.3618099
-
[11]
Zishuo Ding, Yiming Tang, Xiaoyu Cheng, Heng Li, and Weiyi Shang. 2024. LoGenText-Plus: Improving Neural Machine Translation Based Logging Texts Generation with Syntactic Templates.ACM Trans. Softw. Eng. Methodol.33, 2 (2024), 38:1–38:45. doi:10.1145/3624740
-
[12]
Zishuo Ding, Yiming Tang, Yang Li, Heng Li, and Weiyi Shang. 2023. On the Tem- poral Relations between Logging and Code. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 843–854. doi:10.1109/ICSE48619.2023. 00079
-
[13]
Zishuo Ding, Yiming Tang, Yang Li, Heng Li, and Weiyi Shang. 2023. On the Temporal Relations between Logging and Code. In45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20,
work page 2023
-
[14]
IEEE, 843–854. doi:10.1109/ICSE48619.2023.00079
-
[15]
Fahrenkrog-Petersen, Han van der Aa, and Matthias Weidlich
Stephan A. Fahrenkrog-Petersen, Han van der Aa, and Matthias Weidlich. 2019. PRETSA: Event Log Sanitization for Privacy-aware Process Discovery. InInter- national Conference on Process Mining, ICPM 2019, Aachen, Germany, June 24-26,
work page 2019
-
[16]
Static detection of control-flow-related vulnerabilities using graph embedding,
IEEE, 1–8. doi:10.1109/ICPM.2019.00012
-
[17]
Wenhao Fan, Yaohui Sang, Daishuai Zhang, Ran Sun, and Yuan’an Liu. 2017. DroidInjector: A process injection-based dynamic tracking system for runtime behaviors of Android applications.Computers & Security70 (2017), 224–237. doi:10.1016/j.cose.2017.06.001
-
[18]
GDPR. [n. d.].General Data Protection Regulation. https://gdpr-info.eu/
-
[19]
Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R. Lyu. 2016. An Evaluation Study on Log Parsing and Its Use in Log Mining. In46th Annual IEEE/IFIP Interna- tional Conference on Dependable Systems and Networks, DSN 2016, Toulouse, France, June 28 - July 1, 2016. IEEE Computer Society, 654–661. doi:10.1109/DSN.2016.66
-
[20]
Josh Howarth. [n. d.].iPhone vs Android User Stats (2024 Data). https: //explodingtopics.com/blog/iphone-android-users
work page 2024
-
[21]
Yousra Javed and Ayesha Sajid. 2024. A Systematic Review of Privacy Policy Literature.ACM Comput. Surv.57, 2, Article 45 (Nov. 2024), 43 pages. doi:10. 1145/3698393
work page 2024
-
[22]
Zhihan Jiang, Jinyang Liu, Junjie Huang, Yichen Li, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Jieming Zhu, and Michael R. Lyu. 2024. A Large-scale Evalua- tion for Log Parsing Techniques: How Far are We?. InACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM
work page 2024
-
[23]
Mugdha Khedkar, Michael Schlichtig, and Eric Bodden. 2024. Advancing an- droid privacy assessments with automation. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops. 218–222
work page 2024
-
[24]
Kaijun Liu, Guoai Xu, Xiaomei Zhang, Guosheng Xu, and Zhangjie Zhao. 2022. Evaluating the privacy policy of android apps: a privacy policy compliance study for popular apps in China and Europe.Scientific Programming2022, 1 (2022), 2508690
work page 2022
-
[25]
Aleecia M McDonald and Lorrie Faith Cranor. 2008. The cost of reading privacy policies.Isjlp4 (2008), 543
work page 2008
-
[26]
Jonathan A Obar and Anne Oeldorf-Hirsch. 2020. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services.Information, Communication & Society23, 1 (2020), 128–147
work page 2020
-
[27]
State of California Department of Justice. [n. d.].California Consumer Privacy Act (CCPA). https://oag.ca.gov/privacy/ccpa
-
[28]
Christof Rath. 2016. Usable Privacy-Aware Logging for Unstructured Log Entries. In11th International Conference on A vailability, Reliability and Security, ARES 2016, Salzburg, Austria, August 31 - September 2, 2016. IEEE Computer Society, 272–277. doi:10.1109/ARES.2016.1
- [29]
-
[30]
Mukund Srinath, Soundarya Sundareswara, Pranav Venkit, C. Lee Giles, and Shomir Wilson. 2023. Privacy Lost and Found: An Investigation at Scale of Web Privacy Policy Availability. InProceedings of the ACM Symposium on Document Engineering 2023(Limerick, Ireland)(DocEng ’23). Association for Computing Machinery, New York, NY, USA, Article 26, 10 pages. doi...
-
[31]
Peter Story, Sebastian Zimmeck, and Norman M. Sadeh. 2018. Which Apps Have Privacy Policies? - An Analysis of Over One Million Google Play Store Apps. InPrivacy Technologies and Policy - 6th Annual Privacy Forum, APF 2018, Barcelona, Spain, June 13-14, 2018, Revised Selected Papers (Lecture Notes in Computer Science, Vol. 11079), Manel Medina, Andreas Mit...
-
[32]
Ronald J. Tallarida and Rodney B. Murray. 1987.Chi-Square Test. Springer New York, New York, NY, 140–142. doi:10.1007/978-1-4612-4974-0_43
- [33]
-
[34]
Robert M West. 2021. Best practice in statistics: Use the Welch t-test when testing the difference between two groups.Annals of clinical biochemistry58, 4 (2021), 267–269
work page 2021
-
[35]
Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Ed- uard H. Hovy, Joel R. Reidenberg, and Norman M. Sadeh. 2016. The Creation and Analysis of a Website Privacy Policy Corpus. InProce...
-
[36]
Le Yu, Xiapu Luo, Xule Liu, and Tao Zhang. 2016. Can We Trust the Privacy Policies of Android Apps?. In46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016, Toulouse, France, June 28 - July 1,
work page 2016
-
[37]
IEEE Computer Society, 538–549. doi:10.1109/DSN.2016.55
-
[38]
Siyu Yu, Pinjia He, Ningjiang Chen, and Yifan Wu. 2023. Brain: Log Parsing With Bidirectional Parallel Tree.IEEE Transactions on Services Computing16, 5 (2023), 3224–3237. doi:10.1109/TSC.2023.3270566
-
[39]
Yuxia Zhan, Yan Meng, Lu Zhou, Yichang Xiong, Xiaokuan Zhang, Lichuan Ma, Guoxing Chen, Qingqi Pei, and Haojin Zhu. 2024. VPVet: Vetting Privacy Policies of Virtual Reality Apps. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security(Salt Lake City, UT, USA) (CCS ’24). Association for Computing Machinery, New York, NY, ...
-
[40]
Haonan Zhang, Yiming Tang, Maxime Lamothe, Heng Li, and Weiyi Shang
-
[41]
Studying logging practice in test code.Empir. Softw. Eng.27, 4 (2022), 83. doi:10.1007/S10664-022-10139-0
-
[42]
Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R. Lyu. 2023. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. InIEEE International Symposium on Software Reliability Engineering (ISSRE). IEEE
work page 2023
-
[43]
Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel Reidenberg, N Cameron Russell, and Norman Sadeh. 2019. Maps: Scaling privacy compliance analysis to a million apps.Proceedings on privacy enhancing technologies(2019)
work page 2019
-
[44]
Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel R. Reidenberg, N. Cameron Russell, and Norman M. Sadeh. 2019. MAPS: Scaling Privacy Compliance Analysis to a Million Apps.Proc. Priv. En- hancing Technol.2019, 3 (2019), 66–86. doi:10.2478/POPETS-2019-0037
-
[45]
Sebastian Zimmeck, Ziqi Wang, Lieyong Zou, Roger Iyengar, Bin Liu, Florian Schaub, Shomir Wilson, Norman M. Sadeh, Steven M. Bellovin, and Joel R. Rei- denberg. 2016. Automated Analysis of Privacy Requirements for Mobile Apps. In 2016 AAAI Fall Symposia, Arlington, Virginia, USA, November 17-19, 2016. AAAI Press
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.