Recognition: unknown
PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store
Pith reviewed 2026-05-08 11:41 UTC · model grok-4.3
The pith
PrivSTRUCT extracts more than twice as many data-purpose excerpts from privacy policies by preserving their section structure, showing global statements overstate purposes 20 percent more often.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PrivSTRUCT is a systematic encoder-decoder framework that untangles complex privacy disclosures by retaining developer-defined structural cues from section headings. It extracts more than twice the number of data item and purpose excerpts compared with prior tools while preserving the intended scope of each disclosure. Large-scale application to 3,756 apps shows the probability of overstating a data purpose rises by 20.4 percent for first-party collection and 9.7 percent for third-party sharing when developers rely on globally defined rather than locally scoped purposes. Sensitive third-party data flows are frequently diluted into generic or unrelated categories.
What carries the argument
PrivSTRUCT encoder-decoder framework that processes privacy-policy text while preserving logical hierarchy from headings to link specific data items to their intended purposes without entanglement.
Load-bearing premise
The encoder-decoder framework accurately identifies overstating of purposes and structural headings reliably indicate the correct scope for each data practice without extraction errors or misclassifications.
What would settle it
A manual review of extracted excerpts from a representative sample of policies that finds no measurable difference in overstating rates between global and local purpose statements, or that PrivSTRUCT recovers a similar number of excerpts to existing flat-text tools.
Figures
read the original abstract
Existing research typically treats privacy policies as flat, uniform text, extracting information without regard for the document's logical hierarchy. Disregard for structural cues of section headings designed to guide the reader, often leads automated methods to entangle distinct data practices, particularly when linking sensitive data items to their specific purposes. To address this, we introduce PrivSTRUCT, a novel and systematic encoder and decoder combined framework that to untangle complex privacy disclosures. Benchmarking against the state-of-the-art tool PoliGrapher reveals that PrivSTRUCT robustly extracts more than x2 the number of data item and purpose excerpts while retaining developer-defined structural cues. By applying PrivSTRUCT to a large-scale dataset of 3,756 Android apps, we uncover a critical transparency gap: the probability of developers overstating a data purpose is 20.4% higher for first-party collection and 9.7% higher for third-party sharing when they rely on globally defined purposes rather than specific, locally scoped disclosures. Alarmingly, we find that sensitive third-party data flows such as sharing financial data for analytics are frequently diluted and entangled into generic or unrelated categories, highlighting a persistent failure in the current purpose disclosure landscape.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PrivSTRUCT, a novel encoder-decoder framework for extracting data items and purposes from privacy policies while preserving structural hierarchy via section headings to distinguish local versus global scopes. Benchmarking against PoliGrapher shows extraction of more than twice as many data-purpose excerpts. Application to 3,756 Android apps from the Google Play Store reveals that reliance on globally defined purposes increases the probability of overstating data purposes by 20.4% for first-party collection and 9.7% for third-party sharing, with additional observations on entanglement of sensitive third-party flows into generic categories.
Significance. If the extraction accuracy and overstatement classification are robust, this work would usefully demonstrate the value of incorporating document structure into privacy policy analysis tools and provide large-scale empirical evidence of transparency gaps in purpose disclosures. The scale of the 3,756-app study and the >2× improvement over an existing baseline are potentially impactful for both research and regulatory contexts. The significance depends on rigorous validation of the core pipeline, particularly the operationalization of 'overstating'.
major comments (2)
- [Abstract] Abstract: The central quantitative claims (>2× extraction of data item and purpose excerpts relative to PoliGrapher, plus the 20.4% and 9.7% probability increases) are load-bearing for the paper's contribution, yet the abstract provides no details on benchmarking dataset size, exact extraction counts, precision/recall/F1 for the additional pairs, or how false positives were ruled out. This prevents assessment of whether the reported robustness holds.
- [Abstract] Abstract: The transparency-gap findings rest on the encoder-decoder correctly identifying 'overstating' and using headings to determine local/global scope. No information is given on the definition of overstating, annotation protocol, inter-annotator agreement, or error analysis for scope assignment and classification; these are required to support the specific probability deltas on the 3,756-app corpus.
minor comments (2)
- [Abstract] The shorthand 'x2' should be written as '2×' or 'twice' for formal precision.
- Clarify early (e.g., in the introduction or methods) the exact criteria used to label a purpose disclosure as 'overstating' when comparing global versus local statements.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We agree that the abstract can be strengthened by incorporating key supporting details from the main text on benchmarking and the operationalization of overstating. We address each major comment below and will revise the abstract accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central quantitative claims (>2× extraction of data item and purpose excerpts relative to PoliGrapher, plus the 20.4% and 9.7% probability increases) are load-bearing for the paper's contribution, yet the abstract provides no details on benchmarking dataset size, exact extraction counts, precision/recall/F1 for the additional pairs, or how false positives were ruled out. This prevents assessment of whether the reported robustness holds.
Authors: We agree that the abstract is too concise and omits important context for the >2× claim. Section 4 of the manuscript reports the benchmarking dataset size, exact extraction counts for PrivSTRUCT versus PoliGrapher, and the corresponding precision, recall, and F1 scores. False positives were addressed via manual review of sampled extractions, with the procedure and results described in the error analysis. We will revise the abstract to include these key figures and metrics. revision: yes
-
Referee: [Abstract] Abstract: The transparency-gap findings rest on the encoder-decoder correctly identifying 'overstating' and using headings to determine local/global scope. No information is given on the definition of overstating, annotation protocol, inter-annotator agreement, or error analysis for scope assignment and classification; these are required to support the specific probability deltas on the 3,756-app corpus.
Authors: The definition of overstating, the heading-based scope assignment algorithm, the annotation protocol for validation, inter-annotator agreement, and error analysis are all provided in Sections 3 and 5. The 20.4% and 9.7% probability increases are computed on the full 3,756-app corpus after applying the validated extraction and classification pipeline. We will add a concise summary of the overstating definition and validation approach to the abstract. revision: yes
Circularity Check
No significant circularity in PrivSTRUCT derivation
full rationale
The paper introduces an encoder-decoder extraction framework, benchmarks it empirically against the independent PoliGrapher baseline on extraction volume, and applies the resulting tool to an external dataset of 3,756 apps to compute statistical overstatement probabilities. No load-bearing step reduces by construction to self-definition, fitted parameters renamed as predictions, or self-citation chains; the transparency-gap findings derive directly from application to new data rather than tautological equivalence with the framework's design.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Privacy policies contain a logical hierarchy indicated by section headings that can be leveraged to correctly scope data items to purposes.
Reference graph
Works this paper leans on
-
[1]
Providing a safe and trusted experience for everyone (2022), https://play.google.com/about/developer-content-policy/
2022
-
[2]
Proceedings on Privacy Enhancing Tech- nologies (2023)
Adhikari, A., Das, S., Dewri, R.: Evolution of composition, readability, and struc- ture of privacy policies over two decades. Proceedings on Privacy Enhancing Tech- nologies (2023)
2023
-
[3]
In: Proceedings of the Web Conference (2021)
Amos, R., Acar, G., Lucherini, E., Kshirsagar, M., Narayanan, A., Mayer, J.: Pri- vacy policies over time: Curation and analysis of a million-document dataset. In: Proceedings of the Web Conference (2021)
2021
-
[4]
In: 28th USENIX Security Symposium (USENIX Se- curity 19)
Andow, B., Mahmud, S.Y., Wang, W., Whitaker, J., Enck, W., Reaves, B., Singh, K., Xie, T.: PolicyLint: Investigating internal privacy policy contradic- tions on google play. In: 28th USENIX Security Symposium (USENIX Se- curity 19). pp. 585–602. USENIX Association, Santa Clara, CA (Aug 2019), https://www.usenix.org/conference/usenixsecurity19/presentation/andow
2019
-
[5]
In: 29th USENIX Security Symposium (USENIX Security 20)
Andow,B.,Mahmud,S.Y.,Whitaker,J.,Enck,W.,Reaves,B.,Singh,K.,Egelman, S.: Actions speak louder than words:{Entity-Sensitive}privacy policy and data flow analysis with{PoliCheck}. In: 29th USENIX Security Symposium (USENIX Security 20). pp. 985–1002 (2020)
2020
-
[6]
In: Proceedings of The Web Confer- ence 2020
Bannihatti Kumar, V., Iyengar, R., Nisal, N., Feng, Y., Habib, H., Story, P., Cherivirala, S., Hagan, M., Cranor, L., Wilson, S., Schaub, F., Sadeh, N.: Finding a choice in a haystack: Automatic extraction of opt-out state- ments from privacy policy text. In: Proceedings of The Web Confer- ence 2020. p. 1943–1954. WWW ’20, Association for Computing Machin...
-
[7]
In: Proceedings of the 2021 ACM SIGSAC Confer- ence on Computer and Communications Security
Bui, D., Yao, Y., Shin, K.G., Choi, J.M., Shin, J.: Consistency analysis of data- usage purposes in mobile apps. In: Proceedings of the 2021 ACM SIGSAC Confer- ence on Computer and Communications Security. pp. 2824–2843 (2021)
2021
-
[8]
Proceedings on Privacy Enhancing Technologies (2025)
Chanenson, J., Pickering, M., Apthorpe, N.: Automating governing knowledge commons and contextual integrity (gkc-ci) privacy policy annotations with large language models. Proceedings on Privacy Enhancing Technologies (2025)
2025
-
[9]
In: 32nd USENIX Security Sym- posium (USENIX Security 23)
Cui, H., Trimananda, R., Markopoulou, A., Jordan, S.:{PoliGraph}: Automated privacy policy analysis using knowledge graphs. In: 32nd USENIX Security Sym- posium (USENIX Security 23). pp. 1037–1054 (2023)
2023
-
[10]
now take some cookies: Measuring the GDPR’s impact on web privacy
Degeling, M., Utz, C., Lentzsch, C., Hosseini, H., Schaub, F., Holz, T.: We value your privacy... now take some cookies: Measuring the GDPR’s impact on web privacy. arXiv preprint arXiv:1808.05096 (2018)
-
[11]
arXiv e-prints pp
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv e-prints pp. arXiv–2407 (2024)
2024
-
[12]
Retrieval-Augmented Generation for Large Language Models: A Survey
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, H., Wang, H.: Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.109972(1) (2023)
work page internal anchor Pith review arXiv 2023
-
[13]
USENIX SOUPS (2020)
Gopinath, A.A.M., Kumar, V.B., Wilson, S., Sadeh, N.: Automatic section title generation to improve the readability of privacy policies. USENIX SOUPS (2020)
2020
-
[14]
Proceedings of the 27th USENIX Security Symposium p
Harkous, H., Fawaz, K., Lebret, R., Schaub, F., Aberer, K.: Polisis: Automated analysis and presentation of privacy policies using deep learning. Proceedings of the 27th USENIX Security Symposium p. 531 – 548 (02 2018) 18 Silva B. et al
2018
-
[15]
In: 33rd USENIX Security Symposium (USENIX Security 24)
Khandelwal, R., Nayak, A., Chung, P., Fawaz, K.: Unpacking privacy labels: A measurement and developer perspective on google’s data safety section. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 2831–2848 (2024)
2024
-
[16]
arXiv preprint arXiv:1809.08396 (2018)
Linden, T., Khandelwal, R., Harkous, H., Fawaz, K.: The privacy policy landscape after the GDPR. arXiv preprint arXiv:1809.08396 (2018)
-
[17]
I/S: A Journal of Law and Policy for the Information Society4, 543 (2008)
McDonald, A.M., Cranor, L.F.: The cost of reading privacy policies. I/S: A Journal of Law and Policy for the Information Society4, 543 (2008)
2008
-
[18]
In: Proceedings of the 29th ACM international conference on information & knowledge management
Nokhbeh Zaeem, R., Anya, S., Issa, A., Nimergood, J., Rogers, I., Shah, V., Sri- vastava, A., Barber, K.S.: Privacycheck v2: A tool that recaps privacy policies for you. In: Proceedings of the 29th ACM international conference on information & knowledge management. pp. 3441–3444 (2020)
2020
-
[19]
Advances in neural information processing systems36, 53728–53741 (2023)
Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023)
2023
-
[20]
Robles, P.: GDPR: What future for first, second and third-party data (2018), https://econsultancy.com/gdpr-what-future-for-first-second-and-third-party- data//
2018
- [21]
-
[22]
In: HCI International 2018, Las Vegas, USA
Rudolph, M., Feth, D., Polst, S.: Why users ignore privacy policies–a survey and intention model for explaining user privacy behavior. In: HCI International 2018, Las Vegas, USA. pp. 587–598. Springer (2018)
2018
-
[23]
In: AAAI Fall Symposia (2016)
Sathyendra, K.M., Schaub, F., Wilson, S., Sadeh, N.M.: Automatic extraction of opt-out choices from privacy policies. In: AAAI Fall Symposia (2016)
2016
-
[24]
In: Proceedings of the 2017 conference on empirical methods in natural language processing
Sathyendra, K.M., Wilson, S., Schaub, F., Zimmeck, S., Sadeh, N.: Identifying the provision of choices in privacy policy text. In: Proceedings of the 2017 conference on empirical methods in natural language processing. pp. 2774–2779 (2017)
2017
- [25]
-
[26]
In: 2024 Conference on Building a Secure & Empowered Cyberspace (BuildSEC)
Silva, B., Denipitiyage, D., Seneviratne, S., Mahanti, A., Seneviratne, A.: Entailment-driven privacy policy classification with llms. In: 2024 Conference on Building a Secure & Empowered Cyberspace (BuildSEC). pp. 8–15. IEEE (2024)
2024
-
[27]
Srinath, M., Wilson, S., Giles, C.L.: Privacy at scale: Introducing the privaseer corpus of web privacy policies. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 6829– 6839 (2021)
2021
-
[28]
In: AAAI spring symposium on privacy-enhancing artificial intelligence and language technologies
Story, P., Zimmeck, S., Ravichander, A., Smullen, D., Wang, Z., Reidenberg, J., Russell, N.C., Sadeh, N.: Natural language processing for mobile app privacy com- pliance. In: AAAI spring symposium on privacy-enhancing artificial intelligence and language technologies. vol. 2, p. 4 (2019)
2019
-
[29]
Information Systems Frontiers, 13:501–514
Tang, C., Liu, Z., Ma, C., Wu, Z., Li, Y., Liu, W., Zhu, D., Li, Q., Li, X., Liu, T., et al.: PolicyGPT: Automated analysis of privacy policies with large language models. arXiv preprint arXiv:2309.10238 (2023)
-
[30]
In: Proceedings of the 54th ACL (2016) PrivSTRUCT 19
Wilson, S., Schaub, F., Dara, A.A., Liu, F., Cherivirala, S., Leon, P.G., Ander- sen, M.S., Zimmeck, S., Sathyendra, K.M., Russell, N.C., et al.: The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th ACL (2016) PrivSTRUCT 19
2016
-
[31]
In: 32nd USENIX Security Symposium (USENIX Security 23)
Zhou, L., Wei, C., Zhu, T., Chen, G., Zhang, X., Du, S., Cao, H., Zhu, H.: {POLICYCOMP}: counterpart comparison of privacy policies uncovers over- broad personal data collection practices. In: 32nd USENIX Security Symposium (USENIX Security 23). pp. 1073–1090 (2023)
2023
-
[32]
In: 23rd USENIX Security Symposium (USENIX Security 14)
Zimmeck, S., Bellovin, S.M.: Privee: An architecture for automatically analyzing web privacy policies. In: 23rd USENIX Security Symposium (USENIX Security 14). pp. 1–16 (2014) A Appendix A.1 DPO for heading extraction We show the benchmarking results of the Direct Preference Optimisation (DPO) training in Figure 9. Sub Figure (a) represents the fitment of...
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.