pith. machine review for the scientific record. sign in

arxiv: 2604.22157 · v1 · submitted 2026-04-24 · 💻 cs.CR · cs.AI

Recognition: unknown

PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store

Authors on Pith no claims yet

Pith reviewed 2026-05-08 11:41 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords privacy policiesdata purpose complianceAndroid appsGoogle Play Storestructural hierarchyencoder-decodertransparency gapfirst-party collection
0
0 comments X

The pith

PrivSTRUCT extracts more than twice as many data-purpose excerpts from privacy policies by preserving their section structure, showing global statements overstate purposes 20 percent more often.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PrivSTRUCT, an encoder-decoder framework that reads privacy policies while keeping their logical hierarchy of headings and sections intact instead of flattening the text. This prevents distinct data practices from becoming entangled when linking specific data items to their declared purposes. When tested on 3,756 Android apps, the method finds that developers using broad global purpose definitions overstate the actual purpose 20.4 percent more often for first-party collection and 9.7 percent more often for third-party sharing than when they use narrow, locally scoped statements. Sensitive flows such as sharing financial data for analytics are often mixed into generic categories. The result highlights a structural transparency problem in how current policies disclose data use.

Core claim

PrivSTRUCT is a systematic encoder-decoder framework that untangles complex privacy disclosures by retaining developer-defined structural cues from section headings. It extracts more than twice the number of data item and purpose excerpts compared with prior tools while preserving the intended scope of each disclosure. Large-scale application to 3,756 apps shows the probability of overstating a data purpose rises by 20.4 percent for first-party collection and 9.7 percent for third-party sharing when developers rely on globally defined rather than locally scoped purposes. Sensitive third-party data flows are frequently diluted into generic or unrelated categories.

What carries the argument

PrivSTRUCT encoder-decoder framework that processes privacy-policy text while preserving logical hierarchy from headings to link specific data items to their intended purposes without entanglement.

Load-bearing premise

The encoder-decoder framework accurately identifies overstating of purposes and structural headings reliably indicate the correct scope for each data practice without extraction errors or misclassifications.

What would settle it

A manual review of extracted excerpts from a representative sample of policies that finds no measurable difference in overstating rates between global and local purpose statements, or that PrivSTRUCT recovers a similar number of excerpts to existing flat-text tools.

Figures

Figures reproduced from arXiv: 2604.22157 by Anirban Mahanti, Aruna Seneviratne, Bhanuka Silva, Suranga Senevirante.

Figure 1
Figure 1. Figure 1: Left: Privacy policy portion of ‘HD Pro Walls’ developer related to first party collection details. Right: The data safety declaration of the developer’s app called ‘4K Wallpapers - Auto Changer’, with nearly 10M+ downloads, yet contradicting with the policy itself based on data practices. Extracted on December, 2025 making them even less accessible [10, 16]. In the mobile app ecosystem, where users increa… view at source ↗
Figure 2
Figure 2. Figure 2: PrivSTRUCT framework 3 PrivSTRUCT Framework This section first details the privacy policy dataset, followed by a breakdown of each module within the PrivSTRUCT framework (the complete pipeline is illustrated in view at source ↗
Figure 3
Figure 3. Figure 3: Synthesising of the DPO dataset 3.2 Heading extraction Modern HTML based privacy policies frequently rely on complex CSS and JavaScript rather than a semantic Document Object Model (DOM) tree, there￾fore, developers rarely use standard HTML heading tags (e.g., <h1> through <h6>) [6, 14, 25]. Using standard HTML parsers then inevitably reduce the doc￾ument to a flat text output, stripping away implicit hier… view at source ↗
Figure 4
Figure 4. Figure 4: Analysing 750 privacy policies based on semantic (text encoder) classification versus structural (heading-content basis) labels baselines and next, the classification performance of encoder-based modules un￾der varying context settings. Finally we compare PrivSTRUCT with PoliGra￾pher, a state-of-the-art knowledge graph based NLP tool for extracting privacy policy information. 4.1 Confirming the availabilit… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison with PoliGraph for the four classifiers utilised in PrivSTRUCT. The column #Train indicates how many samples we used to fine tune each model and we followed LLM boot￾strapped training, where the training labels are generated via GPT5 models. This stage allows us to use efficient encoder models (∼500M parameters in to￾tal) for the label generation task without relying on locally hosted much large… view at source ↗
Figure 6
Figure 6. Figure 6: Locally-defined, globally-defined or un-defined/floating purposes bedded within single sentence excerpts), whereas PoliGrapher averages only 38.1 (a 52.1% reduction). The disparity is even more pronounced for data purposes: PrivSTRUCT extracts an average of 92.8 unique purpose statements per policy, compared to PoliGrapher’s 10.1 (an 89.1% reduction). Furthermore, upon map￾ping these extracted excerpts to … view at source ↗
Figure 7
Figure 7. Figure 7: Results for Purpose Compliance view at source ↗
Figure 8
Figure 8. Figure 8: Results for Purpose Dilution tices. We denote these contributions via ∆′ . As we identify more and more data￾item linkages that are globally mapped, well-disclosures and over-statements tend to go higher while reducing the under-statements. More specifically, aver￾age PWD|collect improves by 18.0% and average PWD|share does not increase as much; only by 6.9%. This indicates that developers are less likely … view at source ↗
Figure 9
Figure 9. Figure 9: DPO training results for heading extraction view at source ↗
read the original abstract

Existing research typically treats privacy policies as flat, uniform text, extracting information without regard for the document's logical hierarchy. Disregard for structural cues of section headings designed to guide the reader, often leads automated methods to entangle distinct data practices, particularly when linking sensitive data items to their specific purposes. To address this, we introduce PrivSTRUCT, a novel and systematic encoder and decoder combined framework that to untangle complex privacy disclosures. Benchmarking against the state-of-the-art tool PoliGrapher reveals that PrivSTRUCT robustly extracts more than x2 the number of data item and purpose excerpts while retaining developer-defined structural cues. By applying PrivSTRUCT to a large-scale dataset of 3,756 Android apps, we uncover a critical transparency gap: the probability of developers overstating a data purpose is 20.4% higher for first-party collection and 9.7% higher for third-party sharing when they rely on globally defined purposes rather than specific, locally scoped disclosures. Alarmingly, we find that sensitive third-party data flows such as sharing financial data for analytics are frequently diluted and entangled into generic or unrelated categories, highlighting a persistent failure in the current purpose disclosure landscape.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces PrivSTRUCT, a novel encoder-decoder framework for extracting data items and purposes from privacy policies while preserving structural hierarchy via section headings to distinguish local versus global scopes. Benchmarking against PoliGrapher shows extraction of more than twice as many data-purpose excerpts. Application to 3,756 Android apps from the Google Play Store reveals that reliance on globally defined purposes increases the probability of overstating data purposes by 20.4% for first-party collection and 9.7% for third-party sharing, with additional observations on entanglement of sensitive third-party flows into generic categories.

Significance. If the extraction accuracy and overstatement classification are robust, this work would usefully demonstrate the value of incorporating document structure into privacy policy analysis tools and provide large-scale empirical evidence of transparency gaps in purpose disclosures. The scale of the 3,756-app study and the >2× improvement over an existing baseline are potentially impactful for both research and regulatory contexts. The significance depends on rigorous validation of the core pipeline, particularly the operationalization of 'overstating'.

major comments (2)
  1. [Abstract] Abstract: The central quantitative claims (>2× extraction of data item and purpose excerpts relative to PoliGrapher, plus the 20.4% and 9.7% probability increases) are load-bearing for the paper's contribution, yet the abstract provides no details on benchmarking dataset size, exact extraction counts, precision/recall/F1 for the additional pairs, or how false positives were ruled out. This prevents assessment of whether the reported robustness holds.
  2. [Abstract] Abstract: The transparency-gap findings rest on the encoder-decoder correctly identifying 'overstating' and using headings to determine local/global scope. No information is given on the definition of overstating, annotation protocol, inter-annotator agreement, or error analysis for scope assignment and classification; these are required to support the specific probability deltas on the 3,756-app corpus.
minor comments (2)
  1. [Abstract] The shorthand 'x2' should be written as '2×' or 'twice' for formal precision.
  2. Clarify early (e.g., in the introduction or methods) the exact criteria used to label a purpose disclosure as 'overstating' when comparing global versus local statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that the abstract can be strengthened by incorporating key supporting details from the main text on benchmarking and the operationalization of overstating. We address each major comment below and will revise the abstract accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central quantitative claims (>2× extraction of data item and purpose excerpts relative to PoliGrapher, plus the 20.4% and 9.7% probability increases) are load-bearing for the paper's contribution, yet the abstract provides no details on benchmarking dataset size, exact extraction counts, precision/recall/F1 for the additional pairs, or how false positives were ruled out. This prevents assessment of whether the reported robustness holds.

    Authors: We agree that the abstract is too concise and omits important context for the >2× claim. Section 4 of the manuscript reports the benchmarking dataset size, exact extraction counts for PrivSTRUCT versus PoliGrapher, and the corresponding precision, recall, and F1 scores. False positives were addressed via manual review of sampled extractions, with the procedure and results described in the error analysis. We will revise the abstract to include these key figures and metrics. revision: yes

  2. Referee: [Abstract] Abstract: The transparency-gap findings rest on the encoder-decoder correctly identifying 'overstating' and using headings to determine local/global scope. No information is given on the definition of overstating, annotation protocol, inter-annotator agreement, or error analysis for scope assignment and classification; these are required to support the specific probability deltas on the 3,756-app corpus.

    Authors: The definition of overstating, the heading-based scope assignment algorithm, the annotation protocol for validation, inter-annotator agreement, and error analysis are all provided in Sections 3 and 5. The 20.4% and 9.7% probability increases are computed on the full 3,756-app corpus after applying the validated extraction and classification pipeline. We will add a concise summary of the overstating definition and validation approach to the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity in PrivSTRUCT derivation

full rationale

The paper introduces an encoder-decoder extraction framework, benchmarks it empirically against the independent PoliGrapher baseline on extraction volume, and applies the resulting tool to an external dataset of 3,756 apps to compute statistical overstatement probabilities. No load-bearing step reduces by construction to self-definition, fitted parameters renamed as predictions, or self-citation chains; the transparency-gap findings derive directly from application to new data rather than tautological equivalence with the framework's design.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based on abstract only; the central approach rests on the assumption that section headings provide reliable scoping for data purposes. No free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Privacy policies contain a logical hierarchy indicated by section headings that can be leveraged to correctly scope data items to purposes.
    This is the core premise enabling the untangling of disclosures and the comparison of global versus local statements.

pith-pipeline@v0.9.0 · 5523 in / 1247 out tokens · 64886 ms · 2026-05-08T11:41:28.653895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Providing a safe and trusted experience for everyone (2022), https://play.google.com/about/developer-content-policy/

  2. [2]

    Proceedings on Privacy Enhancing Tech- nologies (2023)

    Adhikari, A., Das, S., Dewri, R.: Evolution of composition, readability, and struc- ture of privacy policies over two decades. Proceedings on Privacy Enhancing Tech- nologies (2023)

  3. [3]

    In: Proceedings of the Web Conference (2021)

    Amos, R., Acar, G., Lucherini, E., Kshirsagar, M., Narayanan, A., Mayer, J.: Pri- vacy policies over time: Curation and analysis of a million-document dataset. In: Proceedings of the Web Conference (2021)

  4. [4]

    In: 28th USENIX Security Symposium (USENIX Se- curity 19)

    Andow, B., Mahmud, S.Y., Wang, W., Whitaker, J., Enck, W., Reaves, B., Singh, K., Xie, T.: PolicyLint: Investigating internal privacy policy contradic- tions on google play. In: 28th USENIX Security Symposium (USENIX Se- curity 19). pp. 585–602. USENIX Association, Santa Clara, CA (Aug 2019), https://www.usenix.org/conference/usenixsecurity19/presentation/andow

  5. [5]

    In: 29th USENIX Security Symposium (USENIX Security 20)

    Andow,B.,Mahmud,S.Y.,Whitaker,J.,Enck,W.,Reaves,B.,Singh,K.,Egelman, S.: Actions speak louder than words:{Entity-Sensitive}privacy policy and data flow analysis with{PoliCheck}. In: 29th USENIX Security Symposium (USENIX Security 20). pp. 985–1002 (2020)

  6. [6]

    In: Proceedings of The Web Confer- ence 2020

    Bannihatti Kumar, V., Iyengar, R., Nisal, N., Feng, Y., Habib, H., Story, P., Cherivirala, S., Hagan, M., Cranor, L., Wilson, S., Schaub, F., Sadeh, N.: Finding a choice in a haystack: Automatic extraction of opt-out state- ments from privacy policy text. In: Proceedings of The Web Confer- ence 2020. p. 1943–1954. WWW ’20, Association for Computing Machin...

  7. [7]

    In: Proceedings of the 2021 ACM SIGSAC Confer- ence on Computer and Communications Security

    Bui, D., Yao, Y., Shin, K.G., Choi, J.M., Shin, J.: Consistency analysis of data- usage purposes in mobile apps. In: Proceedings of the 2021 ACM SIGSAC Confer- ence on Computer and Communications Security. pp. 2824–2843 (2021)

  8. [8]

    Proceedings on Privacy Enhancing Technologies (2025)

    Chanenson, J., Pickering, M., Apthorpe, N.: Automating governing knowledge commons and contextual integrity (gkc-ci) privacy policy annotations with large language models. Proceedings on Privacy Enhancing Technologies (2025)

  9. [9]

    In: 32nd USENIX Security Sym- posium (USENIX Security 23)

    Cui, H., Trimananda, R., Markopoulou, A., Jordan, S.:{PoliGraph}: Automated privacy policy analysis using knowledge graphs. In: 32nd USENIX Security Sym- posium (USENIX Security 23). pp. 1037–1054 (2023)

  10. [10]

    now take some cookies: Measuring the GDPR’s impact on web privacy

    Degeling, M., Utz, C., Lentzsch, C., Hosseini, H., Schaub, F., Holz, T.: We value your privacy... now take some cookies: Measuring the GDPR’s impact on web privacy. arXiv preprint arXiv:1808.05096 (2018)

  11. [11]

    arXiv e-prints pp

    Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv e-prints pp. arXiv–2407 (2024)

  12. [12]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, H., Wang, H.: Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.109972(1) (2023)

  13. [13]

    USENIX SOUPS (2020)

    Gopinath, A.A.M., Kumar, V.B., Wilson, S., Sadeh, N.: Automatic section title generation to improve the readability of privacy policies. USENIX SOUPS (2020)

  14. [14]

    Proceedings of the 27th USENIX Security Symposium p

    Harkous, H., Fawaz, K., Lebret, R., Schaub, F., Aberer, K.: Polisis: Automated analysis and presentation of privacy policies using deep learning. Proceedings of the 27th USENIX Security Symposium p. 531 – 548 (02 2018) 18 Silva B. et al

  15. [15]

    In: 33rd USENIX Security Symposium (USENIX Security 24)

    Khandelwal, R., Nayak, A., Chung, P., Fawaz, K.: Unpacking privacy labels: A measurement and developer perspective on google’s data safety section. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 2831–2848 (2024)

  16. [16]

    arXiv preprint arXiv:1809.08396 (2018)

    Linden, T., Khandelwal, R., Harkous, H., Fawaz, K.: The privacy policy landscape after the GDPR. arXiv preprint arXiv:1809.08396 (2018)

  17. [17]

    I/S: A Journal of Law and Policy for the Information Society4, 543 (2008)

    McDonald, A.M., Cranor, L.F.: The cost of reading privacy policies. I/S: A Journal of Law and Policy for the Information Society4, 543 (2008)

  18. [18]

    In: Proceedings of the 29th ACM international conference on information & knowledge management

    Nokhbeh Zaeem, R., Anya, S., Issa, A., Nimergood, J., Rogers, I., Shah, V., Sri- vastava, A., Barber, K.S.: Privacycheck v2: A tool that recaps privacy policies for you. In: Proceedings of the 29th ACM international conference on information & knowledge management. pp. 3441–3444 (2020)

  19. [19]

    Advances in neural information processing systems36, 53728–53741 (2023)

    Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023)

  20. [20]

    Robles, P.: GDPR: What future for first, second and third-party data (2018), https://econsultancy.com/gdpr-what-future-for-first-second-and-third-party- data//

  21. [21]

    Rodriguez, D., Yang, I., Alamo, J.M.D., Sadeh, N.: Large language models: A new approach for privacy policy analysis at scale (2024), https://arxiv.org/abs/2405.20900

  22. [22]

    In: HCI International 2018, Las Vegas, USA

    Rudolph, M., Feth, D., Polst, S.: Why users ignore privacy policies–a survey and intention model for explaining user privacy behavior. In: HCI International 2018, Las Vegas, USA. pp. 587–598. Springer (2018)

  23. [23]

    In: AAAI Fall Symposia (2016)

    Sathyendra, K.M., Schaub, F., Wilson, S., Sadeh, N.M.: Automatic extraction of opt-out choices from privacy policies. In: AAAI Fall Symposia (2016)

  24. [24]

    In: Proceedings of the 2017 conference on empirical methods in natural language processing

    Sathyendra, K.M., Wilson, S., Schaub, F., Zimmeck, S., Sadeh, N.: Identifying the provision of choices in privacy policy text. In: Proceedings of the 2017 conference on empirical methods in natural language processing. pp. 2774–2779 (2017)

  25. [25]

    Silva, B., Denipitiyage, D., Mahanti, A., Seneviratne, A., Seneviratne, S.: Privprism: Automatically detecting discrepancies between google play data safety declarationsanddeveloperprivacypolicies.arXivpreprintarXiv:2603.09214(2026)

  26. [26]

    In: 2024 Conference on Building a Secure & Empowered Cyberspace (BuildSEC)

    Silva, B., Denipitiyage, D., Seneviratne, S., Mahanti, A., Seneviratne, A.: Entailment-driven privacy policy classification with llms. In: 2024 Conference on Building a Secure & Empowered Cyberspace (BuildSEC). pp. 8–15. IEEE (2024)

  27. [27]

    Srinath, M., Wilson, S., Giles, C.L.: Privacy at scale: Introducing the privaseer corpus of web privacy policies. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 6829– 6839 (2021)

  28. [28]

    In: AAAI spring symposium on privacy-enhancing artificial intelligence and language technologies

    Story, P., Zimmeck, S., Ravichander, A., Smullen, D., Wang, Z., Reidenberg, J., Russell, N.C., Sadeh, N.: Natural language processing for mobile app privacy com- pliance. In: AAAI spring symposium on privacy-enhancing artificial intelligence and language technologies. vol. 2, p. 4 (2019)

  29. [29]

    Information Systems Frontiers, 13:501–514

    Tang, C., Liu, Z., Ma, C., Wu, Z., Li, Y., Liu, W., Zhu, D., Li, Q., Li, X., Liu, T., et al.: PolicyGPT: Automated analysis of privacy policies with large language models. arXiv preprint arXiv:2309.10238 (2023)

  30. [30]

    In: Proceedings of the 54th ACL (2016) PrivSTRUCT 19

    Wilson, S., Schaub, F., Dara, A.A., Liu, F., Cherivirala, S., Leon, P.G., Ander- sen, M.S., Zimmeck, S., Sathyendra, K.M., Russell, N.C., et al.: The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th ACL (2016) PrivSTRUCT 19

  31. [31]

    In: 32nd USENIX Security Symposium (USENIX Security 23)

    Zhou, L., Wei, C., Zhu, T., Chen, G., Zhang, X., Du, S., Cao, H., Zhu, H.: {POLICYCOMP}: counterpart comparison of privacy policies uncovers over- broad personal data collection practices. In: 32nd USENIX Security Symposium (USENIX Security 23). pp. 1073–1090 (2023)

  32. [32]

    In: 23rd USENIX Security Symposium (USENIX Security 14)

    Zimmeck, S., Bellovin, S.M.: Privee: An architecture for automatically analyzing web privacy policies. In: 23rd USENIX Security Symposium (USENIX Security 14). pp. 1–16 (2014) A Appendix A.1 DPO for heading extraction We show the benchmarking results of the Direct Preference Optimisation (DPO) training in Figure 9. Sub Figure (a) represents the fitment of...