PolicyGapper detects 2,689 omitted disclosures across 330 top apps using LLMs, with 0.76 average F1-score on manual validation of a 10% subset.
nutrition label
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
PrivacyAkinator uses LLM-generated questions grounded in data-flow representations and a news-mined design space to help developers surface privacy decisions, yielding 47% more decisions identified in 73% less time than PRAM in a 24-person study.
Survey and forum analysis of 683 Android developers finds they manually classify app data for Google's Data Safety Section or skip it, feel confident spotting collected data but not in translating it to the form, and worry about rejection.
Analysis of 878 skill specifications shows operational basis cues are common but example tasks appear in only 19% and all four comprehension anchors in just 2.3%, with a small DNS/C2 case study illustrating the practical impact of missing examples.
citing papers explorer
-
PolicyGapper: Automated Detection of Inconsistencies Between Google Play Data Safety Sections and Privacy Policies Using LLMs
PolicyGapper detects 2,689 omitted disclosures across 330 top apps using LLMs, with 0.76 average F1-score on manual validation of a 10% subset.
-
PrivacyAkinator: Articulating Key Privacy Design Decisions by Answering LLM-Generated Multiple-choice Questions
PrivacyAkinator uses LLM-generated questions grounded in data-flow representations and a news-mined design space to help developers surface privacy decisions, yielding 47% more decisions identified in 73% less time than PRAM in a 24-person study.
-
Challenges in Android Data Disclosure: An Empirical Study
Survey and forum analysis of 683 Android developers finds they manually classify app data for Google's Data Safety Section or skip it, feel confident spotting collected data but not in translating it to the form, and worry about rejection.
-
Toward User Comprehension Supports for LLM Agent Skill Specifications
Analysis of 878 skill specifications shows operational basis cues are common but example tasks appear in only 19% and all four comprehension anchors in just 2.3%, with a small DNS/C2 case study illustrating the practical impact of missing examples.