Challenges in Android Data Disclosure: An Empirical Study
Pith reviewed 2026-05-16 10:56 UTC · model grok-4.3
The pith
Android developers confidently recognize data their apps collect but struggle to turn that into accurate, compliant disclosures on Google's Data Safety Section form.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through survey responses from 41 developers and analysis of discussions by 642 more, the paper establishes that developers can identify privacy-related data in their apps yet encounter repeated obstacles when translating that knowledge into Data Safety Section disclosures, including manual categorization difficulties, incomplete form comprehension, and worries over rejection for non-compliance with Google's rules.
What carries the argument
An empirical combination of a targeted developer survey and large-scale forum thread analysis that surfaces patterns in how developers map app data to the fixed DSS categories.
If this is right
- Clearer official guidance on data categorization would reduce reliance on ad-hoc manual work and external forums.
- Tooling that links code scanning directly to DSS categories could lower the chance of incomplete or mismatched disclosures.
- Reduced rejection risk from disclosure errors would encourage more developers to report data accurately rather than omit entries.
- Widespread adoption of better support materials would shrink the gap between data recognition and compliant reporting.
Where Pith is reading between the lines
- Similar disclosure challenges likely appear in other app-store privacy sections that use fixed category lists.
- Integrating automated data-flow analysis into developer IDEs could test whether the reported confidence gap shrinks in practice.
- Training resources focused on form navigation rather than data detection might address the specific confidence drop the study identifies.
Load-bearing premise
The 41 survey participants and 172 forum threads speak for the wider group of Android developers who must fill out the Data Safety Section.
What would settle it
A larger random sample of active Android developers who report high confidence and low effort when completing the DSS form without the listed difficulties would contradict the central finding.
Figures
read the original abstract
Current legal frameworks enforce that Android developers accurately report the data their apps collect. However, large codebases can make this reporting challenging. This paper employs an empirical approach to understand developers' experience with Google Play Store's Data Safety Section (DSS) form. We first survey 41 Android developers to understand how they categorize privacy-related data into DSS categories and how confident they feel when completing the DSS form. To gain a broader and more detailed view of the challenges developers encounter during the process, we complement the survey with an analysis of 172 online developer discussions, capturing the perspectives of 642 additional developers. Together, these two data sources represent insights from 683 developers. Our findings reveal that developers often manually classify the privacy-related data their apps collect into the data categories defined by Google-or, in some cases, omit classification entirely-and rely heavily on existing online resources when completing the form. Moreover, developers are generally confident in recognizing the data their apps collect, yet they lack confidence in translating this knowledge into DSS-compliant disclosures. Key challenges include issues in identifying privacy-relevant data to complete the form, limited understanding of the form, and concerns about app rejection due to discrepancies with Google's privacy requirements. These results underscore the need for clearer guidance and more accessible tooling to support developers in meeting privacy-aware reporting obligations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports an empirical study of Android developers' experiences completing Google's Data Safety Section (DSS) form. It combines a survey of 41 developers with thematic analysis of 172 online forum threads (involving 642 developers) to examine how developers categorize privacy-related data, their confidence levels, and the challenges they encounter. The central claims are that developers are generally confident in recognizing the data their apps collect but lack confidence in mapping this knowledge to DSS categories; they frequently perform manual classification or omit data and rely on online resources, with key difficulties including identifying privacy-relevant data, understanding the form, and concerns over app rejection due to mismatches with Google's requirements.
Significance. If the reported patterns hold for the broader population, the study supplies useful observational evidence on the practical friction points in Android privacy disclosure. The mixed-methods design (survey plus forum corpus) provides convergent qualitative support for the stated challenges and could usefully inform the design of clearer documentation or automated tooling. The work is grounded in real developer artifacts rather than purely theoretical analysis.
major comments (3)
- [Survey methodology] Survey methodology section: no information is supplied on recruitment channels, response rate, or participant demographics for the n=41 sample. Because the headline claim about developers being 'generally confident' in data recognition rests on generalizing self-reported attitudes from this convenience sample, the absence of these details leaves self-selection bias unaddressed and weakens the external validity of the confidence and challenge findings.
- [Forum analysis] Forum analysis section: selection criteria for the 172 threads (search terms, date range, exclusion rules) are not stated. Without them, it is impossible to evaluate whether the 642 developers represented in the corpus are systematically biased toward privacy-aware or frustrated individuals, directly affecting the reliability of the broader view offered as complementary evidence.
- [Results] Results on confidence (survey and forum findings): the paper reports self-reported confidence levels but provides no quantitative validation (e.g., comparison against actual disclosure accuracy or external audit). This limits how strongly the distinction between 'confident recognition' and 'low translation confidence' can be asserted as a general phenomenon.
minor comments (2)
- [Abstract] Abstract: the total of 683 developers is presented as the combined insight base, but any possible overlap between survey respondents and forum posters is not addressed; a brief clarification would improve transparency.
- [Discussion or Limitations] The paper would benefit from an explicit limitations subsection that directly discusses sample representativeness and the implications for generalizing the 'generally confident yet translation-challenged' characterization.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback on our manuscript. We address each of the major comments below and indicate the revisions we will make to improve the paper.
read point-by-point responses
-
Referee: [Survey methodology] Survey methodology section: no information is supplied on recruitment channels, response rate, or participant demographics for the n=41 sample. Because the headline claim about developers being 'generally confident' in data recognition rests on generalizing self-reported attitudes from this convenience sample, the absence of these details leaves self-selection bias unaddressed and weakens the external validity of the confidence and challenge findings.
Authors: We agree with this assessment. The current manuscript does not provide these details, which is an oversight. In the revised version, we will add a new subsection under Methodology detailing the recruitment channels (posts on Reddit's r/androiddev and XDA Developers), the number of invitations sent and response rate, and participant demographics including experience levels and app categories. We will also explicitly discuss the limitations of the convenience sample and potential self-selection bias in the Discussion section, adjusting our claims about generalizability accordingly. revision: yes
-
Referee: [Forum analysis] Forum analysis section: selection criteria for the 172 threads (search terms, date range, exclusion rules) are not stated. Without them, it is impossible to evaluate whether the 642 developers represented in the corpus are systematically biased toward privacy-aware or frustrated individuals, directly affecting the reliability of the broader view offered as complementary evidence.
Authors: We acknowledge that the selection criteria were not explicitly stated. For the revision, we will expand the 'Forum Analysis' section to specify the search terms (such as 'Data Safety Section', 'DSS form', 'Google Play data safety'), the time period covered (2021-2023), the sources (Reddit, XDA Developers, Stack Overflow), and the exclusion criteria (e.g., threads not in English or unrelated to data disclosure). This will help readers assess any potential biases in the forum data. revision: yes
-
Referee: [Results] Results on confidence (survey and forum findings): the paper reports self-reported confidence levels but provides no quantitative validation (e.g., comparison against actual disclosure accuracy or external audit). This limits how strongly the distinction between 'confident recognition' and 'low translation confidence' can be asserted as a general phenomenon.
Authors: The referee is correct that our findings on confidence are based on self-reported data without quantitative validation against actual disclosure accuracy. As an exploratory study using mixed methods, we did not conduct external audits. In the revision, we will add a dedicated Limitations subsection clarifying that the reported distinction between recognition confidence and translation confidence is based on self-reports and may not correspond to objective measures. We will frame the claims more cautiously and propose future research directions for validation studies. No such validation data exists in our current dataset. revision: partial
Circularity Check
No circularity: purely observational empirical study with no derivations or fitted predictions
full rationale
The paper reports results from a survey of 41 Android developers and thematic analysis of 172 forum threads involving 642 additional developers. No equations, mathematical models, parameter fitting, or first-principles derivations appear anywhere in the manuscript. Claims about developer confidence and challenges are presented as direct summaries of self-reported survey responses and forum content, without any step that reduces a 'prediction' or result back to the input data by construction. Sample representativeness is a validity threat but does not constitute circularity under the defined patterns. The study is self-contained as an empirical report.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The European parliament and the council of the European union
2018. The European parliament and the council of the European union. General Data Protection Regulation (GDPR). Retrieved Oct 13, 2025 from https://eur- lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
work page 2018
-
[2]
2018. GDPR Article 13. Retrieved Oct 13, 2025 from https://gdpr-info.eu/art-13- gdpr/
work page 2018
-
[3]
2018. GDPR Article 4. Retrieved Oct 13, 2025 from https://gdpr-info.eu/art-4- gdpr/
work page 2018
-
[4]
2018. GDPR penalties. Retrieved Oct 13, 2025 from https://gdpr-info.eu/issues/ fines-penalties/
work page 2018
-
[5]
2022. AdMob. Retrieved Oct 13, 2025 from https://admob.google.com/home/
work page 2022
- [6]
-
[7]
2022. Data Safety Section. Retrieved Oct 13, 2025 from https://blog.google/ products/google-play/data-safety/
work page 2022
-
[8]
Get more information about your apps in Google Play
2022. Get more information about your apps in Google Play. Retrieved Oct 20, 2025 from https://blog.google/products/google-play/data-safety/ Challenges in Android Data Disclosure: An Empirical Study MOBILESoft ’26, April 12, 2026, Rio de Janeiro, Brazil
work page 2022
-
[9]
2022. Google’s data types for DSS. Retrieved Oct 13, 2025 from https://support.google.com/googleplay/android-developer/answer/10787469# zippy=%2Cdata-types
-
[10]
2022. Privado.ai. Retrieved Oct 13, 2025 from https://www.privado.ai/data- safety-report
work page 2022
-
[11]
2022. Signal Private Messenger. Retrieved Oct 13, 2025 from https://play.google. com/store/apps/datasafety?id=org.thoughtcrime.securesms&hl=en
work page 2022
-
[12]
2022. Unity. Retrieved Oct 13, 2025 from https://unity.com/
work page 2022
-
[13]
2023. See No Evil: Loopholes in Google’s Data Safety Labels Keep Companies in the Clear and Consumers in the Dark. Retrieved Oct 13, 2025 from https: //foundation.mozilla.org/en/campaigns/googles-data-safety-labels/
work page 2023
-
[14]
Google Play’s Security: 2.36 Million Apps Blocked For Violations In 2024
2024. Google Play’s Security: 2.36 Million Apps Blocked For Violations In 2024. Retrieved Oct 13, 2025 from https://www.forbes.com/sites/alexvakulov/2025/02/ 02/google-plays-security-236m-apps-blocked-for-violations-in-2024/
work page 2024
- [15]
-
[16]
Retrieved Oct 13, 2025 from https://www.codeproject.com/
2025.Code Project. Retrieved Oct 13, 2025 from https://www.codeproject.com/
work page 2025
-
[17]
Retrieved Oct 13, 2025 from https://dzone.com/
2025.DZone. Retrieved Oct 13, 2025 from https://dzone.com/
work page 2025
-
[18]
2025. Firebase Analytics. Retrieved Oct 13, 2025 from https://firebase.google. com/docs/analytics/get-started?platform=web
work page 2025
-
[19]
2025. GitHub API. Retrieved Oct 13, 2025 from https://docs.github.com/en/rest? apiVersion=2022-11-28
work page 2025
-
[20]
2025. Google Play Console. Retrieved Oct 13, 2025 from https://play.google. com/console/signup
work page 2025
-
[21]
2025. Google Play Store. Retrieved Oct 13, 2025 from https://play.google.com/ store/
work page 2025
-
[22]
Retrieved Oct 13, 2025 from https://news.ycombinator.com/
2025.Hacker news. Retrieved Oct 13, 2025 from https://news.ycombinator.com/
work page 2025
-
[23]
New Android app releases per month
2025. New Android app releases per month. Retrieved Oct 13, 2025 from https://www.appbrain.com/stats/number-of-android-apps
work page 2025
-
[24]
2025. Notifee. Retrieved Oct 13, 2025 from https://notifee.app/
work page 2025
-
[25]
2025. Paper artifacts. Retrieved Oct 22, 2025 from https://doi.org/10.5281/zenodo. 17416187
-
[26]
2025. Reddit API. Retrieved Oct 13, 2025 from https://www.reddit.com/dev/api/
work page 2025
-
[27]
Retrieved Oct 13, 2025 from https://www.sitepoint.com/
2025.Site point. Retrieved Oct 13, 2025 from https://www.sitepoint.com/
work page 2025
-
[28]
2025. Stack Exchange API. Retrieved Oct 13, 2025 from https://api.stackexchange. com/
work page 2025
-
[29]
Retrieved Oct 13, 2025 from https://xdaforums.com/
2025.XDA forums. Retrieved Oct 13, 2025 from https://xdaforums.com/
work page 2025
-
[30]
Alessia Antelmi, Gennaro Cordasco, Daniele De Vinco, and Carmine Spagnuolo
-
[31]
InCompanion Proceedings of the ACM Web Conference 2023(Austin, TX, USA)(WWW ’23 Companion)
The Age of Snippet Programming: Toward Understanding Developer Communities in Stack Overflow and Reddit. InCompanion Proceedings of the ACM Web Conference 2023(Austin, TX, USA)(WWW ’23 Companion). Association for Computing Machinery, New York, NY, USA, 1218–1224. https://doi.org/10. 1145/3543873.3587673
-
[32]
Nia Castelly and Fergus Hurley. 2022. Introducing Checks: simplifying privacy for app developers - Google: The Keyword. Retrieved April 10, 2025 from https://blog.google/technology/area-120/checks/
work page 2022
-
[33]
Victoria Clarke and Virginia Braun. 2014.Thematic Analysis. Springer New York, New York, NY, 1947–1952. https://doi.org/10.1007/978-1-4614-5583-7_311
-
[34]
Juliet Corbin and Anselm Strauss. 1990. Grounded Theory Research: Procedures, Canons and Evaluative Criteria.Zeitschrift für Soziologie19, 6 (1990), 418–427. https://doi.org/doi:10.1515/zfsoz-1990-0602
-
[35]
Lucas Franke, Huayu Liang, Sahar Farzanehpour, Aaron Brantly, James C. Davis, and Chris Brown. 2024. An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software. In Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement(Barcelona, Spain)(ESEM ’24). A...
- [36]
-
[37]
Sandra Höltervennhoff, Noah Wöhler, Arne Möhle, Marten Oltrogge, Yasemin Acar, Oliver Wiese, and Sascha Fahl. 2024. A Mixed-Methods Study on User Experiences and Challenges of Recovery Codes for an End-to-End Encrypted Service. In33rd USENIX Security Symposium (USENIX Security 24). USENIX As- sociation, Philadelphia, PA, 7267–7284. https://www.usenix.org/...
work page 2024
-
[38]
Hiroki Inayoshi, Shohei Kakei, and Shoichi Saito. 2024. Detection of Incon- sistencies between Guidance Pages and Actual Data Collection of Third-party SDKs in Android Apps. InProceedings of the IEEE/ACM 11th International Con- ference on Mobile Software Engineering and Systems(Lisbon, Portugal)(MOBILE- Soft ’24). Association for Computing Machinery, New ...
- [39]
-
[40]
Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder
-
[41]
A "nutrition label" for privacy. InProceedings of the 5th Symposium on Usable Privacy and Security(Mountain View, California, USA)(SOUPS ’09). Association for Computing Machinery, New York, NY, USA, Article 4, 12 pages. https: //doi.org/10.1145/1572532.1572538
-
[42]
Patrick Gage Kelley, Lorrie Faith Cranor, and Norman Sadeh. 2013. Privacy as part of the app decision-making process. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Paris, France)(CHI ’13). Association for Computing Machinery, New York, NY, USA, 3393–3402. https://doi.org/10.1145/ 2470654.2466466
- [43]
-
[44]
Mugdha Khedkar, Ambuj Kumar Mondal, and Eric Bodden. 2026. A Study of Privacy-Related Data Collected by Android Apps.Automated Software Engineer- ing33, 2 (2026), 45. https://doi.org/10.1007/s10515-025-00589-3
-
[45]
Mugdha Khedkar, Ambuj Kumar Mondal, and Eric Bodden. 2024. Do Android App Developers Accurately Report Collection of Privacy-Related Data?. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops(Sacramento, CA, USA)(ASEW ’24). Association for Computing Ma- chinery, New York, NY, USA, 176–186. https://doi.or...
-
[46]
Ruiyin Li, Peng Liang, Mohamed Soliman, and Paris Avgeriou. 2021. Under- standing Architecture Erosion: The Practitioners’ Perceptive. In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 311–322. https: //doi.org/10.1109/ICPC52881.2021.00037
-
[47]
Tianshi Li, Lorrie Faith Cranor, Yuvraj Agarwal, and Jason I. Hong. 2024. Matcha: An IDE Plugin for Creating Accurate Privacy Nutrition Labels.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 1 (March 2024), 1–38. https://doi.org/10.1145/3643544
-
[48]
Tianshi Li, Kayla Reiman, Yuvraj Agarwal, Lorrie Faith Cranor, and Jason I. Hong
-
[49]
Understanding Challenges for Developers to Create Accurate Privacy Nutrition Labels. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 588, 24 pages. https://doi.org/10.1145/ 3491102.3502012
- [50]
-
[51]
David Rodriguez, Akshath Jain, Jose M. Del Alamo, and Norman Sadeh. 2023. Comparing Privacy Label Disclosures of Apps Published in both the App Store and Google Play Stores. In2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). 150–157. https://doi.org/10.1109/EuroSPW59978.2023. 00022
-
[52]
Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering14 (2009), 131–164. https://api.semanticscholar.org/CorpusID:207144526
work page 2009
-
[53]
Yusei Sakuraba, Hiroki Inayoshi, Shoichi Saito, and Akito Monden. 2025. Plaintext in the Wild: Investigating Secure Connection Label Accuracy for Android Apps. In2025 IEEE International Conference on Source Code Analysis & Manipulation (SCAM). 145–156. https://doi.org/10.1109/SCAM67354.2025.00022
- [54]
-
[55]
Grishma Shrestha, Shristi Shrestha, and Anas Mahmoud. 2025. No Country for Indie Developers: A Study of Google Play’s Closed Testing Requirements for New Personal Developer Accounts.ACM Trans. Softw. Eng. Methodol.(May 2025). https://doi.org/10.1145/3736578 Just Accepted
-
[56]
Rocky Slavin, Xiaoyin Wang, Mitra Bokaei Hosseini, James Hester, Ram Krishnan, Jaspreet Bhatia, Travis D. Breaux, and Jianwei Niu. 2016. Toward a framework for detecting privacy policy violations in android application code. InProceedings of the 38th International Conference on Software Engineering(Austin, Texas)(ICSE ’16). Association for Computing Machi...
-
[57]
Mohammad Tahaei, Kami Vaniea, and Naomi Saphra. 2020. Understanding Privacy-Related Questions on Stack Overflow. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https: //doi.org/10.1145/3313831.3376768
- [58]
-
[59]
Xiaoyin Wang, Xue Qin, Mitra Bokaei Hosseini, Rocky Slavin, Travis D. Breaux, and Jianwei Niu. 2018. GUILeak: Tracing Privacy Policy Claims on User Input Data for Android Applications. InProceedings of the 40th International Conference on Software Engineering(Gothenburg, Sweden)(ICSE ’18). Association for Com- puting Machinery, New York, NY, USA, 37–47. h...
-
[60]
Le Yu, Xiapu Luo, Xule Liu, and Tao Zhang. 2016. Can We Trust the Privacy Policies of Android Apps?. In2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 538–549. https://doi.org/10.1109/ DSN.2016.55
work page 2016
-
[61]
Sebastian Zimmeck, Ziqi Wang, Lieyong Zou, Roger Iyengar, Bin Liu, Florian Schaub, Shomir Wilson, Norman Sadeh, Steven M. Bellovin, and Joel Reidenberg. 2017.Automated Analysis of Privacy Requirements for Mobile Apps. Korea Society of Internet Information, Korea, Republic of. https://doi.org/10.14722/ndss.2017. 23034 A Codebook In this section, we present...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.