pith. sign in

arxiv: 2606.02960 · v1 · pith:6H7NNNMEnew · submitted 2026-06-01 · 💻 cs.SE

Many a Little Makes a Mickle: A Code-Centric Empirical Study of Data Minimization Principle in Android App Development

Pith reviewed 2026-06-28 13:13 UTC · model grok-4.3

classification 💻 cs.SE
keywords data minimizationAndroid appsprivacy compliancecoding guidelinesLLM code generationempirical studyAPK analysisdeveloper practices
0
0 comments X

The pith

Thirty-one code guidelines from 11,000 Android apps steer both developers and LLMs away from data minimization violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors first study 1,114 open-source Android apps to isolate ten recurring data minimization scenarios that occur across five stages of data handling. They then extract 31 concrete coding guidelines and confirm their relevance by inspecting 9,875 real-world APKs. When these guidelines are supplied to state-of-the-art language models, the models stop reproducing the risky patterns they otherwise copy from existing code. A sympathetic reader cares because the work supplies ready-to-use, code-level steps that address privacy rules at the point where apps are actually written or generated.

Core claim

Empirical examination of code in more than eleven thousand Android applications reveals ten common data minimization scenarios; from these the authors derive thirty-one actionable coding guidelines. State-of-the-art LLMs trained on real-world code reproduce the same risky patterns, yet the guidelines eliminate the issues in every model tested.

What carries the argument

The ten recurring data minimization scenarios across five data-handling stages, used to derive the 31 actionable coding guidelines.

If this is right

  • Apps that follow the 31 guidelines satisfy data minimization at the implementation level.
  • LLM-generated Android code becomes free of the identified risky practices once the guidelines are incorporated.
  • Privacy regulation can be met by changing code practices rather than only updating privacy policies.
  • The same guidelines serve both human developers and AI-assisted programming workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The guidelines might transfer to other mobile platforms that use similar data-handling stages.
  • Embedding the rules in static analysis tools could catch violations during routine code review.
  • Widespread use could lower the aggregate amount of personal data collected by mobile apps over time.
  • New data categories such as health or location records could be checked against the same five-stage framework.

Load-bearing premise

The ten scenarios found in the open-source sample capture the main patterns present in the larger population of real-world Android apps.

What would settle it

Discovery of many real-world apps that systematically violate data minimization in ways outside the ten identified scenarios would show the guidelines do not cover the space.

Figures

Figures reproduced from arXiv: 2606.02960 by Dianshu Liao, Shidong Pan, Xiaoyu Sun, Zhenchang Xing.

Figure 1
Figure 1. Figure 1: Per-indicator breakdown of data minimization prac [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Occurrences of over-claimed permissions. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Include/exclude rules across backup domain types. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Modern mobile applications consume large amounts of data to function, raising significant privacy concerns and regulatory challenges. While prior work has primarily focused on detecting compliance gaps through policy analysis, there remains a lack of actionable guidance for developers to implement privacy principles at the code level. In this paper, we focus on data minimization as a developer-operationalizable principle and investigate its realization in Android applications. We conduct a formative study on 1,114 open-source Android apps to identify ten recurring data minimization scenarios across five data-handling stages. Building on this, we perform a large-scale analysis of 9,875 real-world APKs and distill 31 actionable coding guidelines to support privacy-compliant development. We further examine LLM-based code generation in Android development and find that state-of-the-art models consistently reproduce data minimization-risky practices, indicating that they inherit and amplify patterns from real-world code. Encouragingly, incorporating our guidelines eliminates these issues across all evaluated models. Our work advocates a shift toward responding to privacy regulatory requirements at their code-level root causes, enabling better compliance in both human and AI-assisted programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper conducts a formative study on 1,114 open-source Android apps to identify ten recurring data minimization scenarios across five data-handling stages, performs a large-scale analysis of 9,875 real-world APKs to distill 31 actionable coding guidelines, and evaluates state-of-the-art LLMs, finding that they reproduce data minimization-risky practices which are eliminated when the guidelines are incorporated.

Significance. If the methodological gaps are addressed, the work would provide concrete code-level guidance for data minimization in Android development and demonstrate a practical way to improve LLM-assisted code generation for privacy compliance. The scale of the APK analysis combined with the LLM evaluation represents a useful empirical contribution to software engineering and privacy research.

major comments (3)
  1. [Formative study methodology] Formative study section: no details are provided on inter-rater reliability, validation of scenario detection, or agreement metrics for deriving the ten scenarios from the 1,114 open-source apps; this directly affects the reliability of the scenarios used to frame the subsequent guideline distillation.
  2. [Large-scale analysis] Large-scale APK analysis section: the description of the 9,875-APK inspection provides no information on how false positives are avoided or mitigated when detecting data minimization scenarios, which is load-bearing for the accuracy and soundness of the 31 distilled guidelines.
  3. [Guideline applicability and LLM evaluation] Guideline derivation and evaluation: the ten scenarios originate from an open-source sample, yet the 9,875-APK analysis is used only to distill guidelines rather than to validate coverage or representativeness for closed-source/commercial apps (e.g., permission handling or third-party SDK patterns); this leaves the central claim that the guidelines eliminate risky practices across real-world Android development on an untested assumption.
minor comments (1)
  1. [Abstract] The abstract could more explicitly state the number and identities of the LLMs evaluated and the precise risky practices observed before claiming elimination across all models.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful for the referee's constructive feedback, which identifies key areas where additional methodological transparency would strengthen the paper. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses
  1. Referee: [Formative study methodology] Formative study section: no details are provided on inter-rater reliability, validation of scenario detection, or agreement metrics for deriving the ten scenarios from the 1,114 open-source apps; this directly affects the reliability of the scenarios used to frame the subsequent guideline distillation.

    Authors: We thank the referee for this observation. The formative study was conducted by two authors who independently coded a random subset of 200 apps from the 1,114-app corpus, iteratively refined the scenario definitions through discussion, and validated the final ten scenarios against the full set. Although formal agreement metrics were not reported in the original manuscript, we agree that documenting the process is essential. We will add a dedicated paragraph in the formative study section describing the coding procedure, the pre-discussion agreement rate, and how discrepancies were resolved to derive the scenarios. revision: yes

  2. Referee: [Large-scale analysis] Large-scale APK analysis section: the description of the 9,875-APK inspection provides no information on how false positives are avoided or mitigated when detecting data minimization scenarios, which is load-bearing for the accuracy and soundness of the 31 distilled guidelines.

    Authors: We agree that explicit details on false-positive mitigation are required. Our analysis combined automated detection via static analysis scripts with manual verification on a stratified random sample of 500 APKs to compute an empirical false-positive rate and refine the detection rules before guideline distillation. We will expand the large-scale analysis section with a new subsection that reports the sampling strategy, the observed false-positive rate from manual review, and how these checks informed the final set of 31 guidelines. revision: yes

  3. Referee: [Guideline applicability and LLM evaluation] Guideline derivation and evaluation: the ten scenarios originate from an open-source sample, yet the 9,875-APK analysis is used only to distill guidelines rather than to validate coverage or representativeness for closed-source/commercial apps (e.g., permission handling or third-party SDK patterns); this leaves the central claim that the guidelines eliminate risky practices across real-world Android development on an untested assumption.

    Authors: The ten scenarios were derived from open-source apps to enable source-level inspection, but the 9,875-APK corpus consists primarily of closed-source commercial applications and was used both to quantify scenario prevalence and to surface the concrete code patterns that became the 31 guidelines. Patterns involving permission handling and third-party SDKs were directly observed in the APK dataset. We will revise the discussion and threats-to-validity sections to clarify this distinction, explicitly state the generalizability claim, and acknowledge that a dedicated closed-source validation corpus was not employed. This addresses the concern without altering the empirical grounding of the guidelines. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical derivation from external app data

full rationale

The paper's chain consists of a formative study on 1,114 open-source apps to identify 10 scenarios, followed by APK analysis of 9,875 real-world apps to distill 31 guidelines, then LLM evaluation. All steps rely on independent external code artifacts rather than any self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or uniqueness theorems are invoked. The representativeness concern is a validity issue, not circularity per the rules.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on domain assumptions about how data minimization translates to detectable code patterns and that open-source apps plus APK static analysis capture representative practices; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Data minimization violations can be reliably identified through recurring scenarios in code across five data-handling stages.
    Invoked when moving from the formative study to guideline creation and APK analysis.

pith-pipeline@v0.9.1-grok · 5736 in / 1218 out tokens · 30942 ms · 2026-06-28T13:13:49.708274+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

91 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    General Data Protection Regulation (GDPR)

    2016. General Data Protection Regulation (GDPR). https://eur-lex.europa.eu/ legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679

  2. [2]

    APER-mapping

    2022. APER-mapping. https://github.com/sqlab-sustech/APER-mapping Ac- cessed: 2026-03-18

  3. [3]

    Android Keystore System

    2023. Android Keystore System. https://developer.android.com/privacy-and- security/keystore. Accessed: 2026-03-15

  4. [4]

    2023. Chatous. https://play.google.com/store/apps/details?id=com.chatous. chatous. Accessed: 2026-03-15

  5. [5]

    PalmPay - Smarter Way to Bank

    2023. PalmPay - Smarter Way to Bank. https://play.google.com/store/apps/ details?id=com.transsnet.palmpay. Accessed: 2026-03-15

  6. [6]

    2024. Cursor. https://cursor.com/

  7. [7]

    Sticky Broadcasts

    2024. Sticky Broadcasts. https://developer.android.com/privacy-and-security/ risks/sticky-broadcast. Accessed: 2026-03-15

  8. [8]

    Canara e-Passbook

    2025. Canara e-Passbook. https://play.google.com/store/apps/details?id=com. mobile.canaraepassbook. Accessed: 2026-03-15

  9. [9]

    Code Style Guide

    2025. Code Style Guide. https://developers.google.com/style/code-samples Accessed: 2026-03-15

  10. [10]

    PENUP – Drawing-sharing SNS

    2025. PENUP – Drawing-sharing SNS. https://play.google.com/store/apps/ details?id=com.sec.penup. Accessed: 2026-03-15

  11. [11]

    Android Permission

    2026. Android Permission. https://developer.android.com/guide/topics/manifest/ permission-element. Accessed: 2026-03-15

  12. [12]

    App Banamex

    2026. App Banamex. https://play.google.com/store/apps/details?id=com. citibanamex.banamexmobile. Accessed: 2026-03-15

  13. [13]

    Back up user data with Auto Backup

    2026. Back up user data with Auto Backup. https://developer.android.com/ identity/data/autobackup. Accessed: 2026-03-15

  14. [14]

    2026. Jadx. https://github.com/skylot/jadx. Accessed: 2026-03-15

  15. [15]

    OpenTable: Restaurant Bookings

    2026. OpenTable: Restaurant Bookings. https://play.google.com/store/apps/ details?id=com.opentable. Accessed: 2026-03-15

  16. [16]

    Our Artifact Repository

    2026. Our Artifact Repository. https://zenodo.org/records/19276120

  17. [17]

    Request app permissions

    2026. Request app permissions. https://developer.android.com/training/ permissions/requesting Accessed: 2026-03-15

  18. [18]

    Telegram

    2026. Telegram. https://play.google.com/store/apps/details?id=org.telegram. messenger. Accessed: 2026-03-15

  19. [19]

    2026. WD2Go. https://www.wdcloudbackup.com/home/download-and-install/. Accessed: 2026-03-15

  20. [20]

    Marco Alecci, Nicolas Sannier, Marcello Ceci, Sallam Abualhaija, Jordan Samhi, Domenico Bianculli, Tegawendé Bissyandé, and Jacques Klein. 2025. Toward llm-driven gdpr compliance checking for android apps. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 606–610

  21. [21]

    Bissyandé, Jacques Klein, and Yves Le Traon

    Kevin Allix, Tégawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. AndroZoo: Collecting Millions of Android Apps for the Research Community. 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) (2016), 468–471

  22. [22]

    Benjamin Andow, Samin Yaseer Mahmud, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Serge Egelman. 2020. Actions speak louder than words:{Entity-Sensitive} privacy policy and data flow analysis with{PoliCheck}. In29th USENIX Security Symposium (USENIX Security 20). 985–1002

  23. [23]

    Vinícius Camargo Andrade, Rhodrigo Deda Gomes, Sheila Reinehr, Cinthia Obladen De Almendra Freitas, and Andreia Malucelli. 2023. Privacy by De- sign and Software Engineering: a Systematic Literature Review. InProceedings of the XXI Brazilian Symposium on Software Quality. https://doi.org/10.1145/ 3571473.3571480

  24. [24]

    Android Developers. [n. d.]. Android Permissions Overview. https://developer. android.com/guide/topics/permissions/overview. Accessed: 2026-03-15

  25. [25]

    Anthropic. 2024. Claude 4.5 Sonnet. https://www.anthropic.com/news/claude- sonnet-4-5. Accessed: 2026-03-15

  26. [26]

    Steven Arzt, Siegfried Rasthofer, Christian G Fritz, Eric Bodden, Alexandre Bar- tel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick Mcdaniel. 2014. FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps.Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implem...

  27. [27]

    Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. 2012. PScout: analyzing the Android permission specification.Proceedings of the 2012 ACM conference on Computer and communications security(2012)

  28. [28]

    Michael Backes, Sven Bugiel, Erik Derr, Patrick Mcdaniel, Damien Octeau, and Sebastian Weisgerber. 2016. On Demystifying the Android Application Frame- work: Re-Visiting Android Permission Specification Analysis. InUSENIX Security Symposium

  29. [29]

    Kathrin Bednar, Sarah Spiekermann, and Marc Langheinrich. 2018. Engineer- ing Privacy by Design: Are engineers ready to live up to the challenge?The Information Society35 (2018), 122 – 142

  30. [30]

    Edna Dias Canedo, Ian Nery Bandeira, Angélica Toffano Seidel Calazans, Pe- dro Henrique Teixeira Costa, Emille Catarine Rodrigues Cançado, and Rodrigo Bonifácio. 2022. Privacy requirements elicitation: a systematic literature review and perception analysis of IT practitioners.Requirements Engineering28 (2022), 177–194

  31. [31]

    João Caramujo, Alberto Rodrigues da Silva, Shaghayegh S. M. Monfared, André Ribeiro, Pável Pereira Calado, and Travis D. Breaux. 2018. RSL-IL4Privacy: a domain-specific language for the rigorous specification of privacy policies. Requirements Engineering24 (2018), 1–26

  32. [32]

    CCPA. [n. d.]. California Consumer Privacy Act of 2018 (CCPA). https://oag.ca. gov/privacy/ccpa, Accessed: 2022-04-25

  33. [33]

    Michael Colesky and Sepideh Ghanavati. 2016. Privacy Shielding by Design — A Strategies Case for Near-Compliance.2016 IEEE 24th International Requirements Engineering Conference Workshops (REW)(2016), 271–275

  34. [34]

    Karel Dhondt, Victor Le Pochat, Yana Dimova, Wouter Joosen, and Stijn Volckaert

  35. [35]

    InUSENIX Security Symposium

    Swipe Left for Identity Theft: An Analysis of User Data Privacy Risks on Location-based Dating Apps. InUSENIX Security Symposium

  36. [36]

    F-Droid. 2009. F-Droid - a Free and Open Source Android App Repository. https://f-droid.org/en/packages(2009)

  37. [37]

    Ming Fan, Le Yu, Sen Chen, Hao Zhou, Xiapu Luo, Shuyue Li, Yang Liu, Jun Liu, and Ting Liu. 2020. An Empirical Evaluation of GDPR Compliance Violations in Android mHealth Apps.2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)(2020), 253–264

  38. [38]

    Fedynyshyn and Olha Partyka

    T.O. Fedynyshyn and Olha Partyka. 2025. Data Privacy and Security Challenges in Mobile Application Analytics Frameworks. InCybersecurity Providing in Infor- mation and Telecommunication Systems

  39. [39]

    Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Xiaodong Song, and David A. Wagner. 2011. Android permissions demystified. InConference on Computer and Communications Security

  40. [40]

    Adrienne Porter Felt, Elizabeth Ha, Serge Egelman, Ariel Haney, Erika Chin, and David A. Wagner. 2012. Android permissions: user attention, comprehension, and behavior. InSymposium On Usable Privacy and Security

  41. [41]

    Mafalda Ferreira, Tiago Brito, José Fragoso Santos, and Nuno Santos. 2023. Rule- Keeper: GDPR-Aware Personal Data Compliance for Web Frameworks.2023 IEEE Symposium on Security and Privacy (SP)(2023), 2817–2834

  42. [42]

    Clint Gibler, Jonathan Crussell, Jeremy Erickson, and Hao Chen. 2012. Androi- dLeaks: Automatically Detecting Potential Privacy Leaks in Android Applications on a Large Scale. InTrust and Trustworthy Computing

  43. [43]

    Google. 2025. Gemini 2.5 Pro. https://ai.google.dev/gemini-api/docs/models/ gemini-2.5-pro. Accessed: 2026-03-15

  44. [44]

    Chaorong Guo, Jian Zhang, Jun Yan, Zhiqiang Zhang, and Yanli Zhang. 2013. Characterizing and detecting resource leaks in Android applications. In2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 389–398

  45. [45]

    Gürses, Carmela Troncoso, and Claudia Díaz

    Seda F. Gürses, Carmela Troncoso, and Claudia Díaz. 2011. Engineering Privacy by Design

  46. [46]

    Irit Hadar, Tomer Hasson, Oshrat Ayalon, Eran Toch, Michael Birnhack, Sofia Sherman, and Arod Balissa. 2017. Privacy by designers: software developers’ privacy mindset.Empirical Software Engineering23 (2017), 259 – 289

  47. [47]

    Mahfuzur Rahman, Ahmed Rafi Hasan, Laxmi Rani Das, Marufa Kamal, Tasnim Masura, and Kishor Datta Gupta

    Mohd Ariful Haque, Sunzida Siddique, Md. Mahfuzur Rahman, Ahmed Rafi Hasan, Laxmi Rani Das, Marufa Kamal, Tasnim Masura, and Kishor Datta Gupta. 2025. SOK: Exploring Hallucinations and Security Risks in AI-Assisted Software Devel- opment with Insights for LLM Deployment.2025 Sixth International Conference on Intelligent Data Science Technologies and Appli...

  48. [48]

    Ran Jin, Liu Wang, Shidong Pan, Luona Xu, Tianming Liu, and Haoyu Wang

  49. [49]

    Understanding User Privacy Perceptions of GenAI Smartphones.arXiv preprint arXiv:2604.05571(2026)

  50. [50]

    David Klein, Benny Rolle, Thomas Barber, Manuel Karl, and Martin Johns. 2023. General Data Protection Runtime: Enforcing Transparent GDPR Compliance for Existing Applications.Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security(2023)

  51. [51]

    Jan H. Klemmer, Stefan Albert Horstmann, Nikhil Patnaik, Cordelia Ludden, Cordell Burton, Carson Powers, Fabio Massacci, Akond Rahman, Daniel Votipka, Heather Richter Lipford, Awais Rashid, Alena Naiakshina, Sascha Fahl Cispa Helmholtz Center for Information Security, Ruhr-University of Bochum, Uni- versity of Bristol, Tufts University, Vrije Universiteit...

  52. [52]

    Dianshu Liao, Shidong Pan, Siyuan Yang, Yanjie Zhao, Zhenchang Xing, and Xiaoyu Sun. 2025. A Comparative Study of Android Performance Issues in Real- world Applications and Literature.ACM Transactions on Software Engineering and Methodology(2025). 11

  53. [53]

    Pei Liu, Li Li, Yanjie Zhao, Xiaoyu Sun, and John Grundy. 2020. Androzooopen: Collecting large-scale open source android apps for the research community. In Proceedings of the 17th International Conference on Mining Software Repositories. 548–552

  54. [54]

    Maalej and Swapneel Sheth

    W. Maalej and Swapneel Sheth. 2014. Us and them: a study of privacy require- ments across north america, asia, and europe.Proceedings of the 36th International Conference on Software Engineering(2014)

  55. [55]

    Almurshed, Danny Dig, and Carlos Jensen

    Umme Ayda Mannan, Iftekhar Ahmed, Rana Abdullah M. Almurshed, Danny Dig, and Carlos Jensen. 2016. Understanding code smells in Android applications. InProceedings of the International Conference on Mobile Software Engineering and Systems. Association for Computing Machinery, New York, NY, USA, 225–234. https://doi.org/10.1145/2897073.2897094

  56. [56]

    Tyler McDonnell, Baishakhi Ray, and Miryung Kim. 2013. An Empirical Study of API Stability and Adoption in the Android Ecosystem.2013 IEEE International Conference on Software Maintenance(2013), 70–79

  57. [57]

    National People’s Congress of China. 2021. Personal Information Protection Law of the People’s Republic of China. https://personalinformationprotectionlaw. com/

  58. [58]

    OpenAI. 2025. GPT-5.2. https://openai.com/index/introducing-gpt-5-2/. Ac- cessed: 2026-03-15

  59. [59]

    Shidong Pan, Yikai Ge, and Xiaoyu Sun. 2025. A First Look at Privacy Risks of Android Task-executable Voice Assistant Applications.arXiv preprint arXiv:2509.23680(2025)

  60. [60]

    Shidong Pan, Thong Hoang, Dawen Zhang, Zhenchang Xing, Xiwei Xu, Qinghua Lu, and Mark Staples. 2023. Toward the cure of privacy policy reading phobia: Automated generation of privacy nutrition labels from privacy policies.arXiv preprint arXiv:2306.10923(2023)

  61. [61]

    Shidong Pan, Zhen Tao, Thong Hoang, Dawen Zhang, Tianshi Li, Zhenchang Xing, Xiwei Xu, Mark Staples, Thierry Rakotoarivelo, and David Lo. 2024. A {NEW} {HOPE}: Contextual privacy policies for mobile applications and an approach toward automated generation. In33rd USENIX Security Symposium (USENIX Security 24). 5699–5716

  62. [62]

    Shidong Pan, Dawen Zhang, Mark Staples, Zhenchang Xing, Jieshan Chen, Xiwei Xu, and Thong Hoang. 2024. Is it a trap? a large-scale empirical study and comprehensive assessment of online automated privacy policy generators for mobile apps. In33rd USENIX Security Symposium (USENIX Security 24). 5681– 5698

  63. [63]

    Joel Reardon, Álvaro Feal, Primal Wijesekera, Amit Elazari Bar On, Narseo Vallina- Rodríguez, and Serge Egelman. 2019. 50 Ways to Leak Your Data: An Exploration of Apps’ Circumvention of the Android Permissions System. InUSENIX Security Symposium

  64. [64]

    Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering14 (2009), 131–164

  65. [65]

    Pattaraporn Sangaroonsilp and Hoa Khanh Dam. 2025. A Study on the Prevalence of Privacy in Software Engineering.Comput. Surveys(2025)

  66. [67]

    Understanding Software Developers’ Approach towards Implementing Data Minimization.ArXivabs/1808.01479 (2018)

  67. [68]

    Awanthika Rasanjalee Senarath and Nalin Asanka Gamagedara Arachchilage

  68. [69]

    A data minimization model for embedding privacy into software systems. Comput. Secur.87 (2019)

  69. [70]

    Meixue Si, Shidong Pan, Dianshu Liao, Xiaoyu Sun, Zhen Tao, Wenchang Shi, and Zhenchang Xing. 2024. A solution toward transparent and practical AI regulation: Privacy nutrition labels for open-source generative AI-based applications.arXiv preprint arXiv:2407.15407(2024)

  70. [71]

    Breaux, and Jianwei Niu

    Rocky Slavin, Xiaoyin Wang, Mitra Bokaei Hosseini, James Hester, Ram Narayan Krishnan, Jaspreet Bhatia, Travis D. Breaux, and Jianwei Niu. 2016. Toward a Framework for Detecting Privacy Policy Violations in Android Application Code.2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)(2016), 25–36

  71. [72]

    Daniel Justin Solove. 2006. A Taxonomy of Privacy.University of Pennsylvania Law Review154 (2006), 477

  72. [73]

    Ian Sommerville. 2011. Software engineering 9th Edition.ISBN-10137035152 (2011), 18

  73. [74]

    Klaas-Jan Stol, Paul Ralph, and Brian Fitzgerald. 2016. Grounded Theory in Software Engineering Research: A Critical Review and Guidelines.2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)(2016), 120–131

  74. [75]

    Xiaoyu Sun, Xiao Chen, Li Li, Haipeng Cai, John Grundy, Jordan Samhi, Tegawendé Bissyandé, and Jacques Klein. 2023. Demystifying hidden sensi- tive operations in android apps.ACM Transactions on Software Engineering and Methodology32, 2 (2023), 1–30

  75. [76]

    Xiaoyu Sun, Xiao Chen, Kui Liu, Sheng Wen, Li Li, and John Grundy. 2021. Characterizing sensor leaks in android apps. In2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 498–509

  76. [77]

    Xiaoyu Sun, Xiao Chen, Yonghui Liu, John Grundy, and Li Li. 2023. Taming an- droid fragmentation through lightweight crowdsourced testing.IEEE Transactions on Software Engineering49, 6 (2023), 3599–3615

  77. [78]

    Xiaoyu Sun, Xiao Chen, Yanjie Zhao, Pei Liu, John Grundy, and Li Li. 2022. Mining android api usage to generate unit test cases for pinpointing compatibility issues. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13

  78. [79]

    Bissyandé, Jacques Klein, Damien Octeau, and John Grundy

    Xiaoyu Sun, Li Li, Tegawendé F. Bissyandé, Jacques Klein, Damien Octeau, and John Grundy. 2021. Taming Reflection: An Essential Step Toward Whole-program Analysis of Android Apps.ACM Trans. Softw. Eng. Methodol.30, 3, Article 32 (April 2021), 36 pages

  79. [80]

    Xiaoyu Sun, Li Li, Tegawendé F Bissyandé, Jacques Klein, Damien Octeau, and John Grundy. 2021. Taming reflection: An essential step toward whole-program analysis of android apps.ACM Transactions on Software Engineering and Method- ology (TOSEM)30, 3 (2021), 1–36

  80. [81]

    Zhen Tao, Shidong Pan, Zhenchang Xing, Emily Black, Talia Gillis, and Chunyang Chen. 2025. A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models.arXiv preprint arXiv:2511.21758(2025)

Showing first 80 references.