pith. sign in

arxiv: 2511.20284 · v2 · submitted 2025-11-25 · 💻 cs.CR · cs.AI

Can LLMs Make (Personalized) Access Control Decisions?

Pith reviewed 2026-05-17 05:29 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords large language modelsaccess controlprivacy preferencessmartphone permissionspersonalizationuser decisionspermission requests
0
0 comments X

The pith

Large language models can make access control decisions aligned with users' privacy preferences for smartphone apps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Users struggle with frequent and complex permission requests on their phones, leading to poor security choices. This paper examines whether LLMs can use short user-written privacy statements to decide on app permissions in specific contexts. They built a dataset from real users and tested both general and personalized LLMs against actual user decisions. The models often match what most users want and sometimes encourage safer choices. This suggests LLMs could help automate these decisions while respecting individual preferences.

Core claim

The central claim is that LLMs tasked with reasoning about apps and request contexts generally reflect users' preferences, agreeing with the majority decision in up to 86% of cases. Incorporating user-specific privacy preferences into the model improves agreement with individual decisions but can lead to less safe outcomes because users tend to grant excessive permissions.

What carries the argument

A personalized LLM that incorporates the user's natural-language privacy statement to evaluate each permission request alongside the app and context details.

If this is right

  • LLMs can lower the burden on users by handling dynamic permission decisions during app use.
  • Personalization increases alignment with a single user's choices compared to a general model.
  • Models can influence users toward safer permission grants by defaulting to more restrictive options.
  • Strict following of user preferences may reduce overall security levels in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such LLMs might be deployed in operating systems to pre-filter permission requests before showing them to users.
  • Hybrid approaches could combine LLM suggestions with user overrides to balance safety and personalization.
  • Testing these models in live environments with actual app interactions would reveal real-world effectiveness beyond survey data.

Load-bearing premise

The user privacy statements and permission decisions collected in the online study represent how people would actually behave when using their devices in daily life.

What would settle it

Running the LLMs on permission decisions made by the same users during actual smartphone usage over time and measuring agreement rates against the survey results.

Figures

Figures reproduced from arXiv: 2511.20284 by Aritra Dhar, Daniele Lain, Friederike Groschupp, Lara Magdalena Lazier, Srdjan \v{C}apkun.

Figure 1
Figure 1. Figure 1: shows a user who, following an access control request, makes a decision on it ①. Making access control decisions is known to have a huge cognitive burden for users [4], who then often make decisions that are inconsis￾tent with their internal beliefs, especially under (time) pres￾sure or when engaging automatic thinking processes [14]. Further, the complexity and misunderstanding of access control and permi… view at source ↗
Figure 2
Figure 2. Figure 2: LLM-Based access control decision making. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrices of generic and personalized [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-user agreement of generic (G) vs. personal￾ized (P) decisions and the difference per user (P-G). 5.2.2. Per-user Performance. Next, we look at Hypothesis 2.2: Personalization of LLM decisions improves agreement for all users. For this, we computed the accuracy of generic and personalized LLMs for each individual user based on all decisions each user had made [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of confidence for GPT-4o and 4.1. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Privacy Statement length versus P4o agreement. 5.3.1. User Input. We test H3.1, a longer and contextually relevant statement of preferences leads to better personal￾ization of LLM decisions by looking at the correlation of the agreement and length of the statement, and the influence of different input types. Length of User Input. We show the length of the privacy statements (measured in characters) and the… view at source ↗
Figure 7
Figure 7. Figure 7: Heatmaps of LLM decision performance for model GPT-4o. All values are percentages. to recent advancements in LLMs, studies have begun replac￾ing traditional NLP techniques with them. Such as a two￾step pipeline that uses LLMs and knowledge distillation to convert natural-language access control policies (NLACPs) into machine-enforceable ABAC rules, with evaluation on realistic policy texts [44]. Similarly,… view at source ↗
read the original abstract

Precise access control decisions are crucial for the security of both traditional applications and emerging agent-based systems. Typically, these decisions are made by users during app installation or at runtime. However, due to the increasing complexity and automation of systems, making access control decisions can impose a significant cognitive burden on users, often overwhelming them and leading to suboptimal or even arbitrary choices. To address this problem, we investigate the ability of LLMs to make dynamic, context-aware decisions aligned with users' security preferences, expressed during a lightweight setup phase. As a case study, we analyze smartphone application permission requests, given their ubiquity and users' familiarity with them. We curated a dataset comprising 307 user privacy statements (short, natural-language descriptions of user preferences) and 14,682 corresponding permission decisions, gathered from smartphone users in an online data collection. We compare these decisions with those made by two versions of LLMs that are tasked with reasoning about the app and the request context: a general model and a personalized one (which incorporates user preferences). For the latter, we also collected user feedback on 1,298 of its decisions. Our results show that LLMs generally reflect users' preferences well, agreeing with the majority decision in up to 86% of cases, and can steer users toward safer behavior. However, the results also reveal a key trade-off in personalization: while incorporating user-specific privacy preferences improves agreement with individual decisions, strict adherence to these preferences may lead to less safe outcomes, as users tend to over-permission.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper investigates whether LLMs can make dynamic, context-aware access control decisions aligned with users' security preferences, using smartphone app permissions as a case study. It curates a dataset of 307 user privacy statements and 14,682 permission decisions collected online, then compares decisions from a general LLM and a personalized LLM (incorporating user preferences) against user majority decisions, reporting agreement rates up to 86% and a trade-off where personalization improves individual match but can reduce safety as users over-permission.

Significance. If the empirical results hold under more rigorous validation, the work could inform LLM-assisted mechanisms to reduce cognitive load in access control for apps and emerging agent systems. The scale of the decision dataset and the explicit identification of a personalization-safety trade-off are notable strengths for a security-focused empirical study.

major comments (3)
  1. [Data Collection] Data Collection section: The 307 privacy statements and 14,682 decisions were obtained via online survey rather than instrumented field deployment. Self-selection among privacy-interested participants and hypothetical reporting (without real app consequences) introduce selection and reporting biases that directly affect the validity of the reported 86% majority-agreement rates and the personalization trade-off finding.
  2. [Results] Results section: Agreement percentages (including the 86% figure) are presented without error bars, confidence intervals, or statistical significance tests. Details on how app/request context was encoded as input to the LLMs are also omitted, undermining reproducibility and the ability to assess robustness of the general vs. personalized comparison.
  3. [User Feedback] User Feedback subsection: Post-hoc feedback collected on 1,298 personalized decisions is referenced without stated evaluation criteria or quantitative analysis linking it to the claim that LLMs steer users toward safer behavior.
minor comments (2)
  1. [Abstract] Abstract: Specify the exact LLM models or sizes used for the general and personalized variants and clarify the precise definition of 'majority decision' used for the agreement metric.
  2. [Tables/Figures] Tables/Figures: Ensure all reported agreement rates are accompanied by the number of decisions or statements underlying each percentage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive comments on our manuscript. We address each major comment point by point below, indicating where we will revise the paper.

read point-by-point responses
  1. Referee: [Data Collection] Data Collection section: The 307 privacy statements and 14,682 decisions were obtained via online survey rather than instrumented field deployment. Self-selection among privacy-interested participants and hypothetical reporting (without real app consequences) introduce selection and reporting biases that directly affect the validity of the reported 86% majority-agreement rates and the personalization trade-off finding.

    Authors: We acknowledge that an online survey with hypothetical scenarios can introduce self-selection and reporting biases, as participants may not face real consequences for their choices. This approach was selected to gather a large-scale dataset of 14,682 decisions efficiently while respecting privacy constraints that would complicate instrumented field deployments. In the revised manuscript, we will expand the limitations discussion to explicitly address these biases and their potential impact on the reported agreement rates and trade-off findings, including suggestions for future field studies. revision: partial

  2. Referee: [Results] Results section: Agreement percentages (including the 86% figure) are presented without error bars, confidence intervals, or statistical significance tests. Details on how app/request context was encoded as input to the LLMs are also omitted, undermining reproducibility and the ability to assess robustness of the general vs. personalized comparison.

    Authors: We agree that statistical rigor and reproducibility details are needed. We will add error bars, confidence intervals, and statistical significance tests (such as paired t-tests or chi-squared tests for model comparisons) to the agreement percentages in the Results section. We will also include a dedicated subsection describing the prompt structure and context encoding for both the general and personalized LLMs, with example inputs, to support reproducibility and evaluation of the comparisons. revision: yes

  3. Referee: [User Feedback] User Feedback subsection: Post-hoc feedback collected on 1,298 personalized decisions is referenced without stated evaluation criteria or quantitative analysis linking it to the claim that LLMs steer users toward safer behavior.

    Authors: We will revise the User Feedback subsection to clearly state the evaluation criteria (e.g., Likert-scale questions on alignment with preferences and perceived safety) and provide quantitative results, such as the proportion of participants indicating that the LLM decisions promoted safer behavior compared to their initial choices. This will strengthen the linkage to the claim about steering users toward safer outcomes. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical comparison to independently collected user dataset

full rationale

The paper reports agreement rates (up to 86%) between LLM outputs and a dataset of 307 privacy statements plus 14,682 user permission decisions collected via online survey. No equations, derivations, fitted parameters, or predictions appear in the abstract or described methodology. The central results are straightforward empirical matches against this external user-provided benchmark rather than any self-referential reduction, self-citation chain, or ansatz smuggled in via prior work. This is a standard self-contained empirical evaluation with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that short user privacy statements capture stable preferences that LLMs can reason over, and that the online-collected decisions form a valid ground truth for measuring agreement and safety.

axioms (1)
  • domain assumption User privacy statements accurately and stably represent their security preferences.
    Used directly to condition the personalized LLM and to interpret agreement rates.

pith-pipeline@v0.9.0 · 5595 in / 1245 out tokens · 41298 ms · 2026-05-17T05:29:49.857057+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Text-Based Personas for Simulating User Privacy Decisions

    cs.CR 2026-03 unverdicted novelty 7.0

    Narriva generates behavior-grounded text personas from survey data that achieve up to 87% accuracy in predicting privacy decisions, improve 6-17 points over baselines, cut tokens by 80-95%, and reproduce aggregate dis...

  2. An AI Agent Execution Environment to Safeguard User Data

    cs.CR 2026-04 unverdicted novelty 6.0

    GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack...

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 2 Pith papers · 3 internal anchors

  1. [1]

    Large language model firewall for aigc protection with intelligent detection policy,

    T. Huang, L. You, N. Cai, and T. Huang, “Large language model firewall for aigc protection with intelligent detection policy,” in2024 2nd International Conference On Mobile Internet, Cloud Computing and Information Security (MICCIS). IEEE, 2024, pp. 247–252

  2. [2]

    Can LLMs Find Bugs in Code? An Evaluation from Beginner Errors to Security Vulnerabilities in Python and C++

    A. Mhatre, N. Nader, P. Diehl, and D. Gupta, “Llm-guard: Large language model-based detection and repair of bugs and security vulnerabilities in c++ and python,”arXiv preprint arXiv:2508.16419, 2025

  3. [3]

    Using LLMs to facilitate formal verification of RTL,

    M. Orenes-Vera, M. Martonosi, and D. Wentzlaff, “Using llms to facilitate formal verification of rtl,”arXiv preprint arXiv:2309.09437, 2023

  4. [4]

    Android permissions: User attention, comprehension, and behavior,

    A. P. Felt, E. Ha, S. Egelman, A. Haney, E. Chin, and D. Wagner, “Android permissions: User attention, comprehension, and behavior,” inProceedings of the eighth symposium on usable privacy and security, 2012, pp. 1–14

  5. [5]

    Choice architecture and smartphone privacy: There’sa price for that,

    S. Egelman, A. P. Felt, and D. Wagner, “Choice architecture and smartphone privacy: There’sa price for that,” inThe economics of information security and privacy. Springer, 2013, pp. 211–236

  6. [6]

    Privacy wizards for social networking sites,

    L. Fang and K. LeFevre, “Privacy wizards for social networking sites,” inProceedings of the 19th international conference on World wide web, 2010, pp. 351–360

  7. [7]

    Does prompt formatting have any impact on llm performance?

    J. He, M. Rungta, D. Koleczek, A. Sekhon, F. X. Wang, and S. Hasan, “Does prompt formatting have any impact on llm performance?”

  8. [8]

    Wang, and Sadid Hasan

    [Online]. Available: https://arxiv.org/abs/2411.10541

  9. [9]

    Social-desirability bias and the validity of self-reported values,

    R. J. Fisher and J. E. Katz, “Social-desirability bias and the validity of self-reported values,”Psychology & marketing, vol. 17, no. 2, pp. 105–120, 2000

  10. [10]

    The privacy paradox: Personal information disclosure intentions versus behaviors,

    P. A. Norberg, D. R. Horne, and D. A. Horne, “The privacy paradox: Personal information disclosure intentions versus behaviors,”Journal of consumer affairs, vol. 41, no. 1, pp. 100–126, 2007

  11. [11]

    Privacy and human behavior in the age of information,

    A. Acquisti, L. Brandimarte, and G. Loewenstein, “Privacy and human behavior in the age of information,”Science, vol. 347, no. 6221, pp. 509–514, 2015

  12. [12]

    Folk models of home computer security,

    R. Wash, “Folk models of home computer security,” inProceedings of the Sixth Symposium on Usable Privacy and Security, 2010, pp. 1–16

  13. [13]

    A Survey of Hallucination in Large Foundation Models

    V . Rawte, A. Sheth, and A. Das, “A survey of hallucination in large foundation models,”arXiv preprint arXiv:2309.05922, 2023

  14. [14]

    Foundation models for time series analysis: A tutorial and survey,

    S. Dai, C. Xu, S. Xu, L. Pang, Z. Dong, and J. Xu, “Bias and unfairness in information retrieval systems: New challenges in the llm era,” ser. KDD ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 6437–6447. [Online]. Available: https://doi.org/10.1145/3637528.3671458

  15. [15]

    The fog of warnings: How non-essential notifications blur with security warnings,

    A. Vance, D. Eargle, J. L. Jenkins, C. B. Kirwan, and B. B. Anderson, “The fog of warnings: How non-essential notifications blur with security warnings,” inFifteenth Symposium on Usable Privacy and Security, SOUPS 2019, Santa Clara, CA, USA, August 11-13, 2019, H. R. Lipford, Ed. USENIX Association, 2019. [Online]. Available: https://www.usenix.org/confer...

  16. [16]

    The feasibility of dynamically granted permis- sions: Aligning mobile privacy with user preferences,

    P. Wijesekera, A. Baokar, L. Tsai, J. Reardon, S. Egelman, D. Wagner, and K. Beznosov, “The feasibility of dynamically granted permis- sions: Aligning mobile privacy with user preferences,” in2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017, pp. 1077– 1093

  17. [17]

    Android permissions remystified: A field study on contextual integrity,

    P. Wijesekera, A. Baokar, A. Hosseini, S. Egelman, D. Wagner, and K. Beznosov, “Android permissions remystified: A field study on contextual integrity,” in24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 499–514

  18. [18]

    Android permissions demystified,

    A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android permissions demystified,” inProceedings of the 18th ACM conference on Computer and communications security, 2011, pp. 627–638

  19. [19]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  20. [20]

    Introducing ChatGPT agent: bridging research and action — openai.com,

    OpenAI, “Introducing ChatGPT agent: bridging research and action — openai.com,” https://openai.com/index/introducing-chatgpt-agent/, 2025, [Accessed 10-11-2025]

  21. [21]

    Isolategpt: An execution isolation architecture for llm-based agentic systems,

    Y . Wu, F. Roesner, T. Kohno, N. Zhang, and U. Iqbal, “Isolategpt: An execution isolation architecture for llm-based agentic systems,” in NDSS, 2025

  22. [22]

    Learning to reason with LLMs — openai.com,

    OpenAI, “Learning to reason with LLMs — openai.com,” https:// openai.com/index/learning-to-reason-with-llms/, 2024, [Accessed 07- 11-2025]

  23. [23]

    A survey on in-context learning,

    Q. Dong, L. Li, D. Dai, C. Zheng, J. Ma, R. Li, H. Xia, J. Xu, Z. Wu, B. Changet al., “A survey on in-context learning,” inProceedings of the 2024 conference on empirical methods in natural language processing, 2024, pp. 1107–1128

  24. [24]

    What is the model context protocol (mcp)?

    Model Context Protocol, “What is the model context protocol (mcp)?” [Online]. Available: https://modelcontextprotocol.io/docs/ getting-started/intro

  25. [25]

    Judging llm-as-a-judge with mt- bench and chatbot arena,

    L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xinget al., “Judging llm-as-a-judge with mt- bench and chatbot arena,”Advances in neural information processing systems, vol. 36, pp. 46 595–46 623, 2023

  26. [26]

    Self- evaluation improves selective generation in large language models,

    J. Ren, Y . Zhao, T. Vu, P. J. Liu, and B. Lakshminarayanan, “Self- evaluation improves selective generation in large language models,” inProceedings on. PMLR, 2023, pp. 49–64

  27. [27]

    Fact-checking the output of large language models via token- level uncertainty quantification,

    E. Fadeeva, A. Rubashevskii, A. Shelmanov, S. Petrakov, H. Li, H. Mubarak, E. Tsymbalov, G. Kuzmin, A. Panchenko, T. Baldwin et al., “Fact-checking the output of large language models via token- level uncertainty quantification,”arXiv preprint arXiv:2403.04696, 2024

  28. [28]

    Token probabilities to mitigate large language models overconfidence in answering medical questions: Quantitative study,

    R. Bentegeac, B. Le Guellec, G. Kuchcinski, P. Amouyel, and A. Hamroun, “Token probabilities to mitigate large language models overconfidence in answering medical questions: Quantitative study,” Journal of medical Internet research, vol. 27, p. e64348, 2025

  29. [29]

    Benchmarking uncertainty quantification methods for large language models with lm-polygraph,

    R. Vashurin, E. Fadeeva, A. Vazhentsev, L. Rvanova, D. Vasilev, A. Tsvigun, S. Petrakov, R. Xing, A. Sadallah, K. Grishchenkov et al., “Benchmarking uncertainty quantification methods for large language models with lm-polygraph,”Transactions of the Association for Computational Linguistics, vol. 13, pp. 220–248, 2025

  30. [30]

    Permission issues in open-source android apps: An exploratory study,

    G. L. Scoccia, A. Peruma, V . Pujols, I. Malavolta, and D. E. Krutz, “Permission issues in open-source android apps: An exploratory study,” in2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 2019, pp. 238– 249

  31. [31]

    An empirical history of permission requests and mistakes in open source android apps,

    G. L. Scoccia, A. Peruma, V . Pujols, B. Christians, and D. Krutz, “An empirical history of permission requests and mistakes in open source android apps,” in2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 2019, pp. 597–601

  32. [32]

    An android application risk evaluation framework based on minimum permission set identification,

    J. Xiao, S. Chen, Q. He, Z. Feng, and X. Xue, “An android application risk evaluation framework based on minimum permission set identification,”Journal of Systems and Software, vol. 163, p. 110533, 2020

  33. [33]

    A large scale study of user behavior, expectations and engagement with android permissions,

    W. Cao, C. Xia, S. T. Peddinti, D. Lie, N. Taft, and L. M. Austin, “A large scale study of user behavior, expectations and engagement with android permissions,” in30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, M. D. Bailey and R. Greenstadt, Eds. USENIX Association, 2021, pp. 803–820. [Online]. Available: https://www.usenix.org/...

  34. [34]

    Permissions on Android — Privacy — Android Develop- ers — developer.android.com,

    “Permissions on Android — Privacy — Android Develop- ers — developer.android.com,” https://developer.android.com/guide/ topics/permissions/overview, [Accessed 07-11-2025]

  35. [35]

    Exploring decision making with android’s runtime permission dialogs using in-context surveys,

    B. Bonn ´e, S. T. Peddinti, I. Bilogrevic, and N. Taft, “Exploring decision making with android’s runtime permission dialogs using in-context surveys,” inThirteenth Symposium on Usable Privacy and Security, SOUPS 2017, Santa Clara, CA, USA, July 12-14, 2017. USENIX Association, 2017, pp. 195–210. [Online]. Available: https://www.usenix.org/conference/soup...

  36. [36]

    Prolific,

    “Prolific,” https://www.prolific.com/

  37. [37]

    Using Commitment Requests Instead of Attention Checks

    E. Geisen, “Using Commitment Requests Instead of Attention Checks.” [Online]. Available: https://www.qualtrics.com/articles/ strategy-research/attention-checks-and-data-quality/

  38. [38]

    Non-determinism of

    B. Atil, S. Aykent, A. Chittams, L. Fu, R. J. Passonneau, E. Radcliffe, G. R. Rajagopal, A. Sloan, T. Tudrej, F. Ture, Z. Wu, L. Xu, and B. Baldwin, “Non-determinism of ”deterministic” llm settings,”

  39. [39]

    Llm stability: A detailed analysis with some surprises

    [Online]. Available: https://arxiv.org/abs/2408.04667

  40. [40]

    Automated extraction of security policies from natural-language software doc- uments,

    X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie, “Automated extraction of security policies from natural-language software doc- uments,” inProceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012, pp. 1–11

  41. [41]

    A controlled natural language interface for authoring access control policies,

    L. Shi and D. W. Chadwick, “A controlled natural language interface for authoring access control policies,” inproceedings of the 2011 ACM Symposium on Applied Computing, 2011, pp. 1524–1530

  42. [42]

    Using natural language policies for privacy control in social platforms

    J. De Coi, P. K ¨arger, D. Olmedilla, and S. Zerr, “Using natural language policies for privacy control in social platforms.” CEUR Workshop Proceedings, ISSN 1613-0073, 2009

  43. [43]

    Identification of access control policy sentences from natural language policy documents,

    M. Narouei, H. Khanpour, and H. Takabi, “Identification of access control policy sentences from natural language policy documents,” inIFIP Annual Conference on Data and Applications Security and Privacy. Springer, 2017, pp. 82–100

  44. [44]

    Automatic policy enforcement on semantic social data,

    T.-V . T. Nguyen, N. Fornara, and F. Marfia, “Automatic policy enforcement on semantic social data,”Multiagent and Grid Systems, vol. 11, no. 3, pp. 121–146, 2015

  45. [45]

    Modeling and enforcing access control policies in conversational user interfaces,

    E. Planas, S. Mart ´ınez, M. Brambilla, and J. Cabot, “Modeling and enforcing access control policies in conversational user interfaces,” Software and Systems Modeling, vol. 22, no. 6, pp. 1925–1944, 2023

  46. [46]

    Extraction of machine enforceable abac policies from natural language text using llm knowledge distillation,

    M. Yang, V . Atluri, S. Sural, and A. Kundu, “Extraction of machine enforceable abac policies from natural language text using llm knowledge distillation,” inProceedings of the 30th ACM Symposium on Access Control Models and Technologies, 2025, pp. 157–168

  47. [47]

    Lmn: A tool for generating machine enforceable policies from natural language access control rules using llms,

    P. Sonune, R. Rai, S. Sural, V . Atluri, and A. Kundu, “Lmn: A tool for generating machine enforceable policies from natural language access control rules using llms,”arXiv preprint arXiv:2502.12460, 2025

  48. [48]

    Say what you mean: Natural language access control with large language models for internet of things,

    Y . Cheng, M. Xu, Y . Zhang, K. Li, H. Wu, Y . Zhang, S. Guo, W. Qiu, D. Yu, and X. Cheng, “Say what you mean: Natural language access control with large language models for internet of things,”arXiv preprint arXiv:2505.23835, 2025

  49. [49]

    Trans- lating natural language specifications into access control policies by leveraging large language models,

    S. Lawal, X. Zhao, A. Rios, R. Krishnan, and D. Ferraiolo, “Trans- lating natural language specifications into access control policies by leveraging large language models,” in2024 IEEE 6th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA). IEEE, 2024, pp. 361–370

  50. [50]

    Deploi: Applying nl2sql to synthesize and audit database access control,

    P. Subramaniam and S. Krishnan, “Deploi: Applying nl2sql to synthesize and audit database access control,” 2025. [Online]. Available: https://arxiv.org/abs/2402.07332

  51. [51]

    Automatic generation of attribute-based access control policies from natural language documents

    F. Shan, Z. Wang, M. Liu, and M. Zhang, “Automatic generation of attribute-based access control policies from natural language documents.”Computers, Materials & Continua, vol. 80, no. 3, 2024

  52. [52]

    Toward deep learning based access control,

    M. N. Nobi, R. Krishnan, Y . Huang, M. Shakarami, and R. Sandhu, “Toward deep learning based access control,” inProceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, ser. CODASPY ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 143–154

  53. [53]

    A large scale study of user behavior, expectations and engagement with android permissions,

    W. Cao, C. Xia, S. T. Peddinti, D. Lie, N. Taft, and L. M. Austin, “A large scale study of user behavior, expectations and engagement with android permissions,” in30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 803–820

  54. [54]

    Do llms exhibit human-like response biases? a case study in survey de- sign,

    L. Tjuatja, V . Chen, T. Wu, A. Talwalkwar, and G. Neubig, “Do llms exhibit human-like response biases? a case study in survey de- sign,”Transactions of the Association for Computational Linguistics, vol. 12, pp. 1011–1026, 2024

  55. [55]

    Do llms exhibit human- like cognitive biases? a large-scale systematic evaluation,

    T. Geva, A. Goldstein, E. Lary, and C. Levy, “Do llms exhibit human- like cognitive biases? a large-scale systematic evaluation,”A Large- Scale Systematic Evaluation (September 17, 2025), 2025

  56. [56]

    Humans or llms as the judge? a study on judgement biases.arXiv preprint arXiv:2402.10669, 2024

    G. H. Chen, S. Chen, Z. Liu, F. Jiang, and B. Wang, “Humans or llms as the judge? a study on judgement biases,”arXiv preprint arXiv:2402.10669, 2024

  57. [57]

    Dissecting human and llm preferences,

    J. Li, F. Zhou, S. Sun, Y . Zhang, H. Zhao, and P. Liu, “Dissecting human and llm preferences,”arXiv preprint arXiv:2402.11296, 2024

  58. [58]

    Beavertails: Towards improved safety alignment of llm via a human-preference dataset,

    J. Ji, M. Liu, J. Dai, X. Pan, C. Zhang, C. Bian, B. Chen, R. Sun, Y . Wang, and Y . Yang, “Beavertails: Towards improved safety alignment of llm via a human-preference dataset,”Advances in Neural Information Processing Systems, vol. 36, pp. 24 678–24 704, 2023

  59. [59]

    Pref- erence ranking optimization for human alignment,

    F. Song, B. Yu, M. Li, H. Yu, F. Huang, Y . Li, and H. Wang, “Pref- erence ranking optimization for human alignment,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 18 990–18 998

  60. [60]

    A survey on human preference learning for large language models,

    R. Jiang, K. Chen, X. Bai, Z. He, J. Li, M. Yang, T. Zhao, L. Nie, and M. Zhang, “A survey on human preference learning for large language models,”arXiv preprint arXiv:2406.11191, 2024

  61. [61]

    Can llms capture human preferences?

    A. Goli and A. Singh, “Can llms capture human preferences?”arXiv preprint arXiv:2305.02531, 2023

  62. [62]

    Can llms keep a secret? testing privacy implications of language models via contextual integrity theory,

    N. Mireshghallah, H. Kim, X. Zhou, Y . Tsvetkov, M. Sap, R. Shokri, and Y . Choi, “Can llms keep a secret? testing privacy implications of language models via contextual integrity theory,” inThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [Online]. Available: https://openr...

  63. [63]

    Llm-as-a-judge for privacy evaluation? exploring the alignment of human and LLM perceptions of privacy in textual data,

    S. Meisenbacher, A. Klymenko, and F. Matthes, “Llm-as-a-judge for privacy evaluation? exploring the alignment of human and LLM perceptions of privacy in textual data,”CoRR, vol. abs/2508.12158,

  64. [64]
  65. [65]

    Information flow control in machine learning through modular model architecture,

    T. Tiwari, S. Gururangan, C. Guo, W. Hua, S. Kariyappa, U. Gupta, W. Xiong, K. Maeng, H.-H. S. Lee, and G. E. Suh, “Information flow control in machine learning through modular model architecture,” in 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 6921–6938

  66. [66]

    Permissive information-flow analysis for large language models.arXiv preprint arXiv:2410.03055, 2024

    S. A. Siddiqui, R. Gaonkar, B. K ¨opf, D. Krueger, A. Paverd, A. Salem, S. Tople, L. Wutschitz, M. Xia, and S. Zanella-B ´eguelin, “Permissive information-flow analysis for large language models,”arXiv preprint arXiv:2410.03055, 2024

  67. [67]

    Securing AI Agents with Information-Flow Control

    M. Costa, B. K ¨opf, A. Kolluri, A. Paverd, M. Russinovich, A. Salem, S. Tople, L. Wutschitz, and S. Zanella-B ´eguelin, “Securing ai agents with information-flow control,”arXiv preprint arXiv:2505.23643, 2025

  68. [68]

    Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones,

    W. Enck, P. Gilbert, S. Han, V . Tendulkar, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones,”ACM Transactions on Computer Systems (TOCS), vol. 32, no. 2, pp. 1–29, 2014

  69. [69]

    Apex: extending android permission model and enforcement with user-defined runtime con- straints,

    M. Nauman, S. Khan, and X. Zhang, “Apex: extending android permission model and enforcement with user-defined runtime con- straints,” inProceedings of the 5th ACM symposium on information, computer and communications security, 2010, pp. 328–332

  70. [70]

    Flexible and fine-grained mandatory access control on android for diverse security and privacy policies,

    S. Bugiel, S. Heuser, and A.-R. Sadeghi, “Flexible and fine-grained mandatory access control on android for diverse security and privacy policies,” in22nd USENIX Security Symposium (USENIX Security 13), 2013, pp. 131–146

  71. [71]

    The flask security architecture: System support for diverse security policies,

    R. Spencer, S. Smalley, P. Loscocco, M. Hibler, D. Andersen, and J. Lepreau, “The flask security architecture: System support for diverse security policies,” in8th USENIX Security Symposium (USENIX Security 99), 1999

  72. [72]

    {SEApp}: Bringing mandatory access control to android apps,

    M. Rossi, D. Facchinetti, E. Bacis, M. Rosa, and S. Paraboschi, “{SEApp}: Bringing mandatory access control to android apps,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 3613–3630

  73. [73]

    Finedroid: Enforcing permissions with system-wide application execution context,

    Y . Zhang, M. Yang, G. Gu, and H. Chen, “Finedroid: Enforcing permissions with system-wide application execution context,” in International Conference on Security and Privacy in Communication Systems. Springer, 2015, pp. 3–22. Appendix A. Study Material A.1. Privacy Statement Questions 1)How comfortable are you with sharing personal infor- mation?Some peo...