arxiv: 2603.22869 · v2 · submitted 2026-03-24 · 💻 cs.AI

Recognition: no theorem link

Chain-of-Authorization: Embedding authorization into large language models

Yang Li , Yule Liu , Xinlei He , Youjian Zhao , Qi Li , Ke Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:09 UTC · model grok-4.3

classification 💻 cs.AI

keywords large language modelsauthorizationaccess controlfine-tuningsecuritypermission topologyadversarial defense

0 comments

The pith

Large language models can be trained to generate a structured authorization trajectory before producing any response or action.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes the Chain-of-Authorization framework to make access control a built-in part of how LLMs reason. It redesigns the input and output format and fine-tunes the model on data containing complex permission structures so that every response must first trace an explicit authorization path. A sympathetic reader would care because current LLMs lack any internal sense of who is allowed to ask what, which creates direct risks of data leaks and unauthorized actions when they serve as the core of larger systems. The claim is that embedding authorization this way lets models reject bad requests while still answering good ones at high utility.

Core claim

By redesigning the input-output format and fine-tuning the model on synthesized data with complex permission topologies, the Chain-of-Authorization method forces LLMs to generate a structured authorization trajectory as a causal prerequisite for any substantive response or action, thereby enabling them to internalize access boundaries within dynamic reasoning environments.

What carries the argument

The Chain-of-Authorization (CoA) framework, which redesigns prompt and response formats to require an explicit authorization trajectory and fine-tunes the model on permission topologies so that this trajectory becomes a necessary step before any output is generated.

If this is right

LLMs will maintain high accuracy on authorized prompts while achieving high rejection rates on unauthorized ones.
The model will show robustness against diverse adversarial attacks that attempt to bypass access controls.
Authorization becomes an internal causal step in the reasoning process rather than an external post-processing filter.
Secure LLMs can serve as cognitive cores in larger AI systems without relying solely on decoupled defense layers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trajectory-forcing approach could be extended to other safety constraints such as factual grounding or bias checks.
Multi-turn conversations may require the authorization chain to persist across dialogue turns rather than resetting per prompt.
Periodic re-fine-tuning on updated permission topologies would likely be needed as real-world access rules evolve.

Load-bearing premise

That fine-tuning on synthesized data with complex permission topologies will enable LLMs to internalize and apply access boundaries correctly in dynamic, real-world reasoning environments.

What would settle it

Run the fine-tuned model on a set of real-world prompts whose permission violations are logically implied by the training topologies but not literally present in the training examples, then measure whether rejection rates remain high.

read the original abstract

Although Large Language Models (LLMs) have evolved from text generators into the cognitive core of modern AI systems, their inherent lack of authorization awareness exposes these systems to catastrophic risks, ranging from unintentional data leakage to unauthorized command execution. Existing defense mechanisms are fundamentally decoupled from internal reasoning, rendering them insufficient for the complex security demands of dynamic AI systems. Here, we propose the Chain-of-Authorization (CoA) framework, a paradigm that internalizes access control as a foundational cognitive capability. By systematically redesigning the input-output format and fine-tuning the model on synthesized data with complex permission topologies, CoA forces the model to generate a structured authorization trajectory as a causal prerequisite for any substantive response or action, thereby enabling LLMs to internalize access boundaries within dynamic reasoning environments. CoA maintains high utility in authorized scenarios while achieving high rejection rates of unauthorized prompts and robust defense against diverse adversarial attacks. By embedding authorization directly into the reasoning process, CoA provides a principled architectural blueprint for deploying secure LLMs as the cognitive cores of modern AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoA tries to bake authorization into LLM reasoning via input-output redesign and synthetic fine-tuning, but the abstract offers no numbers or tests so the core claim stays unverified.

read the letter

The main thing to know is that this paper proposes turning authorization into a required step inside the model's reasoning rather than an external check. They redesign the input-output format so the LLM must output a structured authorization trajectory before producing any real response, then fine-tune on synthetic data built around complex permission setups. This is presented as forcing the model to treat access boundaries as a causal prerequisite.

Referee Report

2 major / 1 minor

Summary. The paper proposes the Chain-of-Authorization (CoA) framework to address LLMs' lack of authorization awareness. By redesigning input-output formats and fine-tuning on synthesized data with complex permission topologies, CoA is claimed to force models to generate structured authorization trajectories as a causal prerequisite for responses, yielding high utility on authorized prompts, high rejection rates on unauthorized ones, and robustness to adversarial attacks.

Significance. If the central claims are empirically validated, CoA could offer a meaningful architectural approach to embedding access control directly into LLM reasoning rather than relying on decoupled external defenses, with potential relevance for secure deployment of LLMs in dynamic systems. The absence of any quantitative results, baselines, or experimental details in the manuscript, however, prevents assessment of whether the approach achieves more than format compliance.

major comments (2)

[Abstract] Abstract: the claims of 'high rejection rates of unauthorized prompts' and 'robust defense against diverse adversarial attacks' are presented without any numerical results, evaluation metrics, baselines, error analysis, or description of the test distribution, rendering the performance assertions unverifiable.
[Abstract] Abstract: the assertion that the redesigned I/O format plus fine-tuning on synthesized permission topologies induces genuine causal internalization of access boundaries (as opposed to superficial output-pattern compliance) is not supported by ablations, causal-intervention probes, out-of-distribution topology tests, or comparisons that would demonstrate the trajectory is load-bearing rather than epiphenomenal.

minor comments (1)

The terms 'permission topologies' and 'authorization trajectory' are used without explicit formal definitions or examples in the opening sections, which would aid reader comprehension.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. We agree that the current manuscript lacks the quantitative results, baselines, and analyses needed to substantiate the claims, and we will revise it substantially to include these elements.

read point-by-point responses

Referee: [Abstract] Abstract: the claims of 'high rejection rates of unauthorized prompts' and 'robust defense against diverse adversarial attacks' are presented without any numerical results, evaluation metrics, baselines, error analysis, or description of the test distribution, rendering the performance assertions unverifiable.

Authors: We agree that the abstract currently states performance claims without supporting numbers or details. In the revised manuscript we will add a full experimental section reporting concrete rejection rates on unauthorized prompts, utility scores on authorized prompts, comparison baselines (including standard fine-tuning and external guardrail approaches), error analysis, and a precise description of the test distributions and adversarial attack suite used. revision: yes
Referee: [Abstract] Abstract: the assertion that the redesigned I/O format plus fine-tuning on synthesized permission topologies induces genuine causal internalization of access boundaries (as opposed to superficial output-pattern compliance) is not supported by ablations, causal-intervention probes, out-of-distribution topology tests, or comparisons that would demonstrate the trajectory is load-bearing rather than epiphenomenal.

Authors: We acknowledge that the present manuscript provides no ablations or causal evidence. The revision will incorporate: ablation experiments that remove the authorization-trajectory requirement, causal-intervention probes that edit or suppress the trajectory and measure downstream effects, out-of-distribution tests on permission topologies absent from the training data, and direct comparisons against models trained only on the new I/O format without the synthesized topologies. These additions will test whether the trajectory is causally load-bearing. revision: yes

Circularity Check

0 steps flagged

No circularity: CoA is defined as an external fine-tuning procedure on synthesized data

full rationale

The paper defines Chain-of-Authorization explicitly as a redesign of input-output format followed by fine-tuning on synthesized data with complex permission topologies. This is an independent training intervention whose claimed effect (generating an authorization trajectory as a causal prerequisite) is presented as an empirical outcome of that procedure rather than a quantity derived from or fitted to the target result itself. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or description; the central claim does not reduce to a self-definition or prior author result by construction. The method therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that LLMs can acquire robust authorization awareness through fine-tuning on synthetic permission data; no explicit free parameters or invented entities are stated.

axioms (1)

domain assumption LLMs can internalize complex access control rules as part of their reasoning process when fine-tuned on appropriately structured synthetic data.
This premise is required for the framework to produce reliable behavior outside the training distribution.

pith-pipeline@v0.9.0 · 5488 in / 1113 out tokens · 39864 ms · 2026-05-15T01:09:26.023550+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 8 internal anchors

[1]

Frontiers of Computer Science18, 186345 (2024)

Wang, L.et al.A survey on large language model based autonomous agents. Frontiers of Computer Science18, 186345 (2024). URL https://doi.org/10.1007/ s11704-024-40231-1

work page 2024
[2]

Personal llm agents: Insights and survey about the capability, efficiency and security

Li, Y.et al.Personal llm agents: Insights and survey about the capability, efficiency and security (2024). Preprint at https://arxiv.org/abs/2401.05459

work page arXiv 2024
[3]

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Luo, J.et al.Large language model agent: A survey on methodology, appli- cations and challenges (2025). Preprint at https://arxiv.org/abs/2503.21460, arXiv:2503.21460

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Preprint at https://arxiv.org/abs/ 2601.11893

Ji, Z.et al.Taming various privilege escalation in llm-based agent systems: A mandatory access control framework (2026). Preprint at https://arxiv.org/abs/ 2601.11893

work page arXiv 2026
[5]

Preprint at https://arxiv.org/abs/2603.19469, arXiv:2603.19469

Siu, V.et al.A framework for formalizing llm agent security (2026). Preprint at https://arxiv.org/abs/2603.19469, arXiv:2603.19469

work page arXiv 2026
[6]

Preprint at https://arxiv.org/abs/2603

Liu, X.et al.Visual confused deputy: Exploiting and defending perception fail- ures in computer-using agents (2026). Preprint at https://arxiv.org/abs/2603. 14707

work page 2026
[7]

& Tram` er, F

Abdelnabi, S.et al.Pintor, M., Chen, X. & Tram` er, F. (eds)Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. (eds Pintor, M., Chen, X. & Tram` er, F.)Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, AISec 2023, Copenhagen, Denmark, 30 November 2023, 79–90 (ACM, ...

work page arXiv 2023
[8]

Prompt Injection attack against LLM-integrated Applications

Liu, Y.et al.Prompt injection attack against llm-integrated applications (2025). Preprint at https://arxiv.org/abs/2306.05499

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Yu, D.et al.Differentially private fine-tuning of language models.Proceedings of the International Conference on Learning Representations(2021)

work page 2021
[10]

Mireshghallah, F.et al.Toutanova, K.et al.(eds)Privacy regularization: Joint privacy-utility optimization in languagemodels. (eds Toutanova, K.et al.)Pro- ceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 3799–3807 (Associati...

work page doi:10.18653/v1/2021.naacl-main.298 2021
[11]

arXiv preprint arXiv:2201.00971 , year=

Ginart, A., van der Maaten, L., Zou, J. & Guo, C. Submix: Practical private prediction for large-scale language models (2022). Preprint at https://arxiv.org/ abs/2201.00971. 20

work page arXiv 2022
[12]

& Annavaram, M

Flemings, J., Razaviyayn, M. & Annavaram, M. Duh, K., G´ omez-Adorno, H. & Bethard, S. (eds)Differentially private next-token prediction of large language models. (eds Duh, K., G´ omez-Adorno, H. & Bethard, S.)Proceedings of the 2024 Conference of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologies (V...

work page 2024
[13]

& Chen, M

Liu, Q., Wang, F., Xiao, C. & Chen, M. Che, W., Nabende, J., Shutova, E. & Pilehvar, M. T. (eds)SudoLM: Learning access control of parametric knowl- edge with authorization alignment. (eds Che, W., Nabende, J., Shutova, E. & Pilehvar, M. T.)Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 271...

work page 2025
[14]

& Garain, U

Saha, S., Chaturvedi, A., Mahapatra, J. & Garain, U. Christodoulopoulos, C., Chakraborty, T., Rose, C. & Peng, V. (eds)sudoLLM: On multi-role alignment of language models. (eds Christodoulopoulos, C., Chakraborty, T., Rose, C. & Peng, V.)Findings of the Association for Computational Linguistics: EMNLP 2025, 366–384 (Association for Computational Linguisti...

work page 2025
[15]

Almheiri, S.et al.Inui, K.et al.(eds)Role-aware language models for secure and contextualized access control in organizations. (eds Inui, K.et al.)Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 490–511 (The Asian Fede...

work page 2025
[16]

Jailbreaking Black Box Large Language Models in Twenty Queries

Chao, P.et al.Jailbreaking black box large language models in twenty queries (2024). Preprint at https://arxiv.org/abs/2310.08419, arXiv:2310.08419

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

& Srikumar, V

Zeng, Y.et al.Ku, L.-W., Martins, A. & Srikumar, V. (eds)How johnny can per- suade LLMs to jailbreak them: Rethinking persuasion to challenge AI safety by humanizing LLMs. (eds Ku, L.-W., Martins, A. & Srikumar, V.)Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers), 14322–14350 (Association fo...

work page 2024
[18]

& Xiao, C

Liu, X., Xu, N., Chen, M. & Xiao, C. Autodan: Generating stealthy jailbreak prompts on aligned large language models.Proceedings of the International Conference on Learning Representations(2024). URL https://openreview.net/ forum?id=7Jwpw4qKkb. 21

work page 2024
[19]

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

Zhou, W.et al.Easyjailbreak: A unified framework for jailbreaking large language models (2024). Preprint at arXivpreprintarXiv:2403.12171

work page arXiv 2024
[20]

Tiwari, T.et al.Balzarotti, D. & Xu, W. (eds)Information flow control in machine learning through modular model architecture. (eds Balzarotti, D. & Xu, W.)33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024(USENIX Association, 2024)

work page 2024
[21]

J., Mozaffari, H., Shen, W

Jayaraman, B., Marathe, V. J., Mozaffari, H., Shen, W. F. & Kenthapadi, K. Permissioned llms: Enforcing access control in large language models (2025). Preprint at https://arxiv.org/abs/2505.22860

work page arXiv 2025
[22]

& Elovici, Y

Segal, T., Shabtai, A. & Elovici, Y. Walsh, T., Shah, J. & Kolter, Z. (eds) DOMBA: double model balancing for access-controlled language models via minimum-bounded aggregation. (eds Walsh, T., Shah, J. & Kolter, Z.)AAAI- 25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, 251...

work page 2025
[23]

Qwen3 Technical Report

Team, Q. Qwen3 technical report (2025). Preprint at https://arxiv.org/abs/ 2505.09388, arXiv:2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

The Llama 3 Herd of Models

Grattafiori, A.et al.The llama 3 herd of models (2024). Preprint at https: //arxiv.org/abs/2407.21783, arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Mistral 7B

Jiang, A. Q.et al.Mistral 7b (2023). Preprint at https://arxiv.org/abs/2310. 06825, arXiv:2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

The wmdp benchmark: Measuring and reduc- ing malicious use with unlearning.arXiv preprint arXiv:2403.03218, 2024

Li, N.et al.The wmdp benchmark: Measuring and reducing malicious use with unlearning (2024). Preprint at https://arxiv.org/abs/2403.03218, arXiv:2403.03218

work page arXiv 2024
[27]

Proceedings of the International Conference on Learning Representations(2021)

Hendrycks, D.et al.Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations(2021)

work page 2021
[28]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Rajpurkar, P., Zhang, J., Lopyrev, K. & Liang, P. Su, J., Duh, K. & Carreras, X. (eds)SQuAD: 100,000+ questions for machine comprehension of text. (eds Su, J., Duh, K. & Carreras, X.)Proceedings of the 2016 Conference on Empirical Meth- ods in Natural Language Processing, 2383–2392 (Association for Computational Linguistics, Austin, Texas, 2016). URL http...

work page internal anchor Pith review Pith/arXiv arXiv 2016
[29]

& Sanyal, A

Friel, R., Belyi, M. & Sanyal, A. Ragbench: Explainable benchmark for retrieval- augmented generation systems (2024). Preprint at https://arxiv.org/abs/2407. 11005. 22

work page 2024
[30]

& Pietsch, M

M¨ oller, T., Reina, A., Jayakumar, R. & Pietsch, M. Verspoor, K.et al. (eds)COVID-QA: A question answering dataset for COVID-19. (eds Ver- spoor, K.et al.)Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020(Association for Computational Linguistics, Online, 2020). URL https://aclanthology.org/2020.nlpcovid19-acl.18/

work page 2020
[31]

Mobile actions data set (2026)

Google. Mobile actions data set (2026). Hugging Face urlhttps://huggingface.co/datasets/google/mobile-actions

work page 2026
[32]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y.et al.Roberta: A robustly optimized BERT pretraining approach.CoRR abs/1907.11692(2019). URL http://arxiv.org/abs/1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 1907
[33]

J.et al.Lora: Low-rank adaptation of large language models.Proceedings of the International Conference on Learning Representations (ICLR)(2022)

Hu, E. J.et al.Lora: Low-rank adaptation of large language models.Proceedings of the International Conference on Learning Representations (ICLR)(2022). 23

work page 2022