Harness-MU: A Safe, Governed, and Effective Harness for Multi-User LLM Agents

Wangxuan Fan; Xiaoyu Nie; Zhongxiang Dai

arxiv: 2606.21856 · v1 · pith:VSNWNTFXnew · submitted 2026-06-20 · 💻 cs.CR · cs.AI

Harness-MU: A Safe, Governed, and Effective Harness for Multi-User LLM Agents

Wangxuan Fan , Xiaoyu Nie , Zhongxiang Dai This is my paper

Pith reviewed 2026-06-26 12:12 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords multi-user LLM agentsaccess controlsafety orchestrationgovernance constraintsexecution hooksprivacy preservationMuses-Bench

0 comments

The pith

Harness-MU enforces unbreakable permission boundaries in multi-user LLM agents by treating governance rules as deterministic runtime variables enforced through execution hooks instead of relying on the model itself.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Harness-MU as a framework that separates language generation from safety enforcement in LLM agents used by multiple users. It argues that access permissions, conflict resolution, and data protection should be handled by infrastructure rather than prompts to the model, because LLMs are trained for single users and can be tricked in multi-turn interactions. By making governance constraints into enforceable hooks, the system achieves full privacy preservation against attacks while improving how well the agent follows instructions and satisfies user demands. This matters because as LLMs move into collaborative settings, without such separation, safety remains probabilistic and breakable.

Core claim

Harness-MU is the first model-agnostic, zero-tuning infrastructure framework for multi-user LLM agents that decouples language generation from safety orchestration, thereby guaranteeing unbreakable permission boundaries while maximizing compliant demand satisfaction, as demonstrated on Muses-Bench with four models showing privacy preservation across all attacks and utility gains of 0.28-0.39.

What carries the argument

Execution hooks that treat governance constraints (authorization, restrictions, precedence) as deterministic runtime variables and enforce them outside the LLM's probabilistic outputs.

If this is right

Privacy is preserved across all access-control attacks on the tested models.
Utility score improves by 0.28-0.39 over the standard baseline.
Instruction-following accuracy increases by up to 48.9 percentage points.
Systematic infrastructure becomes essential for solving multi-principal governance challenges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hook-based separation could apply to other multi-agent systems where hard constraints must override probabilistic decisions.
Integration into existing agent platforms would require no model changes, only added runtime enforcement layers.
Real-world collaborative workflows with human principals could test whether the deterministic hooks scale without reducing agent flexibility.

Load-bearing premise

Governance constraints can be fully expressed as deterministic runtime variables that execution hooks can enforce without needing the LLM to interpret them or causing any loss in the agent's functionality.

What would settle it

A multi-turn adversarial input that causes the execution hooks to fail in enforcing a permission boundary while the agent still completes its task would falsify the claim of unbreakable boundaries.

Figures

Figures reproduced from arXiv: 2606.21856 by Wangxuan Fan, Xiaoyu Nie, Zhongxiang Dai.

**Figure 1.** Figure 1: Harness-MU pipeline. Runtime orchestration modules within 6-component harness infrastructure framework. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Access-control results comparison across four models. Error bars denote [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Instruction-following results comparison across four defender models. Error bars denote [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Per-dataset wall-time comparison between Muses-Bench (agent) and Harness-MU (Mediator + Worker critical path) [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

The increasing deployment of large language model (LLM) agents in collaborative workflows demands robust multi-user, multi-principal interaction mechanisms capable of enforcing access permissions, resolving authoritative conflicts, and preventing unauthorized data disclosure. However, a fundamental mismatch exists between the single-user training paradigm of contemporary LLMs and the hard constraints required for multi-principal governance, rendering probabilistic, prompt-based safeguards vulnerable under multi-turn adversarial interactions.Our key insight is that governance constraints -- who is authorized, what is restricted, and whose instructions take precedence -- are deterministic runtime variables that should be enforced by execution hooks rather than entrusted to the LLM. We present \textbf{Harness-MU}, the first model-agnostic, zero-tuning infrastructure framework for multi-user LLM agents. By decoupling language generation from safety orchestration, Harness-MU guarantees unbreakable permission boundaries while maximizing compliant demand satisfaction. Across four frontier open-weight and proprietary models on the \textit{Muses-Bench} benchmark, Harness-MU achieves the goal of privacy preservation across all access-control attacks, outperforming the standard baseline by 0.28--0.39 in utility score and improving instruction-following accuracy by up to 48.9 percentage points. Harness-MU advances the philosophy of \textit{Harness Engineering}, establishing that systematic infrastructure is essential for solving LLM multi-principal governance challenges. The code and data are available at https://github.com/YuanJrShiuan/Harness-MulUser.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Harness-MU decouples LLM generation from governance via deterministic hooks and reports solid benchmark gains on Muses-Bench, but the claim that all constraints stay fully deterministic without semantic judgment remains unproven for multi-turn use.

read the letter

Harness-MU pulls permission enforcement, conflict resolution, and access rules out of the model and into external execution hooks. The authors treat these as fixed runtime variables rather than prompt-based decisions. On Muses-Bench they show privacy holds across the tested attacks while utility rises 0.28-0.39 and instruction accuracy improves by as much as 48.9 points over the baseline, across four models.

The concrete contribution is the model-agnostic, zero-tuning design and the release of code and data. The benchmark itself targets multi-principal attacks, which is a useful addition. The separation of concerns is a straightforward engineering move that avoids retraining.

The soft spot is the assumption that every policy can be expressed as static, content-independent checks. In multi-turn settings, determining whether a request violates a restriction or which principal has precedence can depend on the meaning of earlier turns. Static hooks would then either block compliant requests or miss attacks that require interpretation. The abstract gives no detail on the hook language or how they handled such cases, so the "unbreakable" guarantee rests on an untested completeness claim.

This paper is aimed at engineers who need to run LLM agents with multiple users and hard access rules. Readers working on agent deployment infrastructure will find the hook pattern and benchmark worth examining. It deserves a serious referee because the practical problem is clear and the infrastructure angle is worth testing in review, even with the open question on determinism.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Harness-MU, a model-agnostic, zero-tuning framework for multi-user LLM agents. By decoupling language generation from safety orchestration and enforcing governance constraints (authorization, restrictions, precedence) as deterministic runtime variables via execution hooks, it claims to guarantee unbreakable permission boundaries while maximizing compliant utility. On the Muses-Bench benchmark across four frontier models, it reports privacy preservation against all access-control attacks, utility score gains of 0.28-0.39 over baseline, and instruction-following accuracy improvements up to 48.9 percentage points. Code and data are released at the provided GitHub link.

Significance. If the central claims hold, Harness-MU would advance LLM agent security by shifting from probabilistic, prompt-based safeguards to systematic infrastructure enforcement, addressing the mismatch between single-user LLM training and multi-principal requirements. The open release of code and data is a clear strength that supports reproducibility and further work in 'Harness Engineering'.

major comments (2)

[Abstract and §1] Abstract and §1: The key insight that 'governance constraints -- who is authorized, what is restricted, and whose instructions take precedence -- are deterministic runtime variables that should be enforced by execution hooks rather than entrusted to the LLM' is load-bearing for the 'unbreakable permission boundaries' claim. The manuscript provides no formal characterization of the hook language's expressiveness nor empirical tests of multi-turn cases where determining violations or precedence requires semantic interpretation of prior utterances (e.g., indirect references or conditional overrides), leaving the completeness assumption unverified.
[Results section (Muses-Bench evaluation)] Results section (Muses-Bench evaluation): The quantified improvements (0.28-0.39 utility, up to 48.9 pp accuracy) and claim of privacy preservation across all attacks rest on benchmark comparisons, but the text does not specify attack definitions, statistical tests performed, exclusion criteria, or how multi-turn interactions are encoded in the deterministic hooks, preventing verification of robustness.

minor comments (2)

[Abstract] Abstract contains unreplaced LaTeX markup (\textbf, \textit) that reduces readability in plain text.
The description of Muses-Bench construction and the exact baseline implementation would benefit from additional detail in the main text for independent replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below with clarifications on the design of Harness-MU and indicate planned revisions to improve verifiability and completeness.

read point-by-point responses

Referee: [Abstract and §1] The key insight that 'governance constraints -- who is authorized, what is restricted, and whose instructions take precedence -- are deterministic runtime variables that should be enforced by execution hooks rather than entrusted to the LLM' is load-bearing for the 'unbreakable permission boundaries' claim. The manuscript provides no formal characterization of the hook language's expressiveness nor empirical tests of multi-turn cases where determining violations or precedence requires semantic interpretation of prior utterances (e.g., indirect references or conditional overrides), leaving the completeness assumption unverified.

Authors: The core design of Harness-MU enforces governance constraints exclusively through explicit, deterministic runtime variables passed to execution hooks; the LLM performs only language generation and never interprets or decides on permissions. This separation ensures the boundaries are unbreakable by construction, independent of LLM behavior. The hook language is intentionally minimal (authorization flags, restriction lists, and precedence ordering) to guarantee determinism without requiring formal expressiveness proofs for Turing-complete semantics. Multi-turn interactions in Muses-Bench are handled by explicit hook-state updates after each turn based on system-provided authorizations rather than LLM inference. We acknowledge that indirect semantic references fall outside this deterministic scope and will add a dedicated limitations paragraph in §1 and §5 discussing the boundary between explicit governance and any required external semantic resolution. revision: partial
Referee: [Results section (Muses-Bench evaluation)] The quantified improvements (0.28-0.39 utility, up to 48.9 pp accuracy) and claim of privacy preservation across all attacks rest on benchmark comparisons, but the text does not specify attack definitions, statistical tests performed, exclusion criteria, or how multi-turn interactions are encoded in the deterministic hooks, preventing verification of robustness.

Authors: We agree these implementation details should be stated explicitly in the main text. Attack definitions appear in §3.2 (four access-control attack families with concrete prompt templates); statistical significance was evaluated via bootstrap resampling (1000 iterations) yielding p < 0.01 for all reported gains; exclusion criteria were runs terminated by API errors or timeouts (less than 2 % of trials); and multi-turn interactions are encoded by maintaining a persistent harness state object that updates the deterministic hook variables after each turn using only explicit authorization metadata supplied by the test harness. We will expand the results section with a summary table of attacks and statistical procedures and move the full attack templates to an appendix. revision: yes

Circularity Check

0 steps flagged

No circularity; framework evaluated on external benchmark

full rationale

The paper presents an engineering framework (Harness-MU) whose central claims rest on empirical results from the external Muses-Bench benchmark across four models, with direct comparisons to a standard baseline. No equations, fitted parameters, or derivations are described that reduce to self-defined quantities. The key insight about deterministic runtime variables is presented as an assumption enabling the design, not derived from prior self-citations or internal fitting. Performance metrics (utility score, accuracy) are measured against independent test cases rather than quantities defined from the framework itself. This is the common case of a self-contained systems paper whose validity is externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; the central claim rests on the domain assumption that governance constraints are fully deterministic and separable from language generation.

axioms (1)

domain assumption Governance constraints (who is authorized, what is restricted, whose instructions take precedence) are deterministic runtime variables that can be enforced by execution hooks rather than the LLM.
Explicitly stated as the key insight in the abstract.

pith-pipeline@v0.9.1-grok · 5798 in / 1261 out tokens · 19833 ms · 2026-06-26T12:12:26.065920+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 3 canonical work pages

[1]

arXiv preprint arXiv:2601.01743 , year=

AI Agent Systems: Architectures, Applications, and Evaluation , author=. arXiv preprint arXiv:2601.01743 , year=

arXiv
[2]

arXiv preprint arXiv:2402.01680 , year=

Large language model based multi-agents: A survey of progress and challenges , author=. arXiv preprint arXiv:2402.01680 , year=

Pith/arXiv arXiv
[3]

arXiv preprint arXiv:2502.14743 , year=

Multi-agent coordination across diverse applications: A survey , author=. arXiv preprint arXiv:2502.14743 , year=

arXiv
[4]

Bulletin of Economic Research , volume =

Rees, Ray , title =. Bulletin of Economic Research , volume =. doi:https://doi.org/10.1111/j.1467-8586.1985.tb00179.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8586.1985.tb00179.x , year =

work page doi:10.1111/j.1467-8586.1985.tb00179.x 1985
[5]

arXiv preprint arXiv:2604.08567 , year=

Multi-User Large Language Model Agents , author=. arXiv preprint arXiv:2604.08567 , year=

Pith/arXiv arXiv
[6]

LLM Agents for Coordinating Multi-User Information Gathering

Jhamtani, Harsh and Andreas, Jacob and Van Durme, Benjamin. LLM Agents for Coordinating Multi-User Information Gathering. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.916

work page doi:10.18653/v1/2025.findings-acl.916 2025
[7]

2025 , eprint=

Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control , author=. 2025 , eprint=

2025
[8]

ACM Computing Surveys , volume=

Instruction tuning for large language models: A survey , author=. ACM Computing Surveys , volume=. 2026 , publisher=

2026
[9]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[10]

2026 , url =

Ryan Lopopolo , title =. 2026 , url =

2026
[11]

2026 , eprint=

Meta-Harness: End-to-End Optimization of Model Harnesses , author=. 2026 , eprint=

2026
[12]

2026 , eprint=

DeepAgent: A General Reasoning Agent with Scalable Toolsets , author=. 2026 , eprint=

2026
[13]

2026 , month = jan, url =

Anthropic , title =. 2026 , month = jan, url =

2026
[14]

2026 , doi =

Agent Harness for Large Language Model Agents: A Survey , author =. 2026 , doi =

2026
[15]

2026 , eprint=

Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation , author=. 2026 , eprint=

2026
[16]

2025 , eprint=

Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models , author=. 2025 , eprint=

2025
[17]

2026 , eprint=

PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization , author=. 2026 , eprint=

2026
[18]

The Fourteenth International Conference on Learning Representations , year=

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models , author=. The Fourteenth International Conference on Learning Representations , year=
[19]

Fundamental Limitations in Pointwise Defences of

Xander Davies and Eric Winsor and Alexandra Souly and Tomek Korbak and Robert Kirk and Christian Schroeder de Witt and Yarin Gal , booktitle=. Fundamental Limitations in Pointwise Defences of. 2026 , url=

2026
[20]

2026 , eprint=

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems , author=. 2026 , eprint=

2026
[21]

2024 , eprint=

MemGPT: Towards LLMs as Operating Systems , author=. 2024 , eprint=

2024
[22]

LangGraph Documentation , year =
[23]

2025 , eprint=

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model , author=. 2025 , eprint=

2025
[24]

and Wang, Di

Yang, Shu and Zhu, Shenzhe and Wu, Zeyu and Wang, Keyu and Yao, Junchi and Wu, Junchao and Hu, Lijie and Li, Mengdi and Wong, Derek F. and Wang, Di. Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653...

work page doi:10.18653/v1/2025.findings-acl.226 2025
[25]

NAACL , year=

IHEval: Evaluating language models on following the instruction hierarchy , author=. NAACL , year=
[26]

2026 , month=

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence , author=. 2026 , month=

2026
[27]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

2025
[28]

2025 , month=

Introducing GPT-4.1 in the API , author=. 2025 , month=

2025
[29]

2025 , eprint=

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack , author=. 2025 , eprint=

2025
[30]

2026 , eprint=

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? , author=. 2026 , eprint=

2026
[31]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=
[32]

PII-Scope: A Comprehensive Study on Training Data Privacy Leakage in Pretrained LLMs , author=. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics , pages=
[33]

arXiv preprint arXiv:2601.13671 , year=

The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption , author=. arXiv preprint arXiv:2601.13671 , year=

arXiv
[34]

arXiv preprint arXiv:2603.19896 , year=

Utility-Guided Agent Orchestration for Efficient LLM Tool Use , author=. arXiv preprint arXiv:2603.19896 , year=

arXiv
[35]

arXiv e-prints , pages=

Agentorchestra: A hierarchical multi-agent framework for general-purpose task solving , author=. arXiv e-prints , pages=

[1] [1]

arXiv preprint arXiv:2601.01743 , year=

AI Agent Systems: Architectures, Applications, and Evaluation , author=. arXiv preprint arXiv:2601.01743 , year=

arXiv

[2] [2]

arXiv preprint arXiv:2402.01680 , year=

Large language model based multi-agents: A survey of progress and challenges , author=. arXiv preprint arXiv:2402.01680 , year=

Pith/arXiv arXiv

[3] [3]

arXiv preprint arXiv:2502.14743 , year=

Multi-agent coordination across diverse applications: A survey , author=. arXiv preprint arXiv:2502.14743 , year=

arXiv

[4] [4]

Bulletin of Economic Research , volume =

Rees, Ray , title =. Bulletin of Economic Research , volume =. doi:https://doi.org/10.1111/j.1467-8586.1985.tb00179.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8586.1985.tb00179.x , year =

work page doi:10.1111/j.1467-8586.1985.tb00179.x 1985

[5] [5]

arXiv preprint arXiv:2604.08567 , year=

Multi-User Large Language Model Agents , author=. arXiv preprint arXiv:2604.08567 , year=

Pith/arXiv arXiv

[6] [6]

LLM Agents for Coordinating Multi-User Information Gathering

Jhamtani, Harsh and Andreas, Jacob and Van Durme, Benjamin. LLM Agents for Coordinating Multi-User Information Gathering. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.916

work page doi:10.18653/v1/2025.findings-acl.916 2025

[7] [7]

2025 , eprint=

Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control , author=. 2025 , eprint=

2025

[8] [8]

ACM Computing Surveys , volume=

Instruction tuning for large language models: A survey , author=. ACM Computing Surveys , volume=. 2026 , publisher=

2026

[9] [9]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

[10] [10]

2026 , url =

Ryan Lopopolo , title =. 2026 , url =

2026

[11] [11]

2026 , eprint=

Meta-Harness: End-to-End Optimization of Model Harnesses , author=. 2026 , eprint=

2026

[12] [12]

2026 , eprint=

DeepAgent: A General Reasoning Agent with Scalable Toolsets , author=. 2026 , eprint=

2026

[13] [13]

2026 , month = jan, url =

Anthropic , title =. 2026 , month = jan, url =

2026

[14] [14]

2026 , doi =

Agent Harness for Large Language Model Agents: A Survey , author =. 2026 , doi =

2026

[15] [15]

2026 , eprint=

Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation , author=. 2026 , eprint=

2026

[16] [16]

2025 , eprint=

Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models , author=. 2025 , eprint=

2025

[17] [17]

2026 , eprint=

PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization , author=. 2026 , eprint=

2026

[18] [18]

The Fourteenth International Conference on Learning Representations , year=

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models , author=. The Fourteenth International Conference on Learning Representations , year=

[19] [19]

Fundamental Limitations in Pointwise Defences of

Xander Davies and Eric Winsor and Alexandra Souly and Tomek Korbak and Robert Kirk and Christian Schroeder de Witt and Yarin Gal , booktitle=. Fundamental Limitations in Pointwise Defences of. 2026 , url=

2026

[20] [20]

2026 , eprint=

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems , author=. 2026 , eprint=

2026

[21] [21]

2024 , eprint=

MemGPT: Towards LLMs as Operating Systems , author=. 2024 , eprint=

2024

[22] [22]

LangGraph Documentation , year =

[23] [23]

2025 , eprint=

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model , author=. 2025 , eprint=

2025

[24] [24]

and Wang, Di

Yang, Shu and Zhu, Shenzhe and Wu, Zeyu and Wang, Keyu and Yao, Junchi and Wu, Junchao and Hu, Lijie and Li, Mengdi and Wong, Derek F. and Wang, Di. Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653...

work page doi:10.18653/v1/2025.findings-acl.226 2025

[25] [25]

NAACL , year=

IHEval: Evaluating language models on following the instruction hierarchy , author=. NAACL , year=

[26] [26]

2026 , month=

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence , author=. 2026 , month=

2026

[27] [27]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

2025

[28] [28]

2025 , month=

Introducing GPT-4.1 in the API , author=. 2025 , month=

2025

[29] [29]

2025 , eprint=

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack , author=. 2025 , eprint=

2025

[30] [30]

2026 , eprint=

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? , author=. 2026 , eprint=

2026

[31] [31]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=

[32] [32]

PII-Scope: A Comprehensive Study on Training Data Privacy Leakage in Pretrained LLMs , author=. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics , pages=

[33] [33]

arXiv preprint arXiv:2601.13671 , year=

The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption , author=. arXiv preprint arXiv:2601.13671 , year=

arXiv

[34] [34]

arXiv preprint arXiv:2603.19896 , year=

Utility-Guided Agent Orchestration for Efficient LLM Tool Use , author=. arXiv preprint arXiv:2603.19896 , year=

arXiv

[35] [35]

arXiv e-prints , pages=

Agentorchestra: A hierarchical multi-agent framework for general-purpose task solving , author=. arXiv e-prints , pages=