Harness-MU: A Safe, Governed, and Effective Harness for Multi-User LLM Agents
Pith reviewed 2026-06-26 12:12 UTC · model grok-4.3
The pith
Harness-MU enforces unbreakable permission boundaries in multi-user LLM agents by treating governance rules as deterministic runtime variables enforced through execution hooks instead of relying on the model itself.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Harness-MU is the first model-agnostic, zero-tuning infrastructure framework for multi-user LLM agents that decouples language generation from safety orchestration, thereby guaranteeing unbreakable permission boundaries while maximizing compliant demand satisfaction, as demonstrated on Muses-Bench with four models showing privacy preservation across all attacks and utility gains of 0.28-0.39.
What carries the argument
Execution hooks that treat governance constraints (authorization, restrictions, precedence) as deterministic runtime variables and enforce them outside the LLM's probabilistic outputs.
If this is right
- Privacy is preserved across all access-control attacks on the tested models.
- Utility score improves by 0.28-0.39 over the standard baseline.
- Instruction-following accuracy increases by up to 48.9 percentage points.
- Systematic infrastructure becomes essential for solving multi-principal governance challenges.
Where Pith is reading between the lines
- The same hook-based separation could apply to other multi-agent systems where hard constraints must override probabilistic decisions.
- Integration into existing agent platforms would require no model changes, only added runtime enforcement layers.
- Real-world collaborative workflows with human principals could test whether the deterministic hooks scale without reducing agent flexibility.
Load-bearing premise
Governance constraints can be fully expressed as deterministic runtime variables that execution hooks can enforce without needing the LLM to interpret them or causing any loss in the agent's functionality.
What would settle it
A multi-turn adversarial input that causes the execution hooks to fail in enforcing a permission boundary while the agent still completes its task would falsify the claim of unbreakable boundaries.
Figures
read the original abstract
The increasing deployment of large language model (LLM) agents in collaborative workflows demands robust multi-user, multi-principal interaction mechanisms capable of enforcing access permissions, resolving authoritative conflicts, and preventing unauthorized data disclosure. However, a fundamental mismatch exists between the single-user training paradigm of contemporary LLMs and the hard constraints required for multi-principal governance, rendering probabilistic, prompt-based safeguards vulnerable under multi-turn adversarial interactions.Our key insight is that governance constraints -- who is authorized, what is restricted, and whose instructions take precedence -- are deterministic runtime variables that should be enforced by execution hooks rather than entrusted to the LLM. We present \textbf{Harness-MU}, the first model-agnostic, zero-tuning infrastructure framework for multi-user LLM agents. By decoupling language generation from safety orchestration, Harness-MU guarantees unbreakable permission boundaries while maximizing compliant demand satisfaction. Across four frontier open-weight and proprietary models on the \textit{Muses-Bench} benchmark, Harness-MU achieves the goal of privacy preservation across all access-control attacks, outperforming the standard baseline by 0.28--0.39 in utility score and improving instruction-following accuracy by up to 48.9 percentage points. Harness-MU advances the philosophy of \textit{Harness Engineering}, establishing that systematic infrastructure is essential for solving LLM multi-principal governance challenges. The code and data are available at https://github.com/YuanJrShiuan/Harness-MulUser.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Harness-MU, a model-agnostic, zero-tuning framework for multi-user LLM agents. By decoupling language generation from safety orchestration and enforcing governance constraints (authorization, restrictions, precedence) as deterministic runtime variables via execution hooks, it claims to guarantee unbreakable permission boundaries while maximizing compliant utility. On the Muses-Bench benchmark across four frontier models, it reports privacy preservation against all access-control attacks, utility score gains of 0.28-0.39 over baseline, and instruction-following accuracy improvements up to 48.9 percentage points. Code and data are released at the provided GitHub link.
Significance. If the central claims hold, Harness-MU would advance LLM agent security by shifting from probabilistic, prompt-based safeguards to systematic infrastructure enforcement, addressing the mismatch between single-user LLM training and multi-principal requirements. The open release of code and data is a clear strength that supports reproducibility and further work in 'Harness Engineering'.
major comments (2)
- [Abstract and §1] Abstract and §1: The key insight that 'governance constraints -- who is authorized, what is restricted, and whose instructions take precedence -- are deterministic runtime variables that should be enforced by execution hooks rather than entrusted to the LLM' is load-bearing for the 'unbreakable permission boundaries' claim. The manuscript provides no formal characterization of the hook language's expressiveness nor empirical tests of multi-turn cases where determining violations or precedence requires semantic interpretation of prior utterances (e.g., indirect references or conditional overrides), leaving the completeness assumption unverified.
- [Results section (Muses-Bench evaluation)] Results section (Muses-Bench evaluation): The quantified improvements (0.28-0.39 utility, up to 48.9 pp accuracy) and claim of privacy preservation across all attacks rest on benchmark comparisons, but the text does not specify attack definitions, statistical tests performed, exclusion criteria, or how multi-turn interactions are encoded in the deterministic hooks, preventing verification of robustness.
minor comments (2)
- [Abstract] Abstract contains unreplaced LaTeX markup (\textbf, \textit) that reduces readability in plain text.
- The description of Muses-Bench construction and the exact baseline implementation would benefit from additional detail in the main text for independent replication.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below with clarifications on the design of Harness-MU and indicate planned revisions to improve verifiability and completeness.
read point-by-point responses
-
Referee: [Abstract and §1] The key insight that 'governance constraints -- who is authorized, what is restricted, and whose instructions take precedence -- are deterministic runtime variables that should be enforced by execution hooks rather than entrusted to the LLM' is load-bearing for the 'unbreakable permission boundaries' claim. The manuscript provides no formal characterization of the hook language's expressiveness nor empirical tests of multi-turn cases where determining violations or precedence requires semantic interpretation of prior utterances (e.g., indirect references or conditional overrides), leaving the completeness assumption unverified.
Authors: The core design of Harness-MU enforces governance constraints exclusively through explicit, deterministic runtime variables passed to execution hooks; the LLM performs only language generation and never interprets or decides on permissions. This separation ensures the boundaries are unbreakable by construction, independent of LLM behavior. The hook language is intentionally minimal (authorization flags, restriction lists, and precedence ordering) to guarantee determinism without requiring formal expressiveness proofs for Turing-complete semantics. Multi-turn interactions in Muses-Bench are handled by explicit hook-state updates after each turn based on system-provided authorizations rather than LLM inference. We acknowledge that indirect semantic references fall outside this deterministic scope and will add a dedicated limitations paragraph in §1 and §5 discussing the boundary between explicit governance and any required external semantic resolution. revision: partial
-
Referee: [Results section (Muses-Bench evaluation)] The quantified improvements (0.28-0.39 utility, up to 48.9 pp accuracy) and claim of privacy preservation across all attacks rest on benchmark comparisons, but the text does not specify attack definitions, statistical tests performed, exclusion criteria, or how multi-turn interactions are encoded in the deterministic hooks, preventing verification of robustness.
Authors: We agree these implementation details should be stated explicitly in the main text. Attack definitions appear in §3.2 (four access-control attack families with concrete prompt templates); statistical significance was evaluated via bootstrap resampling (1000 iterations) yielding p < 0.01 for all reported gains; exclusion criteria were runs terminated by API errors or timeouts (less than 2 % of trials); and multi-turn interactions are encoded by maintaining a persistent harness state object that updates the deterministic hook variables after each turn using only explicit authorization metadata supplied by the test harness. We will expand the results section with a summary table of attacks and statistical procedures and move the full attack templates to an appendix. revision: yes
Circularity Check
No circularity; framework evaluated on external benchmark
full rationale
The paper presents an engineering framework (Harness-MU) whose central claims rest on empirical results from the external Muses-Bench benchmark across four models, with direct comparisons to a standard baseline. No equations, fitted parameters, or derivations are described that reduce to self-defined quantities. The key insight about deterministic runtime variables is presented as an assumption enabling the design, not derived from prior self-citations or internal fitting. Performance metrics (utility score, accuracy) are measured against independent test cases rather than quantities defined from the framework itself. This is the common case of a self-contained systems paper whose validity is externally falsifiable.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Governance constraints (who is authorized, what is restricted, whose instructions take precedence) are deterministic runtime variables that can be enforced by execution hooks rather than the LLM.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2601.01743 , year=
AI Agent Systems: Architectures, Applications, and Evaluation , author=. arXiv preprint arXiv:2601.01743 , year=
-
[2]
arXiv preprint arXiv:2402.01680 , year=
Large language model based multi-agents: A survey of progress and challenges , author=. arXiv preprint arXiv:2402.01680 , year=
-
[3]
arXiv preprint arXiv:2502.14743 , year=
Multi-agent coordination across diverse applications: A survey , author=. arXiv preprint arXiv:2502.14743 , year=
-
[4]
Bulletin of Economic Research , volume =
Rees, Ray , title =. Bulletin of Economic Research , volume =. doi:https://doi.org/10.1111/j.1467-8586.1985.tb00179.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8586.1985.tb00179.x , year =
-
[5]
arXiv preprint arXiv:2604.08567 , year=
Multi-User Large Language Model Agents , author=. arXiv preprint arXiv:2604.08567 , year=
-
[6]
LLM Agents for Coordinating Multi-User Information Gathering
Jhamtani, Harsh and Andreas, Jacob and Van Durme, Benjamin. LLM Agents for Coordinating Multi-User Information Gathering. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.916
-
[7]
2025 , eprint=
Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control , author=. 2025 , eprint=
2025
-
[8]
ACM Computing Surveys , volume=
Instruction tuning for large language models: A survey , author=. ACM Computing Surveys , volume=. 2026 , publisher=
2026
-
[9]
Advances in neural information processing systems , volume=
Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
-
[10]
2026 , url =
Ryan Lopopolo , title =. 2026 , url =
2026
-
[11]
2026 , eprint=
Meta-Harness: End-to-End Optimization of Model Harnesses , author=. 2026 , eprint=
2026
-
[12]
2026 , eprint=
DeepAgent: A General Reasoning Agent with Scalable Toolsets , author=. 2026 , eprint=
2026
-
[13]
2026 , month = jan, url =
Anthropic , title =. 2026 , month = jan, url =
2026
-
[14]
2026 , doi =
Agent Harness for Large Language Model Agents: A Survey , author =. 2026 , doi =
2026
-
[15]
2026 , eprint=
Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation , author=. 2026 , eprint=
2026
-
[16]
2025 , eprint=
Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models , author=. 2025 , eprint=
2025
-
[17]
2026 , eprint=
PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization , author=. 2026 , eprint=
2026
-
[18]
The Fourteenth International Conference on Learning Representations , year=
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models , author=. The Fourteenth International Conference on Learning Representations , year=
-
[19]
Fundamental Limitations in Pointwise Defences of
Xander Davies and Eric Winsor and Alexandra Souly and Tomek Korbak and Robert Kirk and Christian Schroeder de Witt and Yarin Gal , booktitle=. Fundamental Limitations in Pointwise Defences of. 2026 , url=
2026
-
[20]
2026 , eprint=
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems , author=. 2026 , eprint=
2026
-
[21]
2024 , eprint=
MemGPT: Towards LLMs as Operating Systems , author=. 2024 , eprint=
2024
-
[22]
LangGraph Documentation , year =
-
[23]
2025 , eprint=
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model , author=. 2025 , eprint=
2025
-
[24]
Yang, Shu and Zhu, Shenzhe and Wu, Zeyu and Wang, Keyu and Yao, Junchi and Wu, Junchao and Hu, Lijie and Li, Mengdi and Wong, Derek F. and Wang, Di. Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653...
-
[25]
NAACL , year=
IHEval: Evaluating language models on following the instruction hierarchy , author=. NAACL , year=
-
[26]
2026 , month=
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence , author=. 2026 , month=
2026
-
[27]
2025 , eprint=
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=
2025
-
[28]
2025 , month=
Introducing GPT-4.1 in the API , author=. 2025 , month=
2025
-
[29]
2025 , eprint=
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack , author=. 2025 , eprint=
2025
-
[30]
2026 , eprint=
Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? , author=. 2026 , eprint=
2026
-
[31]
30th USENIX security symposium (USENIX Security 21) , pages=
Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=
-
[32]
PII-Scope: A Comprehensive Study on Training Data Privacy Leakage in Pretrained LLMs , author=. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics , pages=
-
[33]
arXiv preprint arXiv:2601.13671 , year=
The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption , author=. arXiv preprint arXiv:2601.13671 , year=
-
[34]
arXiv preprint arXiv:2603.19896 , year=
Utility-Guided Agent Orchestration for Efficient LLM Tool Use , author=. arXiv preprint arXiv:2603.19896 , year=
-
[35]
arXiv e-prints , pages=
Agentorchestra: A hierarchical multi-agent framework for general-purpose task solving , author=. arXiv e-prints , pages=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.