Many-Tier Instruction Hierarchy in LLM Agents

· 2026 · cs.CL · arXiv 2604.09443

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, other agents, and more-each carrying different levels of trust and authority. When these instructions conflict, agents must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system > user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels of conflicting instructions with varying privileges, comprising 853 agentic tasks (427 coding and 426 instruction-following). ManyIH-Bench composes constraints developed by LLMs and verified by humans to create realistic and difficult test cases spanning 46 real-world agents. Our experiments show that even the current frontier models perform poorly (~40% accuracy) when instruction conflict scales. This work underscores the urgent need for methods that explicitly target fine-grained, scalable instruction conflict resolution in agentic settings.

representative citing papers

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

cs.CL · 2026-05-28 · unverdicted · novelty 5.0

Mock tool call wrapping does not broadly improve and sometimes reduces robustness to attacks on untrusted inputs across seven models and three LLM-as-a-judge tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs cs.CL · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
Mock tool call wrapping does not broadly improve and sometimes reduces robustness to attacks on untrusted inputs across seven models and three LLM-as-a-judge tasks.

Many-Tier Instruction Hierarchy in LLM Agents

fields

years

verdicts

representative citing papers

citing papers explorer