IHEval : Evaluating language models on following the instruction hierarchy

Zhihan Zhang, Shiyang Li, Zixuan Zhang, Xin Liu, Haoming Jiang, Xianfeng Tang, Yifan Gao, Zheng Li, Haodong Wang, Zhaoxuan Tan, Yichuan Li, Qingyu Yin, Bing Yin, Meng Jiang · 2025 · arXiv 2502.08745

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Many-Tier Instruction Hierarchy in LLM Agents

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.

citing papers explorer

Showing 1 of 1 citing paper.

Many-Tier Instruction Hierarchy in LLM Agents cs.CL · 2026-04-10 · unverdicted · none · ref 36
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.

IHEval : Evaluating language models on following the instruction hierarchy

fields

years

verdicts

representative citing papers

citing papers explorer