Jailbreak distillation: Renewable safety benchmarking

Jingyu Zhang, Ahmed Elgohary, Xiawei Wang, A S M Iftekhar, Ahmed Magooda, Benjamin Van Durme, Daniel Khashabi, Kyle Jackson · 2025 · arXiv 2505.22037

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Many-Tier Instruction Hierarchy in LLM Agents

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.

GraphRAG-IRL: Personalized Recommendation with Graph-Grounded Inverse Reinforcement Learning and LLM Re-ranking

cs.IR · 2026-04-21 · unverdicted · novelty 6.0

GraphRAG-IRL fuses graph-grounded MaxEnt IRL pre-ranking with persona-guided LLM re-ranking to deliver up to 16.8% NDCG@10 gains over IRL-only baselines on MovieLens and consistent 4-6% gains on KuaiRand.

citing papers explorer

Showing 2 of 2 citing papers.

Many-Tier Instruction Hierarchy in LLM Agents cs.CL · 2026-04-10 · unverdicted · none · ref 34
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.
GraphRAG-IRL: Personalized Recommendation with Graph-Grounded Inverse Reinforcement Learning and LLM Re-ranking cs.IR · 2026-04-21 · unverdicted · none · ref 27
GraphRAG-IRL fuses graph-grounded MaxEnt IRL pre-ranking with persona-guided LLM re-ranking to deliver up to 16.8% NDCG@10 gains over IRL-only baselines on MovieLens and consistent 4-6% gains on KuaiRand.

Jailbreak distillation: Renewable safety benchmarking

fields

years

verdicts

representative citing papers

citing papers explorer