PrivacyPeek is a benchmark with 1,182 cases across 7 acquisition behaviors and 16 domains that evaluates acquisition-stage privacy leakage in LLM agents, finding it widespread with limited prompt mitigation.
Mpci-bench: A benchmark for multimodal pairwise contextual integrity evaluation of language model agents
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.
SELFCI uses complementary self-distillation with two reverse KL divergences to align LLMs to contextual integrity while preserving utility, outperforming RL baselines like GRPO in agentic settings.
citing papers explorer
-
PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say
PrivacyPeek is a benchmark with 1,182 cases across 7 acquisition behaviors and 16 domains that evaluates acquisition-stage privacy leakage in LLM agents, finding it widespread with limited prompt mitigation.
-
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.
-
It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs
SELFCI uses complementary self-distillation with two reverse KL divergences to align LLMs to contextual integrity while preserving utility, outperforming RL baselines like GRPO in agentic settings.