CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data
read the original abstract
Advances in generative AI point towards a new era of personalized applications that perform diverse tasks on behalf of users. While general AI assistants have yet to fully emerge, their potential to share personal data raises significant privacy challenges. This paper introduces CI-Bench, a comprehensive synthetic benchmark for evaluating the ability of AI assistants to protect personal information during model inference. Leveraging the Contextual Integrity framework, our benchmark enables systematic assessment of information flow across important context dimensions, including roles, information types, and transmission principles. We present a novel, scalable, multi-step synthetic data pipeline for generating natural communications, including dialogues and emails. Unlike previous work with smaller, narrowly focused evaluations, we present a novel, scalable, multi-step data pipeline that synthetically generates natural communications, including dialogues and emails, which we use to generate 44 thousand test samples across eight domains. Additionally, we formulate and evaluate a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks. We envision CI-Bench as a valuable tool for guiding future language model development, deployment, system design, and dataset construction, ultimately contributing to the development of AI assistants that align with users' privacy expectations.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
MuPPET: A Benchmark for Contextual Privacy of LLM Assistants in Multi-Party Conversations
MuPPET benchmark shows LLM assistants leak substantially more private information in multi-party conversations than one-to-one evaluations indicate.
-
Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation
Introduces DelegateCI-Bench (3167 samples) and a CI-guided RL query rewriter that improves privacy-utility tradeoff by up to +10.1 utility over on-device baselines.
-
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
-
PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems
PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.
-
CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents
A new benchmark shows enterprise LLM agents violate contextual integrity at rates of 15.8-50.9% with leakage up to 26.7%, and higher task performance correlates with more privacy breaches that model scaling does not fix.
-
ContextLens: Modeling Imperfect Privacy and Safety Context for Legal Compliance
ContextLens improves LLM compliance assessment for GDPR and EU AI Act by grounding imperfect contexts through targeted questions on applicability, principles, and provisions while identifying missing factors, without ...
-
Can Large Language Models Really Recognize Your Name?
LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.
-
Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents
A data-centric survey finds that only information-flow control covers compositional and cross-session leakage in LLM agents and that no single benchmark tests an agent across all its data surfaces under one policy.
-
It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs
SELFCI uses complementary self-distillation with two reverse KL divergences to align LLMs to contextual integrity while preserving utility, outperforming RL baselines like GRPO in agentic settings.
-
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
-
Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security
A survey that maps risks along the agent workflow and consolidates metrics and benchmarks for safety, robustness, privacy, and security in agentic AI.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.