CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data

Borja Balle; Diane Wan; Eugene Bagdasarian; Matthew Abueg; Ren Yi; Sahra Ghalebikesabi; Shawn O'Banion; Stefan Mellem; Zhao Cheng

arxiv: 2409.13903 · v1 · pith:XW34DAD6new · submitted 2024-09-20 · 💻 cs.AI

CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data

Zhao Cheng , Diane Wan , Matthew Abueg , Sahra Ghalebikesabi , Ren Yi , Eugene Bagdasarian , Borja Balle , Stefan Mellem

show 1 more author

Shawn O'Banion

This is my paper

classification 💻 cs.AI

keywords assistantsdataci-benchincludinginformationpersonalsyntheticacross

0 comments

read the original abstract

Advances in generative AI point towards a new era of personalized applications that perform diverse tasks on behalf of users. While general AI assistants have yet to fully emerge, their potential to share personal data raises significant privacy challenges. This paper introduces CI-Bench, a comprehensive synthetic benchmark for evaluating the ability of AI assistants to protect personal information during model inference. Leveraging the Contextual Integrity framework, our benchmark enables systematic assessment of information flow across important context dimensions, including roles, information types, and transmission principles. We present a novel, scalable, multi-step synthetic data pipeline for generating natural communications, including dialogues and emails. Unlike previous work with smaller, narrowly focused evaluations, we present a novel, scalable, multi-step data pipeline that synthetically generates natural communications, including dialogues and emails, which we use to generate 44 thousand test samples across eight domains. Additionally, we formulate and evaluate a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks. We envision CI-Bench as a valuable tool for guiding future language model development, deployment, system design, and dataset construction, ultimately contributing to the development of AI assistants that align with users' privacy expectations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MuPPET: A Benchmark for Contextual Privacy of LLM Assistants in Multi-Party Conversations
cs.CL 2026-06 unverdicted novelty 7.0

MuPPET benchmark shows LLM assistants leak substantially more private information in multi-party conversations than one-to-one evaluations indicate.
Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation
cs.CR 2026-06 unverdicted novelty 7.0

Introduces DelegateCI-Bench (3167 samples) and a CI-guided RL query rewriter that improves privacy-utility tradeoff by up to +10.1 utility over on-device baselines.
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems
cs.CR 2026-05 unverdicted novelty 6.0

PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.
CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents
cs.CR 2026-04 unverdicted novelty 6.0

A new benchmark shows enterprise LLM agents violate contextual integrity at rates of 15.8-50.9% with leakage up to 26.7%, and higher task performance correlates with more privacy breaches that model scaling does not fix.
ContextLens: Modeling Imperfect Privacy and Safety Context for Legal Compliance
cs.CL 2026-04 unverdicted novelty 6.0

ContextLens improves LLM compliance assessment for GDPR and EU AI Act by grounding imperfect contexts through targeted questions on applicability, principles, and provisions while identifying missing factors, without ...
Can Large Language Models Really Recognize Your Name?
cs.CR 2025-05 unverdicted novelty 6.0

LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.
Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents
cs.CR 2026-06 unverdicted novelty 5.0

A data-centric survey finds that only information-flow control covers compositional and cross-session leakage in LLM agents and that no single benchmark tests an agent across all its data surfaces under one policy.
It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs
cs.LG 2026-05 unverdicted novelty 4.0

SELFCI uses complementary self-distillation with two reverse KL divergences to align LLMs to contextual integrity while preserving utility, outperforming RL baselines like GRPO in agentic settings.
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems
cs.LG 2026-05 unverdicted novelty 3.0

Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security
cs.AI 2026-05 unverdicted novelty 2.0

A survey that maps risks along the agent workflow and consolidates metrics and benchmarks for safety, robustness, privacy, and security in agentic AI.