Can llms keep a secret? testing privacy implications of language models via contextual integrity theory

Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, Yejin Choi · 2024 · arXiv 2310.17884

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

Whose Side Is Your Agent On? Multi-Party Principal Loyalty in LLM Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

PrincipalBench exposes a sharp split in frontier LLMs between selective and over-refusing behavior on multi-party loyalty, with prompt scaffolding and KL distillation reducing harm rates but only along an existing leak/over-refusal trade-off.

Can You Keep a Secret? Involuntary Information Leakage in Language Model Writing

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

Frontier LLMs leak prompted secret information thematically in generated stories at rates up to 79% above chance in binary discrimination tests, even when told to hide it, with leakage scaling by model size and vanishing for short-form outputs.

When Are LLM Inferences Acceptable? User Reactions and Control Preferences for Inferred Personal Information

cs.HC · 2026-05-11 · unverdicted · novelty 7.0

Users show curiosity over concern toward LLM inferences of personal information, with acceptability depending on context, alignment with expectations, and who uses the inferences rather than just the content.

CAMP: Cumulative Agentic Masking and Pruning for Privacy Protection in Multi-Turn LLM Conversations

cs.CR · 2026-04-16 · unverdicted · novelty 7.0

CAMP formalizes Cumulative PII Exposure and uses a session registry, co-occurrence graph, and CPE score to trigger retroactive masking in multi-turn LLM conversations, neutralizing re-identifiable profiles in synthetic tests while keeping utility intact.

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.

PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.

AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

AgentCollabBench shows that multi-agent reliability is limited by communication topology, with converging-DAG nodes causing synthesis bottlenecks that discard constraints and explain 7-40% of information loss variance.

How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study

cs.CR · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

Vision-language models exhibit perceptual fragility and fail to consistently respect privacy constraints when operating in simulated physical environments, with performance declining in cluttered scenes and under conflicting commands.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

cs.CV · 2024-02-27 · unverdicted · novelty 2.0

The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Whose Side Is Your Agent On? Multi-Party Principal Loyalty in LLM Agents cs.AI · 2026-06-29 · unverdicted · none · ref 25
PrincipalBench exposes a sharp split in frontier LLMs between selective and over-refusing behavior on multi-party loyalty, with prompt scaffolding and KL distillation reducing harm rates but only along an existing leak/over-refusal trade-off.
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents cs.AI · 2026-05-18 · unverdicted · none · ref 20
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.

Can llms keep a secret? testing privacy implications of language models via contextual integrity theory

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer