pith. sign in

Can llms keep a secret? testing privacy implications of language models via contextual integrity theory

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

background 3 method 1

citation-polarity summary

years

2026 8 2024 1

verdicts

UNVERDICTED 9

clear filters

representative citing papers

Whose Side Is Your Agent On? Multi-Party Principal Loyalty in LLM Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

PrincipalBench exposes a sharp split in frontier LLMs between selective and over-refusing behavior on multi-party loyalty, with prompt scaffolding and KL distillation reducing harm rates but only along an existing leak/over-refusal trade-off.

citing papers explorer

Showing 2 of 2 citing papers after filters.