Frontier LLMs leak prompted secret information thematically in generated stories at rates up to 79% above chance in binary discrimination tests, even when told to hide it, with leakage scaling by model size and vanishing for short-form outputs.
Extracting prompts by inverting LLM outputs
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Can You Keep a Secret? Involuntary Information Leakage in Language Model Writing
Frontier LLMs leak prompted secret information thematically in generated stories at rates up to 79% above chance in binary discrimination tests, even when told to hide it, with leakage scaling by model size and vanishing for short-form outputs.