Frontier LLMs leak prompted secret information thematically in generated stories at rates up to 79% above chance in binary discrimination tests, even when told to hide it, with leakage scaling by model size and vanishing for short-form outputs.
The corpus of contemporary American English (COCA): One billion words, 1990–present.https://www.english-corpora.org/coca/
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Can You Keep a Secret? Involuntary Information Leakage in Language Model Writing
Frontier LLMs leak prompted secret information thematically in generated stories at rates up to 79% above chance in binary discrimination tests, even when told to hide it, with leakage scaling by model size and vanishing for short-form outputs.