A shared polarity-flipping encoding subspace in LLM residual streams supports covert encoding and enables real-time detection of agentic data exfiltration via internal probes.
arXiv preprint arXiv:2307.11507 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents
A shared polarity-flipping encoding subspace in LLM residual streams supports covert encoding and enables real-time detection of agentic data exfiltration via internal probes.