← back to paper
arxiv: 2604.18901 · 2 revisions
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams