Con- versations exceeding θ are flagged for review

Streaming Probe:The XGBoost classifier evaluates each turn in real time, computingP adv(t) from the activation, trajectory scalars

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

Adversarial restlessness in LLM activations allows five scalar features to detect multi-turn prompt injections at 93.8% accuracy on synthetic data, with cross-model replication but source-dependent generalization to real-world chats.

citing papers explorer

Showing 1 of 1 citing paper.

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection cs.CR · 2026-04-30 · unverdicted · none · ref 5
Adversarial restlessness in LLM activations allows five scalar features to detect multi-turn prompt injections at 93.8% accuracy on synthetic data, with cross-model replication but source-dependent generalization to real-world chats.

Con- versations exceeding θ are flagged for review

fields

years

verdicts

representative citing papers

citing papers explorer