At λ=5.0, VPI becomes the most vulnerable (76.5% residual), suggesting this trigger type has a sharper transition between detectable and evasive regimes

Partial evasion requires aggressive regularization, is attack-dependent

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models

cs.CR · 2026-04-27 · unverdicted · novelty 6.0

LCF detects multiple LLM runtime threats by computing aggregated diagonal Mahalanobis distances on layer-wise hidden-state differences, calibrated on clean examples, achieving high detection rates with low overhead across several model architectures.

citing papers explorer

Showing 1 of 1 citing paper.

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models cs.CR · 2026-04-27 · unverdicted · none · ref 7
LCF detects multiple LLM runtime threats by computing aggregated diagonal Mahalanobis distances on layer-wise hidden-state differences, calibrated on clean examples, achieving high detection rates with low overhead across several model architectures.

At λ=5.0, VPI becomes the most vulnerable (76.5% residual), suggesting this trigger type has a sharper transition between detectable and evasive regimes

fields

years

verdicts

representative citing papers

citing papers explorer