SentGuard achieves 90.5% detection of unsafe cases within two sentences at 7.41% false positive rate by operating at sentence boundaries during LLM streaming generation.
arXiv preprint arXiv:2510.09694 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.
citing papers explorer
-
SentGuard: Sentence-Level Streaming Guardrails for Large Language Models
SentGuard achieves 90.5% detection of unsafe cases within two sentences at 7.41% false positive rate by operating at sentence boundaries during LLM streaming generation.
-
AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue
AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.