From judgment to interference: Early stopping LLM harmful outputs via streaming content monitoring

URLhttp://arxiv · 2025 · arXiv 2506.09996

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

cs.LG · 2025-09-30 · unverdicted · novelty 6.0

TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.

From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models

cs.CL · 2026-03-04 · unverdicted · novelty 5.0

The paper supplies a unified definition based on data flow and dynamic interaction plus a systematic taxonomy to organize fragmented work on streaming large language models.

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

cs.CL · 2026-05-13 · unverdicted · novelty 3.0

AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.

citing papers explorer

Showing 4 of 4 citing papers.

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning cs.CR · 2026-04-30 · unverdicted · none · ref 16
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models cs.LG · 2025-09-30 · unverdicted · none · ref 39
TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.
From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models cs.CL · 2026-03-04 · unverdicted · none · ref 7
The paper supplies a unified definition based on data flow and dynamic interaction plus a systematic taxonomy to organize fragmented work on streaming large language models.
AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue cs.CL · 2026-05-13 · unverdicted · none · ref 2
AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.

From judgment to interference: Early stopping LLM harmful outputs via streaming content monitoring

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer