From judgment to interference: Early stopping LLM harmful outputs via streaming content monitoring

Li, Y · 2025 · arXiv 2506.09996

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

cs.LG · 2025-09-30 · unverdicted · novelty 6.0

TPCs allow term-by-term progressive polynomial evaluation on LLM activations for flexible safety monitoring that supports both stronger guardrails and low-cost adaptive cascades.

From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models

cs.CL · 2026-03-04 · unverdicted · novelty 5.0

The paper supplies a unified definition based on data flow and dynamic interaction plus a systematic taxonomy to organize fragmented work on streaming large language models.

Online Safety Monitoring for LLMs

cs.AI · 2026-07-02 · unverdicted · novelty 3.0

Simple thresholding on an external verifier signal, calibrated by risk control, performs competitively with sequential hypothesis testing monitors on math reasoning and red-teaming datasets.

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

cs.CL · 2026-05-13 · unverdicted · novelty 3.0

AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.

citing papers explorer

Showing 1 of 1 citing paper after filters.

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning cs.CR · 2026-04-30 · unverdicted · none · ref 16
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.

From judgment to interference: Early stopping LLM harmful outputs via streaming content monitoring

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer