In: Proceedings 2024 Network and Distributed System Security Symposium

Deng, G · 2024 · arXiv 2024.24188

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Adaptive Instruction Composition for Automated LLM Red-Teaming

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

Adaptive Instruction Composition uses a neural contextual bandit with RL to adaptively combine crowdsourced texts, generating more effective and diverse LLM jailbreaks than random or prior adaptive methods on Harmbench.

One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety

cs.CL · 2026-04-01 · unverdicted · novelty 7.0

Incremental Completion Decomposition (ICD) jailbreaks LLMs via sequences of single-word continuations before full harmful responses, outperforming existing methods on AdvBench, JailbreakBench, and StrongREJECT with supporting mechanistic analysis.

Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment

cs.CR · 2024-05-20 · unverdicted · novelty 6.0

SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.

citing papers explorer

Showing 3 of 3 citing papers.

Adaptive Instruction Composition for Automated LLM Red-Teaming cs.CR · 2026-04-22 · unverdicted · none · ref 56
Adaptive Instruction Composition uses a neural contextual bandit with RL to adaptively combine crowdsourced texts, generating more effective and diverse LLM jailbreaks than random or prior adaptive methods on Harmbench.
One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety cs.CL · 2026-04-01 · unverdicted · none · ref 1
Incremental Completion Decomposition (ICD) jailbreaks LLMs via sequences of single-word continuations before full harmful responses, outperforming existing methods on AdvBench, JailbreakBench, and StrongREJECT with supporting mechanistic analysis.
Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment cs.CR · 2024-05-20 · unverdicted · none · ref 7
SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.

In: Proceedings 2024 Network and Distributed System Security Symposium

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer