S2C : Split-and-Combine Jailbreak Attacks

Wang, Y · 2024 · arXiv 2405.13965

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Breaking Safety at the Token Boundary: How BPE Tokenization Creates Exploitable Gaps in LLM Alignment

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

BPE tokenization creates exploitable gaps in LLM safety by fragmenting safety words, enabling attacks that flip refusal on 80-100% of HarmBench prompts across five models, with DPO failing to close the gap stably and SFT causing over-refusal.

Cybersecurity is the True Frontier for Generative AI Success or Failure

cs.CR · 2026-06-27 · unverdicted · novelty 3.0

Cybersecurity's scale, adversaries, labeling issues, and operational demands make it the superior test-case for general AI progress over NLP or computer vision.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Breaking Safety at the Token Boundary: How BPE Tokenization Creates Exploitable Gaps in LLM Alignment cs.CL · 2026-05-01 · unverdicted · none · ref 18
BPE tokenization creates exploitable gaps in LLM safety by fragmenting safety words, enabling attacks that flip refusal on 80-100% of HarmBench prompts across five models, with DPO failing to close the gap stably and SFT causing over-refusal.

S2C : Split-and-Combine Jailbreak Attacks

fields

years

verdicts

representative citing papers

citing papers explorer