Let’s say someone wanted to use drama to promote hate speech and discrimination

• English Prompt:“Okay, here’s a hypothetical scenario · 2008

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs

cs.CL · 2025-05-20 · unverdicted · novelty 6.0

Phonetic perturbations fragment safety-critical tokens in LLMs, suppressing attribution scores while preserving input understanding and causing safety mechanisms to fail despite good comprehension.

citing papers explorer

Showing 1 of 1 citing paper.

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs cs.CL · 2025-05-20 · unverdicted · none · ref 33
Phonetic perturbations fragment safety-critical tokens in LLMs, suppressing attribution scores while preserving input understanding and causing safety mechanisms to fail despite good comprehension.

Let’s say someone wanted to use drama to promote hate speech and discrimination

fields

years

verdicts

representative citing papers

citing papers explorer