SpeechJBB benchmark shows substantially high jailbreak success rates for LALMs on code-switched harmful audio, highest for non-English cases, with pseudo-word insertion further lowering refusal rates.
Curran Associates Inc
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Paired evaluation shows 26.5% decision instability for code-mixed inputs, with review rates rising from 0.138 to 0.297 and non-hate false-flag rates from 0.069 to 0.104.
citing papers explorer
-
When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability
Paired evaluation shows 26.5% decision instability for code-mixed inputs, with review rates rising from 0.138 to 0.297 and non-hate false-flag rates from 0.069 to 0.104.