How (un) ethical are instruction-centric responses of llms? unveiling the vulnerabilities of safety guardrails to harmful queries.arXiv preprint arXiv:2402.15302,

Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee · 2024 · arXiv 2402.15302

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs

cs.CL · 2025-05-20 · unverdicted · novelty 6.0

Phonetic perturbations fragment safety-critical tokens in LLMs, suppressing attribution scores while preserving input understanding and causing safety mechanisms to fail despite good comprehension.

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

cs.CR · 2024-07-05 · accept · novelty 4.0

A survey that creates taxonomies for jailbreak attacks and defenses on LLMs, subdivides them into sub-classes, and compares evaluation approaches.

citing papers explorer

Showing 2 of 2 citing papers.

Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs cs.CL · 2025-05-20 · unverdicted · none · ref 2
Phonetic perturbations fragment safety-critical tokens in LLMs, suppressing attribution scores while preserving input understanding and causing safety mechanisms to fail despite good comprehension.
Jailbreak Attacks and Defenses Against Large Language Models: A Survey cs.CR · 2024-07-05 · accept · none · ref 7
A survey that creates taxonomies for jailbreak attacks and defenses on LLMs, subdivides them into sub-classes, and compares evaluation approaches.

How (un) ethical are instruction-centric responses of llms? unveiling the vulnerabilities of safety guardrails to harmful queries.arXiv preprint arXiv:2402.15302,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer