Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Unraveling llm jailbreaks through safety knowledge neurons , author=

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Suppressing one refusal neuron or amplifying one concept neuron bypasses safety alignment in LLMs from 1.7B to 70B parameters without training or prompt engineering.

citing papers explorer

Showing 1 of 1 citing paper.

A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 5
Suppressing one refusal neuron or amplifying one concept neuron bypasses safety alignment in LLMs from 1.7B to 70B parameters without training or prompt engineering.

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer