Pruning for protection: Increasing jailbreak resistance in aligned LLM s without fine-tuning

Adib Hasan, Ileana Rugina, Alex Wang · 2024 · DOI 10.18653/v1/2024.blackboxnlp-1.26

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

Pruning attention layers in five LLMs across eight datasets maintains accuracy but degrades faithfulness and calibration.

Showing 1 of 1 citing paper.

Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration cs.LG · 2026-06-23 · unverdicted · none · ref 21
Pruning attention layers in five LLMs across eight datasets maintains accuracy but degrades faithfulness and calibration.