Creates TF-RefusalBench to quantify over-alignment in LLMs on criminal-law tasks across four languages and shows abliteration mitigates refusals with little performance loss.
arXiv preprint arXiv:2312.03718 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2representative citing papers
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
citing papers explorer
-
Measuring & Mitigating Over-Alignment for LLMs in Multilingual Criminal Law Courts
Creates TF-RefusalBench to quantify over-alignment in LLMs on criminal-law tasks across four languages and shows abliteration mitigates refusals with little performance loss.
-
A Survey on Knowledge Distillation of Large Language Models
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.