FLIPS identifies LLM instances with 96% closed-set and 90% open-set accuracy by exploiting biases in generated binary random sequences across 237 instances.
Investigating the impact of quantization methods on the safety and reliability of large language models
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
support 1representative citing papers
No continuous utility-preserving input wrapper can eliminate all prompt injection risks in connected prompt spaces for language models.
Across 51 quantized checkpoints, quality metrics fail to predict safety drops in 36 pairings and 10 hidden-danger cases, while a new RTSI screen routes all 10 dangerous rows to testing at matched bucket size.
3-bit quantization induces new stereotypical biases in 6-21% of previously unbiased BBQ items across three LLMs, undetected by perplexity increases under 3%, with models declining in 'unknown' responses by 17.4%.
Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.
Activation-aware pruning preserves perplexity but amplifies bias in LLMs, with 47-59% of previously neutral items developing new stereotypical responses at 70% sparsity.
8:16 sparsity with variance correction and outlier handling lets compressed LLMs match or exceed dense-model accuracy under fixed memory limits, outperforming the common 2:4 pattern in flexibility.
citing papers explorer
-
Quality Is Not a Safety Proxy Under Quantization
Across 51 quantized checkpoints, quality metrics fail to predict safety drops in 36 pairings and 10 hidden-danger cases, while a new RTSI screen routes all 10 dangerous rows to testing at matched bucket size.
-
Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
3-bit quantization induces new stereotypical biases in 6-21% of previously unbiased BBQ items across three LLMs, undetected by perplexity increases under 3%, with models declining in 'unknown' responses by 17.4%.
-
Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI
Activation-aware pruning preserves perplexity but amplifies bias in LLMs, with 47-59% of previously neutral items developing new stereotypical responses at 70% sparsity.