3-bit quantization induces new stereotypical biases in 6-21% of previously unbiased BBQ items across three LLMs, undetected by perplexity increases under 3%, with models declining in 'unknown' responses by 17.4%.
Accuracy is not all you need
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3roles
background 1polarities
support 1representative citing papers
Cassandra is a self-speculative decoding system that builds a draft model via fine-grained data selection and optimized pruning/mantissa truncation, achieving up to 2.41x speedup over BF16 and 1.81x more tokens than Eagle-3 on Llama 3 8B without training.
Activation-aware pruning preserves perplexity but amplifies bias in LLMs, with 47-59% of previously neutral items developing new stereotypical responses at 70% sparsity.
citing papers explorer
-
Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI
Activation-aware pruning preserves perplexity but amplifies bias in LLMs, with 47-59% of previously neutral items developing new stereotypical responses at 70% sparsity.