FairyFuse enables multiplication-free ternary LLM inference on CPUs via fused AVX-512 kernels, achieving 29.6x kernel speedup and 32.4 tokens/s on Xeon with near-lossless quality.
EfficientQAT: Efficient quantization-aware training for large language models.ACL, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels
FairyFuse enables multiplication-free ternary LLM inference on CPUs via fused AVX-512 kernels, achieving 29.6x kernel speedup and 32.4 tokens/s on Xeon with near-lossless quality.