FairyFuse enables multiplication-free ternary LLM inference on CPUs via fused AVX-512 kernels, achieving 29.6x kernel speedup and 32.4 tokens/s on Xeon with near-lossless quality.
GPTQ: Accurate post-training quantization for generative pre-trained transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels
FairyFuse enables multiplication-free ternary LLM inference on CPUs via fused AVX-512 kernels, achieving 29.6x kernel speedup and 32.4 tokens/s on Xeon with near-lossless quality.
- Measuring Maximum Activations in Open Large Language Models