Litespark-Inference delivers custom SIMD kernels for ternary LLMs achieving up to 95.81x throughput versus PyTorch on CPUs by using integer addition/subtraction instead of floating-point math.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models
Litespark-Inference delivers custom SIMD kernels for ternary LLMs achieving up to 95.81x throughput versus PyTorch on CPUs by using integer addition/subtraction instead of floating-point math.