SPARQLe is a hardware-software co-design that splits quantized activations into dense low bits and sparse high bits to run inference on narrower datapaths while claiming to preserve full-precision accuracy.
Microscopiq: Accelerating foundational models through outlier-aware microscaling quantization,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AR 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Cassandra is a self-speculative decoding system that builds a draft model via fine-grained data selection and optimized pruning/mantissa truncation, achieving up to 2.41x speedup over BF16 and 1.81x more tokens than Eagle-3 on Llama 3 8B without training.
citing papers explorer
-
SPARQLe: Sub-Precision Activation Representation for Quantized LLM Inference
SPARQLe is a hardware-software co-design that splits quantized activations into dense low bits and sparse high bits to run inference on narrower datapaths while claiming to preserve full-precision accuracy.
-
Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding
Cassandra is a self-speculative decoding system that builds a draft model via fine-grained data selection and optimized pruning/mantissa truncation, achieving up to 2.41x speedup over BF16 and 1.81x more tokens than Eagle-3 on Llama 3 8B without training.