Microscopiq: Accelerating foundational models through outlier-aware microscaling quantization,

· 2025 · arXiv 5053.373098

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SPARQLe: Sub-Precision Activation Representation for Quantized LLM Inference

cs.AR · 2026-05-29 · unverdicted · novelty 6.0

SPARQLe is a hardware-software co-design that splits quantized activations into dense low bits and sparse high bits to run inference on narrower datapaths while claiming to preserve full-precision accuracy.

Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding

cs.AR · 2026-05-26 · unverdicted · novelty 5.0

Cassandra is a self-speculative decoding system that builds a draft model via fine-grained data selection and optimized pruning/mantissa truncation, achieving up to 2.41x speedup over BF16 and 1.81x more tokens than Eagle-3 on Llama 3 8B without training.

citing papers explorer

Showing 2 of 2 citing papers after filters.

SPARQLe: Sub-Precision Activation Representation for Quantized LLM Inference cs.AR · 2026-05-29 · unverdicted · none · ref 19
SPARQLe is a hardware-software co-design that splits quantized activations into dense low bits and sparse high bits to run inference on narrower datapaths while claiming to preserve full-precision accuracy.
Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding cs.AR · 2026-05-26 · unverdicted · none · ref 51
Cassandra is a self-speculative decoding system that builds a draft model via fine-grained data selection and optimized pruning/mantissa truncation, achieving up to 2.41x speedup over BF16 and 1.81x more tokens than Eagle-3 on Llama 3 8B without training.

Microscopiq: Accelerating foundational models through outlier-aware microscaling quantization,

fields

years

verdicts

representative citing papers

citing papers explorer