MODE decomposes expert selection frequency by modality, filters redundant vision tokens, adds per-modality sensitivity, and uses ILP to assign bit-widths, limiting average loss to 2.9% at W3A16 on MoE-MLLMs.
A is B” fail to learn “B is A
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
SPARQLe is a hardware-software co-design that splits quantized activations into dense low bits and sparse high bits to run inference on narrower datapaths while claiming to preserve full-precision accuracy.
citing papers explorer
-
MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs
MODE decomposes expert selection frequency by modality, filters redundant vision tokens, adds per-modality sensitivity, and uses ILP to assign bit-widths, limiting average loss to 2.9% at W3A16 on MoE-MLLMs.
-
Layer Collapse in Diffusion Language Models
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
-
SPARQLe: Sub-Precision Activation Representation for Quantized LLM Inference
SPARQLe is a hardware-software co-design that splits quantized activations into dense low bits and sparse high bits to run inference on narrower datapaths while claiming to preserve full-precision accuracy.