Dynamo: Runtime switchable quantization for moe with cross-dataset adaptation.arXiv preprint arXiv:2503.21135

Zihao Zheng, Xiuping Cui, Size Zheng, Maoliang Li, Jiayu Chen, Yun Liang, Xiang Chen · arXiv 2503.21135

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

AlphaQ: Calibration-Free Bit Allocation for Mixture-of-Experts Quantization

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

AlphaQ performs calibration-free mixed-precision quantization of MoE models by allocating higher bits to experts whose weight spectra exhibit stronger heavy-tailed structure according to HT-SR theory, outperforming calibration-based methods and reaching near full-precision accuracy at 3.5 average bi

GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

GEMQ applies global LP-based expert importance estimation and router fine-tuning within progressive quantization to cut memory and speed inference in MoE LLMs with little accuracy loss.

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

cs.CL · 2026-04-20 · unverdicted · novelty 6.0 · 2 refs

GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.

DynaGraph: Lightweight Multi-Model Interaction Framework via Dynamic Topological Reconfiguration

cs.MA · 2026-05-28 · unverdicted · novelty 5.0

DynaGraph is a multi-model framework that multiplexes PEFT adapters on a shared base model with evaluator-driven dynamic topology reconfiguration and hierarchical self-healing to achieve near-72B performance on reasoning benchmarks using an 8B model while reducing latency and tokens.

citing papers explorer

Showing 1 of 1 citing paper after filters.

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling cs.CL · 2026-04-20 · unverdicted · none · ref 37 · 2 links
GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.

Dynamo: Runtime switchable quantization for moe with cross-dataset adaptation.arXiv preprint arXiv:2503.21135

fields

years

verdicts

representative citing papers

citing papers explorer