21 Published as a conference paper at ICLR 2026 Definitions: For anyq∈ P\{o 1, o2}, we define the activation of the experts∈[k]byqas, σ(s,t) q :=Pm r=1 ReLU(⟨w(s,t) r , q⟩)

With high probability (abbreviated asw · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees

cs.LG · 2026-04-07 · unverdicted · novelty 5.0

A router-norm and variance-based bit allocation strategy for quantizing MoE models that claims higher accuracy and lower cost than prior mixed-precision methods.

citing papers explorer

Showing 1 of 1 citing paper.

Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees cs.LG · 2026-04-07 · unverdicted · none · ref 4
A router-norm and variance-based bit allocation strategy for quantizing MoE models that claims higher accuracy and lower cost than prior mixed-precision methods.

21 Published as a conference paper at ICLR 2026 Definitions: For anyq∈ P\{o 1, o2}, we define the activation of the experts∈[k]byqas, σ(s,t) q :=Pm r=1 ReLU(⟨w(s,t) r , q⟩)

fields

years

verdicts

representative citing papers

citing papers explorer