TENP applies trapezoidal expert-neuron pruning to MoE models, retaining key experts while pruning others via projected neuron contribution, yielding only 1-point accuracy drop at 40% sparsity on DeepSeek with 10% code-generation gain.
A Comprehensive Evaluation of Quantization Strategies for Large Language Models , booktitle =
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.
citing papers explorer
-
TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts
TENP applies trapezoidal expert-neuron pruning to MoE models, retaining key experts while pruning others via projected neuron contribution, yielding only 1-point accuracy drop at 40% sparsity on DeepSeek with 10% code-generation gain.
-
K-Quantization and its Impact on Output Performance
Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.