A Comprehensive Evaluation of Quantization Strategies for Large Language Models , booktitle =

Jin, Renren, Du, Jiangcun, Huang, Wuwei, Liu, Wei, Luan, Jian, Wang, Bin · 2024 · DOI 10.18653/v1/2024.findings-acl.726

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

TENP applies trapezoidal expert-neuron pruning to MoE models, retaining key experts while pruning others via projected neuron contribution, yielding only 1-point accuracy drop at 40% sparsity on DeepSeek with 10% code-generation gain.

K-Quantization and its Impact on Output Performance

cs.CL · 2026-05-19 · unverdicted · novelty 3.0

Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.

citing papers explorer

Showing 2 of 2 citing papers after filters.

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts cs.LG · 2026-06-03 · unverdicted · none · ref 43
TENP applies trapezoidal expert-neuron pruning to MoE models, retaining key experts while pruning others via projected neuron contribution, yielding only 1-point accuracy drop at 40% sparsity on DeepSeek with 10% code-generation gain.
K-Quantization and its Impact on Output Performance cs.CL · 2026-05-19 · unverdicted · none · ref 29
Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.

A Comprehensive Evaluation of Quantization Strategies for Large Language Models , booktitle =

fields

years

verdicts

representative citing papers

citing papers explorer