A survey of low-bit large language models: Basics, systems, and algorithms,

· 2024 · arXiv 2409.16694

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

MxGLUT: A Reconfigurable LUT-Centric Broadcast Dataflow Accelerator for Mixed-Precision GEMM

cs.AR · 2026-07-02 · unverdicted · novelty 6.0

MxGLUT introduces a reconfigurable LUT-centric broadcast dataflow accelerator with mixed-precision LUT-based PEs that unifies FP8-INT4 and FP8-FP8 GEMM without separate FP datapaths, reporting up to 2.16x prefill speedup and 0.492 TFLOPS/mm² area efficiency in 28nm synthesis.

An Empirical Study of OpenPangu Quantization on Ascend NPUs

cs.LG · 2026-06-19 · unverdicted · novelty 2.0

Empirical tests show 8-bit weight-only quantization is lossless on both models while 4-bit works for the 7B but harms the 1B on reasoning/math/code tasks, and 2-bit or lower settings collapse performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.

An Empirical Study of OpenPangu Quantization on Ascend NPUs cs.LG · 2026-06-19 · unverdicted · none · ref 2
Empirical tests show 8-bit weight-only quantization is lossless on both models while 4-bit works for the 7B but harms the 1B on reasoning/math/code tasks, and 2-bit or lower settings collapse performance.

A survey of low-bit large language models: Basics, systems, and algorithms,

fields

years

verdicts

representative citing papers

citing papers explorer