MxGLUT introduces a reconfigurable LUT-centric broadcast dataflow accelerator with mixed-precision LUT-based PEs that unifies FP8-INT4 and FP8-FP8 GEMM without separate FP datapaths, reporting up to 2.16x prefill speedup and 0.492 TFLOPS/mm² area efficiency in 28nm synthesis.
arXiv preprint arXiv:2409.16694 , year =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces TQS metric and TQS-PTQ framework that uses dynamical-systems stability to enable a priori, calibration-free mixed-precision post-training quantization for time-series models.
Empirical tests show 8-bit weight-only quantization is lossless on both models while 4-bit works for the 7B but harms the 1B on reasoning/math/code tasks, and 2-bit or lower settings collapse performance.
citing papers explorer
-
MxGLUT: A Reconfigurable LUT-Centric Broadcast Dataflow Accelerator for Mixed-Precision GEMM
MxGLUT introduces a reconfigurable LUT-centric broadcast dataflow accelerator with mixed-precision LUT-based PEs that unifies FP8-INT4 and FP8-FP8 GEMM without separate FP datapaths, reporting up to 2.16x prefill speedup and 0.492 TFLOPS/mm² area efficiency in 28nm synthesis.