arXiv preprint arXiv:2409.16694 , year =

Gong, Ruihao, others , title = · 2024 · arXiv 2409.16694

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

MxGLUT: A Reconfigurable LUT-Centric Broadcast Dataflow Accelerator for Mixed-Precision GEMM

cs.AR · 2026-07-02 · unverdicted · novelty 6.0

MxGLUT introduces a reconfigurable LUT-centric broadcast dataflow accelerator with mixed-precision LUT-based PEs that unifies FP8-INT4 and FP8-FP8 GEMM without separate FP datapaths, reporting up to 2.16x prefill speedup and 0.492 TFLOPS/mm² area efficiency in 28nm synthesis.

Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score

cs.LG · 2026-06-11 · unverdicted · novelty 6.0

Introduces TQS metric and TQS-PTQ framework that uses dynamical-systems stability to enable a priori, calibration-free mixed-precision post-training quantization for time-series models.

An Empirical Study of OpenPangu Quantization on Ascend NPUs

cs.LG · 2026-06-19 · unverdicted · novelty 2.0

Empirical tests show 8-bit weight-only quantization is lossless on both models while 4-bit works for the 7B but harms the 1B on reasoning/math/code tasks, and 2-bit or lower settings collapse performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.

MxGLUT: A Reconfigurable LUT-Centric Broadcast Dataflow Accelerator for Mixed-Precision GEMM cs.AR · 2026-07-02 · unverdicted · none · ref 8
MxGLUT introduces a reconfigurable LUT-centric broadcast dataflow accelerator with mixed-precision LUT-based PEs that unifies FP8-INT4 and FP8-FP8 GEMM without separate FP datapaths, reporting up to 2.16x prefill speedup and 0.492 TFLOPS/mm² area efficiency in 28nm synthesis.

arXiv preprint arXiv:2409.16694 , year =

fields

years

verdicts

representative citing papers

citing papers explorer