A white paper on neural network quantization

Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

Saliency-Aware Regularized Quantization Calibration for Large Language Models

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

SARQC augments standard PTQ calibration with a saliency-aware regularizer to keep quantized weights closer to original floating-point values, yielding improved perplexity and zero-shot accuracy on dense and MoE LLMs.

citing papers explorer

Showing 2 of 2 citing papers.

Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 70
Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
Saliency-Aware Regularized Quantization Calibration for Large Language Models cs.AI · 2026-05-07 · unverdicted · none · ref 49
SARQC augments standard PTQ calibration with a saliency-aware regularizer to keep quantized weights closer to original floating-point values, yielding improved perplexity and zero-shot accuracy on dense and MoE LLMs.

A white paper on neural network quantization

fields

years

verdicts

representative citing papers

citing papers explorer