Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
A white paper on neural network quantization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
SARQC augments standard PTQ calibration with a saliency-aware regularizer to keep quantized weights closer to original floating-point values, yielding improved perplexity and zero-shot accuracy on dense and MoE LLMs.
citing papers explorer
-
Moshi: a speech-text foundation model for real-time dialogue
Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
-
Saliency-Aware Regularized Quantization Calibration for Large Language Models
SARQC augments standard PTQ calibration with a saliency-aware regularizer to keep quantized weights closer to original floating-point values, yielding improved perplexity and zero-shot accuracy on dense and MoE LLMs.